Chi-Square Test for Goodness of Fit - Department of Physics

Chi-Square Test for Goodness of Fit 

Scientists often use the Chi-square (χ 2 ) test to determine the “goodness of fit” between 

theoretical and experimental data. In this test, we compare observed values with theoretical or 

expected values. Observed values are those that the researcher obtains empirically through direct 

observation. The theoretical or expected values are developed on the basis of an established 

theory or a working hypothesis. For example, we might expect that if we flip a coin 200 times, 

that we would tally 100 heads and 100 tails. In checking our hypothesis, we might find only 92 

heads and 108 tails. Should we reject this coin as being fair? Should we just attribute the 

difference between expected and observed frequencies to random fluctuation? 

Consider a second example: let’s suppose that we have an unbiased, six-sided die. We 

roll this die 300 times and tally the number of times each side appears: 

Face 

Frequency 

1 42 

2 55 

3 38 

4 57 

5 64 

6 44 

Ideally, we might expect every side to appear 50 times. What should we conclude from these 

results? Is the die biased? 

Null Hypothesis 

The use of the chi-squared distribution is hypothesis testing follows this process: (1) a 

null hypothesis H 0 is stated, (2) a test statistic is calculated, the observed value of the test statistic 

is compared to a critical value, and (3) a decision is made whether or not to reject the null 

hypothesis. An attractive feature of the chi-squared goodness-of-fit test is that it can be applied 

to any univariate distribution for which you can calculate the cumulative distribution function. 

The null hypothesis is a statement that is assumed true. It is rejected only when the data has a 

degree of statistical confidence that the null hypothesis is false, when the level of confidence 

exceeds a pre-determined level, usually 95 %, that causes a rejection of the null hypothesis. If 

experimental observations indicate that the null hypothesis should be rejected, it means either 

that the hypothesis is indeed false or the measured data gave an improbable result indicating that 

the hypothesis is false, when it is really true. This is an unfortunate property of statistics. 

Calculating Chi-squared 

For the chi-square goodness-of-fit computation, the data are divided into k bins and the 

test statistic is defined as 

k 

2 

2 ( Oi 

− Ei 

) 

χ = ∑ 

(7) 

i= 

1 Ei 

where O i is the observed frequency for bin i and E i is the expected frequency for bin i. Chisquared 

is always positive and may range from 0 to ∞. 

The chi-squared goodness-of-fit test is applied to binned data (i.e., data put into classes) 

and is sensitive to the choice of bins. This is actually not a restriction, since for non-binned data, 

a histogram or frequency table can be made before generating the chi-square test. However, the

values of the chi-squared test statistic are dependent on how the data is binned. Another 

disadvantage of the chi-square test is that it requires a sufficient sample size in order for the chisquare 

approximation to be valid. There is no optimal choice for the bin width (since the optimal 

bin width depends on the distribution). Most reasonable choices should produce similar, but not 

identical, results. One method that may work is to choose bins that have a width of s/3 and lower 

and upper bins at the sample mean ±6s, where s is the sample standard deviation. For the chisquare 

approximation to be valid, the expected frequency should be at least 5. This test is not 

valid for small samples, and if some of the counts are less than five, you may need to combine 

some bins in the tails. 

Let’s apply this now to the above examples: 

Table I: The Coin Toss Example 

Face O E (O − E) 2 / E 

Heads 92 100 0.64 

Tails 108 100 0.64 

Totals 200 200 χ 2 =1.28 

Table II: The 6-sided Die Example 

Face O E (O − E) 2 / E 

1 42 50 1.28 

2 55 50 0.50 

3 38 50 2.88 

4 57 50 0.98 

5 64 50 3.92 

6 44 50 0.72 

Totals 300 300 χ 2 =10.28 

Degrees of Freedom 

We have seen how to calculate a value for chi-squared, but so far, it doesn’t have much 

meaning. The chi-square distribution is tabulated and available in most texts on statistics (and 

reprinted here). To use the table, one must know how many degrees of freedom df are associated 

with the number of categories in the sample data. This is because there is a family of chi-square 

distributions, each a function of the number of degrees of freedom. 

The number of degrees of freedom is typically equal to k −1. For example, in the die 

example, the expected frequencies for each of the two categories (heads, tails) are not 

independent. To obtain the expected frequency of tails (100), we need only subtract the expected 

frequency of heads (100) from the total frequency (200). Similarly, for the die example, there 

are six possible categories of outcomes: the occurrence of each of the faces. Under the 

assumption that the die is fair, we expect a frequency of 50 for each of the faces, but these again 

are not independent. Once the frequency count is known for five of the bins, the frequency of 

the sixth bin is determined, since the total count is 300. Thus, only the frequencies in five of the 

six bins are free to vary − leading to five degrees of freedom for this example.

Levels of Confidence 

A chi-square table, like Table III, lists the chi-squared distribution in terms of df and in 

terms of the level of confidence, α = 1 − p. This chi-squared goodness-of-fit method is not 

without risk; and the data may lead to the rejection when in fact it is true. This is why we speak 

of confidence. In the coin flip example, the null hypothesis is that the frequency of “heads” is 

equal to the frequency of “tails.” In the more general case, we do not require equal probability 

for each of the categories. There are many cases where an expected category will contain the 

majority of tally marks over all other categories (one such example would be a survey enquiring 

about the public’s choice in an upcoming presidential election that includes all candidates on the 

ballot). 

In Table III, the critical values of χ 2 are given for up to 20 degrees of freedom. Four 

different percentile points in each distribution are given for 1 − p = 0.10, 0.05, and 0.025. The 

standard practice in the world of statistics is to use a 95 % level of confidence in the hypothesis 

decision making. Thus, if the value of chi-squared that is calculated indicates a value of p that is 

less than or equal to 0.05, then the null hypothesis should be rejected. In the coin-flip example, 

you can toss a coin and get 14 heads out of twenty flips and find p = 0.0577. This would indicate 

that such an observation can happen by chance and the coin can be considered a fair coin. Such 

a finding would be described by statisticians as “not statistically significant at the 5 % level.” If 

one found 15 heads out of 20 tosses, then p would be somewhat less than 0.05 and the coin 

would be considered biased. This would be described as “statistically significant at the 5 % 

level.” The significance level of the test is not determined by the p value. It is pre-determined 

by the experimenter. You can choose a 90 % level, a 95 % level, a 99 % level, etc. 

For the coin flip example, with one degree of freedom. The χ 2 for the experiment given 

in Table I is only 1.28. This corresponds to a p = 0.26, which is somewhat greater than 0.05. 

Therefore, the null hypothesis that the die is fair cannot be rejected. The smaller the p-value, the 

greater is the likelihood that the null hypothesis should be rejected. 

In the case of the data in Table II for the die, the chi-square value is 10.28, which 

corresponds to a 93 % confidence level. The die would be considered fair. 

Why p = 0.05 or a 95 % Level of Confidence used? 

Long ago, before the wide-spread availability of computers, calculating p values was 

somewhat difficult, so the values were tabulated for people to interpolate the p values. The 

tables that were most commonly used were published by Ronald A. Fisher beginning in the 

1930s. These tables were subsequently reproduced in statistics books everywhere. In Fisher’s 

books, he argued the level of p = 0.05 as the measure of whether something significant is going 

on by stating, 

The value for p = 0.05 or 1 in 20 is 1.96 or nearly 2; it is convenient to 

take this point as a limit in judging whether a deviation ought to be 

considered significant or not. Deviations exceeding twice the standard 

deviation are thus formally regarded as significant. Using this criterion, 

we should be led to follow up a false indication only once in 22 trials, 

even if the statistics were the only guide available. 

Fisher continued his discussion in another part of his book,

If one 

in twenty does not seem 

high enough odds, we 

may, if we prefer it, 

draw the line at one in fifty (the 2 % point), or one in 

a hundred (the 1 % 

point) ). Personally, the writer prefers to set a low standard of significance 

at the 5 percent point, and ignore entirely all resultss which fail to reach 

this level. A scientific fact should be regardedd as experimentally 

established only if a properly designed experiment rarely fails to give this 

level of significance. 

Table III. The Chi-Square Distribution 

α 

2 

χ 

df 

α = 0.10 

α = 0.05 

α = 0.025 

df 

α = 0.10 α = 0.05 α = 0.025 

1 

2.706 

3.841 

5.024 

11 

17.275 

19.675 

21.920 

2 

4.605 

5.991 

7.378 

12 

18.549 

21.026 

23.337 

3 

6.251 

7.815 

9.348 

13 

19.812 

22.3622 24.736 

4 

7.779 

9.488 

11.143 

14 

21.064 

23.685 

26.119 

5 

9.236 

11.071 

12.833 

15 

22.307 

24.996 

27.4888 

6 

10.645 

12.592 

14.449 

16 

23.5422 26.2966 28.845 

7 

12.017 

14.067 

16.013 

17 

24.769 

27.5877 30.1911 

8 

13.362 

15.507 

17.535 

18 

25.9899 28.869 

31.526 

9 

14.684 

16.919 

19.023 

19 

27.204 

30.1444 32.8522 

10 

15.987 

18.307 

20.483 

20 

28.4122 31.410 

34.170 

Fractional Uncertainty Revisited 

When a reported value is determined by taking the average of a set 

of independent 

readings, 

the fractional uncertainty is given by the ratio of the uncertainty divided by the average 

value. For this example,

fractional uncertainty = 

= 

uncertainty 

average 

0.05 cm 

31.19 cm 

= 0.0016 ≈ 0.002 

Note that the fractional uncertainty is dimensionless (the uncertainty in cm was divided 

by the average in cm). An experimental physicist might make the statement that this 

measurement “is good to about 1 part in 500" or "precise to about 0.2%." 

The fractional uncertainty is also important because it is used in propagating uncertainty 

in calculations using the result of a measurement, as discussed in the next section. 

Propagation of Uncertainty 

Let say we are given a functional relationship between several measured variables 

(x, y, z), 

Q = f(x,y,z) 

What is the uncertainty in Q if the uncertainties in x, y, and z are known? 

To calculate the variance in Q = f(x,y) as a function of the variances in x and y we use 

the following: 

2 

2 2⎛ 

∂Q 

⎞ 2 ⎛ ∂Q 

⎞ 

σQ = σx⎜ 

⎟ + σ 

y⎜ 

⎟ + 2σ 

⎝ ∂x 

⎠ ⎝ ∂y 

⎠ 

2 

xy 

⎛ ∂Q 

⎞⎛ 

∂Q 

⎞ 

⎜ ⎟⎜ 

⎟ 

⎝ ∂x 

⎠⎝ 

∂y 

⎠ 

(8) 

If the variables x and y are uncorrelated (σ xy = 0), the last term in the above equation is zero. 

We can derive the equation 8 as follows: 

Assume we have several measurements of the quantities x (e.g. x 1 , x 2 ...x i ) and y (e.g. y 1 , 

y 2 ...y i ). Then, the average of x and y is 

N 

N 

1 

1 

x = ∑ x i 

and y = ∑ y i 

N i= 

1 

N i= 

1 

Assume that the measured values are close to the average values, evaluating Q at those measured 

values... Q = f ( x, 

y) 

. Let Q i = f(x i ,y i ) Now, expand Q i about the average values. 

⎛ ∂Q 

⎞ ⎛ ∂Q 

⎞ 

Qi 

= f ( x, 

y) 

+ ( xi 

− x) 

⎜ ⎟ + ( yi 

− y) 

⎜ ⎟ + higher order terms 

⎝ ∂x 

⎠ 

y 

x ⎝ ∂ ⎠ y 

But, let’s take the difference and neglect the higher order terms: 

⎛ ∂Q 

⎞ ⎛ ∂Q 

⎞ 

Qi 

− Q = ( xi 

− x) 

⎜ ⎟ + ( yi 

− y) 

⎜ ⎟ 

⎝ ∂x 

⎠ 

y 

x ⎝ ∂ ⎠ y 

The variance is

σ 

2 

Q 

1 

= 

N 

1 

= 

N 

N 

∑ 

i= 

1 

N 

2 

2⎛ 

∂Q 

⎞ 

∑( 

xi 

− x) 

⎜ ⎟ 

i= 1 ⎝ ∂x 

⎠μ 

2⎛ 

∂Q 

⎞ 

= σx⎜ 

⎟ 

⎝ ∂x 

⎠ 

( Q − Q) 

i 

2 

μ 

x 

2 

2 ⎛ ∂Q 

⎞ 

+ σ 

y⎜ 

⎟ 

⎝ ∂y 

⎠ 

x 

2 

μ 

1 

+ 

N 

y 

2 

N 

N 

2⎛ 

∂Q 

⎞ 2 

⎛ ∂Q 

⎞ ⎛ ∂Q 

⎞ 

∑( 

yi 

− y) 

⎜ ⎟ + ∑( 

xi 

− x)( 

yi 

− y) 

⎜ ⎟ ⎜ ⎟ 

i= 1 ⎝ ∂y 

⎠ N 

μ i= 1 

⎝ ∂x 

⎠μ 

⎝ ∂y 

⎠μ 

⎛ ∂Q 

⎞ 

+ 2⎜ 

⎟ 

⎝ ∂x 

⎠ 

μ 

x 

⎛ ∂Q 

⎞ 

⎜ ⎟ 

⎝ ∂y 

⎠ 

μ 

y 

N 

∑ 

i= 

1 

y 

( x − x)( 

y 

i 

i 

− y) 

x 

y 

Since the derivatives are evaluated at the average values ( x, y ) we can pull them out of the 

summation. 

Example: Power in an electric circuit is P = I 2 R. 

Let I = 1.0 ± 0.1 A and R = 10.0 ± 1.0 Ω. 

Determine the power and its uncertainty using propagation of errors, 

assuming I and R are uncorrelated. 

σ 

2 

P 

2 

I 

2 

2⎛ 

∂P 

⎞ 

= σI 

⎜ ⎟ 

⎝ ∂I 

⎠ 

2 

= σ (2IR) 

I = 1 

+ σ ( I 

2 

R 

2 

= (0.1) (2 × 1.0 × 10.0) 

2 

) 

2 

2 

2 

2 ⎛ ∂P 

⎞ 

+ σR⎜ 

⎟ 

⎝ ∂R 

⎠ 

R= 

10 

+ (1.0) 

2 

(1.0 

2 

) 

2 

= 5 watts 

2 

The uncertainty in the power is the square root of the variance. 

P = I 2 R = 10.0 ± 2 W 

If the true value of the power was 10.0 W, and we measured it many times 

with an uncertainty s = ± 2 W and Gaussian statistics apply, then 68% of 

the measurements would lie in the range of 8 to 12 watts 

More Examples: 

In each of the following examples, the uncertainty and the fractional uncertainty are 

given. 

(a) f = x + y (b) f = xy 

∂f 

∂f 

∂f 

= 1 and = 1 

= y and 

∂x 

∂y 

∂x 

σ 

σ 

f 

f 

f 

= 

= 

σ 

2 

x 

σ 

2 

x 

+ σ 

2 

y 

+ σ 

2 

y 

x + y 

f 

σ 

σ 

f 

f 

= 

= 

y 

2 

σ 

x 

σ 

2 

x 

2 

2 

x 

2 

+ x σ 

σ 

+ 

y 

2 

y 

2 

2 

y 

∂f 

∂y 

= x 

(c) 

f = 

x 

y

∂f 

1 

= and 

∂x 

y 

∂f 

x 

= − 

∂y 

y 

2 2 2 

2 2 

σ x σ 

x y σ 

f σ σ 

x y 

σ 

f 

= + and = + 

2 4 

2 2 

y y 

f x y 

2 

Note: the fractional uncertainty in f, as shown in (b) and (c) above, has the same form for 

multiplication and division: The fractional uncertainty in a product or quotient is the square root 

of the sum of the squares of the fractional uncertainty of each individual term, as long as the 

terms are not correlated. 

Example: Find the fractional uncertainty in v, where v = at where a = 9.8 ± 0.1 m/s 2 and t = 1.2 ± 

0.1 s. 

σ 

v 

v 

= 

σ 

a 

2 

a 

2 

σ 

+ 

t 

2 

t 

2 

= 

⎛ 0.1⎞ 

⎜ ⎟ 

⎝ 9.8 ⎠ 

2 

⎛ 0.1⎞ 

+ ⎜ ⎟ 

⎝ 3.4 ⎠ 

2 

= 0.031 

or 

3.1% 

Notice that since the relative uncertainty in t (2.9 %) is significantly greater than the relative 

uncertainty for a (1.0 %), the relative uncertainty in v is essentially the same as for t (about 3%). 

Time-saving approximation: "A chain is only as strong as its weakest link." 

If one of the uncertainty terms is more than 3 times greater than the other terms, the rootsquares 

formula can be skipped, and the combined uncertainty is simply the largest uncertainty. 

This shortcut can save a lot of time without losing any accuracy in the estimate of the overall 

uncertainty. 

The Upper-Lower Bound Method of Uncertainty Propagation 

An alternative and sometimes simpler procedure to the tedious propagation of 

uncertainty law that is the upper-lower bound method of uncertainty propagation. This 

alternative method does not yield a standard uncertainty estimate (with a 68% confidence 

interval), but it does give a reasonable estimate of the uncertainty for practically any situation. 

The basic idea of this method is to use the uncertainty ranges of each variable to calculate the 

maximum and minimum values of the function. You can also think of this procedure as 

examining the best and worst case scenarios. For example, if you took an angle measurement: θ 

= 25° ± 1° and you needed to find f = cos θ , then 

f max = cos(26°) = 0.8988 f min = cos(24°) = 0.9135 

f ≈ 0.906 ± 0.007 

Note that even though θ was only measured to 2 significant figures, f is known to 3 figures.

As shown in this example, the uncertainty estimate from the 

upper-lower bound method 

is generally larger than the standard uncertainty estimate found from the 

propagation of 

uncertainty law. 

The upper-lower bound method is especially useful when the functional relationship is 

not clear 

or is incomplete. One 

practical application is forecasting the expected range in an 

expense budget. In this case, some expenses may be fixed, while others may be uncertain, and 

the range 

of these uncertain terms could be used to predict the upper and lower bounds on the 

total expense. 

Use of Significant 

Figures for Simple Propagatio 

on of Uncertainty 

By following 

a few simple rules, significant figures can be used to find 

the appropriate 

precision 

for a calculated result for the four most basicc math functions, all without the use of 

complicated formulas for propagating uncertainties. 

For multiplication and division, the number of significant figures that are reliably known 

in a product or quotient is the same as the smallest number of significant figures in any of the 

original numbers. 

Example: 

6.6 (2 significant figures) 

× 7328.7 (5 significant figures) 

48369.42 

= 4.8 × 10 4 (2 significant figures) 

For addition and subtraction, the result should be rounded 

reported for the least precise number. 

Examples: 

223.64 

5560.5 

+54 

+0.008 

278 

5560.5 

off to the last decimal place 

If 

a calculated number is 

to be used in further calculations, 

it is good practice to keep at 

least one extra digit to reduce rounding errorss that may accumulate. 

Then the final 

answer should be rounded according 

to the above guidelines. 

Uncertainty and Significant 

Figures 

For the same 

reason thatt it is dishonest to report a result with more significant figures 

than are reliably known, the uncertainty value should also not be reported 

with excessive 

precision. 

For example, 

if we measure the density of copper, it would 

be unreasonable to report a 

result like: 

measured density = 8.93 ± 0.4753 g/cm 3 WRONG! 

The uncertainty in the measurement cannot be known to that precision. In most 

experimental work, the confidence in the uncertainty estimate is not much better than about ± 

50% because of all the various sources of error, none of whichh can be known 

exactly. Therefore, to be consistent with this large uncertainty in the uncertainty (!)

the uncertainty value should be stated to only one significant figure (or perhaps 2 sig. figs. if the 

first digit is a 1). Experimental uncertainties should be rounded to one (or at most two) 

significant figures. So, the the above result should be reported as 

To help give a sense of the amount of confidence that can be placed in the standard 

deviation, Table IV indicates the relative uncertainty associated with the standard deviation for 

various sample sizes. Note that in order for an uncertainty value to be reported to 3 significant 

figures, more than 10 000 readings would be required to justify this degree of precision! 

When an explicit uncertainty estimate is made, the uncertainty term indicates how many 

significant figures should be reported in the measured value (not the other way around!). For 

example, the uncertainty in the density measurement above is about 0.5 g/cm 3 , so this tells us 

that the digit in the tenths place is uncertain, and should be the last one reported. The other digits 

in the hundredths place and beyond are insignificant, and should not be reported: 

measured density = 8.9 ± 0.5 g/cm 3 

RIGHT! 

An experimental value should be rounded to an appropriate number of significant figures 

consistent with its uncertainty. This generally means that the last significant figure in any 

reported measurement should be in the same decimal place as the uncertainty. 

In most instances, this practice of rounding an experimental result to be consistent with 

the uncertainty estimate gives the same number of significant figures as the rules discussed 

earlier for simple propagation of uncertainties for adding, subtracting, multiplying, and dividing. 

Caution: When conducting an experiment, it is important to keep in mind that precision is 

expensive (both in terms of time and material resources). Do not waste your time trying to 

obtain a precise result when only a rough estimate is required. The cost increases exponentially 

with the amount of precision required, so the potential benefit of this precision must be weighed 

against the extra cost. 

Table IV. Relative Uncertainty Associated with the Standard Deviation for Various Sample 

Sizes 

N Relative Uncert.* Sig. Figs. Valid Implied Uncertainty 

2 71% 1 ± 10% to 100% 

3 50% 1 ± 10% to 100% 

4 41% 1 ± 10% to 100% 

5 35% 1 ± 10% to 100% 

10 24% 1 ± 10% to 100% 

20 16% 1 ± 10% to 100%

30 13% 1 ± 10% to 100% 

50 10% 2 ± 1% to 10% 

100 7% 2 ± 1% to 10% 

10000 0.7% 3 ± 0.1% to 1% 

*The relative uncertainty is given by the approximate formula: 

σ σ 

σ 

= 

1 

2( N −1) 

Combining and Reporting Uncertainties 

In 1993, the International Standards Organization (ISO) published the first official worldwide 

Guide to the Expression of Uncertainty in Measurement. Before this time, uncertainty 

estimates were evaluated and reported according to different conventions depending on the 

context of the measurement or the scientific discipline. Here are a few key points from this 100- 

page guide, which can be found in modified form on the NIST website (see References). 

When reporting a measurement, the measured value should be reported along with an 

estimate of the total combined standard uncertainty of the value. The total uncertainty is found 

by combining the uncertainty components based on the two types of uncertainty analysis: 

Type A evaluation of standard uncertainty – method of evaluation of uncertainty by 

the statistical analysis of a series of observations. This method primarily includes random 

errors. 

Type B evaluation of standard uncertainty – method of evaluation of uncertainty by 

means other than the statistical analysis of series of observations. This method includes 

systematic errors and any other uncertainty factors that the experimenter believes are 

important. 

The individual uncertainty components should be combined using the law of propagation 

of uncertainties, commonly called the "root-sum-of-squares" or "RSS" method. When this is 

done, the combined standard uncertainty should be equivalent to the standard deviation of the 

result, making this uncertainty value correspond with a 68% confidence interval. If a wider 

confidence interval is desired, the uncertainty can be multiplied by a coverage factor (usually k 

= 2 or 3) to provide an uncertainty range that is believed to include the true value with a 

confidence of 95% or 99.7% respectively. If a coverage factor is used, there should be a clear 

explanation of its meaning so there is no confusion for readers interpreting the significance of the 

uncertainty value. 

You should be aware that the ± uncertainty notation may be used to indicate different 

confidence intervals, depending on the scientific discipline or context. For example, a public 

opinion poll may report that the results have a margin of error of ± 3%, which means that 

readers can be 95% confident (not 68% confident) that the reported results are accurate within 3 

percentage points. In physics, the same average result would be reported with an uncertainty of ±

1.5% to indicate the 68% confidence interval. 

Conclusion: "When do measurements agree with each other?" 

We now have the resources to answer the fundamental scientific question that was asked 

at the beginning of this error analysis discussion: "Does my result agree with a theoretical 

prediction or results from other experiments?" 

Generally speaking, a measured result agrees with a theoretical prediction if the 

prediction lies within the range of experimental uncertainty. Similarly, if two measured values 

have standard uncertainty ranges that overlap, then the measurements are said to be consistent 

(they agree). If the uncertainty ranges do not overlap, then the measurements are said to be 

discrepant (they do not agree). However, you should recognize that this overlap criteria can give 

two opposite answers depending on the evaluation and confidence level of the uncertainty. It 

would be unethical to arbitrarily inflate the uncertainty range just to make the measurement agree 

with an expected value. A better procedure would be to discuss the size of the difference 

between the measured and expected values within the context of the uncertainty, and try to 

discover the source of the discrepancy if the difference is truly significant. 

References 

Taylor, John. An Introduction to Error Analysis, 2nd. ed. University Science Books: 

Sausalito, 1997. 

Baird, DC Experimentation: An Introduction to Measurement Theory and Experiment 

Design, 3rd. ed. Prentice Hall: Englewood Cliffs, 1995. 

Bevington, Phillip and Robinson, D. Data Reduction and Error Analysis for the Physical 

Sciences, 2nd. ed. McGraw-Hill: New York, 1991. 

Fisher, RA. Statistical Methods for Research Workers, Oliver & Boyd Publishers, 1958. 

ISO. Guide to the Expression of Uncertainty in Measurement. International Organization for 

Standardization (ISO) and the International Committee on Weights and Measures 

(CIPM): Switzerland, 1993. 

NIST. Guidelines for Evaluating and Expressing the Uncertainty of NIST Measurement 

Results, 1994. Available online: http://physics.nist.gov/Pubs/guidelines/contents.html 

Portions of this document on measurements and error was modified from a document originally prepared by 

The University of North Carolina at Chapel Hill, Department of Physics and Astronomy

Chi-Square Test for Goodness of Fit - Department of Physics

Create successful ePaper yourself

Delete template?

Save as template?