03.05.2023 Views

vdoc

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

208 Part IV: Quality Assurance

Variance

One of the most interesting pieces of information associated with any data set is

how the values in the data set vary from each other. A very important and widely

used measure of dispersion is the variance, which focuses on how far the observations

within a data set deviate from their mean and thus gives an overall understanding

about the variability of the observations within the data set.

For example, if the values in the data set are x 1 , x 2 , x 3 , . . . , x n and the mean is

x – , then x 1 – x – , x 2 – x – , x 3 – x – , . . . , x n – x – are the deviations from the mean. It is then

natural to find the sum of these deviations and argue that if this sum is large,

then the values differ too much from each other, and if this sum is small, they do

not differ from each other too much. Unfortunately, this argument does not hold

since the sum of the deviations is always zero, no matter how much the values in

the data set differ from each other. This is true because some of the deviations

are positive and some are negative and when we take their sum they cancel each

other. To avoid this cancellation process we can square these deviations and then

take their sum. The variance is then the average value of the sum of the squared

deviations from the mean x – . If the data set represents a population, then the deviations

are taken from the population mean m. Thus, the population variance, denoted

by s 2 (read as sigma squared), is defined as

s

1

=

N

X m

N i

2 2

i=

1

( )

And the sample variance, denoted by s 2 , is defined as:

. (18.5)

Part IV.A.2

s

n

1

= X X

i

n

( ) . (18.6)

1

2 2

i=

1

For computational purposes we give below the simplified forms of the above formulas

for population and sample variances:

( )

=

2

1

s 2

2 i

i

N

X X

N

(18.7)

S

2

1

( X

)

i

= X

i

n 1 n

2 2

(18.8)

Note that one difficulty in using the variance as the measure of dispersion is that

the units for measuring the variance are not the same as are used for data values.

Rather, variance is expressed as a square of the units used for the data values. For

example, if the data values are dollar amounts, then the variance will be expressed

in squared dollars, which, in this case, becomes meaningless. Therefore, for application

purposes, we define another measure of dispersion, called standard deviation,

that is directly related to the variance. Standard deviation is measured in the

same units as the data values.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!