vdoc

03.05.2023 Views

226 Part IV: Quality AssuranceTypes of defects200100Weighted frequencies1501005080604020Percent0 0Types of defects C A B D EWeighted frequencies 75.0 50.0 45.0 24.0 12.5Percent 36.3 24.2 21.8 11.6 6.1Cumulative % 36.3 60.5 82.3 93.9 100.0Figure 18.13 Pareto chart when weighted frequencies are used.effective way to investigate for such an association is to prepare a graph by plottingone variable along the horizontal scale (x-axis) and the second variable alongthe vertical scale (y-axis). Each pair of observations (x, y) is then plotted as a pointin the xy-plane. The graph prepared is called a scatter plot. A scatter plot is a veryuseful graphical tool because it depicts the nature and strength of associationsbetween two variables. To illustrate, we consider the following example.Part IV.A.4EXAMPLE 18.22The cholesterol levels and the systolic blood pressures of 30 randomly selected U.S.males in the age group of 40 to 50 years are given in Table 18.8. Construct a scatter plotof these data and determine if there is any association between the cholesterol levelsand systolic blood pressures.Solution:Figure 18.14 shows the scatter plot of the data in Table 18.8. This scatter plot clearlyindicates that there is a fairly good upward linear trend. Also, if we draw a straight linethrough the data points, we can see that the data points are concentrated around thestraight line within a narrow band. The upward trend indicates a positive associationbetween the two variables whereas the width of the band indicates the strength of theassociation, which in this case is very strong. As the association between the two variablesgets stronger and stronger, the band enclosing the plotted points becomes narrowerand narrower. A downward trend indicates a negative relationship between twovariables. A numerical measure of association between two numerical variables is calledthe Pearson or simply the correlation coefficient, named after the English statisticianContinued

Chapter 18: A. Basic Statistics and Applications 227ContinuedTable 18.8Cholesterol levels and systolic blood pressures of 30 randomly selectedU.S. males.Subject 1 2 3 4 5 6 7 8 9 10Cholesterol (x) 195 180 220 160 200 220 200 183 139 155Systolic BP (y) 130 128 138 122 140 148 142 127 116 123Subject 11 12 13 14 15 16 17 18 19 20Cholesterol (x) 153 164 171 143 159 167 162 165 178 145Systolic BP (y) 119 130 128 120 121 124 118 121 124 115Subject 21 22 23 24 25 26 27 28 29 30Cholesterol (x) 245 198 156 175 171 167 142 187 158 142Systolic BP (y) 145 126 122 124 117 122 112 131 122 120150140r = .891y130120110150 175200 225 250xPart IV.A.4Figure 18.14 Scatter plot for the data in Table 18.8.Karl Pearson (1857–1936). The correlation coefficient between two numerical variablesin sample data is usually denoted by r. The Greek letter r (rho) denotes the correspondingmeasure of association that is the correlation coefficient for a population of data.The correlation coefficient is defined asr =( xi x) ( yi y)( x x) ( y y)i2 2i=( x) i ( yi)xyi in( x)2 2 i2yi xi yi n n ( )2. (18.16)Continued

Chapter 18: A. Basic Statistics and Applications 227

Continued

Table 18.8

Cholesterol levels and systolic blood pressures of 30 randomly selected

U.S. males.

Subject 1 2 3 4 5 6 7 8 9 10

Cholesterol (x) 195 180 220 160 200 220 200 183 139 155

Systolic BP (y) 130 128 138 122 140 148 142 127 116 123

Subject 11 12 13 14 15 16 17 18 19 20

Cholesterol (x) 153 164 171 143 159 167 162 165 178 145

Systolic BP (y) 119 130 128 120 121 124 118 121 124 115

Subject 21 22 23 24 25 26 27 28 29 30

Cholesterol (x) 245 198 156 175 171 167 142 187 158 142

Systolic BP (y) 145 126 122 124 117 122 112 131 122 120

150

140

r = .891

y

130

120

110

150 175

200 225 250

x

Part IV.A.4

Figure 18.14 Scatter plot for the data in Table 18.8.

Karl Pearson (1857–1936). The correlation coefficient between two numerical variables

in sample data is usually denoted by r. The Greek letter r (rho) denotes the corresponding

measure of association that is the correlation coefficient for a population of data.

The correlation coefficient is defined as

r =

( xi

x) ( yi

y)

( x x) ( y y)

i

2 2

i

=

( x

) i ( yi

)

xy

i i

n

( x

)

2

2 i

2

yi

xi

yi

n n

( )

2

. (18.16)

Continued

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!