10.07.2015 Views

Using R for Introductory Statistics : John Verzani

Using R for Introductory Statistics : John Verzani

Using R for Introductory Statistics : John Verzani

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>Using</strong> R <strong>for</strong> introductory statistics 84lwd= The thickness of lines. Numbers bigger than 1 increase the default.col= Specifies the color to use <strong>for</strong> the points or lines.3.3.2 The correlation between two variablesThe correlation between two variables numerically describes whether larger- and smallerthan-averagevalues of one variable are related to larger- or smaller-thanaverage values ofthe other variable.Figure 3.9 shows two data sets: the scattered one on the left is weakly correlated; theone on the right with a trend is strongly correlated. We drew horizontal and vertical linesthrough breaking the figure into four quadrants. The correlated data shows thatlarger than average values of the x variable are paired with larger-than-average values ofthe y variable, as these points are concentrated in upper-right quadrant and not scatteredthroughout both right quadrants. Similarly <strong>for</strong> smaller-than-average values.For the correlated data, the productswill tend to be positive, as thishappens in both the upper-right and lower-left quadrants. This is not the case with thescattered data set. Because of this, the quantitywill be useful indescribing the correlation between two variables. When the data is uncorrelated, theterms will tend to cancel each other out; <strong>for</strong> correlated data they will not.Figure 3.9 Two data sets withhorizontal and vertical lines drawnthrough The data set on the leftshows weak correlation and dataspread throughout the fourquadrants of the plot. The data seton the right is strongly correlated,and the data is concentrated intoopposite quadrants.To produce a numeric summary that can be used to compare data sets, this sum is scaledby a term related to the product of the sample standard deviations. With this scaling, the

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!