10.07.2015 Views

Using R for Introductory Statistics : John Verzani

Using R for Introductory Statistics : John Verzani

Using R for Introductory Statistics : John Verzani

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Bivariate data 85correlation only involves the respective z-scores, and the quantity is always between −1and 1.When there is a linear relationship between x and y then values of r 2 close to 1 indicatea strong linear relationship, and values close to a a weak linear relationship. (Sometimes rmay be close to a, but a different type of relationship holds.)The Pearson correlation coefficientThe Pearson correlation coefficient, r, of two data vectors x and y is defined by(3.1)The value of r is between −1 and 1.In R this is found with the cor() function, as in cor(x, y).We look at the correlations <strong>for</strong> the three data sets just discussed. First we attach thevariable names, as they have been previously detached.> attach(homedata); attach(maydow); attach(kid.weights)In Example 3.2, on Maplewood home values, we saw a nearly linear relationship betweenthe 1970 assessed values and the 2000 ones. The correlation in this case is> cor(y1970,y2000)[1] 0.9111In Example 3.3, where the temperature’s influence on the Dow Jones average wasconsidered, no trend was discernible. The correlation in this example is> cor(max.temp[−1],diff(DJA))[1] 0.01029In the height-and-weight example, the correlation is> cor(height,weight)[1] 0.8238The number is close to 1, but we have our doubts that a linear relationship a correctdescription.The Spearman rank correlationIf the relationship between the variables is not linear but is increasing, such as theapparent curve <strong>for</strong> the height-and-weight data set, we can still use the correlationcoefficient to understand the strength of the relationship. Rather than use the raw data <strong>for</strong>the calculation, we use the ranked data. That is, the data is ordered from smallest tolargest, and a data point’s rank is its position after sorting, with 1 being the smallest and n

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!