10.07.2015 Views

Using R for Introductory Statistics : John Verzani

Using R for Introductory Statistics : John Verzani

Using R for Introductory Statistics : John Verzani

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Univariate data 49> var(test.scores.b) # larger, asanticipated[1] 394.2> sd(test.scores)[1] 4.967Quantiles, quintiles, percentiles, and moreThe standard deviation, like the mean, can be skewed when an exceptionally large orsmall value is in the data. More resistant alternatives are available. A conceptually simpleone (the IQR) is to take the range of the middle 50% of the data. That is, trim off 25% ofthe data from the left and right, and then take the range of what is remaining.To be precise, we need to generalize the concept of the median. The median splits thedata in half—half smaller than the median and half bigger. The quantiles generalize this.The pth quantile is at position 1+p(n−1) in the sorted data. When this is not an integer, aweighted average is used. † This value essentially splits the data so 100p% is smaller and100(1−p)% is larger. Here p ranges from 0 to 1. The median then is the 0.5 quantile.The percentiles do the same thing, except that a scale of 0 to 100 is used, instead of 0to 1. The term quartiles refers to the 0,25, 50,75, and 100 percentiles, and the termquintiles refers to the 0, 20,40, 60, 80, and 100 percentiles.The quantile () function returns the quantiles. This function is called with the datavector and a value (or values) <strong>for</strong> p. We illustrate on a very simple data set, <strong>for</strong> which theanswers are easily guessed.> x = 0:5 # 0,1,2,3,4,5> length(x)[1] 6> sum(sort(x)[3:4])/2 # the median the hard way[1] 2.5> median(x) # easy way. Clearly themiddle[1] 2.5> quantile(x,.25)25%1.25> quantile(x,c(0.25,0.5,0.75)) # more than 1 at a time† There are other definitions used <strong>for</strong> the pth quantile implemented in the quantile() function. Thesealternatives are specified with the type= argument. The default is type 7. See ?quantile <strong>for</strong> thedetails.25% 50% 75%1.25 2.50 3.75> quantile(x) # default gives quartiles0% 25% 50% 75% 100%0.00 1.25 2.50 3.75 5.00■ Example 2.7: Executive pay The exec.pay (<strong>Using</strong>R) data set contains compensation toCEOs of 199 U.S. companies in the year 2000 in units of $10,000. The data is not

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!