10.07.2015 Views

Using R for Introductory Statistics : John Verzani

Using R for Introductory Statistics : John Verzani

Using R for Introductory Statistics : John Verzani

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>Using</strong> R <strong>for</strong> introductory statistics 50symmetrically distributed, as a stem-and-leaf plot will show. Let’s use the quantile()function to look at the data:> sum(exec.pay > 100)/length(exec.pay) # proportionmore[1] 0.09045 # 9% make more than 1million> quantile(exec.pay,0.9) # 914,000 dollars is 90percentile90%91.4> quantile(exec.pay,0.99) # 9 million is top 1percentile997,906.6> sum(exec.pay quantile(exec.pay,.10) # the 10 percentile is90,00010%9Quantiles versus proportions For a data vector x we can ask two related but inversequestions : what proportion of the data is less than or equal to a specified value? Or <strong>for</strong> aspecified proportion, what value has this proportion of the data less than or equal? Thelatter question is answered by the quantile function.The inter-quartile rangeReturning to the idea of the middle 50% of the data, this would be the distance betweenthe 75th percentile and the 25th percentile. This is known as the interquartile range andis found in R with the IQR() function.For the executive pay data the IQR is> IQR(exec.pay)[1] 27.5Whereas, <strong>for</strong> comparison, the standard deviation is> sd(exec.pay)[1] 207.0This is much bigger, as the largest values of exec. pay are much larger than the others andskew the results.z-scoresThe z-score of a value is the number of standard deviations the value is from the samplemean of the data set. That is,

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!