10.07.2015 Views

Using R for Introductory Statistics : John Verzani

Using R for Introductory Statistics : John Verzani

Using R for Introductory Statistics : John Verzani

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>Using</strong> R <strong>for</strong> introductory statistics 46good summary of a data set. However, it can be found using commands we’ve seenpreviously. For example, if x stores the data, then the mode may be found as follows:> x=c(72,75,84,84,98,94,55, 62)> which(table(x) == max(table(x)))845That is, the value of 84, which is the fifth, after sorting, of x. Alternately, the functionwhich. max(), which determines the position of the max in a data vector, finds this valuewith which.max(table(x)).The midrange is a natural measure of center—the middle of the range. It can be foundusing mean (range(x)). For some data sets it is close to the mean, but not when there areoutliers. As it is even more sensitive to these than the mean, it isn’t widely used tosummarize the center.Summation notationThe definition of the mean involves a summation:In statistics, this is usually written in a more compact <strong>for</strong>m using summation notation.The above sum is rewritten asThe symbol ∑, the Greek capital sigma, is used to indicate a sum. The i=1 on the bottomand n on top indicate that we should include x i <strong>for</strong> i=1, 2 , …, n, that is x 1 , x 2 , …, x n .Sometimes the indices are explicitly indicated, as inWhen the variable that is being summed over is not in doubt, the summation notation isoften shortened. For example,Notationally, this is how summations are handled in R using the sum() function. If x is adata vector, then sum(x) adds up x[1]+x[2]+…+x[n].The summation notation can be confusing at first but offers the advantages of beingmore compact to write and easier to manipulate algebraically. It also <strong>for</strong>ces our attentionon the operation of addition.■ Example 2.6: Another <strong>for</strong>mula <strong>for</strong> the mean We can use the summation <strong>for</strong>mulato present another useful <strong>for</strong>mula <strong>for</strong> the mean. Let Range (x) be all the values of the dataset. When we add x 1 +x 2 +…+x n , if there are ties in the data, it is natural to group the same

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!