Introduction to R
Introduction to R
Introduction to R
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
<strong>Introduction</strong> <strong>to</strong> R<br />
Alisha Albert-Green<br />
aalbertg@uwo.ca<br />
May 13, 2013<br />
1 / 77
Schedule<br />
Course: Monday, May 13, 2013 - Friday, May 17, 2013<br />
Lectures: 9:00am-10:30am<br />
Alisha Albert-Green (Monday - Thursday) & John Braun (Friday)<br />
Labs: 10:45am-12:30pm<br />
Matt Bartley (Monday - Friday)<br />
2 / 77
Course Overview<br />
<strong>Introduction</strong> <strong>to</strong> R:<br />
basic commands, searching for help, data management, downloading<br />
and installing packages<br />
Programming and Simulation:<br />
introduction <strong>to</strong> programming, creating and modifying functions,<br />
generating data<br />
Graphics:<br />
basic plotting, high level data visualization, examples<br />
ANOVA, Linear and Generalized Linear Regression:<br />
common statistical tests, introduction <strong>to</strong> linear and generalized linear<br />
models, fitting models in R, visualization<br />
Bootstrap:<br />
when and why <strong>to</strong> use the bootstrap, parametric and non-parametric<br />
bootstrap in R<br />
3 / 77
The R Project<br />
R is an open source package for statistical computing:<br />
Freely available, and its users are free <strong>to</strong> see how it is written and <strong>to</strong><br />
improve it<br />
The R Core Team is responsible for development and maintenance of R<br />
(Duncan Murdoch from Western is the only Canadian on this<br />
international team)<br />
R is available at: http://cran.r-project.org (this site is referred<br />
<strong>to</strong> as CRAN)<br />
Will run on many operating systems: Windows, Mac, Linux<br />
4 / 77
References<br />
Braun, W.J. and Murdoch, D.M. (2008). A First Course in Statistical<br />
Computing with R. Cambridge University Press.<br />
Maindonald, J. and Braun, W.J. (2010). Data Analysis and Graphics<br />
Using R: An Example-Based Approach, 3rd edition. Cambridge<br />
University Press.<br />
Vernables, W.N. and Ripley, B.D. (2002). Modern Applied Statistics<br />
with S. 4th edition. Springer.<br />
There is a lot of documentation online<br />
5 / 77
Course Conventions<br />
This course is about how <strong>to</strong> do computations in R, with an emphasis<br />
on data visualization<br />
Most of what we do would be the same on any system<br />
The user types input, and R responds with text or graphs as output. To<br />
indicate the difference, we have typeset the user input in red, and text<br />
output in blue (most of the time). For example:<br />
> This was typed by the user<br />
This is a response from R<br />
6 / 77
Installing R<br />
1 Go <strong>to</strong> CRAN (http://cran.r-project.org)<br />
7 / 77
Installing R<br />
1 Go <strong>to</strong> CRAN (http://cran.r-project.org)<br />
2 Click on “download R” from the “Getting Started” box<br />
7 / 77
Installing R<br />
1 Go <strong>to</strong> CRAN (http://cran.r-project.org)<br />
2 Click on “download R” from the “getting started” box<br />
3 Select the appropriate CRAN mirror (for example University of<br />
Toron<strong>to</strong>)<br />
7 / 77
Installing R<br />
1 Go <strong>to</strong> CRAN (http://cran.r-project.org)<br />
2 Click on “download R” from the “getting started” box<br />
3 Select the appropriate CRAN mirror (for example University of<br />
Toron<strong>to</strong>)<br />
4 Select “Download R” for your operating system<br />
7 / 77
Starting R<br />
One of the default settings of the installation procedure is <strong>to</strong> create<br />
an R icon on your computer’s desk<strong>to</strong>p.<br />
Double clicking on the R icon starts the program. 1 The first thing<br />
that will happen is that R will open the console.<br />
Something like the following will appear in the console:<br />
1 Other systems may install an icon <strong>to</strong> click, or may require you <strong>to</strong> type “R” at a<br />
command prompt.<br />
8 / 77
Starting R<br />
R version 3.0.0 (2013-04-03) -- "Masked Marvel"<br />
Copyright (C) 2013 The R Foundation for Statistical Computing<br />
Platform: x86_64-w64-mingw32/x64 (64-bit)<br />
R is free software and comes with ABSOLUTELY NO WARRANTY.<br />
You are welcome <strong>to</strong> redistribute it under certain conditions.<br />
Type ’license()’ or ’licence()’ for distribution details.<br />
Natural language support but running in an English locale<br />
R is a collaborative project with many contribu<strong>to</strong>rs.<br />
Type ’contribu<strong>to</strong>rs()’ for more information and<br />
’citation()’ on how <strong>to</strong> cite R or R packages in publications.<br />
Type ’demo()’ for some demos, ’help()’ for on-line help, or<br />
’help.start()’ for an HTML browser interface <strong>to</strong> help.<br />
Type ’q()’ <strong>to</strong> quit R.<br />
><br />
9 / 77
R as a Calcula<strong>to</strong>r<br />
The > sign tells you that R is ready for you <strong>to</strong> type in a command. For<br />
example, you can do addition:<br />
> 5 + 14<br />
[1] 19<br />
If you hit the ‘Enter’ key, you should see the result. The greater-than sign<br />
(>) is called the prompt symbol.<br />
The [1] indicates that this is the first (and in this case only) result from<br />
the command.<br />
10 / 77
R as a Calcula<strong>to</strong>r<br />
Other commands return multiple values, and each line of results will be<br />
labelled <strong>to</strong> aid the user in deciphering the output. For example, the<br />
sequence of integers from 5 <strong>to</strong> 20 may be displayed as follows:<br />
> options(width=40)<br />
> 5:20<br />
[1] 5 6 7 8 9 10 11 12 13 14 15 16<br />
[13] 17 18 19 20<br />
The first line starts with the first return value, so it is labelled [1]; the<br />
second line starts with the thirteenth, so is labelled [13].<br />
11 / 77
R as a Calcula<strong>to</strong>r<br />
Anything that can be computed with a pocket calcula<strong>to</strong>r can be computed<br />
at the R prompt. Here are some additional examples.<br />
> # Everything following a # sign is assumed <strong>to</strong> be a<br />
> # comment and is ignored by R.<br />
> 3 * 6 # multiplication<br />
[1] 18<br />
> 4 - 9 # subtraction<br />
[1] -5<br />
> 20 / 4 # division<br />
[1] 5<br />
> 11^2 # second power of 11<br />
[1] 121<br />
12 / 77
R as a Calcula<strong>to</strong>r<br />
> 111^2<br />
[1] 12321<br />
> 1111^2<br />
[1] 1234321<br />
> 11111^2<br />
[1] 123454321<br />
> 1111111^2 # only 9 digits are displayed by default<br />
[1] 1.234568e+12<br />
> options(digits=18) # we can display more digits now<br />
> 1111111^2<br />
[1] 1234567654321<br />
> 11111111^2 # do you see a pattern?<br />
[1] 123456787654321<br />
> 111111111^2 # is R always correct?<br />
[1] 12345678987654320<br />
13 / 77
Quitting R<br />
To end your R session, type:<br />
> q()<br />
If you then hit the Enter key, you will be asked whether or not you would<br />
like <strong>to</strong> save an image of the current workspace, or <strong>to</strong> cancel.<br />
The workspace contains a record, called the his<strong>to</strong>ry, of the computations<br />
you’ve done, and may contain some saved results.<br />
If you choose <strong>to</strong> save, then two files will be s<strong>to</strong>red in a folder on your<br />
system: .RData (a workspace including all objects i.e. variables) and<br />
.Rhis<strong>to</strong>ry (a file with all code typed in<strong>to</strong> the console).<br />
14 / 77
Script Files<br />
Rather than saving the workspace, we prefer <strong>to</strong> keep a record of the<br />
commands we entered<br />
Easiest way <strong>to</strong> do this is via a script edi<strong>to</strong>r, available from the<br />
File menu and clicking on “New script”<br />
Commands are executed by highlighting them and hitting Ctrl-R<br />
At the end of a session, save the final script for a permanent record of<br />
your work<br />
15 / 77
Global Variables<br />
R has a workspace known as the global environment that can be used <strong>to</strong><br />
s<strong>to</strong>re the results of calculations, and many other types of objects.<br />
For a first example, suppose we would like <strong>to</strong> s<strong>to</strong>re a result of the<br />
calculation -0.2 + -7.25 * 0.95 for future use. (These number arises<br />
out of a model for predicting Australian log-mortality rate for people at<br />
age 20).<br />
We will assign this value <strong>to</strong> an object called logMortality. To do this,<br />
we type:<br />
> intercept slope logMortality <br />
Notice that when we hit Enter, nothing appears on the screen except a<br />
new prompt: R has done what we asked and is waiting for us <strong>to</strong> ask for<br />
something else.<br />
16 / 77
Global Variables<br />
We can see the results of this assignment by typing the name of our new<br />
object at the prompt:<br />
> logMortality<br />
[1] -7.0875<br />
Think of this just anther calculation: R is calculating the result of the<br />
expression logMortality, and printing it.<br />
17 / 77
Global Variables<br />
We can also use logMorality in further calculations, if we wish.<br />
For example, we can calculate the mortality rate on the original scale:<br />
> mortality mortality<br />
[1] 0.0008354835<br />
18 / 77
Built-in Functions<br />
Most of the work in R is done with functions.<br />
For example, we saw that <strong>to</strong> quit R we type q(). This tells R <strong>to</strong> call the<br />
function named q.<br />
The parentheses surround the argument list, which in this case contains<br />
nothing: we just want R <strong>to</strong> quit, and do not need <strong>to</strong> tell it how.<br />
Another function is mean(). It computes a sample sample and has a<br />
required argument: x.<br />
> y mean(y) # compute the sample mean contained in the vec<strong>to</strong>r y<br />
[1] 2.5<br />
19 / 77
R is Case Sensitive!<br />
See what happens if you type:<br />
> x MEAN(x)<br />
Error: could not find function "MEAN"<br />
or<br />
> mean(x)<br />
[1] 5.5<br />
20 / 77
R is Case Sensitive!<br />
Now try:<br />
> MEAN MEAN(x)<br />
[1] 5.5<br />
The function mean() is built in <strong>to</strong> R. R considers MEAN <strong>to</strong> be a different<br />
function, because it is case-sensitive. However, we can create a function<br />
called MEAN ourselves.<br />
21 / 77
Other Useful Summaries<br />
Other summary statistics can be calculated for data s<strong>to</strong>red in vec<strong>to</strong>rs. In<br />
particular, try:<br />
> var(x) # computes the sample variance for the vec<strong>to</strong>r x<br />
> sd(x) # computes the sample standard deviation for x<br />
> min(x) # displays the minimum value in x<br />
> max(x) # displays the maximum value in x<br />
> summary(x) # computes the min., max., mean, median,<br />
> # 1st, 2nd and 3rd quantiles of x<br />
> quantile(x, probs = c(0.025, 0.975)) # computes the<br />
> # 2.5%and 97.5% quantiles of x<br />
> median(x) # computes the median of x<br />
22 / 77
objects()<br />
The calculations in the previous slides led <strong>to</strong> the creation of several simple<br />
R objects. These objects are s<strong>to</strong>red in the current R workspace. A list of<br />
all objects in the current workspace can be printed <strong>to</strong> the screen using the<br />
objects() function:<br />
> objects()<br />
[1] "intercept" "logMortality"<br />
[3] "MEAN" "mortality"<br />
[5] "slope" "x"<br />
[7] "y"<br />
A synonym for objects() is ls().<br />
Remember that if we quit our R session without saving the workspace<br />
image, then these objects will disappear.<br />
23 / 77
Vec<strong>to</strong>rs<br />
A numeric vec<strong>to</strong>r is a list of numbers. The c() function is used <strong>to</strong> collect<br />
things <strong>to</strong>gether in<strong>to</strong> a vec<strong>to</strong>r. We can type:<br />
> c(0, 7, 8)<br />
[1] 0 7 8<br />
Again, we can assign this <strong>to</strong> a named object:<br />
> x x<br />
[1] 0 7 8<br />
24 / 77
Patterned Vec<strong>to</strong>rs<br />
The : symbol can be used <strong>to</strong> create sequences of increasing (or<br />
decreasing) values. For example:<br />
> numbers3<strong>to</strong>10 numbers3<strong>to</strong>10<br />
[1] 3 4 5 6 7 8 9 10<br />
Vec<strong>to</strong>rs can be joined <strong>to</strong>gether with the c() function. Note what happens<br />
when we type:<br />
> c(numbers3<strong>to</strong>10, x) # recall x
Concatenating Vec<strong>to</strong>rs<br />
Here is another example of the use of the c() function:<br />
> someNumbers
Concatenating Vec<strong>to</strong>rs<br />
We can append numbers3<strong>to</strong>10 <strong>to</strong> the end of someNumbers, and then<br />
append the decreasing sequence from 4 <strong>to</strong> 1:<br />
> moreNumbers moreNumbers<br />
[1] 2 3 5 7 11 13 17 19 23<br />
[10] 29 31 37 41 43 47 59 67 71<br />
[19] 73 79 83 89 97 103 107 109 113<br />
[28] 119 3 4 5 6 7 8 9 10<br />
[37] 4 3 2 1<br />
Remember that the numbers in the square brackets give the index of the<br />
element immediately <strong>to</strong> the right. Among other things, this helps us <strong>to</strong><br />
identify the 27nd element of moreNumbers as 113.<br />
27 / 77
Extracting Elements From Vec<strong>to</strong>rs<br />
A nicer way <strong>to</strong> display the 27nd element of moreNumbers is <strong>to</strong> use square<br />
brackets <strong>to</strong> extract just that element:<br />
> moreNumbers[27]<br />
[1] 113<br />
To see the second element of x, type:<br />
> x[2]<br />
[1] 7<br />
28 / 77
Extracting Elements from Vec<strong>to</strong>rs<br />
We can extract more than one element at a time. For example:<br />
> someNumbers[c(3, 6, 7)]<br />
[1] 5 13 17<br />
To get the third through seventh element of numbers3<strong>to</strong>10, type:<br />
> numbers3<strong>to</strong>10[3:7]<br />
[1] 5 6 7 8 9<br />
Negative indices can be used <strong>to</strong> avoid certain elements. For example, we<br />
can select all but the second element of x as follows:<br />
> x[-2]<br />
[1] 0 8<br />
29 / 77
Extracting Elements from Vec<strong>to</strong>rs<br />
The third through eleventh elements of someNumbers can be avoided as<br />
follows:<br />
> someNumbers[-(3:11)]<br />
[1] 2 3 37 41 43 47 59 67 71<br />
[10] 73 79 83 89 97 103 107 109 113<br />
[19] 119<br />
Using a zero index returns nothing:<br />
> numbers3<strong>to</strong>10[c(0, 5:8)]<br />
[1] 7 8 9 10<br />
Usually, we would not include a zero here.<br />
30 / 77
Extracting Elements from Vec<strong>to</strong>rs<br />
Do not mix positive and negative indices. To see what happens, consider:<br />
> x[c(-2, 3)]<br />
Error in x[c(-2, 3)] :<br />
only 0’s may be mixed with negative subscripts<br />
The problem is that it is not clear what is <strong>to</strong> be extracted: do we want the<br />
third element of x before or after removing the second one?<br />
31 / 77
Vec<strong>to</strong>r Arithmentic<br />
Arithmetic can be done on R vec<strong>to</strong>rs. For example, we can multiply all<br />
elements of x by 3:<br />
> x * 3 # x
Vec<strong>to</strong>r Arithmentic<br />
Addition, subtraction and division by a constant have the same kind of<br />
effect.<br />
For example:<br />
> y x^3<br />
[1] 0 343 512<br />
33 / 77
Vec<strong>to</strong>r Arithmentic<br />
The previous examples show how a binary arithmetic opera<strong>to</strong>r can be used<br />
with vec<strong>to</strong>rs and constants.<br />
In general, the binary opera<strong>to</strong>rs also work element-by-element when<br />
applied <strong>to</strong> pairs of vec<strong>to</strong>rs.<br />
For example, we can compute y x i<br />
i<br />
, for i = 1, 2, 3, i.e. (y x 1<br />
1 , y x 2<br />
2 , y x 3<br />
3<br />
), as<br />
follows:<br />
> y^x<br />
[1] 1 128 6561<br />
34 / 77
Recycling<br />
When the vec<strong>to</strong>rs are different lengths, the shorter one is extended by<br />
recycling: values are repeated, starting at the beginning.<br />
For example, <strong>to</strong> see the pattern of addition of the numbers 1 <strong>to</strong> 10 plus 2<br />
and 3, we need only give the 2:3 vec<strong>to</strong>r once:<br />
> c(1, 1, 2, 2, 3, 3, 4, 4, 5, 5,<br />
+ 6, 6, 7, 7, 8, 8, 9, 9, 10, 10) + 2:3<br />
[1] 3 4 4 5 5 6 6 7 7 8 8 9<br />
[13] 9 10 10 11 11 12 12 13<br />
35 / 77
Recycling<br />
R will give a warning if the length of the longer vec<strong>to</strong>r is not a multiple of<br />
the length of the smaller one, because that is usually a sign that<br />
something is wrong.<br />
For example, if we wanted <strong>to</strong> add 2, 3 and 4 <strong>to</strong> the vec<strong>to</strong>r:<br />
> c(1, 1, 2, 2, 3, 3, 4, 4, 5, 5,<br />
+ 6, 6, 7, 7, 8, 8, 9, 9, 10, 10) + 2:4<br />
[1] 3 4 6 4 6 7 6 7 9 7 9 10 9 10 12 10 12 13 12 13<br />
Warning message:<br />
In c(1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8, 9, 9, 10,<br />
10) + :<br />
longer object length is not a multiple of shorter<br />
object length<br />
36 / 77
Patterned Vec<strong>to</strong>rs<br />
We saw the use of the : opera<strong>to</strong>r for producing simple sequences of<br />
integers.<br />
Patterned vec<strong>to</strong>rs can also be produced using the seq() function as well<br />
as the rep() function. For example, the sequence of odd numbers less<br />
than or equal <strong>to</strong> 21 can be obtained using<br />
> seq(from = 1, <strong>to</strong> = 21, by = 2)<br />
[1] 1 3 5 7 9 11 13 15 17 19 21<br />
Notice the use of by = 2 here. The seq() function has several optional<br />
parameters, including one named by. If by is not specified, the default<br />
value of 1 will be used.<br />
37 / 77
Examples: Patterned Vec<strong>to</strong>rs<br />
Repeated patterns are obtained using rep() and s() .<br />
> rep(3, 12) # repeat the value 3, 12 times<br />
[1] 3 3 3 3 3 3 3 3 3 3 3 3<br />
> rep(seq(2, 20, by=2), 2) # repeat 2,4 ... 20, twice<br />
[1] 2 4 6 8 10 12 14 16 18 20 2 4 6 8 10 12 14 16 18 20<br />
> rep(c(1, 4), c(3, 2)) # repeat 1, 3 times and 4, twice<br />
[1] 1 1 1 4 4<br />
> rep(c(1, 4), each=3) # repeat each value 3 times<br />
[1] 1 1 1 4 4 4<br />
> rep(seq(2, 20, 2), rep(2, 10)) # repeat each value<br />
> # twice<br />
[1] 2 2 4 4 6 6 8 8 10 10 12 12 14 14 16 16 18 18 20 20<br />
38 / 77
Missing Values and Other Special Values<br />
The missing value symbol is NA. Missing values often arise in real data<br />
problems, but they can also arise because of the way calculations are<br />
performed.<br />
> someEvens someEvens[seq(2, 20, 2)]
Missing Values and Other Special Values<br />
Recall that x contains the values (0, 7, 8). Consider:<br />
> x / x<br />
[1] NaN 1 1<br />
The NaN symbol denotes a value which is ‘Not a Number’. This arises as a<br />
result of attempting <strong>to</strong> compute the indeterminate 0/0.<br />
This symbol is sometimes used when a calculation does not make sense. In<br />
other cases, special values may be shown, or you may get an error or<br />
warning message:<br />
> 1 / x<br />
[1] Inf 0.1428571 0.1250000<br />
Here Inf, R has tried <strong>to</strong> evaluate 1/0.<br />
40 / 77
Character Variables<br />
Scalars and vec<strong>to</strong>rs can be made up of strings of characters instead of<br />
numbers. All elements of a vec<strong>to</strong>r must be of the same type. For example:<br />
> primates morePrimates morePrimates<br />
[1] "chimpanzee" "gorilla" "gibbon" "baboon" "lemur"<br />
To check that these are character variables, we can type:<br />
> is.character(morePrimates)<br />
[1] TRUE<br />
or<br />
> class(morePrimates)<br />
[1] "character"<br />
41 / 77
Fac<strong>to</strong>r Variables<br />
Fac<strong>to</strong>rs offer an alternative way of s<strong>to</strong>ring character data. For example, a<br />
fac<strong>to</strong>r with four elements and having the two levels, control and<br />
treatment can be created using fac<strong>to</strong>r():<br />
> grp # these will be s<strong>to</strong>red as character variables<br />
> grp<br />
[1] "control" "treatment" "control" "treatment"<br />
> grp grp<br />
[1] control treatment control treatment<br />
Levels: control treatment<br />
42 / 77
Fac<strong>to</strong>r Variables<br />
Fac<strong>to</strong>rs are a more efficient way of s<strong>to</strong>ring character data when there are<br />
repeats among the vec<strong>to</strong>r elements.<br />
This is because the levels of a fac<strong>to</strong>r are internally coded as integers.<br />
To see what the codes are for our fac<strong>to</strong>r, we can type<br />
> as.integer(grp)<br />
[1] 1 2 1 2<br />
43 / 77
Fac<strong>to</strong>r Variables<br />
The labels for the levels are only s<strong>to</strong>red once each, rather than being<br />
repeated.<br />
The codes are indices in<strong>to</strong> the vec<strong>to</strong>r of levels:<br />
> levels(grp)<br />
[1] "control" "treatment"<br />
> levels(grp)[as.integer(grp)]<br />
[1] "control" "treatment" "control" "treatment"<br />
44 / 77
Fac<strong>to</strong>r Variables<br />
As for numeric vec<strong>to</strong>rs, square brackets [] are used <strong>to</strong> index fac<strong>to</strong>r and<br />
character vec<strong>to</strong>r elements. For example, the fac<strong>to</strong>r grp has 4 elements, so<br />
we can print out the third element by typing:<br />
> grp[3]<br />
[1] control<br />
Levels: control treatment<br />
We can access the third through fifth elements of morePrimates as<br />
follows:<br />
> morePrimates[3:5]<br />
[1] "gibbon" "baboon" "lemur"<br />
45 / 77
Conversions Between Variable Classes<br />
Let’s say, for example, that we have the following indica<strong>to</strong>r representing<br />
weather in London (1 = sunny, 2 = cloudy, 3 = rainy, 4 = snowy and 5 =<br />
hail). The vec<strong>to</strong>r wxInd contains the weather for the past week.<br />
> wxInd wxInd wxInd<br />
[1] 1 1 2 3 5 2 5<br />
Levels: 1 2 3 5<br />
46 / 77
Conversions Between Variable Classes<br />
What if we now wanted <strong>to</strong> convert these back <strong>to</strong> numeric variables:<br />
> as.numeric(wxInd)<br />
> 1 1 2 3 4 2 4 # This does not match our vec<strong>to</strong>r wxInd<br />
Instead, we use<br />
> as.numeric(as.character(wxInd))<br />
> 1 1 2 3 5 2 5<br />
47 / 77
TRUE/FALSE Statements<br />
When there may be missing values, the is.na() function should be used<br />
<strong>to</strong> detect them. For instance:<br />
> is.na(someEvens)<br />
[1] TRUE FALSE TRUE FALSE TRUE FALSE TRUE FALSE TRUE<br />
[10] FALSE TRUE FALSE TRUE FALSE TRUE FALSE TRUE FALSE<br />
[19] TRUE FALSE<br />
The result is an example of a “logical vec<strong>to</strong>r”.<br />
The ! symbol means “not”, so we can locate the non-missing values in<br />
someEvens as follows:<br />
> !is.na(someEvens)<br />
[1] FALSE TRUE FALSE TRUE FALSE TRUE FALSE TRUE FALSE<br />
[10] TRUE FALSE TRUE FALSE TRUE FALSE TRUE FALSE TRUE<br />
[19] FALSE TRUE<br />
48 / 77
TRUE/FALSE Statements<br />
We can then display the even numbers only:<br />
> someEvens[!is.na(someEvens)]<br />
[1] 2 4 6 8 10 12 14 16 18 20<br />
49 / 77
TRUE/FALSE Statements<br />
One of the basic types of vec<strong>to</strong>r in R holds logical values. For example, a<br />
logical vec<strong>to</strong>r may be constructed as<br />
> a b b[a]<br />
[1] 13 2<br />
The elements of b corresponding <strong>to</strong> TRUE are selected.<br />
50 / 77
TRUE/FALSE Statements<br />
If we attempt arithmetic on a logical vec<strong>to</strong>r, e.g.<br />
> sum(a)<br />
[1] 2<br />
then the operations are performed after converting FALSE <strong>to</strong> 0 and TRUE<br />
<strong>to</strong> 1. In this case the result is that we count how many occurrences of<br />
TRUE are in the vec<strong>to</strong>r.<br />
51 / 77
Logical Opera<strong>to</strong>rs<br />
There are also other logical opera<strong>to</strong>rs:<br />
& means in both sets A and B<br />
| means in set A or B<br />
! means not in set A and B<br />
These are all vec<strong>to</strong>rized, so, for example:<br />
> A B A&B<br />
[1] TRUE FALSE FALSE FALSE<br />
> A|B<br />
[1] TRUE FALSE TRUE TRUE<br />
> !A<br />
[1] FALSE TRUE TRUE FALSE<br />
52 / 77
Logical Opera<strong>to</strong>rs<br />
If we attempt logical operations on a numerical vec<strong>to</strong>r, 0 is taken <strong>to</strong> be<br />
FALSE, and any non-zero value is taken <strong>to</strong> be TRUE:<br />
> a<br />
[1] TRUE FALSE FALSE TRUE<br />
><br />
> b - 2<br />
[1] 11 5 6 0<br />
><br />
> a & (b - 2)<br />
[1] TRUE FALSE FALSE FALSE<br />
53 / 77
Relational Opera<strong>to</strong>rs<br />
It is often necessary <strong>to</strong> test relations when programming <strong>to</strong> decide whether<br />
they are TRUE or FALSE. R allows for equality and inequality relations <strong>to</strong><br />
be tested in using the relational opera<strong>to</strong>rs: , ==, >=, C # Which elements are greater than 4?<br />
> C > 4<br />
[1] FALSE TRUE TRUE<br />
> # Which elements are exactly equal <strong>to</strong> 4?<br />
> C == 4 # Rounding error may occur!<br />
[1] FALSE FALSE FALSE<br />
> Which elements are greater than or equal 4?<br />
> C >= 4<br />
[1] FALSE TRUE TRUE<br />
> # Which elements are not equal <strong>to</strong> 4?<br />
> C!=4<br />
[1] TRUE TRUE TRUE<br />
54 / 77
Relational Opera<strong>to</strong>rs<br />
> # Print element of C > 4<br />
> C[C > 4]<br />
[1] 6 9<br />
> D # Which elements of C are less than the corresponding<br />
> # element of D?<br />
> C < D<br />
[1] TRUE FALSE FALSE<br />
> # Print elements of C which are less than the<br />
> # corresponding elements of D:<br />
> C[C < D]<br />
[1] 3<br />
55 / 77
S<strong>to</strong>ring Data<br />
Usually, our data are not s<strong>to</strong>red in vec<strong>to</strong>rs...<br />
56 / 77
Matrices<br />
To arrange values in<strong>to</strong> a matrix, we use the matrix() function:<br />
> m m<br />
[,1] [,2] [,3]<br />
[1,] 1 3 5<br />
[2,] 2 4 6<br />
We can then access elements using two indices. For example, the value in<br />
the first row, second column is:<br />
> m[1, 2]<br />
[1] 3<br />
57 / 77
Matrices<br />
Whole rows or columns of matrices may be selected by leaving the<br />
corresponding index blank:<br />
> m[1,] # <strong>to</strong> display the first row<br />
[1] 1 3 5<br />
> m[, 3] # <strong>to</strong> display the third column<br />
[1] 5 6<br />
58 / 77
Numerical Summaries for Matrices<br />
Numerical summaries for matrices are also available:<br />
> rowMeans(m) # take the mean of the elements in each row<br />
[1] 3 4<br />
> rowSums(m) # take the sum of the elements in each row<br />
[1] 9 12<br />
Note that colMeans and colSums are also built-in functions.<br />
59 / 77
Data Frames<br />
Most data sets are s<strong>to</strong>red in R as data frames. These are like matrices, but<br />
with the columns having their own names.<br />
Columns can be of different types from each other, i.e. numeric, character,<br />
fac<strong>to</strong>r and logical.<br />
Use the data.frame() function <strong>to</strong> construct data frames from vec<strong>to</strong>rs:<br />
> colours numbers coloursAndNumbers
Data Frames<br />
We can see the contents of a data frame:<br />
> coloursAndNumbers<br />
colours numbers moreNumbers<br />
1 red 1 7<br />
2 yellow 2 8<br />
3 blue 3 9<br />
4 yellow 4 10<br />
5 blue 5 11<br />
6 blue 6 12<br />
and access the column names:<br />
> names(coloursAndNumbers)<br />
[1] "colours" "numbers" "moreNumbers"<br />
or the dimension:<br />
> dim(coloursAndNumbers)<br />
[1] 6 3 # corresponds <strong>to</strong> rows and columns<br />
61 / 77
Numerical Summaries for Data Frames<br />
Rows and columns of data frames, can be extracted as they were for<br />
matrices. Let’s say that we want <strong>to</strong> sum over the “numbers” and<br />
“moreNumbers” columns of coloursAndNumbers. We can do this as<br />
follows:<br />
> apply(coloursAndNumbers[,2:3], 2, sum)<br />
numbers moreNumbers<br />
21 57<br />
We can also find the mean of “numbers” by “colours” using the<br />
aggregate function:<br />
> aggregate(coloursAndNumbers$numbers,<br />
+ by = list("colours" = colours), sum)<br />
colours x<br />
1 blue 14<br />
2 red 1<br />
3 yellow 6<br />
62 / 77
Accessing Components of Data Frames<br />
Suppose we want a subset of the coloursAndNumbers data; we want all<br />
rows where “colours” is blue. I can simply use the subset function, as<br />
follows:<br />
> subset(coloursAndNumbers, colours == "blue")<br />
colours numbers moreNumbers<br />
3 blue 3 9<br />
5 blue 5 11<br />
6 blue 6 12<br />
To see the “colours” column:<br />
> coloursAndNumbers$numbers<br />
[1] 1 2 3 4 5 6<br />
Note: columns and rows of data frames can be accessed using the same<br />
syntax as for matrices.<br />
63 / 77
Lists<br />
Lists in R can contain many types of objects including vec<strong>to</strong>rs, matrices,<br />
functions and these may be of different dimension.<br />
Many functions return complicated results as lists.<br />
You can see the names of the objects in a list using the names() function,<br />
and extract parts of it. If, for example, we had a list named d:<br />
> names(d) # Print the names of the objects in the list<br />
> d$x # Print the x component of d<br />
The list() function is one way of organizing multiple pieces of output<br />
from functions.<br />
64 / 77
Lists<br />
> x y list(x = x, y = y)<br />
$x<br />
[1] 3 2 3<br />
$y<br />
[1] 7 7<br />
> mylist mylist$x<br />
[1] 3 2 3<br />
> mylist$mean<br />
function (x, ...)<br />
UseMethod("mean")<br />
<br />
65 / 77
Data Input and Output<br />
When in an R session, it is possible <strong>to</strong> read and write data <strong>to</strong> files outside<br />
of R, for example on your computer’s hard drive.<br />
Before we can discuss some of the many ways of doing this, it is important<br />
<strong>to</strong> know where the data is coming from or going <strong>to</strong>.<br />
66 / 77
Changing Direc<strong>to</strong>ries<br />
In Windows versions of R, it is possible <strong>to</strong> use the File | Change dir<br />
menu <strong>to</strong> choose the direc<strong>to</strong>ry or folder <strong>to</strong> which you wish <strong>to</strong> direct your<br />
data.<br />
It is also possible <strong>to</strong> use the setwd() function. For example, <strong>to</strong> work with<br />
data in the folder mydata on the C: drive, type<br />
setwd("C:/mydata")<br />
From now on, all data input and output will default <strong>to</strong> the mydata folder<br />
in the C: drive.<br />
In order <strong>to</strong> check the current working direc<strong>to</strong>ry, we can simply type:<br />
getwd()<br />
[1] "C:/mydata"<br />
67 / 77
dump() and source() functions<br />
Suppose you have an object called usefuldata in your workspace. To<br />
save this object for future use, type<br />
dump("usefuldata", "useful.R")<br />
This s<strong>to</strong>res the commands necessary <strong>to</strong> construct usefuldata in a file<br />
useful.R on your computer’s hard drive.<br />
To retrieve the vec<strong>to</strong>r in a future session, type<br />
source("useful.R")<br />
68 / 77
Saving and Retrieving Image Files<br />
The vec<strong>to</strong>rs and other objects created during an R session are s<strong>to</strong>red<br />
in the workspace known as the global environment.<br />
When ending an R session, we have the option of saving the<br />
workspace in a file called a workspace image.<br />
If we choose <strong>to</strong> do so, a file called by default .RData is created in the<br />
current working direc<strong>to</strong>ry (folder) which contains the information<br />
needed <strong>to</strong> reconstruct this workspace.<br />
In Windows, the workspace image will be au<strong>to</strong>matically loaded if R is<br />
started by clicking on the icon representing the file .RData, or if the<br />
.RData file is saved in the direc<strong>to</strong>ry from which R is started.<br />
69 / 77
Saving and Retrieving Image Files<br />
Again, we can begin an R session with that workspace image, by<br />
clicking on the icon for temp.RData.<br />
Alternatively, we can type load("temp.RData") after entering an R<br />
session.<br />
Objects that were already in the current workspace image will remain,<br />
unless they have the same name as objects in the workspace image<br />
associated with temp.RData.<br />
In the latter case, the current objects will be overwritten and lost.<br />
70 / 77
Data frames and the read.table() function<br />
Data sets frequently consist of more than one column of data, where each<br />
column represents measurements of a single variable.<br />
Each row usually represents a single observation.<br />
For example, the following data set consists of 4 observations on the three<br />
variables x, y, and z:<br />
x y z<br />
61 13 4<br />
175 21 18<br />
111 24 14<br />
124 23 18<br />
71 / 77
Reading in Data Frames<br />
If such a data set is s<strong>to</strong>red in a file called pretend.txt in the direc<strong>to</strong>ry<br />
mydata on the C: drive, then it can be read in<strong>to</strong> an R data frame:<br />
pretendDF
Dates and Times<br />
The base package has the function strptime() <strong>to</strong> convert from<br />
strings (e.g. "2007-12-25", or "12/25/07") <strong>to</strong> an internal<br />
numerical representation, and format() <strong>to</strong> convert back for printing.<br />
The ISOdate() and ISOdatetime() functions are used when<br />
numerical values for the year, month day, etc. are known.<br />
Other functions are available in the chron package.<br />
Functions, such as mdy.date() and date.mdy(), in the date<br />
package are useful for converting dates <strong>to</strong> Julian dates and vice versa<br />
These can be difficult functions <strong>to</strong> use, and a full description is<br />
beyond the scope of this course.<br />
73 / 77
Built-In Functions and Online Help<br />
The function mean() is an example of a built-in function.<br />
To get help with the mean function, you can type ?mean or help(mean).<br />
These will give you a description of the function for taking the mean of a<br />
vec<strong>to</strong>r. It tells the user what package mean() is contained in, as well as a<br />
description of its arguments and some examples of using this function.<br />
74 / 77
Built-In Functions and Online Help<br />
If you do not know the function name, it is often convenient <strong>to</strong> use<br />
help.start().<br />
This brings up an Internet browser, such as Internet Explorer or Firefox. 2<br />
The browser will show you a menu of several options, including a listing of<br />
installed packages. The base package contains many of the routinely used<br />
functions and is au<strong>to</strong>matically installed upon opening R.<br />
2 R depends on your system <strong>to</strong> have a properly installed browser. If it doesn’t have<br />
one, you may see an error message, or possibly nothing at all.<br />
75 / 77
Built-In Functions and Online Help<br />
Another function that is often used is help.search().<br />
For example, <strong>to</strong> see if there are any functions that do optimization (finding<br />
minima or maxima), type: help.search("optimization").<br />
76 / 77
Installing Packages<br />
All of the examples from <strong>to</strong>day required functions which are part of the<br />
base package in R. However, much of the more advanced functions<br />
require us <strong>to</strong> download packages. If, for example, we wanted <strong>to</strong> download<br />
the lattice package for high level graphics, we could use the following<br />
commands:<br />
> install.packages("lattice")<br />
Select the Canada (ON) CRAN mirror. Then, type:<br />
> library(lattice)<br />
This last line must be run each time a new R session is started.<br />
77 / 77