01.08.2014 Views

Introduction to R

Introduction to R

Introduction to R

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>Introduction</strong> <strong>to</strong> R<br />

Alisha Albert-Green<br />

aalbertg@uwo.ca<br />

May 13, 2013<br />

1 / 77


Schedule<br />

Course: Monday, May 13, 2013 - Friday, May 17, 2013<br />

Lectures: 9:00am-10:30am<br />

Alisha Albert-Green (Monday - Thursday) & John Braun (Friday)<br />

Labs: 10:45am-12:30pm<br />

Matt Bartley (Monday - Friday)<br />

2 / 77


Course Overview<br />

<strong>Introduction</strong> <strong>to</strong> R:<br />

basic commands, searching for help, data management, downloading<br />

and installing packages<br />

Programming and Simulation:<br />

introduction <strong>to</strong> programming, creating and modifying functions,<br />

generating data<br />

Graphics:<br />

basic plotting, high level data visualization, examples<br />

ANOVA, Linear and Generalized Linear Regression:<br />

common statistical tests, introduction <strong>to</strong> linear and generalized linear<br />

models, fitting models in R, visualization<br />

Bootstrap:<br />

when and why <strong>to</strong> use the bootstrap, parametric and non-parametric<br />

bootstrap in R<br />

3 / 77


The R Project<br />

R is an open source package for statistical computing:<br />

Freely available, and its users are free <strong>to</strong> see how it is written and <strong>to</strong><br />

improve it<br />

The R Core Team is responsible for development and maintenance of R<br />

(Duncan Murdoch from Western is the only Canadian on this<br />

international team)<br />

R is available at: http://cran.r-project.org (this site is referred<br />

<strong>to</strong> as CRAN)<br />

Will run on many operating systems: Windows, Mac, Linux<br />

4 / 77


References<br />

Braun, W.J. and Murdoch, D.M. (2008). A First Course in Statistical<br />

Computing with R. Cambridge University Press.<br />

Maindonald, J. and Braun, W.J. (2010). Data Analysis and Graphics<br />

Using R: An Example-Based Approach, 3rd edition. Cambridge<br />

University Press.<br />

Vernables, W.N. and Ripley, B.D. (2002). Modern Applied Statistics<br />

with S. 4th edition. Springer.<br />

There is a lot of documentation online<br />

5 / 77


Course Conventions<br />

This course is about how <strong>to</strong> do computations in R, with an emphasis<br />

on data visualization<br />

Most of what we do would be the same on any system<br />

The user types input, and R responds with text or graphs as output. To<br />

indicate the difference, we have typeset the user input in red, and text<br />

output in blue (most of the time). For example:<br />

> This was typed by the user<br />

This is a response from R<br />

6 / 77


Installing R<br />

1 Go <strong>to</strong> CRAN (http://cran.r-project.org)<br />

7 / 77


Installing R<br />

1 Go <strong>to</strong> CRAN (http://cran.r-project.org)<br />

2 Click on “download R” from the “Getting Started” box<br />

7 / 77


Installing R<br />

1 Go <strong>to</strong> CRAN (http://cran.r-project.org)<br />

2 Click on “download R” from the “getting started” box<br />

3 Select the appropriate CRAN mirror (for example University of<br />

Toron<strong>to</strong>)<br />

7 / 77


Installing R<br />

1 Go <strong>to</strong> CRAN (http://cran.r-project.org)<br />

2 Click on “download R” from the “getting started” box<br />

3 Select the appropriate CRAN mirror (for example University of<br />

Toron<strong>to</strong>)<br />

4 Select “Download R” for your operating system<br />

7 / 77


Starting R<br />

One of the default settings of the installation procedure is <strong>to</strong> create<br />

an R icon on your computer’s desk<strong>to</strong>p.<br />

Double clicking on the R icon starts the program. 1 The first thing<br />

that will happen is that R will open the console.<br />

Something like the following will appear in the console:<br />

1 Other systems may install an icon <strong>to</strong> click, or may require you <strong>to</strong> type “R” at a<br />

command prompt.<br />

8 / 77


Starting R<br />

R version 3.0.0 (2013-04-03) -- "Masked Marvel"<br />

Copyright (C) 2013 The R Foundation for Statistical Computing<br />

Platform: x86_64-w64-mingw32/x64 (64-bit)<br />

R is free software and comes with ABSOLUTELY NO WARRANTY.<br />

You are welcome <strong>to</strong> redistribute it under certain conditions.<br />

Type ’license()’ or ’licence()’ for distribution details.<br />

Natural language support but running in an English locale<br />

R is a collaborative project with many contribu<strong>to</strong>rs.<br />

Type ’contribu<strong>to</strong>rs()’ for more information and<br />

’citation()’ on how <strong>to</strong> cite R or R packages in publications.<br />

Type ’demo()’ for some demos, ’help()’ for on-line help, or<br />

’help.start()’ for an HTML browser interface <strong>to</strong> help.<br />

Type ’q()’ <strong>to</strong> quit R.<br />

><br />

9 / 77


R as a Calcula<strong>to</strong>r<br />

The > sign tells you that R is ready for you <strong>to</strong> type in a command. For<br />

example, you can do addition:<br />

> 5 + 14<br />

[1] 19<br />

If you hit the ‘Enter’ key, you should see the result. The greater-than sign<br />

(>) is called the prompt symbol.<br />

The [1] indicates that this is the first (and in this case only) result from<br />

the command.<br />

10 / 77


R as a Calcula<strong>to</strong>r<br />

Other commands return multiple values, and each line of results will be<br />

labelled <strong>to</strong> aid the user in deciphering the output. For example, the<br />

sequence of integers from 5 <strong>to</strong> 20 may be displayed as follows:<br />

> options(width=40)<br />

> 5:20<br />

[1] 5 6 7 8 9 10 11 12 13 14 15 16<br />

[13] 17 18 19 20<br />

The first line starts with the first return value, so it is labelled [1]; the<br />

second line starts with the thirteenth, so is labelled [13].<br />

11 / 77


R as a Calcula<strong>to</strong>r<br />

Anything that can be computed with a pocket calcula<strong>to</strong>r can be computed<br />

at the R prompt. Here are some additional examples.<br />

> # Everything following a # sign is assumed <strong>to</strong> be a<br />

> # comment and is ignored by R.<br />

> 3 * 6 # multiplication<br />

[1] 18<br />

> 4 - 9 # subtraction<br />

[1] -5<br />

> 20 / 4 # division<br />

[1] 5<br />

> 11^2 # second power of 11<br />

[1] 121<br />

12 / 77


R as a Calcula<strong>to</strong>r<br />

> 111^2<br />

[1] 12321<br />

> 1111^2<br />

[1] 1234321<br />

> 11111^2<br />

[1] 123454321<br />

> 1111111^2 # only 9 digits are displayed by default<br />

[1] 1.234568e+12<br />

> options(digits=18) # we can display more digits now<br />

> 1111111^2<br />

[1] 1234567654321<br />

> 11111111^2 # do you see a pattern?<br />

[1] 123456787654321<br />

> 111111111^2 # is R always correct?<br />

[1] 12345678987654320<br />

13 / 77


Quitting R<br />

To end your R session, type:<br />

> q()<br />

If you then hit the Enter key, you will be asked whether or not you would<br />

like <strong>to</strong> save an image of the current workspace, or <strong>to</strong> cancel.<br />

The workspace contains a record, called the his<strong>to</strong>ry, of the computations<br />

you’ve done, and may contain some saved results.<br />

If you choose <strong>to</strong> save, then two files will be s<strong>to</strong>red in a folder on your<br />

system: .RData (a workspace including all objects i.e. variables) and<br />

.Rhis<strong>to</strong>ry (a file with all code typed in<strong>to</strong> the console).<br />

14 / 77


Script Files<br />

Rather than saving the workspace, we prefer <strong>to</strong> keep a record of the<br />

commands we entered<br />

Easiest way <strong>to</strong> do this is via a script edi<strong>to</strong>r, available from the<br />

File menu and clicking on “New script”<br />

Commands are executed by highlighting them and hitting Ctrl-R<br />

At the end of a session, save the final script for a permanent record of<br />

your work<br />

15 / 77


Global Variables<br />

R has a workspace known as the global environment that can be used <strong>to</strong><br />

s<strong>to</strong>re the results of calculations, and many other types of objects.<br />

For a first example, suppose we would like <strong>to</strong> s<strong>to</strong>re a result of the<br />

calculation -0.2 + -7.25 * 0.95 for future use. (These number arises<br />

out of a model for predicting Australian log-mortality rate for people at<br />

age 20).<br />

We will assign this value <strong>to</strong> an object called logMortality. To do this,<br />

we type:<br />

> intercept slope logMortality <br />

Notice that when we hit Enter, nothing appears on the screen except a<br />

new prompt: R has done what we asked and is waiting for us <strong>to</strong> ask for<br />

something else.<br />

16 / 77


Global Variables<br />

We can see the results of this assignment by typing the name of our new<br />

object at the prompt:<br />

> logMortality<br />

[1] -7.0875<br />

Think of this just anther calculation: R is calculating the result of the<br />

expression logMortality, and printing it.<br />

17 / 77


Global Variables<br />

We can also use logMorality in further calculations, if we wish.<br />

For example, we can calculate the mortality rate on the original scale:<br />

> mortality mortality<br />

[1] 0.0008354835<br />

18 / 77


Built-in Functions<br />

Most of the work in R is done with functions.<br />

For example, we saw that <strong>to</strong> quit R we type q(). This tells R <strong>to</strong> call the<br />

function named q.<br />

The parentheses surround the argument list, which in this case contains<br />

nothing: we just want R <strong>to</strong> quit, and do not need <strong>to</strong> tell it how.<br />

Another function is mean(). It computes a sample sample and has a<br />

required argument: x.<br />

> y mean(y) # compute the sample mean contained in the vec<strong>to</strong>r y<br />

[1] 2.5<br />

19 / 77


R is Case Sensitive!<br />

See what happens if you type:<br />

> x MEAN(x)<br />

Error: could not find function "MEAN"<br />

or<br />

> mean(x)<br />

[1] 5.5<br />

20 / 77


R is Case Sensitive!<br />

Now try:<br />

> MEAN MEAN(x)<br />

[1] 5.5<br />

The function mean() is built in <strong>to</strong> R. R considers MEAN <strong>to</strong> be a different<br />

function, because it is case-sensitive. However, we can create a function<br />

called MEAN ourselves.<br />

21 / 77


Other Useful Summaries<br />

Other summary statistics can be calculated for data s<strong>to</strong>red in vec<strong>to</strong>rs. In<br />

particular, try:<br />

> var(x) # computes the sample variance for the vec<strong>to</strong>r x<br />

> sd(x) # computes the sample standard deviation for x<br />

> min(x) # displays the minimum value in x<br />

> max(x) # displays the maximum value in x<br />

> summary(x) # computes the min., max., mean, median,<br />

> # 1st, 2nd and 3rd quantiles of x<br />

> quantile(x, probs = c(0.025, 0.975)) # computes the<br />

> # 2.5%and 97.5% quantiles of x<br />

> median(x) # computes the median of x<br />

22 / 77


objects()<br />

The calculations in the previous slides led <strong>to</strong> the creation of several simple<br />

R objects. These objects are s<strong>to</strong>red in the current R workspace. A list of<br />

all objects in the current workspace can be printed <strong>to</strong> the screen using the<br />

objects() function:<br />

> objects()<br />

[1] "intercept" "logMortality"<br />

[3] "MEAN" "mortality"<br />

[5] "slope" "x"<br />

[7] "y"<br />

A synonym for objects() is ls().<br />

Remember that if we quit our R session without saving the workspace<br />

image, then these objects will disappear.<br />

23 / 77


Vec<strong>to</strong>rs<br />

A numeric vec<strong>to</strong>r is a list of numbers. The c() function is used <strong>to</strong> collect<br />

things <strong>to</strong>gether in<strong>to</strong> a vec<strong>to</strong>r. We can type:<br />

> c(0, 7, 8)<br />

[1] 0 7 8<br />

Again, we can assign this <strong>to</strong> a named object:<br />

> x x<br />

[1] 0 7 8<br />

24 / 77


Patterned Vec<strong>to</strong>rs<br />

The : symbol can be used <strong>to</strong> create sequences of increasing (or<br />

decreasing) values. For example:<br />

> numbers3<strong>to</strong>10 numbers3<strong>to</strong>10<br />

[1] 3 4 5 6 7 8 9 10<br />

Vec<strong>to</strong>rs can be joined <strong>to</strong>gether with the c() function. Note what happens<br />

when we type:<br />

> c(numbers3<strong>to</strong>10, x) # recall x


Concatenating Vec<strong>to</strong>rs<br />

Here is another example of the use of the c() function:<br />

> someNumbers


Concatenating Vec<strong>to</strong>rs<br />

We can append numbers3<strong>to</strong>10 <strong>to</strong> the end of someNumbers, and then<br />

append the decreasing sequence from 4 <strong>to</strong> 1:<br />

> moreNumbers moreNumbers<br />

[1] 2 3 5 7 11 13 17 19 23<br />

[10] 29 31 37 41 43 47 59 67 71<br />

[19] 73 79 83 89 97 103 107 109 113<br />

[28] 119 3 4 5 6 7 8 9 10<br />

[37] 4 3 2 1<br />

Remember that the numbers in the square brackets give the index of the<br />

element immediately <strong>to</strong> the right. Among other things, this helps us <strong>to</strong><br />

identify the 27nd element of moreNumbers as 113.<br />

27 / 77


Extracting Elements From Vec<strong>to</strong>rs<br />

A nicer way <strong>to</strong> display the 27nd element of moreNumbers is <strong>to</strong> use square<br />

brackets <strong>to</strong> extract just that element:<br />

> moreNumbers[27]<br />

[1] 113<br />

To see the second element of x, type:<br />

> x[2]<br />

[1] 7<br />

28 / 77


Extracting Elements from Vec<strong>to</strong>rs<br />

We can extract more than one element at a time. For example:<br />

> someNumbers[c(3, 6, 7)]<br />

[1] 5 13 17<br />

To get the third through seventh element of numbers3<strong>to</strong>10, type:<br />

> numbers3<strong>to</strong>10[3:7]<br />

[1] 5 6 7 8 9<br />

Negative indices can be used <strong>to</strong> avoid certain elements. For example, we<br />

can select all but the second element of x as follows:<br />

> x[-2]<br />

[1] 0 8<br />

29 / 77


Extracting Elements from Vec<strong>to</strong>rs<br />

The third through eleventh elements of someNumbers can be avoided as<br />

follows:<br />

> someNumbers[-(3:11)]<br />

[1] 2 3 37 41 43 47 59 67 71<br />

[10] 73 79 83 89 97 103 107 109 113<br />

[19] 119<br />

Using a zero index returns nothing:<br />

> numbers3<strong>to</strong>10[c(0, 5:8)]<br />

[1] 7 8 9 10<br />

Usually, we would not include a zero here.<br />

30 / 77


Extracting Elements from Vec<strong>to</strong>rs<br />

Do not mix positive and negative indices. To see what happens, consider:<br />

> x[c(-2, 3)]<br />

Error in x[c(-2, 3)] :<br />

only 0’s may be mixed with negative subscripts<br />

The problem is that it is not clear what is <strong>to</strong> be extracted: do we want the<br />

third element of x before or after removing the second one?<br />

31 / 77


Vec<strong>to</strong>r Arithmentic<br />

Arithmetic can be done on R vec<strong>to</strong>rs. For example, we can multiply all<br />

elements of x by 3:<br />

> x * 3 # x


Vec<strong>to</strong>r Arithmentic<br />

Addition, subtraction and division by a constant have the same kind of<br />

effect.<br />

For example:<br />

> y x^3<br />

[1] 0 343 512<br />

33 / 77


Vec<strong>to</strong>r Arithmentic<br />

The previous examples show how a binary arithmetic opera<strong>to</strong>r can be used<br />

with vec<strong>to</strong>rs and constants.<br />

In general, the binary opera<strong>to</strong>rs also work element-by-element when<br />

applied <strong>to</strong> pairs of vec<strong>to</strong>rs.<br />

For example, we can compute y x i<br />

i<br />

, for i = 1, 2, 3, i.e. (y x 1<br />

1 , y x 2<br />

2 , y x 3<br />

3<br />

), as<br />

follows:<br />

> y^x<br />

[1] 1 128 6561<br />

34 / 77


Recycling<br />

When the vec<strong>to</strong>rs are different lengths, the shorter one is extended by<br />

recycling: values are repeated, starting at the beginning.<br />

For example, <strong>to</strong> see the pattern of addition of the numbers 1 <strong>to</strong> 10 plus 2<br />

and 3, we need only give the 2:3 vec<strong>to</strong>r once:<br />

> c(1, 1, 2, 2, 3, 3, 4, 4, 5, 5,<br />

+ 6, 6, 7, 7, 8, 8, 9, 9, 10, 10) + 2:3<br />

[1] 3 4 4 5 5 6 6 7 7 8 8 9<br />

[13] 9 10 10 11 11 12 12 13<br />

35 / 77


Recycling<br />

R will give a warning if the length of the longer vec<strong>to</strong>r is not a multiple of<br />

the length of the smaller one, because that is usually a sign that<br />

something is wrong.<br />

For example, if we wanted <strong>to</strong> add 2, 3 and 4 <strong>to</strong> the vec<strong>to</strong>r:<br />

> c(1, 1, 2, 2, 3, 3, 4, 4, 5, 5,<br />

+ 6, 6, 7, 7, 8, 8, 9, 9, 10, 10) + 2:4<br />

[1] 3 4 6 4 6 7 6 7 9 7 9 10 9 10 12 10 12 13 12 13<br />

Warning message:<br />

In c(1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8, 9, 9, 10,<br />

10) + :<br />

longer object length is not a multiple of shorter<br />

object length<br />

36 / 77


Patterned Vec<strong>to</strong>rs<br />

We saw the use of the : opera<strong>to</strong>r for producing simple sequences of<br />

integers.<br />

Patterned vec<strong>to</strong>rs can also be produced using the seq() function as well<br />

as the rep() function. For example, the sequence of odd numbers less<br />

than or equal <strong>to</strong> 21 can be obtained using<br />

> seq(from = 1, <strong>to</strong> = 21, by = 2)<br />

[1] 1 3 5 7 9 11 13 15 17 19 21<br />

Notice the use of by = 2 here. The seq() function has several optional<br />

parameters, including one named by. If by is not specified, the default<br />

value of 1 will be used.<br />

37 / 77


Examples: Patterned Vec<strong>to</strong>rs<br />

Repeated patterns are obtained using rep() and s() .<br />

> rep(3, 12) # repeat the value 3, 12 times<br />

[1] 3 3 3 3 3 3 3 3 3 3 3 3<br />

> rep(seq(2, 20, by=2), 2) # repeat 2,4 ... 20, twice<br />

[1] 2 4 6 8 10 12 14 16 18 20 2 4 6 8 10 12 14 16 18 20<br />

> rep(c(1, 4), c(3, 2)) # repeat 1, 3 times and 4, twice<br />

[1] 1 1 1 4 4<br />

> rep(c(1, 4), each=3) # repeat each value 3 times<br />

[1] 1 1 1 4 4 4<br />

> rep(seq(2, 20, 2), rep(2, 10)) # repeat each value<br />

> # twice<br />

[1] 2 2 4 4 6 6 8 8 10 10 12 12 14 14 16 16 18 18 20 20<br />

38 / 77


Missing Values and Other Special Values<br />

The missing value symbol is NA. Missing values often arise in real data<br />

problems, but they can also arise because of the way calculations are<br />

performed.<br />

> someEvens someEvens[seq(2, 20, 2)]


Missing Values and Other Special Values<br />

Recall that x contains the values (0, 7, 8). Consider:<br />

> x / x<br />

[1] NaN 1 1<br />

The NaN symbol denotes a value which is ‘Not a Number’. This arises as a<br />

result of attempting <strong>to</strong> compute the indeterminate 0/0.<br />

This symbol is sometimes used when a calculation does not make sense. In<br />

other cases, special values may be shown, or you may get an error or<br />

warning message:<br />

> 1 / x<br />

[1] Inf 0.1428571 0.1250000<br />

Here Inf, R has tried <strong>to</strong> evaluate 1/0.<br />

40 / 77


Character Variables<br />

Scalars and vec<strong>to</strong>rs can be made up of strings of characters instead of<br />

numbers. All elements of a vec<strong>to</strong>r must be of the same type. For example:<br />

> primates morePrimates morePrimates<br />

[1] "chimpanzee" "gorilla" "gibbon" "baboon" "lemur"<br />

To check that these are character variables, we can type:<br />

> is.character(morePrimates)<br />

[1] TRUE<br />

or<br />

> class(morePrimates)<br />

[1] "character"<br />

41 / 77


Fac<strong>to</strong>r Variables<br />

Fac<strong>to</strong>rs offer an alternative way of s<strong>to</strong>ring character data. For example, a<br />

fac<strong>to</strong>r with four elements and having the two levels, control and<br />

treatment can be created using fac<strong>to</strong>r():<br />

> grp # these will be s<strong>to</strong>red as character variables<br />

> grp<br />

[1] "control" "treatment" "control" "treatment"<br />

> grp grp<br />

[1] control treatment control treatment<br />

Levels: control treatment<br />

42 / 77


Fac<strong>to</strong>r Variables<br />

Fac<strong>to</strong>rs are a more efficient way of s<strong>to</strong>ring character data when there are<br />

repeats among the vec<strong>to</strong>r elements.<br />

This is because the levels of a fac<strong>to</strong>r are internally coded as integers.<br />

To see what the codes are for our fac<strong>to</strong>r, we can type<br />

> as.integer(grp)<br />

[1] 1 2 1 2<br />

43 / 77


Fac<strong>to</strong>r Variables<br />

The labels for the levels are only s<strong>to</strong>red once each, rather than being<br />

repeated.<br />

The codes are indices in<strong>to</strong> the vec<strong>to</strong>r of levels:<br />

> levels(grp)<br />

[1] "control" "treatment"<br />

> levels(grp)[as.integer(grp)]<br />

[1] "control" "treatment" "control" "treatment"<br />

44 / 77


Fac<strong>to</strong>r Variables<br />

As for numeric vec<strong>to</strong>rs, square brackets [] are used <strong>to</strong> index fac<strong>to</strong>r and<br />

character vec<strong>to</strong>r elements. For example, the fac<strong>to</strong>r grp has 4 elements, so<br />

we can print out the third element by typing:<br />

> grp[3]<br />

[1] control<br />

Levels: control treatment<br />

We can access the third through fifth elements of morePrimates as<br />

follows:<br />

> morePrimates[3:5]<br />

[1] "gibbon" "baboon" "lemur"<br />

45 / 77


Conversions Between Variable Classes<br />

Let’s say, for example, that we have the following indica<strong>to</strong>r representing<br />

weather in London (1 = sunny, 2 = cloudy, 3 = rainy, 4 = snowy and 5 =<br />

hail). The vec<strong>to</strong>r wxInd contains the weather for the past week.<br />

> wxInd wxInd wxInd<br />

[1] 1 1 2 3 5 2 5<br />

Levels: 1 2 3 5<br />

46 / 77


Conversions Between Variable Classes<br />

What if we now wanted <strong>to</strong> convert these back <strong>to</strong> numeric variables:<br />

> as.numeric(wxInd)<br />

> 1 1 2 3 4 2 4 # This does not match our vec<strong>to</strong>r wxInd<br />

Instead, we use<br />

> as.numeric(as.character(wxInd))<br />

> 1 1 2 3 5 2 5<br />

47 / 77


TRUE/FALSE Statements<br />

When there may be missing values, the is.na() function should be used<br />

<strong>to</strong> detect them. For instance:<br />

> is.na(someEvens)<br />

[1] TRUE FALSE TRUE FALSE TRUE FALSE TRUE FALSE TRUE<br />

[10] FALSE TRUE FALSE TRUE FALSE TRUE FALSE TRUE FALSE<br />

[19] TRUE FALSE<br />

The result is an example of a “logical vec<strong>to</strong>r”.<br />

The ! symbol means “not”, so we can locate the non-missing values in<br />

someEvens as follows:<br />

> !is.na(someEvens)<br />

[1] FALSE TRUE FALSE TRUE FALSE TRUE FALSE TRUE FALSE<br />

[10] TRUE FALSE TRUE FALSE TRUE FALSE TRUE FALSE TRUE<br />

[19] FALSE TRUE<br />

48 / 77


TRUE/FALSE Statements<br />

We can then display the even numbers only:<br />

> someEvens[!is.na(someEvens)]<br />

[1] 2 4 6 8 10 12 14 16 18 20<br />

49 / 77


TRUE/FALSE Statements<br />

One of the basic types of vec<strong>to</strong>r in R holds logical values. For example, a<br />

logical vec<strong>to</strong>r may be constructed as<br />

> a b b[a]<br />

[1] 13 2<br />

The elements of b corresponding <strong>to</strong> TRUE are selected.<br />

50 / 77


TRUE/FALSE Statements<br />

If we attempt arithmetic on a logical vec<strong>to</strong>r, e.g.<br />

> sum(a)<br />

[1] 2<br />

then the operations are performed after converting FALSE <strong>to</strong> 0 and TRUE<br />

<strong>to</strong> 1. In this case the result is that we count how many occurrences of<br />

TRUE are in the vec<strong>to</strong>r.<br />

51 / 77


Logical Opera<strong>to</strong>rs<br />

There are also other logical opera<strong>to</strong>rs:<br />

& means in both sets A and B<br />

| means in set A or B<br />

! means not in set A and B<br />

These are all vec<strong>to</strong>rized, so, for example:<br />

> A B A&B<br />

[1] TRUE FALSE FALSE FALSE<br />

> A|B<br />

[1] TRUE FALSE TRUE TRUE<br />

> !A<br />

[1] FALSE TRUE TRUE FALSE<br />

52 / 77


Logical Opera<strong>to</strong>rs<br />

If we attempt logical operations on a numerical vec<strong>to</strong>r, 0 is taken <strong>to</strong> be<br />

FALSE, and any non-zero value is taken <strong>to</strong> be TRUE:<br />

> a<br />

[1] TRUE FALSE FALSE TRUE<br />

><br />

> b - 2<br />

[1] 11 5 6 0<br />

><br />

> a & (b - 2)<br />

[1] TRUE FALSE FALSE FALSE<br />

53 / 77


Relational Opera<strong>to</strong>rs<br />

It is often necessary <strong>to</strong> test relations when programming <strong>to</strong> decide whether<br />

they are TRUE or FALSE. R allows for equality and inequality relations <strong>to</strong><br />

be tested in using the relational opera<strong>to</strong>rs: , ==, >=, C # Which elements are greater than 4?<br />

> C > 4<br />

[1] FALSE TRUE TRUE<br />

> # Which elements are exactly equal <strong>to</strong> 4?<br />

> C == 4 # Rounding error may occur!<br />

[1] FALSE FALSE FALSE<br />

> Which elements are greater than or equal 4?<br />

> C >= 4<br />

[1] FALSE TRUE TRUE<br />

> # Which elements are not equal <strong>to</strong> 4?<br />

> C!=4<br />

[1] TRUE TRUE TRUE<br />

54 / 77


Relational Opera<strong>to</strong>rs<br />

> # Print element of C > 4<br />

> C[C > 4]<br />

[1] 6 9<br />

> D # Which elements of C are less than the corresponding<br />

> # element of D?<br />

> C < D<br />

[1] TRUE FALSE FALSE<br />

> # Print elements of C which are less than the<br />

> # corresponding elements of D:<br />

> C[C < D]<br />

[1] 3<br />

55 / 77


S<strong>to</strong>ring Data<br />

Usually, our data are not s<strong>to</strong>red in vec<strong>to</strong>rs...<br />

56 / 77


Matrices<br />

To arrange values in<strong>to</strong> a matrix, we use the matrix() function:<br />

> m m<br />

[,1] [,2] [,3]<br />

[1,] 1 3 5<br />

[2,] 2 4 6<br />

We can then access elements using two indices. For example, the value in<br />

the first row, second column is:<br />

> m[1, 2]<br />

[1] 3<br />

57 / 77


Matrices<br />

Whole rows or columns of matrices may be selected by leaving the<br />

corresponding index blank:<br />

> m[1,] # <strong>to</strong> display the first row<br />

[1] 1 3 5<br />

> m[, 3] # <strong>to</strong> display the third column<br />

[1] 5 6<br />

58 / 77


Numerical Summaries for Matrices<br />

Numerical summaries for matrices are also available:<br />

> rowMeans(m) # take the mean of the elements in each row<br />

[1] 3 4<br />

> rowSums(m) # take the sum of the elements in each row<br />

[1] 9 12<br />

Note that colMeans and colSums are also built-in functions.<br />

59 / 77


Data Frames<br />

Most data sets are s<strong>to</strong>red in R as data frames. These are like matrices, but<br />

with the columns having their own names.<br />

Columns can be of different types from each other, i.e. numeric, character,<br />

fac<strong>to</strong>r and logical.<br />

Use the data.frame() function <strong>to</strong> construct data frames from vec<strong>to</strong>rs:<br />

> colours numbers coloursAndNumbers


Data Frames<br />

We can see the contents of a data frame:<br />

> coloursAndNumbers<br />

colours numbers moreNumbers<br />

1 red 1 7<br />

2 yellow 2 8<br />

3 blue 3 9<br />

4 yellow 4 10<br />

5 blue 5 11<br />

6 blue 6 12<br />

and access the column names:<br />

> names(coloursAndNumbers)<br />

[1] "colours" "numbers" "moreNumbers"<br />

or the dimension:<br />

> dim(coloursAndNumbers)<br />

[1] 6 3 # corresponds <strong>to</strong> rows and columns<br />

61 / 77


Numerical Summaries for Data Frames<br />

Rows and columns of data frames, can be extracted as they were for<br />

matrices. Let’s say that we want <strong>to</strong> sum over the “numbers” and<br />

“moreNumbers” columns of coloursAndNumbers. We can do this as<br />

follows:<br />

> apply(coloursAndNumbers[,2:3], 2, sum)<br />

numbers moreNumbers<br />

21 57<br />

We can also find the mean of “numbers” by “colours” using the<br />

aggregate function:<br />

> aggregate(coloursAndNumbers$numbers,<br />

+ by = list("colours" = colours), sum)<br />

colours x<br />

1 blue 14<br />

2 red 1<br />

3 yellow 6<br />

62 / 77


Accessing Components of Data Frames<br />

Suppose we want a subset of the coloursAndNumbers data; we want all<br />

rows where “colours” is blue. I can simply use the subset function, as<br />

follows:<br />

> subset(coloursAndNumbers, colours == "blue")<br />

colours numbers moreNumbers<br />

3 blue 3 9<br />

5 blue 5 11<br />

6 blue 6 12<br />

To see the “colours” column:<br />

> coloursAndNumbers$numbers<br />

[1] 1 2 3 4 5 6<br />

Note: columns and rows of data frames can be accessed using the same<br />

syntax as for matrices.<br />

63 / 77


Lists<br />

Lists in R can contain many types of objects including vec<strong>to</strong>rs, matrices,<br />

functions and these may be of different dimension.<br />

Many functions return complicated results as lists.<br />

You can see the names of the objects in a list using the names() function,<br />

and extract parts of it. If, for example, we had a list named d:<br />

> names(d) # Print the names of the objects in the list<br />

> d$x # Print the x component of d<br />

The list() function is one way of organizing multiple pieces of output<br />

from functions.<br />

64 / 77


Lists<br />

> x y list(x = x, y = y)<br />

$x<br />

[1] 3 2 3<br />

$y<br />

[1] 7 7<br />

> mylist mylist$x<br />

[1] 3 2 3<br />

> mylist$mean<br />

function (x, ...)<br />

UseMethod("mean")<br />

<br />

65 / 77


Data Input and Output<br />

When in an R session, it is possible <strong>to</strong> read and write data <strong>to</strong> files outside<br />

of R, for example on your computer’s hard drive.<br />

Before we can discuss some of the many ways of doing this, it is important<br />

<strong>to</strong> know where the data is coming from or going <strong>to</strong>.<br />

66 / 77


Changing Direc<strong>to</strong>ries<br />

In Windows versions of R, it is possible <strong>to</strong> use the File | Change dir<br />

menu <strong>to</strong> choose the direc<strong>to</strong>ry or folder <strong>to</strong> which you wish <strong>to</strong> direct your<br />

data.<br />

It is also possible <strong>to</strong> use the setwd() function. For example, <strong>to</strong> work with<br />

data in the folder mydata on the C: drive, type<br />

setwd("C:/mydata")<br />

From now on, all data input and output will default <strong>to</strong> the mydata folder<br />

in the C: drive.<br />

In order <strong>to</strong> check the current working direc<strong>to</strong>ry, we can simply type:<br />

getwd()<br />

[1] "C:/mydata"<br />

67 / 77


dump() and source() functions<br />

Suppose you have an object called usefuldata in your workspace. To<br />

save this object for future use, type<br />

dump("usefuldata", "useful.R")<br />

This s<strong>to</strong>res the commands necessary <strong>to</strong> construct usefuldata in a file<br />

useful.R on your computer’s hard drive.<br />

To retrieve the vec<strong>to</strong>r in a future session, type<br />

source("useful.R")<br />

68 / 77


Saving and Retrieving Image Files<br />

The vec<strong>to</strong>rs and other objects created during an R session are s<strong>to</strong>red<br />

in the workspace known as the global environment.<br />

When ending an R session, we have the option of saving the<br />

workspace in a file called a workspace image.<br />

If we choose <strong>to</strong> do so, a file called by default .RData is created in the<br />

current working direc<strong>to</strong>ry (folder) which contains the information<br />

needed <strong>to</strong> reconstruct this workspace.<br />

In Windows, the workspace image will be au<strong>to</strong>matically loaded if R is<br />

started by clicking on the icon representing the file .RData, or if the<br />

.RData file is saved in the direc<strong>to</strong>ry from which R is started.<br />

69 / 77


Saving and Retrieving Image Files<br />

Again, we can begin an R session with that workspace image, by<br />

clicking on the icon for temp.RData.<br />

Alternatively, we can type load("temp.RData") after entering an R<br />

session.<br />

Objects that were already in the current workspace image will remain,<br />

unless they have the same name as objects in the workspace image<br />

associated with temp.RData.<br />

In the latter case, the current objects will be overwritten and lost.<br />

70 / 77


Data frames and the read.table() function<br />

Data sets frequently consist of more than one column of data, where each<br />

column represents measurements of a single variable.<br />

Each row usually represents a single observation.<br />

For example, the following data set consists of 4 observations on the three<br />

variables x, y, and z:<br />

x y z<br />

61 13 4<br />

175 21 18<br />

111 24 14<br />

124 23 18<br />

71 / 77


Reading in Data Frames<br />

If such a data set is s<strong>to</strong>red in a file called pretend.txt in the direc<strong>to</strong>ry<br />

mydata on the C: drive, then it can be read in<strong>to</strong> an R data frame:<br />

pretendDF


Dates and Times<br />

The base package has the function strptime() <strong>to</strong> convert from<br />

strings (e.g. "2007-12-25", or "12/25/07") <strong>to</strong> an internal<br />

numerical representation, and format() <strong>to</strong> convert back for printing.<br />

The ISOdate() and ISOdatetime() functions are used when<br />

numerical values for the year, month day, etc. are known.<br />

Other functions are available in the chron package.<br />

Functions, such as mdy.date() and date.mdy(), in the date<br />

package are useful for converting dates <strong>to</strong> Julian dates and vice versa<br />

These can be difficult functions <strong>to</strong> use, and a full description is<br />

beyond the scope of this course.<br />

73 / 77


Built-In Functions and Online Help<br />

The function mean() is an example of a built-in function.<br />

To get help with the mean function, you can type ?mean or help(mean).<br />

These will give you a description of the function for taking the mean of a<br />

vec<strong>to</strong>r. It tells the user what package mean() is contained in, as well as a<br />

description of its arguments and some examples of using this function.<br />

74 / 77


Built-In Functions and Online Help<br />

If you do not know the function name, it is often convenient <strong>to</strong> use<br />

help.start().<br />

This brings up an Internet browser, such as Internet Explorer or Firefox. 2<br />

The browser will show you a menu of several options, including a listing of<br />

installed packages. The base package contains many of the routinely used<br />

functions and is au<strong>to</strong>matically installed upon opening R.<br />

2 R depends on your system <strong>to</strong> have a properly installed browser. If it doesn’t have<br />

one, you may see an error message, or possibly nothing at all.<br />

75 / 77


Built-In Functions and Online Help<br />

Another function that is often used is help.search().<br />

For example, <strong>to</strong> see if there are any functions that do optimization (finding<br />

minima or maxima), type: help.search("optimization").<br />

76 / 77


Installing Packages<br />

All of the examples from <strong>to</strong>day required functions which are part of the<br />

base package in R. However, much of the more advanced functions<br />

require us <strong>to</strong> download packages. If, for example, we wanted <strong>to</strong> download<br />

the lattice package for high level graphics, we could use the following<br />

commands:<br />

> install.packages("lattice")<br />

Select the Canada (ON) CRAN mirror. Then, type:<br />

> library(lattice)<br />

This last line must be run each time a new R session is started.<br />

77 / 77

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!