19.09.2015 Views

Survival Analysis using R

Survival Analysis using R - Research Data Centre - University of ...

Survival Analysis using R - Research Data Centre - University of ...

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>Survival</strong> <strong>Analysis</strong><br />

<strong>using</strong> R<br />

Bruce L. Jones<br />

Department of Statistical and Actuarial Sciences<br />

The University of Western Ontario<br />

March 24, 2010


Outline<br />

• What is R?<br />

• Why use R?<br />

• A bit about R<br />

• What is <strong>Survival</strong> <strong>Analysis</strong>?<br />

• The survival package in R<br />

• Example<br />

1


What is R?<br />

• R is a free software environment for statistical computing and graphics.<br />

• It compiles and runs on a wide variety of UNIX platforms, Windows<br />

and MacOS.<br />

• R is very popular among researchers in statistics.<br />

• R is similar in appearance to S.<br />

• R was initially written by Ross Ihaka and Robert Gentleman<br />

2


Why use R?<br />

• It contains advanced statistical routines not yet available in other<br />

packages.<br />

• It provides an unparalleled platform for programming new statistical<br />

methods in an easy and straightforward manner.<br />

• It has state-of-the-art graphics capabilities.<br />

• It’s free. Just go to http://www.r-project.org<br />

3


Assignment, Vectors and Arrays<br />

> 1+2*3<br />

[1] 7<br />

> x=3<br />

> y x+y<br />

[1] 5<br />

> z=c(2,3,4,5)<br />

> z<br />

[1] 2 3 4 5<br />

> 2*z<br />

[1] 4 6 8 10<br />

><br />

9


Assignment, Vectors and Arrays<br />

> 1+2*3<br />

[1] 7<br />

> x=3<br />

> y x+y<br />

[1] 5<br />

> z=c(2,3,4,5)<br />

> z<br />

[1] 2 3 4 5<br />

> 2*z<br />

[1] 4 6 8 10<br />

><br />

9


Assignment, Vectors and Arrays<br />

> z=2:5<br />

> z<br />

[1] 2 3 4 5<br />

> z=seq(2,5,1)<br />

> z<br />

[1] 2 3 4 5<br />

> zz=seq(10,300,3)<br />

> zz<br />

[1] 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64<br />

[20] 67 70 73 76 79 82 85 88 91 94 97 100 103 106 109 112 115 118 121<br />

[39] 124 127 130 133 136 139 142 145 148 151 154 157 160 163 166 169 172 175 178<br />

[58] 181 184 187 190 193 196 199 202 205 208 211 214 217 220 223 226 229 232 235<br />

[77] 238 241 244 247 250 253 256 259 262 265 268 271 274 277 280 283 286 289 292<br />

[96] 295 298<br />

><br />

10


Assignment, Vectors and Arrays<br />

> z=2:5<br />

> z<br />

[1] 2 3 4 5<br />

> z=seq(2,5,1)<br />

> z<br />

[1] 2 3 4 5<br />

> zz=seq(10,300,3)<br />

> zz<br />

[1] 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64<br />

[20] 67 70 73 76 79 82 85 88 91 94 97 100 103 106 109 112 115 118 121<br />

[39] 124 127 130 133 136 139 142 145 148 151 154 157 160 163 166 169 172 175 178<br />

[58] 181 184 187 190 193 196 199 202 205 208 211 214 217 220 223 226 229 232 235<br />

[77] 238 241 244 247 250 253 256 259 262 265 268 271 274 277 280 283 286 289 292<br />

[96] 295 298<br />

><br />

10


Assignment, Vectors and Arrays<br />

> mat=array(1:12,c(3,4))<br />

> mat<br />

[,1] [,2] [,3] [,4]<br />

[1,] 1 4 7 10<br />

[2,] 2 5 8 11<br />

[3,] 3 6 9 12<br />

> mat=matrix(1:12,3,4)<br />

> mat<br />

[,1] [,2] [,3] [,4]<br />

[1,] 1 4 7 10<br />

[2,] 2 5 8 11<br />

[3,] 3 6 9 12<br />

><br />

11


Assignment, Vectors and Arrays<br />

> mat=array(1:12,c(3,4))<br />

> mat<br />

[,1] [,2] [,3] [,4]<br />

[1,] 1 4 7 10<br />

[2,] 2 5 8 11<br />

[3,] 3 6 9 12<br />

> mat=matrix(1:12,3,4)<br />

> mat<br />

[,1] [,2] [,3] [,4]<br />

[1,] 1 4 7 10<br />

[2,] 2 5 8 11<br />

[3,] 3 6 9 12<br />

><br />

11


Functions<br />

> plus=function(a,b) a+b<br />

> plus(3,4)<br />

[1] 7<br />

> plus(3)<br />

Error in plus(3) : element 2 is empty;<br />

the part of the args list of ’+’ being evaluated was:<br />

(a, b)<br />

> plus=function(a,b=0) a+b<br />

> plus(3,4)<br />

[1] 7<br />

> plus(3)<br />

[1] 3<br />

> plus(1:3,4:5)<br />

[1] 5 7 7<br />

Warning message:<br />

In a + b : longer object length is not a multiple of shorter object length<br />

><br />

12


Functions<br />

> plus=function(a,b) a+b<br />

> plus(3,4)<br />

[1] 7<br />

> plus(3)<br />

Error in plus(3) : element 2 is empty;<br />

the part of the args list of ’+’ being evaluated was:<br />

(a, b)<br />

> plus=function(a,b=0) a+b<br />

> plus(3,4)<br />

[1] 7<br />

> plus(3)<br />

[1] 3<br />

> plus(1:3,4:5)<br />

[1] 5 7 7<br />

Warning message:<br />

In a + b : longer object length is not a multiple of shorter object length<br />

><br />

12


What is <strong>Survival</strong> <strong>Analysis</strong>?<br />

<strong>Survival</strong> <strong>Analysis</strong> is the study of lifetimes and their distributions. It usually<br />

involves one or more of the following objectives:<br />

• to explore the behaviour of the distribution of a lifetime.<br />

• to model the distribution of a lifetime.<br />

• to test for differences between the distributions of two or more lifetimes.<br />

• to model the impact of one or more explanatory variables on a lifetime<br />

distribution.<br />

13


The Nature of Lifetime Data<br />

• It’s almost always incomplete.<br />

– It often involves right-censoring.<br />

– It sometimes involves left-truncation.<br />

• The methods of survival analysis allow for this incompleteness.<br />

14


The survival Package in R<br />

> install.packages("survival") # first time only<br />

--- Please select a CRAN mirror for use in this session ---<br />

trying URL ’http://probability.ca/cran/bin/windows/contrib/2.10/survival_2.35-8.zip’<br />

Content type ’application/zip’ length 2445387 bytes (2.3 Mb)<br />

opened URL<br />

downloaded 2.3 Mb<br />

package ’survival’ successfully unpacked and MD5 sums checked<br />

The downloaded packages are in<br />

C:\Documents and Settings\jones\Local Settings\Temp\RtmpEQ5ZaF\downloaded_packages<br />

> library(survival)<br />

Loading required package:<br />

><br />

splines<br />

15


Creating a <strong>Survival</strong> Object<br />

Example 1. Complete data lifetimes: 26, 42, 71, 85, 92.<br />

> ex1.times=c(26,42,71,85,92)<br />

> ex1.surv=Surv(ex1.times)<br />

> ex1.surv<br />

[1] 26 42 71 85 92<br />

> class(ex1.surv)<br />

[1] "Surv"<br />

> class(ex1.times)<br />

[1] "numeric"<br />

><br />

16


Creating a <strong>Survival</strong> Object<br />

Example 2. Right-censored lifetimes: 26, 42, 71, 80+, 80+.<br />

> ex2.times=c(26,42,71,80,80)<br />

> ex2.events=c(1,1,1,0,0)<br />

> ex2.surv=Surv(ex2.times,ex2.events)<br />

> ex2.surv<br />

[1] 26 42 71 80+ 80+<br />

><br />

17


Creating a <strong>Survival</strong> Object<br />

Example 3. Left-truncated and right-censored lifetimes:<br />

Left-truncation time is 40 for all individuals;<br />

Event/right-censoring times are 42, 71, 80+, 80+.<br />

> ex3.lttimes=rep(40,4)<br />

> ex3.times=c(42,71,80,80)<br />

> ex3.events=c(1,1,0,0)<br />

> ex3.surv=Surv(ex3.lttimes,ex3.times,ex3.events)<br />

> ex3.surv<br />

[1] (40,42 ] (40,71 ] (40,80+] (40,80+]<br />

><br />

18


Real Data Example<br />

Lifetimes: Times until death of 26 psychiatric patients<br />

Number of deaths: 14<br />

Number of censored observations: 12<br />

Covariates: patient age and sex (15 females, 11 males)<br />

19


Real Data Example<br />

The Data<br />

patient sex age time death patient sex age time death<br />

1 2 51 1 1 14 2 30 37 0<br />

2 2 58 1 1 15 2 33 35 0<br />

3 2 55 2 1 16 1 36 25 1<br />

4 2 28 22 1 17 1 30 31 0<br />

5 1 21 30 0 18 1 41 22 1<br />

6 1 19 28 1 19 2 43 26 1<br />

7 2 25 32 1 20 2 45 24 1<br />

8 2 48 11 1 21 2 35 35 0<br />

9 2 47 14 1 22 1 29 34 0<br />

10 2 25 36 0 23 1 35 30 0<br />

11 2 31 31 0 24 1 32 35 1<br />

12 1 24 33 0 25 2 36 40 1<br />

13 1 25 33 0 26 1 32 39 0<br />

20


Real Data Example<br />

Questions<br />

• Does the lifetime distribution behave the way we expect?<br />

• Are the lifetimes different for females and males?<br />

• Do the lifetimes depend on age?<br />

21


Estimating the <strong>Survival</strong> Function<br />

We can explore the lifetime distribution by examining nonparametric<br />

estimates of the survival function.<br />

The R function survfit allow us to do this.<br />

> library(KMsurv) # get the data<br />

> data(psych)<br />

> attach(psych)<br />

> names(psych)<br />

[1] "sex" "age" "time" "death"<br />

> psych.surv=Surv(age,age+time,death) # create a survival object<br />

> psych.fit1=survfit(psych.surv˜1) # obtain the estimates<br />

> plot(psych.fit1,xlim=c(40,80),xlab="age",ylab="probability",<br />

+ main="<strong>Survival</strong> Function Estimates") # plot the estimates<br />

><br />

22


Estimating the <strong>Survival</strong> Function<br />

<strong>Survival</strong> Function Estimates<br />

probability<br />

0.0 0.2 0.4 0.6 0.8 1.0<br />

40 50 60 70 80<br />

age<br />

23


Estimating the <strong>Survival</strong> Function<br />

Now let’s consider females and males separately.<br />

> psych.fit2=survfit(psych.surv˜sex) # separate by sex<br />

> plot(psych.fit2,xlim=c(40,80),xlab="age",ylab="probability",<br />

+ main="<strong>Survival</strong> Function Estimates for Males (red) and Females",<br />

+ col=c("red","blue"))<br />

> plot(psych.fit2,xlim=c(40,80),xlab="age",ylab="probability",<br />

+ main="<strong>Survival</strong> Function Estimates for Males (red) and Females",<br />

+ col=c("red","blue"), conf.int=T)<br />

><br />

24


Estimating the <strong>Survival</strong> Function<br />

<strong>Survival</strong> Function Estimates for Females (blue) and Males<br />

probability<br />

0.0 0.2 0.4 0.6 0.8 1.0<br />

40 50 60 70 80<br />

age<br />

25


Estimating the <strong>Survival</strong> Function<br />

<strong>Survival</strong> Function Estimates for Females (blue) and Males<br />

probability<br />

0.0 0.2 0.4 0.6 0.8 1.0<br />

40 50 60 70 80<br />

age<br />

26


Testing for Differences<br />

The R function survdiff allow us to test for differences between lifetime<br />

distributions.<br />

> survdiff(psych.surv˜sex)<br />

Error in survdiff(psych.surv ˜ sex) : Right censored data only<br />

> psych.surv2=Surv(time,death) # create new survival object<br />

> survdiff(psych.surv2˜sex)<br />

Call:<br />

survdiff(formula = psych.surv2 ˜ sex)<br />

N Observed Expected (O-E)ˆ2/E (O-E)ˆ2/V<br />

sex=1 11 4 6.24 0.807 1.61<br />

sex=2 15 10 7.76 0.650 1.61<br />

><br />

Chisq= 1.6 on 1 degrees of freedom, p= 0.205<br />

27


Testing for Differences<br />

The R function survdiff allow us to test for differences between lifetime<br />

distributions.<br />

> survdiff(psych.surv˜sex)<br />

Error in survdiff(psych.surv ˜ sex) : Right censored data only<br />

> psych.surv2=Surv(time,death) # create new survival object<br />

> survdiff(psych.surv2˜sex)<br />

Call:<br />

survdiff(formula = psych.surv2 ˜ sex)<br />

N Observed Expected (O-E)ˆ2/E (O-E)ˆ2/V<br />

sex=1 11 4 6.24 0.807 1.61<br />

sex=2 15 10 7.76 0.650 1.61<br />

><br />

Chisq= 1.6 on 1 degrees of freedom, p= 0.205<br />

27


Fitting a Proportional Hazards Model<br />

The model: h(t|x 1 ,...,x p )=h 0 (t) exp(β 1 x 1 + ···+ β p x p )<br />

• The PH model is often used when we are interested in the impact of<br />

the covariates, x 1 ,...,x p , but not the lifetime distributions themselves.<br />

• We can estimate and make inferences about β 1 ,...,β p without estimating<br />

h 0 .<br />

• The R function coxph allows us to do this.<br />

28


Fitting a Proportional Hazards Model<br />

> psych.coxph1=coxph(psych.surv˜sex)<br />

> summary(psych.coxph1)<br />

Call:<br />

coxph(formula = psych.surv ˜ sex)<br />

n= 26<br />

coef exp(coef) se(coef) z Pr(>|z|)<br />

sex 0.3900 1.4770 0.6102 0.639 0.523<br />

exp(coef) exp(-coef) lower .95 upper .95<br />

sex 1.477 0.677 0.4466 4.884<br />

Rsquare= 0.016 (max possible= 0.926 )<br />

Likelihood ratio test= 0.43 on 1 df, p=0.5141<br />

Wald test = 0.41 on 1 df, p=0.5227<br />

Score (logrank) test = 0.41 on 1 df, p=0.5203<br />

29


Fitting a Proportional Hazards Model<br />

Next we use our survival object psych.surv2, which does not involve left-truncation.<br />

> psych.coxph2=coxph(psych.surv2˜sex)<br />

> summary(psych.coxph2)<br />

Call:<br />

coxph(formula = psych.surv2 ˜ sex)<br />

n= 26<br />

coef exp(coef) se(coef) z Pr(>|z|)<br />

sex 0.7511 2.1194 0.6055 1.241 0.215<br />

exp(coef) exp(-coef) lower .95 upper .95<br />

sex 2.119 0.4718 0.6469 6.944<br />

Rsquare= 0.062 (max possible= 0.945 )<br />

Likelihood ratio test= 1.66 on 1 df, p=0.1981<br />

Wald test = 1.54 on 1 df, p=0.2148<br />

Score (logrank) test = 1.61 on 1 df, p=0.2046<br />

Note that the last test is exactly that performed <strong>using</strong> survdiff.<br />

30


Fitting a Proportional Hazards Model<br />

Finally, consider<br />

> psych.coxph3=coxph(psych.surv2˜age+sex)<br />

> summary(psych.coxph3)<br />

Call:<br />

coxph(formula = psych.surv2 ˜ age + sex)<br />

n= 26<br />

coef exp(coef) se(coef) z Pr(>|z|)<br />

age 0.20753 1.23063 0.05828 3.561 0.00037 ***<br />

sex -0.52374 0.59230 0.73753 -0.710 0.47762<br />

---<br />

Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1<br />

exp(coef) exp(-coef) lower .95 upper .95<br />

age 1.2306 0.8126 1.0978 1.380<br />

sex 0.5923 1.6883 0.1396 2.514<br />

Rsquare= 0.553 (max possible= 0.945 )<br />

Likelihood ratio test= 20.91 on 2 df, p=2.879e-05<br />

Wald test = 14.3 on 2 df, p=0.0007866<br />

Score (logrank) test = 21.27 on 2 df, p=2.409e-05<br />

31


Conclusions about this Example<br />

• There is great uncertainty due to the small number of observations.<br />

• Times until death depend on age at first admission to the hospital.<br />

• We cannot conclude that the lifetimes are different for females and<br />

males.<br />

32


Fitting an Accelerated Failure Time Model<br />

• This is a popular fully parametric model for which the lifetime distribution<br />

is the same for different covariate values, except that the time<br />

scale is multiplied by a different constant.<br />

• The R function survreg can be used to fit an AFT model.<br />

33


Summary<br />

• R is a flexible and free software environment for statistical computing<br />

and graphics.<br />

• The survival package contains functions for survival analysis.<br />

– Surv creates a survival object.<br />

– survfit estimates (nonparametrically) the survival function.<br />

– survdiff performs tests for differences in lifetime distributions.<br />

– coxph fits the proportional hazards model.<br />

– survreg fits the accelerated failure time model.<br />

These slides are here:<br />

http://www.stats.uwo.ca/faculty/jones/survival_talk.pdf<br />

34

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!