Survival Analysis using R
Survival Analysis using R - Research Data Centre - University of ...
Survival Analysis using R - Research Data Centre - University of ...
- No tags were found...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
<strong>Survival</strong> <strong>Analysis</strong><br />
<strong>using</strong> R<br />
Bruce L. Jones<br />
Department of Statistical and Actuarial Sciences<br />
The University of Western Ontario<br />
March 24, 2010
Outline<br />
• What is R?<br />
• Why use R?<br />
• A bit about R<br />
• What is <strong>Survival</strong> <strong>Analysis</strong>?<br />
• The survival package in R<br />
• Example<br />
1
What is R?<br />
• R is a free software environment for statistical computing and graphics.<br />
• It compiles and runs on a wide variety of UNIX platforms, Windows<br />
and MacOS.<br />
• R is very popular among researchers in statistics.<br />
• R is similar in appearance to S.<br />
• R was initially written by Ross Ihaka and Robert Gentleman<br />
2
Why use R?<br />
• It contains advanced statistical routines not yet available in other<br />
packages.<br />
• It provides an unparalleled platform for programming new statistical<br />
methods in an easy and straightforward manner.<br />
• It has state-of-the-art graphics capabilities.<br />
• It’s free. Just go to http://www.r-project.org<br />
3
Assignment, Vectors and Arrays<br />
> 1+2*3<br />
[1] 7<br />
> x=3<br />
> y x+y<br />
[1] 5<br />
> z=c(2,3,4,5)<br />
> z<br />
[1] 2 3 4 5<br />
> 2*z<br />
[1] 4 6 8 10<br />
><br />
9
Assignment, Vectors and Arrays<br />
> 1+2*3<br />
[1] 7<br />
> x=3<br />
> y x+y<br />
[1] 5<br />
> z=c(2,3,4,5)<br />
> z<br />
[1] 2 3 4 5<br />
> 2*z<br />
[1] 4 6 8 10<br />
><br />
9
Assignment, Vectors and Arrays<br />
> z=2:5<br />
> z<br />
[1] 2 3 4 5<br />
> z=seq(2,5,1)<br />
> z<br />
[1] 2 3 4 5<br />
> zz=seq(10,300,3)<br />
> zz<br />
[1] 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64<br />
[20] 67 70 73 76 79 82 85 88 91 94 97 100 103 106 109 112 115 118 121<br />
[39] 124 127 130 133 136 139 142 145 148 151 154 157 160 163 166 169 172 175 178<br />
[58] 181 184 187 190 193 196 199 202 205 208 211 214 217 220 223 226 229 232 235<br />
[77] 238 241 244 247 250 253 256 259 262 265 268 271 274 277 280 283 286 289 292<br />
[96] 295 298<br />
><br />
10
Assignment, Vectors and Arrays<br />
> z=2:5<br />
> z<br />
[1] 2 3 4 5<br />
> z=seq(2,5,1)<br />
> z<br />
[1] 2 3 4 5<br />
> zz=seq(10,300,3)<br />
> zz<br />
[1] 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64<br />
[20] 67 70 73 76 79 82 85 88 91 94 97 100 103 106 109 112 115 118 121<br />
[39] 124 127 130 133 136 139 142 145 148 151 154 157 160 163 166 169 172 175 178<br />
[58] 181 184 187 190 193 196 199 202 205 208 211 214 217 220 223 226 229 232 235<br />
[77] 238 241 244 247 250 253 256 259 262 265 268 271 274 277 280 283 286 289 292<br />
[96] 295 298<br />
><br />
10
Assignment, Vectors and Arrays<br />
> mat=array(1:12,c(3,4))<br />
> mat<br />
[,1] [,2] [,3] [,4]<br />
[1,] 1 4 7 10<br />
[2,] 2 5 8 11<br />
[3,] 3 6 9 12<br />
> mat=matrix(1:12,3,4)<br />
> mat<br />
[,1] [,2] [,3] [,4]<br />
[1,] 1 4 7 10<br />
[2,] 2 5 8 11<br />
[3,] 3 6 9 12<br />
><br />
11
Assignment, Vectors and Arrays<br />
> mat=array(1:12,c(3,4))<br />
> mat<br />
[,1] [,2] [,3] [,4]<br />
[1,] 1 4 7 10<br />
[2,] 2 5 8 11<br />
[3,] 3 6 9 12<br />
> mat=matrix(1:12,3,4)<br />
> mat<br />
[,1] [,2] [,3] [,4]<br />
[1,] 1 4 7 10<br />
[2,] 2 5 8 11<br />
[3,] 3 6 9 12<br />
><br />
11
Functions<br />
> plus=function(a,b) a+b<br />
> plus(3,4)<br />
[1] 7<br />
> plus(3)<br />
Error in plus(3) : element 2 is empty;<br />
the part of the args list of ’+’ being evaluated was:<br />
(a, b)<br />
> plus=function(a,b=0) a+b<br />
> plus(3,4)<br />
[1] 7<br />
> plus(3)<br />
[1] 3<br />
> plus(1:3,4:5)<br />
[1] 5 7 7<br />
Warning message:<br />
In a + b : longer object length is not a multiple of shorter object length<br />
><br />
12
Functions<br />
> plus=function(a,b) a+b<br />
> plus(3,4)<br />
[1] 7<br />
> plus(3)<br />
Error in plus(3) : element 2 is empty;<br />
the part of the args list of ’+’ being evaluated was:<br />
(a, b)<br />
> plus=function(a,b=0) a+b<br />
> plus(3,4)<br />
[1] 7<br />
> plus(3)<br />
[1] 3<br />
> plus(1:3,4:5)<br />
[1] 5 7 7<br />
Warning message:<br />
In a + b : longer object length is not a multiple of shorter object length<br />
><br />
12
What is <strong>Survival</strong> <strong>Analysis</strong>?<br />
<strong>Survival</strong> <strong>Analysis</strong> is the study of lifetimes and their distributions. It usually<br />
involves one or more of the following objectives:<br />
• to explore the behaviour of the distribution of a lifetime.<br />
• to model the distribution of a lifetime.<br />
• to test for differences between the distributions of two or more lifetimes.<br />
• to model the impact of one or more explanatory variables on a lifetime<br />
distribution.<br />
13
The Nature of Lifetime Data<br />
• It’s almost always incomplete.<br />
– It often involves right-censoring.<br />
– It sometimes involves left-truncation.<br />
• The methods of survival analysis allow for this incompleteness.<br />
14
The survival Package in R<br />
> install.packages("survival") # first time only<br />
--- Please select a CRAN mirror for use in this session ---<br />
trying URL ’http://probability.ca/cran/bin/windows/contrib/2.10/survival_2.35-8.zip’<br />
Content type ’application/zip’ length 2445387 bytes (2.3 Mb)<br />
opened URL<br />
downloaded 2.3 Mb<br />
package ’survival’ successfully unpacked and MD5 sums checked<br />
The downloaded packages are in<br />
C:\Documents and Settings\jones\Local Settings\Temp\RtmpEQ5ZaF\downloaded_packages<br />
> library(survival)<br />
Loading required package:<br />
><br />
splines<br />
15
Creating a <strong>Survival</strong> Object<br />
Example 1. Complete data lifetimes: 26, 42, 71, 85, 92.<br />
> ex1.times=c(26,42,71,85,92)<br />
> ex1.surv=Surv(ex1.times)<br />
> ex1.surv<br />
[1] 26 42 71 85 92<br />
> class(ex1.surv)<br />
[1] "Surv"<br />
> class(ex1.times)<br />
[1] "numeric"<br />
><br />
16
Creating a <strong>Survival</strong> Object<br />
Example 2. Right-censored lifetimes: 26, 42, 71, 80+, 80+.<br />
> ex2.times=c(26,42,71,80,80)<br />
> ex2.events=c(1,1,1,0,0)<br />
> ex2.surv=Surv(ex2.times,ex2.events)<br />
> ex2.surv<br />
[1] 26 42 71 80+ 80+<br />
><br />
17
Creating a <strong>Survival</strong> Object<br />
Example 3. Left-truncated and right-censored lifetimes:<br />
Left-truncation time is 40 for all individuals;<br />
Event/right-censoring times are 42, 71, 80+, 80+.<br />
> ex3.lttimes=rep(40,4)<br />
> ex3.times=c(42,71,80,80)<br />
> ex3.events=c(1,1,0,0)<br />
> ex3.surv=Surv(ex3.lttimes,ex3.times,ex3.events)<br />
> ex3.surv<br />
[1] (40,42 ] (40,71 ] (40,80+] (40,80+]<br />
><br />
18
Real Data Example<br />
Lifetimes: Times until death of 26 psychiatric patients<br />
Number of deaths: 14<br />
Number of censored observations: 12<br />
Covariates: patient age and sex (15 females, 11 males)<br />
19
Real Data Example<br />
The Data<br />
patient sex age time death patient sex age time death<br />
1 2 51 1 1 14 2 30 37 0<br />
2 2 58 1 1 15 2 33 35 0<br />
3 2 55 2 1 16 1 36 25 1<br />
4 2 28 22 1 17 1 30 31 0<br />
5 1 21 30 0 18 1 41 22 1<br />
6 1 19 28 1 19 2 43 26 1<br />
7 2 25 32 1 20 2 45 24 1<br />
8 2 48 11 1 21 2 35 35 0<br />
9 2 47 14 1 22 1 29 34 0<br />
10 2 25 36 0 23 1 35 30 0<br />
11 2 31 31 0 24 1 32 35 1<br />
12 1 24 33 0 25 2 36 40 1<br />
13 1 25 33 0 26 1 32 39 0<br />
20
Real Data Example<br />
Questions<br />
• Does the lifetime distribution behave the way we expect?<br />
• Are the lifetimes different for females and males?<br />
• Do the lifetimes depend on age?<br />
21
Estimating the <strong>Survival</strong> Function<br />
We can explore the lifetime distribution by examining nonparametric<br />
estimates of the survival function.<br />
The R function survfit allow us to do this.<br />
> library(KMsurv) # get the data<br />
> data(psych)<br />
> attach(psych)<br />
> names(psych)<br />
[1] "sex" "age" "time" "death"<br />
> psych.surv=Surv(age,age+time,death) # create a survival object<br />
> psych.fit1=survfit(psych.surv˜1) # obtain the estimates<br />
> plot(psych.fit1,xlim=c(40,80),xlab="age",ylab="probability",<br />
+ main="<strong>Survival</strong> Function Estimates") # plot the estimates<br />
><br />
22
Estimating the <strong>Survival</strong> Function<br />
<strong>Survival</strong> Function Estimates<br />
probability<br />
0.0 0.2 0.4 0.6 0.8 1.0<br />
40 50 60 70 80<br />
age<br />
23
Estimating the <strong>Survival</strong> Function<br />
Now let’s consider females and males separately.<br />
> psych.fit2=survfit(psych.surv˜sex) # separate by sex<br />
> plot(psych.fit2,xlim=c(40,80),xlab="age",ylab="probability",<br />
+ main="<strong>Survival</strong> Function Estimates for Males (red) and Females",<br />
+ col=c("red","blue"))<br />
> plot(psych.fit2,xlim=c(40,80),xlab="age",ylab="probability",<br />
+ main="<strong>Survival</strong> Function Estimates for Males (red) and Females",<br />
+ col=c("red","blue"), conf.int=T)<br />
><br />
24
Estimating the <strong>Survival</strong> Function<br />
<strong>Survival</strong> Function Estimates for Females (blue) and Males<br />
probability<br />
0.0 0.2 0.4 0.6 0.8 1.0<br />
40 50 60 70 80<br />
age<br />
25
Estimating the <strong>Survival</strong> Function<br />
<strong>Survival</strong> Function Estimates for Females (blue) and Males<br />
probability<br />
0.0 0.2 0.4 0.6 0.8 1.0<br />
40 50 60 70 80<br />
age<br />
26
Testing for Differences<br />
The R function survdiff allow us to test for differences between lifetime<br />
distributions.<br />
> survdiff(psych.surv˜sex)<br />
Error in survdiff(psych.surv ˜ sex) : Right censored data only<br />
> psych.surv2=Surv(time,death) # create new survival object<br />
> survdiff(psych.surv2˜sex)<br />
Call:<br />
survdiff(formula = psych.surv2 ˜ sex)<br />
N Observed Expected (O-E)ˆ2/E (O-E)ˆ2/V<br />
sex=1 11 4 6.24 0.807 1.61<br />
sex=2 15 10 7.76 0.650 1.61<br />
><br />
Chisq= 1.6 on 1 degrees of freedom, p= 0.205<br />
27
Testing for Differences<br />
The R function survdiff allow us to test for differences between lifetime<br />
distributions.<br />
> survdiff(psych.surv˜sex)<br />
Error in survdiff(psych.surv ˜ sex) : Right censored data only<br />
> psych.surv2=Surv(time,death) # create new survival object<br />
> survdiff(psych.surv2˜sex)<br />
Call:<br />
survdiff(formula = psych.surv2 ˜ sex)<br />
N Observed Expected (O-E)ˆ2/E (O-E)ˆ2/V<br />
sex=1 11 4 6.24 0.807 1.61<br />
sex=2 15 10 7.76 0.650 1.61<br />
><br />
Chisq= 1.6 on 1 degrees of freedom, p= 0.205<br />
27
Fitting a Proportional Hazards Model<br />
The model: h(t|x 1 ,...,x p )=h 0 (t) exp(β 1 x 1 + ···+ β p x p )<br />
• The PH model is often used when we are interested in the impact of<br />
the covariates, x 1 ,...,x p , but not the lifetime distributions themselves.<br />
• We can estimate and make inferences about β 1 ,...,β p without estimating<br />
h 0 .<br />
• The R function coxph allows us to do this.<br />
28
Fitting a Proportional Hazards Model<br />
> psych.coxph1=coxph(psych.surv˜sex)<br />
> summary(psych.coxph1)<br />
Call:<br />
coxph(formula = psych.surv ˜ sex)<br />
n= 26<br />
coef exp(coef) se(coef) z Pr(>|z|)<br />
sex 0.3900 1.4770 0.6102 0.639 0.523<br />
exp(coef) exp(-coef) lower .95 upper .95<br />
sex 1.477 0.677 0.4466 4.884<br />
Rsquare= 0.016 (max possible= 0.926 )<br />
Likelihood ratio test= 0.43 on 1 df, p=0.5141<br />
Wald test = 0.41 on 1 df, p=0.5227<br />
Score (logrank) test = 0.41 on 1 df, p=0.5203<br />
29
Fitting a Proportional Hazards Model<br />
Next we use our survival object psych.surv2, which does not involve left-truncation.<br />
> psych.coxph2=coxph(psych.surv2˜sex)<br />
> summary(psych.coxph2)<br />
Call:<br />
coxph(formula = psych.surv2 ˜ sex)<br />
n= 26<br />
coef exp(coef) se(coef) z Pr(>|z|)<br />
sex 0.7511 2.1194 0.6055 1.241 0.215<br />
exp(coef) exp(-coef) lower .95 upper .95<br />
sex 2.119 0.4718 0.6469 6.944<br />
Rsquare= 0.062 (max possible= 0.945 )<br />
Likelihood ratio test= 1.66 on 1 df, p=0.1981<br />
Wald test = 1.54 on 1 df, p=0.2148<br />
Score (logrank) test = 1.61 on 1 df, p=0.2046<br />
Note that the last test is exactly that performed <strong>using</strong> survdiff.<br />
30
Fitting a Proportional Hazards Model<br />
Finally, consider<br />
> psych.coxph3=coxph(psych.surv2˜age+sex)<br />
> summary(psych.coxph3)<br />
Call:<br />
coxph(formula = psych.surv2 ˜ age + sex)<br />
n= 26<br />
coef exp(coef) se(coef) z Pr(>|z|)<br />
age 0.20753 1.23063 0.05828 3.561 0.00037 ***<br />
sex -0.52374 0.59230 0.73753 -0.710 0.47762<br />
---<br />
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1<br />
exp(coef) exp(-coef) lower .95 upper .95<br />
age 1.2306 0.8126 1.0978 1.380<br />
sex 0.5923 1.6883 0.1396 2.514<br />
Rsquare= 0.553 (max possible= 0.945 )<br />
Likelihood ratio test= 20.91 on 2 df, p=2.879e-05<br />
Wald test = 14.3 on 2 df, p=0.0007866<br />
Score (logrank) test = 21.27 on 2 df, p=2.409e-05<br />
31
Conclusions about this Example<br />
• There is great uncertainty due to the small number of observations.<br />
• Times until death depend on age at first admission to the hospital.<br />
• We cannot conclude that the lifetimes are different for females and<br />
males.<br />
32
Fitting an Accelerated Failure Time Model<br />
• This is a popular fully parametric model for which the lifetime distribution<br />
is the same for different covariate values, except that the time<br />
scale is multiplied by a different constant.<br />
• The R function survreg can be used to fit an AFT model.<br />
33
Summary<br />
• R is a flexible and free software environment for statistical computing<br />
and graphics.<br />
• The survival package contains functions for survival analysis.<br />
– Surv creates a survival object.<br />
– survfit estimates (nonparametrically) the survival function.<br />
– survdiff performs tests for differences in lifetime distributions.<br />
– coxph fits the proportional hazards model.<br />
– survreg fits the accelerated failure time model.<br />
These slides are here:<br />
http://www.stats.uwo.ca/faculty/jones/survival_talk.pdf<br />
34