01.08.2014 Views

example analysis of simple linear regression

example analysis of simple linear regression

example analysis of simple linear regression

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Breast Cancer Mortality and Mean Temperature<br />

In the early 1960s, data were collected from <strong>of</strong>ficial statistics registers <strong>of</strong> Great Britain, Norway<br />

and Sweden on breast cancer mortality. Death rates for neoplasms <strong>of</strong> the breast were calculated<br />

for various age groups and for certain areas at the same latitude. Age-specific death rates were<br />

then calculated for each area and converted to a mortality index using 100 as the age-specific rate<br />

for all <strong>of</strong> England and Wales. The mean annual temperatures at various latitudes under study<br />

were obtained from the British Meteorological Office.<br />

Mean Annual Mortality Index Mean Annual Mortality Index<br />

Temperature (°F)<br />

Temperature (°F)<br />

51.3 102.5 46.3 78.9<br />

49.9 104.5 42.1 84.6<br />

50.0 100.4 44.2 81.7<br />

49.2 95.9 43.5 72.2<br />

48.5 87.0 42.3 65.1<br />

47.8 95.0 40.2 68.1<br />

47.3 88.6 31.8 67.3<br />

45.1 89.2 34.0 52.5<br />

The data are published in A.J. Lea (1965), British Medical Journal, Volume 1, pp. 488-490. The<br />

conclusions reached in the paper were that:<br />

• An association between the death rate from neoplasms <strong>of</strong> the breast in females and mean<br />

annual temperature has been found.<br />

• The nature <strong>of</strong> the association is that the death rate increases with the mean annual<br />

temperature.<br />

• These finding apply, at present, only to Norway, Sweden and Great Britain<br />

• The evidence does not, at this stage, allow a statement <strong>of</strong> the factor or factors concerned to be<br />

made.


Regression Analysis in R: Breast Cancer Data<br />

> cancer attach(cancer)<br />

> cancer<br />

temp mortality<br />

1 51.3 102.5<br />

2 49.9 104.5<br />

3 50.0 100.4<br />

4 49.2 95.9<br />

5 48.5 87.0<br />

6 47.8 95.0<br />

7 47.3 88.6<br />

8 45.1 89.2<br />

9 46.3 78.9<br />

10 42.1 84.6<br />

11 44.2 81.7<br />

12 43.5 72.2<br />

13 42.3 65.1<br />

14 40.2 68.1<br />

15 31.8 67.3<br />

16 34.0 52.5<br />

> plot(temp,mortality)<br />

mortality<br />

60 70 80 90 100<br />

35 40 45 50<br />

temp<br />

2


egout summary(regout)<br />

Call:<br />

lm(formula = mortality ~ temp)<br />

Residuals:<br />

Min 1Q Median 3Q Max<br />

-12.8358 -5.6319 0.4904 4.3981 14.1200<br />

Coefficients:<br />

Estimate Std. Error t value Pr(>|t|)<br />

(Intercept) -21.7947 15.6719 -1.391 0.186<br />

temp 2.3577 0.3489 6.758 9.2e-06 ***<br />

---<br />

Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1<br />

Residual standard error: 7.545 on 14 degrees <strong>of</strong> freedom<br />

Multiple R-Squared: 0.7654, Adjusted R-squared: 0.7486<br />

F-statistic: 45.67 on 1 and 14 DF, p-value: 9.202e-06<br />

> anova(regout)<br />

Analysis <strong>of</strong> Variance Table<br />

Response: mortality<br />

Df Sum Sq Mean Sq F value Pr(>F)<br />

temp 1 2599.53 2599.53 45.669 9.202e-06 ***<br />

Residuals 14 796.91 56.92<br />

---<br />

Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1<br />

> rescan plot(temp,rescan)<br />

rescan<br />

-10 -5 0 5 10 15<br />

35 40 45 50<br />

temp<br />

3


newtemp tempred tempred<br />

1 2 3 4 5 6 7 8<br />

53.65153 56.00923 58.36692 60.72462 63.08231 65.44001 67.79770 70.15540<br />

9 10 11 12 13 14 15 16<br />

72.51309 74.87079 77.22848 79.58617 81.94387 84.30156 86.65926 89.01695<br />

17 18 19 20<br />

91.37465 93.73234 96.09004 98.44773<br />

> predict(regout,newtemp,interval="confidence")<br />

fit lwr upr<br />

1 53.65153 43.39628 63.90679<br />

2 56.00923 46.43702 65.58144<br />

3 58.36692 49.46726 67.26659<br />

4 60.72462 52.48444 68.96480<br />

5 63.08231 55.48514 70.67948<br />

6 65.44001 58.46482 72.41520<br />

7 67.79770 61.41731 74.17809<br />

8 70.15540 64.33428 75.97651<br />

9 72.51309 67.20450 77.82169<br />

10 74.87079 70.01313 79.72844<br />

11 77.22848 72.74158 81.71538<br />

12 79.58617 75.36863 83.80371<br />

13 81.94387 77.87413 86.01361<br />

14 84.30156 80.24474 88.35839<br />

15 86.65926 82.47922 90.83929<br />

16 89.01695 84.58893 93.44498<br />

17 91.37465 86.59323 96.15606<br />

18 93.73234 88.51350 98.95118<br />

19 96.09004 90.36898 101.81110<br />

20 98.44773 92.17520 104.72026<br />

> predict(regout,newtemp,interval="prediction")<br />

fit lwr upr<br />

1 53.65153 34.49385 72.80922<br />

2 56.00923 37.20833 74.81013<br />

3 58.36692 39.89937 76.83448<br />

4 60.72462 42.56567 78.88356<br />

5 63.08231 45.20597 80.95866<br />

6 65.44001 47.81900 83.06102<br />

7 67.79770 50.40356 85.19184<br />

8 70.15540 52.95853 87.35226<br />

9 72.51309 55.48288 89.54330<br />

10 74.87079 57.97571 91.76586<br />

11 77.22848 60.43625 94.02071<br />

12 79.58617 62.86390 96.30845<br />

13 81.94387 65.25826 98.62948<br />

14 84.30156 67.61910 100.98403<br />

15 86.65926 69.94641 103.37211<br />

16 89.01695 72.24036 105.79355<br />

17 91.37465 74.50133 108.24796<br />

18 93.73234 76.72990 110.73478<br />

19 96.09004 78.92678 113.25329<br />

20 98.44773 81.09287 115.80260<br />

4

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!