Stat-403/Stat-650 : Intermediate Sampling and Experimental Design ...

More documents

Recommendations

Info

Statistical Significance TestingTo see if the null hypotheses being tested in The Journal of Wildlife Management can validly beconsidered to be true, I arbitrarily selected two issues: an issue from the 1996 volume, the other from1998. I scanned the results section of each paper, looking for P-values. For each P-value I found, Ilooked back to see what hypothesis was being tested. I made a very biased selection of some conclusionsreached by rejecting null hypotheses; these include: (1) the occurrence of sheep remains in coyote (Canislatrans) scats differed among seasons (P = 0.03, n = 467), (2) duckling body mass differed among years(P < 0.0001), and (3) the density of large trees was greater in unlogged forest stands than in loggedstands (P = 0.02). (The last is my personal favorite.) Certainly we knew before any data were collectedthat the null hypotheses being tested were false. Sheep remains certainly must have varied amongseasons, if only between 61.1% in 1 season and 61.2% in another. The only question was whether or notthe sample size was sufficient to detect the difference. Likewise, we know before data are collected thatthere are real differences in the other examples, which are what Abelson (1997) referred to as"gratuitous" significance testing—testing what is already known.Three comments in favor of the point null hypothesis, such as µ = µ 0 . First, while such hypotheses arevirtually always false for sampling studies, they may be reasonable for experimental studies in whichsubjects are randomly assigned to treatment groups (Mulaik et al. 1997). Second, testing a point nullhypothesis in fact does provide a reasonable approximation to a more appropriate question: is µ nearlyequal to µ 0 (Berger and Delampady 1987, Berger and Sellke 1987), if the sample size is modest(Rindskopf 1997). Large sample sizes will result in small P-values even if µ is nearly equal to µ 0 . Third,testing the point null hypothesis is mathematically much easier than testing composite null hypotheses,which involve noncentrality parameters (Steiger and Fouladi 1997).The bottom line on P-values is that they relate to data that were not observed under a model that isknown to be false. How meaningful can they be? But they are objective, at least; or are they?P is ArbitraryIf the null hypothesis truly is false (as most of those tested really are), then P can be made as small as onewishes, by getting a large enough sample. P is a function of (1) the difference between reality and thenull hypothesis and (2) the sample size. Suppose, for example, that you are testing to see if the mean of apopulation (µ) is, say, 100. The null hypothesis then is H 0 : µ = 100, versus the alternative hypothesis ofH 1 : µ 100. One might use Student's t-test, which iswhere is the mean of the sample, S is the standard deviation of the sample, and n is the sample size.Clearly, t can be made arbitrarily large (and the P-value associated with it arbitrarily small) by makingeither ( – 100) or large enough. As the sample size increases, ( – 100) and S will67http://www.npwrc.usgs.gov/resource/1999/statsig/stathyp.htm (3 of 7) [12/6/2002 3:10:59 PM]
Statistical Significance Testingapproximately stabilize at the true parameter values. Hence, a large value of n translates into a largevalue of t. This strong dependence of P on the sample size led Good (1982) to suggest that P-values bestandardized to a sample size of 100, by replacing P by P(or 0.5, if that is smaller).Even more arbitrary in a sense than P is the use of a standard cutoff value, usually denoted . P-valuesless than or equal to are deemed significant; those greater than are nonsignificant. Use of wasadvocated by Jerzy Neyman and Egon Pearson, whereas R. A. Fisher recommended presentation ofobserved P-values instead (Huberty 1993). Use of a fixed level, say = 0.05, promotes the seeminglynonsensical distinction between a significant finding if P = 0.049, and a nonsignificant finding if P =0.051. Such minor differences are illusory anyway, as they derive from tests whose assumptions often areonly approximately met (Preece 1990). Fisher objected to the Neyman-Pearson procedure because of itsmechanical, automated nature (Mulaik et al. 1997).Proving the Null HypothesisDiscourses on hypothesis testing emphasize that null hypotheses cannot be proved; they can only bedisproved (rejected). Failing to reject a null hypothesis does not mean that it is true. Especially withsmall samples, one must be careful not to accept the null hypothesis. Consider a test of the nullhypothesis that a mean µ equals µ 0 . The situations illustrated in Figure 1 both reflect a failure to rejectthat hypothesis. Figure 1A suggests the null hypothesis may well be false, but the sample was too smallto indicate significance; there is a lack of power. Conversely, Figure 1B shows that the data truly wereconsistent with the null hypothesis. The two situations should lead to different conclusions about µ, butthe P-values associated with the tests are identical.Taking another look at the two issues of The Journal of Wildlife Management, I noted a number ofarticles that indicated a null hypothesis was proven. Among these were (1) no difference in slope aspectof random snags (P = 0.112, n = 57), (2) no difference in viable seeds (F 2,6 = 3.18, P = 0.11), (3) lambkill was not correlated to trapper hours (r 12 = 0.50, P = 0.095), (4) no effect due to month (P = 0.07, n =15), and (5) no significant differences in survival distributions (P-values > 0.014!, n variable). I selectedthe examples to illustrate null hypotheses claimed to be true, despite small sample sizes and P-values thatwere small but (usually) >0.05. All examples, I believe, reflect the lack of power (Fig. 1A) whileclaiming a lack of effect (Fig. 1B).68http://www.npwrc.usgs.gov/resource/1999/statsig/stathyp.htm (4 of 7) [12/6/2002 3:10:59 PM]
Page 1 and 2:
Stat-403/Stat-650 : Intermediate Sa
Page 3:
Chapter 1Statistics - why the badim
Page 8 and 9:
Problems With Using MicrosoftExcel
Page 10 and 11:
The next example shows the help sup
Page 12 and 13:
Knusel, Leo, “On the Accuracy of
Page 14 and 15:
The latter tool is but one of many
Page 16 and 17:
Is It Practical To Use Excel For St
Page 18 and 19: Is It Practical To Use Excel For St
Page 20 and 21: Is It Practical To Use Excel For St
Page 22 and 23: Using Excel for StatisticsStatistic
Page 24 and 25: Using Excel for StatisticsTipsWarni
Page 26 and 27: Using Excel for Statistics2. Adding
Page 28 and 29: Using Excel for StatisticsThe field
Page 30 and 31: Using Excel for StatisticsIn this e
Page 32 and 33: Using Excel for Statisticsbehaves l
Page 34 and 35: Using Excel for StatisticsThe resul
Page 36 and 37: Using Excel for Statisticsare liste
Page 38 and 39: DFID Guides Introduction PageStatis
Page 40 and 41: DFID Guides Introduction Pagetel: +
Page 42 and 43: BIOMETRICSINFORMATIONIndex of Pamph
Page 44 and 45: Index of Pamphlet TopicsExperimenta
Page 46 and 47: Index of Pamphlet TopicsRegression2
Page 48 and 49: Chapter 6How do I interpret a p-val
Page 50 and 51: 2Type I error in the data exceeds w
Page 52 and 53: 4ssssssssssssssssssssssssssssssssss
Page 61 and 62: Chapter 8The Insignificance ofStati
Page 63 and 64: Statistical Signigicance TestingEco
Page 65 and 66: Statistical Signigicance TestingThe
Page 67: Statistical Significance Testingof
Page 71 and 72: Statistical Significance TestingBio
Page 73 and 74: Statistical Significance TestingThe
Page 75 and 76: Statistical Significance Testinginf
Page 77 and 78: Statistical Significance TestingHom
Page 79 and 80: Statistical Significance Testingsug
Page 81 and 82: Statistical Significance Testingres
Page 83 and 84: Statistical Significance TestingThe
Page 85 and 86: http://www.npwrc.usgs.gov/resource/
Page 87 and 88: http://www.npwrc.usgs.gov/resource/
Page 89: Chapter 9Designing EnvironmentalFie
Page 102 and 103: 101
Page 104 and 105: 103
Page 106 and 107: 105
Page 108 and 109: 107
Page 110 and 111: 109
Page 112 and 113: Chapter 10Power analysis in wildlif
Page 114 and 115: 113
Page 116 and 117: 115
Page 118 and 119:
117
Page 120 and 121:
119
Page 122 and 123:
121
Page 124 and 125:
123
Page 126 and 127:
125
Page 128 and 129:
127
Page 130 and 131:
129
Page 132 and 133:
131
Page 134 and 135:
133
Page 136 and 137:
135
Page 138 and 139:
137
Page 140 and 141:
139
Page 142 and 143:
141
Page 144 and 145:
143
Page 146 and 147:
145
Page 148 and 149:
147
Page 150 and 151:
Chapter 12Pseudo-replication revisi
Page 152 and 153:
151
Page 154 and 155:
153
Page 156:
155
show all

Stat-403/Stat-650 : Intermediate Sampling and Experimental Design ...

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?