Optimum Sample Size to Detect Perturbation Effects: The ...
Optimum Sample Size to Detect Perturbation Effects: The ...
Optimum Sample Size to Detect Perturbation Effects: The ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
P.S.Z.N.: Marine Ecology, 23 (1): 1±9 (2002)<br />
ã 2002 Blackwell Verlag, Berlin<br />
ISSN 0173-9565<br />
TOPIC<br />
Accepted: August 26, 2001<br />
<strong>Optimum</strong> <strong>Sample</strong> <strong>Size</strong> <strong>to</strong> <strong>Detect</strong><br />
<strong>Perturbation</strong> <strong>Effects</strong>: <strong>The</strong> Importance<br />
of Statistical Power Analysis ±<br />
A Critique<br />
Marco Ortiz<br />
Institu<strong>to</strong> de Investigaciones OceanoloÂgicas, Facultad de Recursos del Mar,<br />
Universidad de An<strong>to</strong>fagasta, Casilla 170, An<strong>to</strong>fagasta, Chile.<br />
E-mail: mortiz@uan<strong>to</strong>f.cl<br />
With 3 figures and 1 table<br />
Keywords: Effect size, optimum sample size, precautionary principle, statistical<br />
power analysis, Type I (a) and II (b) errors, variability.<br />
Abstract. <strong>The</strong> current article describes statistical power analysis as an efficient strategy<br />
for the estimation of the optimum sample size. <strong>The</strong> principle aim is constructively <strong>to</strong><br />
criticise and enrich the results presented by Mouillot et al. (1999), who estimate the<br />
optimum sample size for evaluating possible perturbations. <strong>The</strong> authors did not make<br />
any reference <strong>to</strong> statistical power analysis, even though their objective clearly went beyond<br />
a simple s<strong>to</strong>ck evaluation <strong>to</strong> assess management strategies in a particular marine<br />
ecosystem. Surprisingly, they proposed (a priori) an ANOVA design <strong>to</strong> test a hypothesis<br />
considering both space and temporal scales. However, the authors did not cover important<br />
<strong>to</strong>pics related with power analysis and the precautionary principle, both used<br />
in<strong>to</strong> environment impact assessment programmes for marine ecosystems. Based on<br />
their results and on statistical power analysis, it is demonstrated that the variability (dispersion<br />
statistics), a key fac<strong>to</strong>r they used <strong>to</strong> estimate the sample size, is less relevant<br />
than the magnitude of perturbation (effect size). <strong>The</strong>refore, a greater effort must be devoted<br />
<strong>to</strong> estimate the effect size of a particular phenomenon rather than a desired variability.<br />
Problem<br />
In a recent paper (Mouillot et al., 1999) the authors proposed a procedure <strong>to</strong> estimate<br />
the optimal sample size <strong>to</strong> assess the perturbations of the fish fauna in a marine reserve.<br />
Contrary <strong>to</strong> what Mouillot et al.'s (1999) work stated, the estimation of sample size<br />
seems <strong>to</strong> be simple only in those situations where the aim of an investigation is <strong>to</strong> evaluate,<br />
at a single scale of time and space, the abundance (density or biomass) of a s<strong>to</strong>ck.<br />
Under these circumstances, one goal could be <strong>to</strong> evaluate the abundance of a particular<br />
U. S. Copyright Clearance Center Code Statement: 0173-9565/02/2301 ± 0001$15.00/0
2 Ortiz<br />
biological population and another different goal could be <strong>to</strong> assess the putative impacts<br />
or perturbations on these populations. <strong>The</strong> abstract and outline of the problem of their<br />
work indicate that the authors' principal objective was more than a simple s<strong>to</strong>ck evaluation.<br />
<strong>The</strong> focus of the study was rather on evaluating optimum management strategies<br />
of a particular marine ecosystem.<br />
Even though Mouillot et al. (1999) used an elegant statistical method, one which is<br />
more appropriate <strong>to</strong> determine population descrip<strong>to</strong>rs such as the distribution pattern, it<br />
is unfortunately an incomplete analysis regarding sample size estimation. Although the<br />
authors planned, a priori, <strong>to</strong> apply an ANOVA design <strong>to</strong> test the working hypothesis,<br />
they made no reference <strong>to</strong> the statistical power analysis; this offers better opportunities<br />
for determining the deleterious effects of putative perturbations on the ecological system<br />
under study (Bernstein & Zalinski, 1983; Toft & Shea, 1983; Rotenberry & Wiens,<br />
1985; Gerrodette, 1987; Andrew & Maps<strong>to</strong>ne, 1987; Green, 1989; Peterman, 1990a,<br />
1990b; Peterman & M'Gonigle, 1992; Schlese & Nelson, 1996; Gray, 1996; Ribic &<br />
Ganio, 1996; Underwood, 1981, 1991, 1993, 1994, 1996, 1997; Sheppard, 1999).<br />
Had the authors explored power analysis theory, they would have recognised that<br />
while the variability (dispersion statistics) is an important fac<strong>to</strong>r <strong>to</strong> estimate the sample<br />
size, the magnitude of the perturbation is even more important. Hence, the questions <strong>to</strong><br />
be addressed are: How large is the disturbance? How many samples are necessary <strong>to</strong><br />
evaluate a putative perturbation?<br />
Moreover, the authors did not introduce the precautionary principle in their analysis.<br />
This principle states that potentially damaging pollution emissions should be reduced<br />
even if there is no scientific evidence <strong>to</strong> prove a causal link between emissions and effects<br />
(Peterman & M'Gonigle, 1992). Even though this concept is related with a particular<br />
type of perturbation (pollution emissions), it is possible <strong>to</strong> use it in a more general<br />
way <strong>to</strong> include other source of perturbations. <strong>The</strong>refore, environmental assessment programmes<br />
deserve a deeper analysis including more concepts and <strong>to</strong>ols, and should not<br />
reduce the issue solely <strong>to</strong> variability coefficients.<br />
<strong>The</strong> principal objectives of the current work are <strong>to</strong> describe the statistical power analysis<br />
strategy and <strong>to</strong> demonstrate, using the results of Mouillot et al. (1999), how its application<br />
helps <strong>to</strong> improve the estimation of the optimum sample size required under a<br />
perturbation working hypothesis.<br />
Proposedmethodology<br />
1. Statistical power analysis<br />
Statistical power analysis constitutes the most suitable procedure for estimating optimal<br />
sample size (n) starting from a particular working hypothesis. <strong>The</strong> focus here is on the<br />
probability of correctly detecting an effect (e.g., of a perturbation), that is, rejecting the<br />
null hypothesis (H 0 ) of no effect and accepting H 1 (Dixon & Massey, 1969; Winer,<br />
1971; Cohen, 1988; Sokal & Rohlf, 1995). <strong>The</strong> power of a test is that probability of<br />
correctly rejecting a false hypothesis, (H 0 ). Power is defined by 1 ± b, b being the probability<br />
of making a Type II error or an incorrect acceptance of H 0 . Power can be calculated<br />
from standard equations and tables (e.g., Dixon & Massey, 1969; Winer, 1971;<br />
Cohen, 1988). It is a function of Type I error (a) or incorrectly rejecting H 0 , the sample
Power analysis for sample size estimations 3<br />
size (n), sampling design, variability of sampling and effect size. Effect size is the magnitude<br />
of the true effect. <strong>The</strong> larger it is, the more likely a given design with a given<br />
sample size will correctly reject H 0 at a stated a level. Additionally, it is possible <strong>to</strong><br />
evaluate the quality of sampling design and the size of sampling units, especially when<br />
it is necessary <strong>to</strong> satisfy the assumptions of normality of a data set for parametric tests<br />
and the homogeneity of variances for parametric and non-parametric tests (Underwood,<br />
1981, 1997).<br />
<strong>The</strong>re are two types of statistical power analysis: the first is before the start of data<br />
collection programs, experiments or management manipulations (a priori power analysis),<br />
the second afterwards when the work has been done (a posteriori power analysis).<br />
A priori analysis is commonly used in fisheries sciences before starting an experiment<br />
or management program in order <strong>to</strong> estimate the sample size necessary <strong>to</strong> generate acceptably<br />
high power. Another application is <strong>to</strong> plan the magnitude of treatment perturbations<br />
(effect sizes) necessary <strong>to</strong> produce high power. It is also used <strong>to</strong> determine<br />
beforehand how large an effect size would need <strong>to</strong> be in order <strong>to</strong> give an acceptable<br />
power, given a planned sample size. On the other hand, a posteriori analysis is relevant<br />
only when interpreting the results of a statistical test that has already failed <strong>to</strong> reject the<br />
null hypothesis (for more details see the excellent work of Peterman, 1990a).<br />
2. Data analysis<br />
Based on the results obtained by Mouillot et al. (1999) (from their Tables 5 and 6), estimations<br />
of power curves were calculated for N° of samples having variability coefficients<br />
of 10 and 25 %. This procedure was carried out utilising the power tables for<br />
ANOVA design (from 2 <strong>to</strong> 25 N° of treatments) described by Cohen (1988) at three<br />
different levels of effect size: small, medium and large (0.1, 0.25 and 0.4, respectively).<br />
Note that these levels of effect size were taken from the social sciences context and<br />
`small', `medium' and `large' may not be suitable for biological and ecological studies.<br />
In the present analysis, these are only used as extreme possible situations. <strong>The</strong> a rate<br />
was fixed at 0.05 and the acceptable power at 0.80 (b = 0.20). <strong>The</strong>se levels of a and b<br />
are usually defined as the minimum acceptable (Peterman, 1990a; Peterman & M'Gonigle,<br />
1992; Schlese & Nelson, 1996; Ribic & Ganio, 1996; Underwood, 1981, 1997),<br />
although, under a conservative test hypothesis, they must be set a = b, or at a desired<br />
power = 0.95 for a = 0.05 (Peterman, 1990a). Additionally, Maps<strong>to</strong>ne (1995) proposed<br />
a procedure for determining the a and b rate that is based on the magnitude of impacts<br />
(effect sizes). Using power analysis it is possible <strong>to</strong> increase the robustness of any statistical<br />
test, which means that in testing a working hypothesis (e.g., perturbation) the<br />
probability of Type I and Type II statistical errors is simultaneously decreased (Tiku et<br />
al., 1986).<br />
Finally, the results of Mouillot et al.'s (1999) work (see their Tables 5 and 6) were<br />
separated by habitat and time: (1) Seagrass habitat west for June, July and August, (2)<br />
Rocky bot<strong>to</strong>ms south for Oc<strong>to</strong>ber, November and December and (3) Rocky bot<strong>to</strong>ms<br />
east for November and December (Table 1). <strong>The</strong> sample size (n) used <strong>to</strong> calculate the<br />
power curves was the average of those obtained by Mouillot et al. (1999) per habitat for<br />
the two coefficients of variation (10 % and 25 %).
Table 1. Power values (probabilities) for an ANOVA design with a = 0.05. For 2 <strong>to</strong> 25 N° of treatments and for each one of the habitats analysed by Mouillot et al. (1999).<br />
A small effect size is 0.1, medium effect is 0.25 and a large effect is 0.4 (sensu Cohen, 1988).<br />
power (with Type I error = 0.05)<br />
N°<br />
marine habitats<br />
treatment - 1<br />
seagrass west rocky bot<strong>to</strong>ms south rocky bot<strong>to</strong>ms east<br />
(k-1)<br />
variability coefficient variability coefficient variability coefficient<br />
10 % 25 % 10 % 25 % 10 % 25 %<br />
(comparisons) sample size sample size sample size<br />
n = 118 n = 19 n = 112 n = 18 n = 130 n = 21<br />
effect size effect size effect size<br />
0.1 0.25 0.4 0.1 0.25 0.4 0.1 0.25 0.4 0.1 0.25 0.4 0.1 0.25 0.4 0.1 0.25 0.4<br />
1 34 97 99 9 33 68 29 94 99 8 31 66 36 98 99 9 36 73<br />
2 38 99 99 9 36 76 32 98 99 9 34 73 41 99 99 9 40 80<br />
3 43 99 99 9 41 83 36 99 99 9 39 80 46 99 99 10 45 87<br />
4 47 99 99 10 45 88 40 99 99 9 43 86 50 99 99 10 50 91<br />
5 52 99 99 10 49 92 44 99 99 10 47 90 55 99 99 11 54 94<br />
6 56 99 99 11 53 94 47 99 99 10 51 93 60 99 99 11 58 96<br />
8 63 99 99 11 60 97 54 99 99 11 57 97 67 99 99 12 66 99<br />
10 69 99 99 12 67 99 60 99 99 12 64 98 73 99 99 13 72 99<br />
12 74 99 99 13 72 99 65 99 99 12 69 99 78 99 99 14 77 99<br />
15 81 99 99 14 78 99 71 99 99 13 76 99 83 99 99 15 84 99<br />
24 92 99 99 21 97 99 85 99 99 16 89 99 94 99 99 18 94 99<br />
Note: n = average sample size and coefficient of variability per habitat from Mouillot et al. (1999)<br />
4 Ortiz
Power analysis for sample size estimations 5<br />
Fig. 1. Power curves for ANOVA design for seagrass west habitat with a. 10 % (n = 118) and b. 25 %<br />
(n = 19) of variability (dispersion coefficient). Dotted line represents the minimum acceptable power of 0.80.<br />
Re-evaluation of data<br />
Figures 1, 2 and 3 show that for the seagrass, rocky shores south and rocky shores west<br />
habitats, respectively, independent of the coefficient of variability, the magnitude of a<br />
possible perturbation (effect size) must always be large, that is ³ 0.40, <strong>to</strong> ensure a high<br />
robustness of the statistical test (ANOVA) in assessing the hypothesis. In a situation<br />
where the putative perturbation is small (effect size = 0.10), no sample size proposed by<br />
Mouillot et al. (1999) is sufficient. Additionally, if the effect size is < 0.10, more than<br />
1000 samples must be taken for an ANOVA design.
6 Ortiz<br />
Fig. 2. Power curves for ANOVA design for rocky shores south habitat with a. 10 % (n = 112) and b. 25 %<br />
(n = 18) of variability (dispersion coefficient). Dotted line represents the minimum acceptable power of 0.80.<br />
Based on these considerations, it is demonstrated that the procedure for estimating<br />
the sample size is more complex than that proposed by Mouillot et al. (1999). Definitively,<br />
estimating the magnitude of perturbation, i. e. effect sizes, becomes essential not<br />
only <strong>to</strong> increase the robustness of the statistical test, but also <strong>to</strong> optimise the costs of the<br />
sampling program. For instance, in a situation with large magnitudes of effect size (e.g.,<br />
0.40) it would only be necessary <strong>to</strong> have a maximum variability of 25 %. This would<br />
significantly reduce the costs of sampling and study time. Finally, once effect size rates<br />
are roughly estimated, based on previous sampling or a detailed review of the scientific<br />
literature (for comparable situations), a priori power analysis can be implemented <strong>to</strong><br />
estimate the optimum sample size (Cohen, 1988; Peterman, 1990a; Underwood, 1997).
Power analysis for sample size estimations 7<br />
Fig. 3. Power curves for ANOVA design for rocky shores east habitat with a. 10 % (n = 130) and b. 25 %<br />
(n = 21) of variability (dispersion coefficient). Dotted line represents the minimum acceptable power of 0.80.<br />
Thus, an incorrect sampling program <strong>to</strong> assess a potential impact could have negative<br />
consequences on the marine natural systems, especially when the null hypothesis (e.g.,<br />
non-perturbation) was not rejected and simultaneously a posteriori statistical power<br />
was not determined, which finally determines the quality of our conclusions. However,<br />
in those cases when the null hypothesis was not rejected with low power, it is strongly<br />
recommended <strong>to</strong> use the precautionary principle as an important decision-making <strong>to</strong>ol<br />
for improving management polices.
8 Ortiz<br />
Conclusions<br />
<strong>The</strong> decrease of the sampling variability (coefficient of variability) is definitively an<br />
excellent and useful strategy <strong>to</strong> be applied when the magnitude of effect size is small<br />
and, therefore, a large sample size is required. However, this procedure must be applied<br />
once the size of the perturbation is known, not before. <strong>The</strong>refore, any sampling program<br />
design should be avoided that could have deleterious consequences for natural systems,<br />
especially when the acceptance of the null hypothesis was incorrect! This situation<br />
could support misguided management plans, as was described extensively by Peterman<br />
(1990a). In conclusion, the magnitude of perturbation (effect size) is the most relevant<br />
information for any hypothesis testing and more efforts must be focused <strong>to</strong>wards its estimation<br />
(Rotenberry & Wiens, 1985).<br />
Acknowledgements<br />
I would like <strong>to</strong> thank Prof. Dr. M. Wolff, M.Sc. C. Jimenez and Dr. S. Jesse and the anonymous reviewers for<br />
criticising and improving the manuscript.<br />
References<br />
Andrew, N. & B. Maps<strong>to</strong>ne, 1987: Sampling and the description of spatial pattern in marine ecology. Oceanogr.<br />
Mar. Biol. Annu. Rev., 25: 39±90.<br />
Bernstein, B. & J. Zalinski, 1983: An optimum sampling design and power tests for environmental biologists.<br />
J. Environ. Manage., 16: 35±43.<br />
Cohen, J., 1988: Statistical power analysis for the behavioral sciences. 2 nd edition. L. Erlbaum Associates,<br />
Hillsdale, N.Y.; 567 pp.<br />
Dixon, W. & F. Massey, 1969: Introduction <strong>to</strong> statistical analysis. 3 rd edition. McGraw Hill Book Co., N.Y.;<br />
638 pp.<br />
Gerrodette, T., 1987: A power analysis for detecting trends. Ecology, 68(5): 1364±1372.<br />
Gray, J., 1996: Environmental science and a precautionary approach revisited. Mar. Pollut. Bull., 32(7): 532±<br />
534.<br />
Green, R., 1989: Power analysis and practical strategies for environmental moni<strong>to</strong>ring. Environ. Res., 50:<br />
195±205.<br />
Maps<strong>to</strong>ne, B., 1995: Scalable decision rules for environmental impacts studies: Effect size, Type I and Type II<br />
errors. Ecol. Appl., 5(2): 401±410.<br />
Mouillot, D., J.-M. Culioli, A. Leprete & J.-A. Tomasini, 1999: Dispersion statistics for three fish species<br />
(Symphodus ocellatus, Serranus scriba and Diplodus annularis) in the Lavezzi Islands Marine Reserve<br />
(South Corsica, Mediterranean Sea). P.S.Z.N.: Marine Ecology, 20(1): 19±34.<br />
Peterman, R., 1990a: Statistical power analysis can improve fisheries research and management. Can. J.<br />
Aquat. Sci., 47: 1±15.<br />
Peterman, R., 1990b: <strong>The</strong> importance of reporting statistical power: the forest decline and acidic deposition<br />
example. Ecology, 71(5): 2024±2027.<br />
Peterman, R. & M. M'Gonigle, 1992: Statistical power analysis and the precautionary principle. Mar. Pollut.<br />
Bull., 24(5): 231±234.<br />
Ribic, Ch. & L. Ganio, 1996: Power analysis for beach surveys of marine debris. Mar. Pollut. Bull., 32(7):<br />
554±557.<br />
Rotenberry, J. & J. Wiens, 1985: Statistical Power Analysis and community-wide patterns. Am. Nat., 125:<br />
164±168.<br />
Schlese, W. & W. Nelson, 1996: A power analysis of methods for assessment of change in seagrass cover.<br />
Aquat. Bot., 53: 227±233.<br />
Sheppard, Ch., 1999: How large should my sample be? Some quick guides <strong>to</strong> sample size and the power of<br />
test. Mar. Pollut. Bull., 38(6): 439±447.<br />
Sokal, R. & F. Rohlf, 1995: Biometry. 3 rd edition. W.H. Freeman and Co., San Francisco; 878 pp.<br />
Tiku, M., W. Tan & N. Balakrishnan, 1986: Robust Inference. Marcel Dekker, Inc., N.Y.; 321 pp.
Power analysis for sample size estimations 9<br />
Toft, K. & P. Shea, 1983: <strong>Detect</strong>ing community-wide patterns: Estimating power strengthens statistical inference.<br />
Am. Nat., 122(5): 618±625.<br />
Underwood, A., 1981: Techniques of analysis of variance in experimental marine biology and ecology. Oceanogr.<br />
Mar. Biol. Annu. Rev., 19: 513±605.<br />
Underwood, A., 1991: Beyond BACI: experimental designs for detecting human environmental impacts on<br />
temporal variations in natural populations. Aust. J. Mar. Freshwater Res., 42: 569±587.<br />
Underwood, A., 1993: <strong>The</strong> mechanics of spatially replicated sampling programmes <strong>to</strong> detect environmental<br />
impacts in a variable world. Aust. J. Ecol., 18: 99±117.<br />
Underwood, A., 1994: On beyond BACI: Sampling designs that might reliably detect environmental disturbances.<br />
Ecol. Appl., 4(1): 3±15.<br />
Underwood, A., 1996: <strong>Detect</strong>ion, interpretation, prediction and management of environmental disturbances:<br />
some roles for experimental marine ecology. J. Exp. Mar. Biol. Ecol., 200: 1±27.<br />
Underwood, A., 1997: Experiments in ecology: <strong>The</strong>ir logical design and interpretation using analysis of variance.<br />
Cambridge University Press; 504 pp.<br />
Winer, B., 1971: Statistical principles in experimental design. McGraw-Hill, N.Y.; 907 pp.