19.02.2015 Views

Optimum Sample Size to Detect Perturbation Effects: The ...

Optimum Sample Size to Detect Perturbation Effects: The ...

Optimum Sample Size to Detect Perturbation Effects: The ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

P.S.Z.N.: Marine Ecology, 23 (1): 1±9 (2002)<br />

ã 2002 Blackwell Verlag, Berlin<br />

ISSN 0173-9565<br />

TOPIC<br />

Accepted: August 26, 2001<br />

<strong>Optimum</strong> <strong>Sample</strong> <strong>Size</strong> <strong>to</strong> <strong>Detect</strong><br />

<strong>Perturbation</strong> <strong>Effects</strong>: <strong>The</strong> Importance<br />

of Statistical Power Analysis ±<br />

A Critique<br />

Marco Ortiz<br />

Institu<strong>to</strong> de Investigaciones OceanoloÂgicas, Facultad de Recursos del Mar,<br />

Universidad de An<strong>to</strong>fagasta, Casilla 170, An<strong>to</strong>fagasta, Chile.<br />

E-mail: mortiz@uan<strong>to</strong>f.cl<br />

With 3 figures and 1 table<br />

Keywords: Effect size, optimum sample size, precautionary principle, statistical<br />

power analysis, Type I (a) and II (b) errors, variability.<br />

Abstract. <strong>The</strong> current article describes statistical power analysis as an efficient strategy<br />

for the estimation of the optimum sample size. <strong>The</strong> principle aim is constructively <strong>to</strong><br />

criticise and enrich the results presented by Mouillot et al. (1999), who estimate the<br />

optimum sample size for evaluating possible perturbations. <strong>The</strong> authors did not make<br />

any reference <strong>to</strong> statistical power analysis, even though their objective clearly went beyond<br />

a simple s<strong>to</strong>ck evaluation <strong>to</strong> assess management strategies in a particular marine<br />

ecosystem. Surprisingly, they proposed (a priori) an ANOVA design <strong>to</strong> test a hypothesis<br />

considering both space and temporal scales. However, the authors did not cover important<br />

<strong>to</strong>pics related with power analysis and the precautionary principle, both used<br />

in<strong>to</strong> environment impact assessment programmes for marine ecosystems. Based on<br />

their results and on statistical power analysis, it is demonstrated that the variability (dispersion<br />

statistics), a key fac<strong>to</strong>r they used <strong>to</strong> estimate the sample size, is less relevant<br />

than the magnitude of perturbation (effect size). <strong>The</strong>refore, a greater effort must be devoted<br />

<strong>to</strong> estimate the effect size of a particular phenomenon rather than a desired variability.<br />

Problem<br />

In a recent paper (Mouillot et al., 1999) the authors proposed a procedure <strong>to</strong> estimate<br />

the optimal sample size <strong>to</strong> assess the perturbations of the fish fauna in a marine reserve.<br />

Contrary <strong>to</strong> what Mouillot et al.'s (1999) work stated, the estimation of sample size<br />

seems <strong>to</strong> be simple only in those situations where the aim of an investigation is <strong>to</strong> evaluate,<br />

at a single scale of time and space, the abundance (density or biomass) of a s<strong>to</strong>ck.<br />

Under these circumstances, one goal could be <strong>to</strong> evaluate the abundance of a particular<br />

U. S. Copyright Clearance Center Code Statement: 0173-9565/02/2301 ± 0001$15.00/0


2 Ortiz<br />

biological population and another different goal could be <strong>to</strong> assess the putative impacts<br />

or perturbations on these populations. <strong>The</strong> abstract and outline of the problem of their<br />

work indicate that the authors' principal objective was more than a simple s<strong>to</strong>ck evaluation.<br />

<strong>The</strong> focus of the study was rather on evaluating optimum management strategies<br />

of a particular marine ecosystem.<br />

Even though Mouillot et al. (1999) used an elegant statistical method, one which is<br />

more appropriate <strong>to</strong> determine population descrip<strong>to</strong>rs such as the distribution pattern, it<br />

is unfortunately an incomplete analysis regarding sample size estimation. Although the<br />

authors planned, a priori, <strong>to</strong> apply an ANOVA design <strong>to</strong> test the working hypothesis,<br />

they made no reference <strong>to</strong> the statistical power analysis; this offers better opportunities<br />

for determining the deleterious effects of putative perturbations on the ecological system<br />

under study (Bernstein & Zalinski, 1983; Toft & Shea, 1983; Rotenberry & Wiens,<br />

1985; Gerrodette, 1987; Andrew & Maps<strong>to</strong>ne, 1987; Green, 1989; Peterman, 1990a,<br />

1990b; Peterman & M'Gonigle, 1992; Schlese & Nelson, 1996; Gray, 1996; Ribic &<br />

Ganio, 1996; Underwood, 1981, 1991, 1993, 1994, 1996, 1997; Sheppard, 1999).<br />

Had the authors explored power analysis theory, they would have recognised that<br />

while the variability (dispersion statistics) is an important fac<strong>to</strong>r <strong>to</strong> estimate the sample<br />

size, the magnitude of the perturbation is even more important. Hence, the questions <strong>to</strong><br />

be addressed are: How large is the disturbance? How many samples are necessary <strong>to</strong><br />

evaluate a putative perturbation?<br />

Moreover, the authors did not introduce the precautionary principle in their analysis.<br />

This principle states that potentially damaging pollution emissions should be reduced<br />

even if there is no scientific evidence <strong>to</strong> prove a causal link between emissions and effects<br />

(Peterman & M'Gonigle, 1992). Even though this concept is related with a particular<br />

type of perturbation (pollution emissions), it is possible <strong>to</strong> use it in a more general<br />

way <strong>to</strong> include other source of perturbations. <strong>The</strong>refore, environmental assessment programmes<br />

deserve a deeper analysis including more concepts and <strong>to</strong>ols, and should not<br />

reduce the issue solely <strong>to</strong> variability coefficients.<br />

<strong>The</strong> principal objectives of the current work are <strong>to</strong> describe the statistical power analysis<br />

strategy and <strong>to</strong> demonstrate, using the results of Mouillot et al. (1999), how its application<br />

helps <strong>to</strong> improve the estimation of the optimum sample size required under a<br />

perturbation working hypothesis.<br />

Proposedmethodology<br />

1. Statistical power analysis<br />

Statistical power analysis constitutes the most suitable procedure for estimating optimal<br />

sample size (n) starting from a particular working hypothesis. <strong>The</strong> focus here is on the<br />

probability of correctly detecting an effect (e.g., of a perturbation), that is, rejecting the<br />

null hypothesis (H 0 ) of no effect and accepting H 1 (Dixon & Massey, 1969; Winer,<br />

1971; Cohen, 1988; Sokal & Rohlf, 1995). <strong>The</strong> power of a test is that probability of<br />

correctly rejecting a false hypothesis, (H 0 ). Power is defined by 1 ± b, b being the probability<br />

of making a Type II error or an incorrect acceptance of H 0 . Power can be calculated<br />

from standard equations and tables (e.g., Dixon & Massey, 1969; Winer, 1971;<br />

Cohen, 1988). It is a function of Type I error (a) or incorrectly rejecting H 0 , the sample


Power analysis for sample size estimations 3<br />

size (n), sampling design, variability of sampling and effect size. Effect size is the magnitude<br />

of the true effect. <strong>The</strong> larger it is, the more likely a given design with a given<br />

sample size will correctly reject H 0 at a stated a level. Additionally, it is possible <strong>to</strong><br />

evaluate the quality of sampling design and the size of sampling units, especially when<br />

it is necessary <strong>to</strong> satisfy the assumptions of normality of a data set for parametric tests<br />

and the homogeneity of variances for parametric and non-parametric tests (Underwood,<br />

1981, 1997).<br />

<strong>The</strong>re are two types of statistical power analysis: the first is before the start of data<br />

collection programs, experiments or management manipulations (a priori power analysis),<br />

the second afterwards when the work has been done (a posteriori power analysis).<br />

A priori analysis is commonly used in fisheries sciences before starting an experiment<br />

or management program in order <strong>to</strong> estimate the sample size necessary <strong>to</strong> generate acceptably<br />

high power. Another application is <strong>to</strong> plan the magnitude of treatment perturbations<br />

(effect sizes) necessary <strong>to</strong> produce high power. It is also used <strong>to</strong> determine<br />

beforehand how large an effect size would need <strong>to</strong> be in order <strong>to</strong> give an acceptable<br />

power, given a planned sample size. On the other hand, a posteriori analysis is relevant<br />

only when interpreting the results of a statistical test that has already failed <strong>to</strong> reject the<br />

null hypothesis (for more details see the excellent work of Peterman, 1990a).<br />

2. Data analysis<br />

Based on the results obtained by Mouillot et al. (1999) (from their Tables 5 and 6), estimations<br />

of power curves were calculated for N° of samples having variability coefficients<br />

of 10 and 25 %. This procedure was carried out utilising the power tables for<br />

ANOVA design (from 2 <strong>to</strong> 25 N° of treatments) described by Cohen (1988) at three<br />

different levels of effect size: small, medium and large (0.1, 0.25 and 0.4, respectively).<br />

Note that these levels of effect size were taken from the social sciences context and<br />

`small', `medium' and `large' may not be suitable for biological and ecological studies.<br />

In the present analysis, these are only used as extreme possible situations. <strong>The</strong> a rate<br />

was fixed at 0.05 and the acceptable power at 0.80 (b = 0.20). <strong>The</strong>se levels of a and b<br />

are usually defined as the minimum acceptable (Peterman, 1990a; Peterman & M'Gonigle,<br />

1992; Schlese & Nelson, 1996; Ribic & Ganio, 1996; Underwood, 1981, 1997),<br />

although, under a conservative test hypothesis, they must be set a = b, or at a desired<br />

power = 0.95 for a = 0.05 (Peterman, 1990a). Additionally, Maps<strong>to</strong>ne (1995) proposed<br />

a procedure for determining the a and b rate that is based on the magnitude of impacts<br />

(effect sizes). Using power analysis it is possible <strong>to</strong> increase the robustness of any statistical<br />

test, which means that in testing a working hypothesis (e.g., perturbation) the<br />

probability of Type I and Type II statistical errors is simultaneously decreased (Tiku et<br />

al., 1986).<br />

Finally, the results of Mouillot et al.'s (1999) work (see their Tables 5 and 6) were<br />

separated by habitat and time: (1) Seagrass habitat west for June, July and August, (2)<br />

Rocky bot<strong>to</strong>ms south for Oc<strong>to</strong>ber, November and December and (3) Rocky bot<strong>to</strong>ms<br />

east for November and December (Table 1). <strong>The</strong> sample size (n) used <strong>to</strong> calculate the<br />

power curves was the average of those obtained by Mouillot et al. (1999) per habitat for<br />

the two coefficients of variation (10 % and 25 %).


Table 1. Power values (probabilities) for an ANOVA design with a = 0.05. For 2 <strong>to</strong> 25 N° of treatments and for each one of the habitats analysed by Mouillot et al. (1999).<br />

A small effect size is 0.1, medium effect is 0.25 and a large effect is 0.4 (sensu Cohen, 1988).<br />

power (with Type I error = 0.05)<br />

N°<br />

marine habitats<br />

treatment - 1<br />

seagrass west rocky bot<strong>to</strong>ms south rocky bot<strong>to</strong>ms east<br />

(k-1)<br />

variability coefficient variability coefficient variability coefficient<br />

10 % 25 % 10 % 25 % 10 % 25 %<br />

(comparisons) sample size sample size sample size<br />

n = 118 n = 19 n = 112 n = 18 n = 130 n = 21<br />

effect size effect size effect size<br />

0.1 0.25 0.4 0.1 0.25 0.4 0.1 0.25 0.4 0.1 0.25 0.4 0.1 0.25 0.4 0.1 0.25 0.4<br />

1 34 97 99 9 33 68 29 94 99 8 31 66 36 98 99 9 36 73<br />

2 38 99 99 9 36 76 32 98 99 9 34 73 41 99 99 9 40 80<br />

3 43 99 99 9 41 83 36 99 99 9 39 80 46 99 99 10 45 87<br />

4 47 99 99 10 45 88 40 99 99 9 43 86 50 99 99 10 50 91<br />

5 52 99 99 10 49 92 44 99 99 10 47 90 55 99 99 11 54 94<br />

6 56 99 99 11 53 94 47 99 99 10 51 93 60 99 99 11 58 96<br />

8 63 99 99 11 60 97 54 99 99 11 57 97 67 99 99 12 66 99<br />

10 69 99 99 12 67 99 60 99 99 12 64 98 73 99 99 13 72 99<br />

12 74 99 99 13 72 99 65 99 99 12 69 99 78 99 99 14 77 99<br />

15 81 99 99 14 78 99 71 99 99 13 76 99 83 99 99 15 84 99<br />

24 92 99 99 21 97 99 85 99 99 16 89 99 94 99 99 18 94 99<br />

Note: n = average sample size and coefficient of variability per habitat from Mouillot et al. (1999)<br />

4 Ortiz


Power analysis for sample size estimations 5<br />

Fig. 1. Power curves for ANOVA design for seagrass west habitat with a. 10 % (n = 118) and b. 25 %<br />

(n = 19) of variability (dispersion coefficient). Dotted line represents the minimum acceptable power of 0.80.<br />

Re-evaluation of data<br />

Figures 1, 2 and 3 show that for the seagrass, rocky shores south and rocky shores west<br />

habitats, respectively, independent of the coefficient of variability, the magnitude of a<br />

possible perturbation (effect size) must always be large, that is ³ 0.40, <strong>to</strong> ensure a high<br />

robustness of the statistical test (ANOVA) in assessing the hypothesis. In a situation<br />

where the putative perturbation is small (effect size = 0.10), no sample size proposed by<br />

Mouillot et al. (1999) is sufficient. Additionally, if the effect size is < 0.10, more than<br />

1000 samples must be taken for an ANOVA design.


6 Ortiz<br />

Fig. 2. Power curves for ANOVA design for rocky shores south habitat with a. 10 % (n = 112) and b. 25 %<br />

(n = 18) of variability (dispersion coefficient). Dotted line represents the minimum acceptable power of 0.80.<br />

Based on these considerations, it is demonstrated that the procedure for estimating<br />

the sample size is more complex than that proposed by Mouillot et al. (1999). Definitively,<br />

estimating the magnitude of perturbation, i. e. effect sizes, becomes essential not<br />

only <strong>to</strong> increase the robustness of the statistical test, but also <strong>to</strong> optimise the costs of the<br />

sampling program. For instance, in a situation with large magnitudes of effect size (e.g.,<br />

0.40) it would only be necessary <strong>to</strong> have a maximum variability of 25 %. This would<br />

significantly reduce the costs of sampling and study time. Finally, once effect size rates<br />

are roughly estimated, based on previous sampling or a detailed review of the scientific<br />

literature (for comparable situations), a priori power analysis can be implemented <strong>to</strong><br />

estimate the optimum sample size (Cohen, 1988; Peterman, 1990a; Underwood, 1997).


Power analysis for sample size estimations 7<br />

Fig. 3. Power curves for ANOVA design for rocky shores east habitat with a. 10 % (n = 130) and b. 25 %<br />

(n = 21) of variability (dispersion coefficient). Dotted line represents the minimum acceptable power of 0.80.<br />

Thus, an incorrect sampling program <strong>to</strong> assess a potential impact could have negative<br />

consequences on the marine natural systems, especially when the null hypothesis (e.g.,<br />

non-perturbation) was not rejected and simultaneously a posteriori statistical power<br />

was not determined, which finally determines the quality of our conclusions. However,<br />

in those cases when the null hypothesis was not rejected with low power, it is strongly<br />

recommended <strong>to</strong> use the precautionary principle as an important decision-making <strong>to</strong>ol<br />

for improving management polices.


8 Ortiz<br />

Conclusions<br />

<strong>The</strong> decrease of the sampling variability (coefficient of variability) is definitively an<br />

excellent and useful strategy <strong>to</strong> be applied when the magnitude of effect size is small<br />

and, therefore, a large sample size is required. However, this procedure must be applied<br />

once the size of the perturbation is known, not before. <strong>The</strong>refore, any sampling program<br />

design should be avoided that could have deleterious consequences for natural systems,<br />

especially when the acceptance of the null hypothesis was incorrect! This situation<br />

could support misguided management plans, as was described extensively by Peterman<br />

(1990a). In conclusion, the magnitude of perturbation (effect size) is the most relevant<br />

information for any hypothesis testing and more efforts must be focused <strong>to</strong>wards its estimation<br />

(Rotenberry & Wiens, 1985).<br />

Acknowledgements<br />

I would like <strong>to</strong> thank Prof. Dr. M. Wolff, M.Sc. C. Jimenez and Dr. S. Jesse and the anonymous reviewers for<br />

criticising and improving the manuscript.<br />

References<br />

Andrew, N. & B. Maps<strong>to</strong>ne, 1987: Sampling and the description of spatial pattern in marine ecology. Oceanogr.<br />

Mar. Biol. Annu. Rev., 25: 39±90.<br />

Bernstein, B. & J. Zalinski, 1983: An optimum sampling design and power tests for environmental biologists.<br />

J. Environ. Manage., 16: 35±43.<br />

Cohen, J., 1988: Statistical power analysis for the behavioral sciences. 2 nd edition. L. Erlbaum Associates,<br />

Hillsdale, N.Y.; 567 pp.<br />

Dixon, W. & F. Massey, 1969: Introduction <strong>to</strong> statistical analysis. 3 rd edition. McGraw Hill Book Co., N.Y.;<br />

638 pp.<br />

Gerrodette, T., 1987: A power analysis for detecting trends. Ecology, 68(5): 1364±1372.<br />

Gray, J., 1996: Environmental science and a precautionary approach revisited. Mar. Pollut. Bull., 32(7): 532±<br />

534.<br />

Green, R., 1989: Power analysis and practical strategies for environmental moni<strong>to</strong>ring. Environ. Res., 50:<br />

195±205.<br />

Maps<strong>to</strong>ne, B., 1995: Scalable decision rules for environmental impacts studies: Effect size, Type I and Type II<br />

errors. Ecol. Appl., 5(2): 401±410.<br />

Mouillot, D., J.-M. Culioli, A. Leprete & J.-A. Tomasini, 1999: Dispersion statistics for three fish species<br />

(Symphodus ocellatus, Serranus scriba and Diplodus annularis) in the Lavezzi Islands Marine Reserve<br />

(South Corsica, Mediterranean Sea). P.S.Z.N.: Marine Ecology, 20(1): 19±34.<br />

Peterman, R., 1990a: Statistical power analysis can improve fisheries research and management. Can. J.<br />

Aquat. Sci., 47: 1±15.<br />

Peterman, R., 1990b: <strong>The</strong> importance of reporting statistical power: the forest decline and acidic deposition<br />

example. Ecology, 71(5): 2024±2027.<br />

Peterman, R. & M. M'Gonigle, 1992: Statistical power analysis and the precautionary principle. Mar. Pollut.<br />

Bull., 24(5): 231±234.<br />

Ribic, Ch. & L. Ganio, 1996: Power analysis for beach surveys of marine debris. Mar. Pollut. Bull., 32(7):<br />

554±557.<br />

Rotenberry, J. & J. Wiens, 1985: Statistical Power Analysis and community-wide patterns. Am. Nat., 125:<br />

164±168.<br />

Schlese, W. & W. Nelson, 1996: A power analysis of methods for assessment of change in seagrass cover.<br />

Aquat. Bot., 53: 227±233.<br />

Sheppard, Ch., 1999: How large should my sample be? Some quick guides <strong>to</strong> sample size and the power of<br />

test. Mar. Pollut. Bull., 38(6): 439±447.<br />

Sokal, R. & F. Rohlf, 1995: Biometry. 3 rd edition. W.H. Freeman and Co., San Francisco; 878 pp.<br />

Tiku, M., W. Tan & N. Balakrishnan, 1986: Robust Inference. Marcel Dekker, Inc., N.Y.; 321 pp.


Power analysis for sample size estimations 9<br />

Toft, K. & P. Shea, 1983: <strong>Detect</strong>ing community-wide patterns: Estimating power strengthens statistical inference.<br />

Am. Nat., 122(5): 618±625.<br />

Underwood, A., 1981: Techniques of analysis of variance in experimental marine biology and ecology. Oceanogr.<br />

Mar. Biol. Annu. Rev., 19: 513±605.<br />

Underwood, A., 1991: Beyond BACI: experimental designs for detecting human environmental impacts on<br />

temporal variations in natural populations. Aust. J. Mar. Freshwater Res., 42: 569±587.<br />

Underwood, A., 1993: <strong>The</strong> mechanics of spatially replicated sampling programmes <strong>to</strong> detect environmental<br />

impacts in a variable world. Aust. J. Ecol., 18: 99±117.<br />

Underwood, A., 1994: On beyond BACI: Sampling designs that might reliably detect environmental disturbances.<br />

Ecol. Appl., 4(1): 3±15.<br />

Underwood, A., 1996: <strong>Detect</strong>ion, interpretation, prediction and management of environmental disturbances:<br />

some roles for experimental marine ecology. J. Exp. Mar. Biol. Ecol., 200: 1±27.<br />

Underwood, A., 1997: Experiments in ecology: <strong>The</strong>ir logical design and interpretation using analysis of variance.<br />

Cambridge University Press; 504 pp.<br />

Winer, B., 1971: Statistical principles in experimental design. McGraw-Hill, N.Y.; 907 pp.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!