Behavioural Surveillance Surveys - The Wisdom of Whores

Behavioural Surveillance Surveys - The Wisdom of Whores Behavioural Surveillance Surveys - The Wisdom of Whores

wisdomofwhores.com
from wisdomofwhores.com More from this publisher
14.03.2015 Views

Figure 7 : Calculation selection probabilities, sampling weights, and standardized sampling weights — hypothetical data In this example, calculations of standardized weights are shown for the first five of a sample of clusters chosen in a hypothetical survey. Let n i = the number of sample elements chosen in cluster i, P i = overall probability of selection for sample elements in cluster i, w i = sampling weight for sample elements in cluster i, and w i ’ = standardized sampling weight for sample elements in cluster i. Cluster No. n i P i w i w i n i w i ’ 1 20 .033 30.30 606.06 .0502 2 11 .022 45.45 499.95 .0414 3 6 .030 33.33 199.98 .0166 4 13 .043 23.26 302.28 .0250 5 12 .023 43.48 521.76 .0432 . . . Total 300 12,073.02 Now suppose that you do your survey, and you randomly select 10 of the 30 brothels to be included in your sample. Measures of size are not available at the planning stage. So you select the brothels with equal probability. The small brothels therefore have the same chance of being selected as the larger brothels. In all likelihood (based on probabilities), you would select around 3-4 of the large brothels, and 6-7 of the small brothels. If you sub-sampled a fixed quota of 20 women from each brothel (for a total sample of 200), you would end up with approximately 130 women from the smaller brothels, and only 70 women from the larger brothels. Since the women from the smaller brothels would be much less likely to use condoms than the women from the larger brothel, you would underestimate consistent condom use for the total sample. This is essentially a weighting problem, because the women selected from the large brothels should account for 75% of the sample. By correctly keeping track of sampling probabilities for each cluster, weighting factors can easily be handled during analysis. But this can only happen if the proper information is recorded during the data collection process. Sample forms specifying what type of data to record during data collection are provided in Appendix 4. 64 C H A PTER 5 B EHAV I OR A L S U R V EI L L A NC E S U R V EY S

Calculating standard errors with multi-stage cluster designs In order to test the statistical significance of observed changes or trends, it is necessary to have estimates of the magnitude of sampling error associated with the survey estimates, commonly referred to as standard errors. The estimation of sampling depends upon the sample design used in collecting the data. As sample designs become more complex (e.g., when stratification, cluster sampling, and multiple stages of sample selection are used), the procedures for estimating standard errors become quite complicated. The estimation of standard errors for such designs is beyond the scope of this Guide, and consultation with a statistician is recommended. Unfortunately, standard statistical software packages such as SPSS and EPI-INFO do not provide an adequate solution to this problem. While both packages will estimate the standard errors of observed changes on indicators and perform appropriate statistical tests, the standard errors produced by these software packages assume that simple random sampling was used in gathering the survey data. Since it is highly probable that cluster sampling will be employed in BSS, the estimated standard errors produced by these packages will be incorrect. Because the standard errors generated by standard software packages will usually be under-estimated (because they assume simple random sampling), there is the danger that observed changes in indicators will be believed to be statistically significant when in fact they are not. To avoid this problem, and perform careful data analysis, software that can perform cluster analysis should be used. STATA and SUDAAN are two such packages. These programs do not assume that simple random sampling was used. They make use of the cluster information to calculate design effects, and adjust standard errors before conducting statistical tests, thereby avoiding the problem of incorrect conclusions. If using appropriate software is not an option, one other possibility is to compensate for the expected under-estimation of standard errors by tightening the criteria used for judging statistical significance. For example, instead of using p

Calculating standard errors<br />

with multi-stage cluster<br />

designs<br />

In order to test the statistical significance <strong>of</strong><br />

observed changes or trends, it is necessary to<br />

have estimates <strong>of</strong> the magnitude <strong>of</strong> sampling<br />

error associated with the survey estimates,<br />

commonly referred to as standard errors.<br />

<strong>The</strong> estimation <strong>of</strong> sampling depends upon<br />

the sample design used in collecting the data.<br />

As sample designs become more complex<br />

(e.g., when stratification, cluster sampling, and<br />

multiple stages <strong>of</strong> sample selection are used),<br />

the procedures for estimating standard errors<br />

become quite complicated. <strong>The</strong> estimation <strong>of</strong><br />

standard errors for such designs is beyond the<br />

scope <strong>of</strong> this Guide, and consultation with a<br />

statistician is recommended.<br />

Unfortunately, standard statistical s<strong>of</strong>tware<br />

packages such as SPSS and EPI-INFO do not<br />

provide an adequate solution to this problem.<br />

While both packages will estimate the standard<br />

errors <strong>of</strong> observed changes on indicators<br />

and perform appropriate statistical tests, the<br />

standard errors produced by these s<strong>of</strong>tware<br />

packages assume that simple random sampling<br />

was used in gathering the survey data. Since<br />

it is highly probable that cluster sampling will<br />

be employed in BSS, the estimated standard<br />

errors produced by these packages will be<br />

incorrect. Because the standard errors<br />

generated by standard s<strong>of</strong>tware packages will<br />

usually be under-estimated (because they<br />

assume simple random sampling), there is the<br />

danger that observed changes in indicators<br />

will be believed to be statistically significant<br />

when in fact they are not.<br />

To avoid this problem, and perform careful<br />

data analysis, s<strong>of</strong>tware that can perform cluster<br />

analysis should be used. STATA and SUDAAN<br />

are two such packages. <strong>The</strong>se programs do<br />

not assume that simple random sampling<br />

was used. <strong>The</strong>y make use <strong>of</strong> the cluster<br />

information to calculate design effects, and<br />

adjust standard errors before conducting<br />

statistical tests, thereby avoiding the problem<br />

<strong>of</strong> incorrect conclusions.<br />

If using appropriate s<strong>of</strong>tware is not an<br />

option, one other possibility is to compensate<br />

for the expected under-estimation <strong>of</strong> standard<br />

errors by tightening the criteria used for<br />

judging statistical significance. For example,<br />

instead <strong>of</strong> using p

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!