Behavioural Surveillance Surveys - The Wisdom of Whores
Behavioural Surveillance Surveys - The Wisdom of Whores Behavioural Surveillance Surveys - The Wisdom of Whores
Figure 7 : Calculation selection probabilities, sampling weights, and standardized sampling weights — hypothetical data In this example, calculations of standardized weights are shown for the first five of a sample of clusters chosen in a hypothetical survey. Let n i = the number of sample elements chosen in cluster i, P i = overall probability of selection for sample elements in cluster i, w i = sampling weight for sample elements in cluster i, and w i ’ = standardized sampling weight for sample elements in cluster i. Cluster No. n i P i w i w i n i w i ’ 1 20 .033 30.30 606.06 .0502 2 11 .022 45.45 499.95 .0414 3 6 .030 33.33 199.98 .0166 4 13 .043 23.26 302.28 .0250 5 12 .023 43.48 521.76 .0432 . . . Total 300 12,073.02 Now suppose that you do your survey, and you randomly select 10 of the 30 brothels to be included in your sample. Measures of size are not available at the planning stage. So you select the brothels with equal probability. The small brothels therefore have the same chance of being selected as the larger brothels. In all likelihood (based on probabilities), you would select around 3-4 of the large brothels, and 6-7 of the small brothels. If you sub-sampled a fixed quota of 20 women from each brothel (for a total sample of 200), you would end up with approximately 130 women from the smaller brothels, and only 70 women from the larger brothels. Since the women from the smaller brothels would be much less likely to use condoms than the women from the larger brothel, you would underestimate consistent condom use for the total sample. This is essentially a weighting problem, because the women selected from the large brothels should account for 75% of the sample. By correctly keeping track of sampling probabilities for each cluster, weighting factors can easily be handled during analysis. But this can only happen if the proper information is recorded during the data collection process. Sample forms specifying what type of data to record during data collection are provided in Appendix 4. 64 C H A PTER 5 B EHAV I OR A L S U R V EI L L A NC E S U R V EY S
Calculating standard errors with multi-stage cluster designs In order to test the statistical significance of observed changes or trends, it is necessary to have estimates of the magnitude of sampling error associated with the survey estimates, commonly referred to as standard errors. The estimation of sampling depends upon the sample design used in collecting the data. As sample designs become more complex (e.g., when stratification, cluster sampling, and multiple stages of sample selection are used), the procedures for estimating standard errors become quite complicated. The estimation of standard errors for such designs is beyond the scope of this Guide, and consultation with a statistician is recommended. Unfortunately, standard statistical software packages such as SPSS and EPI-INFO do not provide an adequate solution to this problem. While both packages will estimate the standard errors of observed changes on indicators and perform appropriate statistical tests, the standard errors produced by these software packages assume that simple random sampling was used in gathering the survey data. Since it is highly probable that cluster sampling will be employed in BSS, the estimated standard errors produced by these packages will be incorrect. Because the standard errors generated by standard software packages will usually be under-estimated (because they assume simple random sampling), there is the danger that observed changes in indicators will be believed to be statistically significant when in fact they are not. To avoid this problem, and perform careful data analysis, software that can perform cluster analysis should be used. STATA and SUDAAN are two such packages. These programs do not assume that simple random sampling was used. They make use of the cluster information to calculate design effects, and adjust standard errors before conducting statistical tests, thereby avoiding the problem of incorrect conclusions. If using appropriate software is not an option, one other possibility is to compensate for the expected under-estimation of standard errors by tightening the criteria used for judging statistical significance. For example, instead of using p
- Page 23 and 24: Funders of HIV prevention activitie
- Page 25 and 26: What is to be measured ? Indicators
- Page 27 and 28: Step 7 : Constructing a sampling fr
- Page 29 and 30: Step 12 : Data collection and super
- Page 31 and 32: choosing population groups GUIDELIN
- Page 33 and 34: the spread of HIV. For this reason,
- Page 35 and 36: In some circumstances, public healt
- Page 37 and 38: GUIDELINES FOR REPEATED BEHAVIORAL
- Page 39 and 40: The majority of sub-populations of
- Page 41 and 42: In the end, the issue boils down to
- Page 43 and 44: In looking at behaviors of hard-to-
- Page 45 and 46: Maps derived from program planning
- Page 47 and 48: Selecting primary sampling units (c
- Page 49 and 50: ...when measures of size are not av
- Page 51 and 52: If there is no reason to believe th
- Page 53 and 54: Figure 3 : Decision tree for first-
- Page 55 and 56: Implications of alternative samplin
- Page 57 and 58: Table 5 : Values of Z 1-α and Z 1-
- Page 59 and 60: Table 6 : Sample size requirements
- Page 61 and 62: equired. In this case, the pros and
- Page 63 and 64: Should one- or two-tailed z-score v
- Page 65 and 66: Other measurement issues for BSS Th
- Page 67 and 68: A “low-tech” solution to the pr
- Page 69 and 70: 5 Weighting in multi-stage sampling
- Page 71 and 72: Figure 6 : Procedures for calculati
- Page 73: Calculating weights from sampling p
- Page 77 and 78: 6 Adapting and using questionnaires
- Page 79 and 80: Some attempts have in the past been
- Page 81 and 82: Informed consent Confidentiality an
- Page 83 and 84: GUIDELINES FOR REPEATED BEHAVIORAL
- Page 85 and 86: Recommended Methods of Statistical
- Page 87 and 88: where n is the sample size in the d
- Page 89 and 90: Table 2 : Reported number of non-re
- Page 91 and 92: False conclusions: the danger of co
- Page 93 and 94: Analysis of trends in behavior over
- Page 95 and 96: Changes in sampling methodology ove
- Page 97 and 98: 8 Using the data collected to impro
- Page 99 and 100: Improving prevention programs As a
- Page 101 and 102: Finally, information about HIV and
- Page 103 and 104: Figure 9 : HIV and STD prevalence,
- Page 105 and 106: 9 Indicators This guide identifies
- Page 107 and 108: The standard questionnaires are acc
- Page 109 and 110: INDICATORS Page INDICATORS FOR YOUT
- Page 111 and 112: INDICATORS Page INDICATORS FOR MEN
- Page 113 and 114: KEY INDICATORS What follows is a fu
- Page 115 and 116: The indicator uses promoted data. O
- Page 117 and 118: To be counted the numerator for thi
- Page 119 and 120: Strengths and limitations Some meas
- Page 121 and 122: Adult Indicator 5 Consistent condom
- Page 123 and 124: Adult Indicator 7 (men only) Number
Figure 7 : Calculation selection probabilities, sampling weights, and standardized<br />
sampling weights — hypothetical data<br />
In this example, calculations <strong>of</strong> standardized weights are shown for the first five <strong>of</strong> a sample<br />
<strong>of</strong> clusters chosen in a hypothetical survey. Let n i<br />
= the number <strong>of</strong> sample elements chosen<br />
in cluster i, P i<br />
= overall probability <strong>of</strong> selection for sample elements in cluster i, w i<br />
= sampling<br />
weight for sample elements in cluster i, and w i<br />
’ = standardized sampling weight for sample<br />
elements in cluster i.<br />
Cluster<br />
No. n i<br />
P i<br />
w i<br />
w i<br />
n i<br />
w i<br />
’<br />
1 20 .033 30.30 606.06 .0502<br />
2 11 .022 45.45 499.95 .0414<br />
3 6 .030 33.33 199.98 .0166<br />
4 13 .043 23.26 302.28 .0250<br />
5 12 .023 43.48 521.76 .0432<br />
.<br />
.<br />
.<br />
Total 300 12,073.02<br />
Now suppose that you do your survey,<br />
and you randomly select 10 <strong>of</strong> the 30 brothels<br />
to be included in your sample. Measures <strong>of</strong><br />
size are not available at the planning stage.<br />
So you select the brothels with equal<br />
probability. <strong>The</strong> small brothels therefore<br />
have the same chance <strong>of</strong> being selected as<br />
the larger brothels. In all likelihood (based on<br />
probabilities), you would select around 3-4 <strong>of</strong><br />
the large brothels, and 6-7 <strong>of</strong> the small brothels.<br />
If you sub-sampled a fixed quota <strong>of</strong> 20 women<br />
from each brothel (for a total sample <strong>of</strong> 200),<br />
you would end up with approximately 130<br />
women from the smaller brothels, and only 70<br />
women from the larger brothels. Since the<br />
women from the smaller brothels would be<br />
much less likely to use condoms than the<br />
women from the larger brothel, you would<br />
underestimate consistent condom use for the<br />
total sample.<br />
This is essentially a weighting problem,<br />
because the women selected from the large<br />
brothels should account for 75% <strong>of</strong> the sample.<br />
By correctly keeping track <strong>of</strong> sampling<br />
probabilities for each cluster, weighting factors<br />
can easily be handled during analysis. But this<br />
can only happen if the proper information is<br />
recorded during the data collection process.<br />
Sample forms specifying what type <strong>of</strong> data<br />
to record during data collection are provided<br />
in Appendix 4.<br />
64<br />
C H A PTER 5 B EHAV I OR A L S U R V EI L L A NC E S U R V EY S