The purpose of sampling is to reduce the cost of collecting data about a population by gathering information from a subset instead of the entire population. While the optimal sampling procedure is heavily case-dependent, guidance and “rules of thumb” exist.
Factors Influencing Sample Size Decisions
A total of some 100 interviews is generally an adequate guess for sample sizes that allow making reasonable statistical inferences even for subgroups of the surveyed population such as grid-connected households. If it is intended to compare subgroups, for example electrified and non-electrified households, the sample size per subgroup must not fall below 30 households.
A statistically accurate determination of the required sample size mainly depends on the study’s measurement objective, which is in most cases the examination of changes in specific indicators (e.g. firewood consumption, studying hours) over time or differences in these indicators between project and control areas.
Technically, the required sample size for a given indicatorfor each household survey round and / or comparison group depends on five factors:
- the number of households in the target population
- the initial or baseline level of the indicator (e.g. fuelwood consumption before the intervention)
- the magnitude of change expected to be reliably measured (e.g. 50 percent fuel savings for improved stoves in comparison to traditional stoves)
- the degree of confidence with which it is desired to be certain that an observed change would not have occurred by chance (the level of statistical significance), and
- the degree of confidence with which it is desired to be certain that an actual change of the magnitude specified above will be detected (statistical power).
The first two are given by the characteristics of the population in the target region, while the last three are determined by the researcher. Sample size formulas and “rules of thumb” exist to determine concrete sample size figures based on this information (see for example Magnani 1997).
Another more pragmatic approach concerning sample size decisions is to somehow invert this procedure:
The potential sample size is established according to the survey budget in order to determine 3), 4), 5), the explanatory power to be expected from the survey. In case the explanatory power turns out to be higher or lower than desired, smaller adjustments may be made either at the budget or the sample size.
Since non-response in surveys, particularly in developing countries, can never be ruled out, it has to be considered during the calculation of sample size requirements as well. An allowance of 10% should prove adequate in most situations.
Selection of Sample Households
Once overall sample size requirements have been determined, the final step in developing the sample design is to determine how many clusters (e.g. villages) and how many households per cluster should be chosen. While smaller clusters imply higher sampling precision, they also imply higher survey costs due to higher costs of transporting and sustaining field staff. 20 interviews per cluster represent an adequate solution for this trade-off.
Whatever the sample size, the randomness of sample selection is crucial for guaranteeing representativeness of the collected data. Different easy-to-implement alternatives exist such as segmentation and random-walk methods where the first household to be interviewed is drawn randomly. The enumerators then simply select every 5th household, for example.
In the optimal case, sample households are selected by means of a list of all households located within each cluster (e.g. from census enumerations). A sample of units is then chosen using either simple random or systematic sampling. In most cases, though, such lists are not available. Conducting pre-surveys for an enumeration of households is likely to be unacceptably costly and time consuming.
- ↑ Magnani, Robert (1997). Sampling Guide. Food and Nutrition Technical Assistance Project (FANTA), Washington, D.C.