After a probability sample is drawn and survey data are collected, researchers sometimes want to stratify the sample according to auxiliary information about the sampled population. This process is often called poststratification.
When poststratification is done properly, it can improve efficiency. It can also be used to adjust the sampling weights such that the marginal distribution of the sampling weights is in agreement with known auxiliary information from other resources, such as the census. The adjusted weight is often called the poststratification weight. It is quite common for researchers to use poststratification techniques in survey data analysis.
Poststratification is also used by epidemiologists, who frequently analyze health survey data. They often compute statistics based on a process called direct standardization, a form of poststratification. For example, certain diseases, such as cancer, are more common among older populations. Therefore, to compare the prevalence rates among geographic regions that are populated with different age groups, it is necessary to make adjustments according to such demographic categories and to compute relative prevalence rates of the diseases.
For more information about poststratification, see Fuller (2009); Lohr (2010); Wolter (2007); Rao, Yung, and Hidiroglou (2002).
After you provide the population controls for each poststratum that is defined by the poststratification variables, the SURVEYMEANS procedure creates the poststratification weights accordingly. Then the procedure computes statistics that you request by using poststratification weights.
You can save the poststratification weights in an OUTPSWGT= data to be used in subsequent analyses.
For a selected sample, let be the poststratum index; let be the population totals for each corresponding poststratum; and let be a corresponding indicator variable for the poststratum p defined by
Denote the total sum of original weights in the sample for each poststratum as
Then the poststratification weight for the observation (h, i, j) is
The SURVEYMEANS procedure computes statistics by using the poststratification weights instead of the original weights .
The standard error and confidence intervals of computed statistics are based on the estimated variance, using either a replication method or the Taylor series method.
When you specify VARMETHOD=BRR or VARMETHOD=JACKKNIFE, PROC SURVEYMEANS computes the variance of a statistic by using replication methods, as described in the section Replication Methods for Variance Estimation. However, with poststratification, an extra step is needed to adjust the weights.
First, PROC SURVEYMEANS constructs a replicate and computes appropriate replicate weights for the replicate. Then, by using the poststratification control totals, the procedure adjusts these replicate weights in the same way as described previously for constructing the poststratification weights for the full sample. Finally, PROC SURVEYMEANS computes the estimate for a desired statistics by using the poststratification weights that are adjusted from the replicate weights in the current replicate. Then the final variance is estimated by the variability among replicate estimates, as described in the section Replication Methods for Variance Estimation.
When you specify VARMETHOD=TAYLOR, or by default when you do not specify the VARMETHOD= option, PROC SURVEYMEANS uses the Taylor series method to estimate the variances of requested statistics.
The sum and mean of variable Y under poststratification is
where
is the sum of the poststratification weights over all observations in the sample.
For each poststratum , let the mean of variable Y in each poststratum be
where is the total of the poststratification weights in poststratum p.
For observation (h, i, j), assume that it belongs to the pth poststratum. Let
PROC SURVEYMEANS estimates the variance of as
where, if , then
and if , then
PROC SURVEYMEANS estimates the variance of as
For a domain D, let be the corresponding indicator variable:
Let
The sum and mean of variable Y under poststratification in domain D are
where
is the sum of the poststratification weights over all observations in the sample in domain D. For each poststratum , let the mean of variable Y and the mean of the domain indicator variable in each poststratum be
Assume that the observation (h, i, j) belongs to the pth poststratum. Let
Then PROC SURVEYMEANS estimates the variance of domain sum as
where, if , then
and if , then
Then PROC SURVEYMEANS estimates the variance of domain mean as
where, if , then
and if , then
Suppose you want to calculate the ratio of variable Y to variable X. Let and be the values of variable X and variable Y, respectively, for observation (h, i, j).
The ratio of Y to X after poststratification is
where is the poststratification weight for observation .
Assume that the observation (h, i, j) belongs to the pth poststratum. Let
where and are the means of variable Y and variable X, respectively, in poststratum p.
The variance of is estimated by
where, if , then
and if , then
For a domain D, let be the corresponding indicator variable:
Let
The ratio of variable Y to variable X in domain D after poststratification is estimated by
For each poststratum , let the mean of variable X and Y in each poststratum be
Assume that the observation (h, i, j) belongs to the pth poststratum. Let
Then PROC SURVEYMEANS estimates the variance of domain ratio after poststratification as
where, if , then
and if , then