To select a sample with PROC SURVEYSELECT, you input a SAS data set that contains the sampling frame (the list of units from which the sample is to be selected). You also specify the selection method, the desired sample size or sampling rate, and other selection parameters. PROC SURVEYSELECT selects the sample and produces an output data set that contains the selected units, their selection probabilities, and their sampling weights. For more information, see ChapterĀ 102: The SURVEYSELECT Procedure.
In this example, the sample design is a stratified sample design, with households as the sampling units and selection by simple
random sampling. The SAS data set HHFrame
contains the sampling frame, which is the list of households in the survey population. The sampling frame is stratified by
the variables State
and Region
. Within strata, households are selected by simple random sampling. The following PROC SURVEYSELECT statements select a probability
sample of households according to this sample design:
proc surveyselect data=HHFrame out=HHSample method=srs n=(3, 5, 3, 6, 2); strata State Region; run;
The STRATA statement names the stratification variables State
and Region
. In the PROC SURVEYSELECT statement, the DATA= option names the SAS data set HHFrame
as the input data set (or sampling frame) from which to select the sample. The OUT= option stores the sample in the SAS data
set named HHSample
. The METHOD=SRS option specifies simple random sampling as the sample selection method. The N= option specifies the stratum
sample sizes.
The SURVEYSELECT procedure then selects a stratified random sample of households and produces the output data set HHSample
, which contains the selected households together with their selection probabilities and sampling weights. The data set HHSample
also contains the sampling unit identification variable Id
and the stratification variables State
and Region
from the input data set HHFrame
.