The primary input data set for PROC SURVEYSELECT is the DATA=
data set, which contains the list of units from which the sample is selected. You can use a secondary input data set to provide
stratum-level design and selection information, such as sample sizes or rates, certainty size values, or stratum costs. This
secondary input data set is sometimes called the SAMPSIZE=
input data set. You can provide stratum sample sizes in the _NSIZE_
(or SampleSize
) variable in the SAMPSIZE= data set.
The secondary input data set must contain all the STRATA variables, with the same type and length as in the DATA= data set. The STRATA groups should appear in the same order in the secondary data set as in the DATA= data set. You can name only one secondary data set in each invocation of PROC SURVEYSELECT.
You must name the secondary input data set in the appropriate PROC SURVEYSELECT
or STRATA
option, and use the designated variable name to provide the stratum-level values. For example, if you want to provide stratum-level
costs for sample allocation, you name the secondary data set in the COST=SAS-data-set
option in the STRATA statement. The data set must include the stratum costs in a variable named _COST_
. You can use the secondary input data set for more than one option if it is appropriate for your design. For example, the
secondary data set can include both stratum costs and stratum variances, which are required for optimal allocation (ALLOC=OPTIMAL
).
Instead of using a separate secondary input data set, you can include secondary information in the DATA= data set along with the sampling frame. When you include secondary information in the DATA= data set, name the DATA= data set in the appropriate options, and include the required variables in the DATA= data set.
Table 102.3 lists the available secondary data set variables, together with their descriptions and the corresponding options.
Table 102.3: PROC SURVEYSELECT Secondary Data Set Variables
Variable |
Description |
Statement |
Option |
---|---|---|---|
_ALLOC_ |
Allocation proportion |
||
_CERTP_ |
Certainty proportion |
||
_CERTSIZE_ |
Certainty size |
||
_COST_ |
Cost |
||
_MAXSIZE_ |
Maximum size |
||
_MINSIZE_ |
Minimum size |
||
_NSIZE_ |
Sample size |
||
_RATE_ |
Sampling rate |
||
_SEED_ |
Random number seed |
||
_VAR_ |
Variance |