When you specify the GROUPS= option, PROC SURVEYSELECT provides random assignment of the observations in the DATA= input data set. The OUT= output data set contains all observations in the input data set and identifies the assigned groups. If you do not specify an ID statement, the output data set contains all variables in the input data set. If you specify an ID statement, PROC SURVEYSELECT copies those variable that you specify from the input data set to the output data set.
You can specify the name of the output data set by using the OUT= option in the PROC SURVEYSELECT statement. If you omit the OUT= option, the data set is named DATAn
, where n is the smallest integer that makes the name unique.
The random assignment output data set can include the following variables:
STRATA variables, if you specify a STRATA statement
Replicate
, which is the replicate identification number. This variable is included when you specify the REPS= option.
ID variables, if you specify an ID statement
GroupID
, which is the group identification number. If you specify a STRATA statement, PROC SURVEYSELECT performs random assignment independently within strata, and the groups are nested within strata.
InitialSeed
, which is the initial seed for random number generation
If you specify the OUTSIZE option, the random assignment output data set also includes the following variables:
Total
, which is the total number of units in the data set, or the total in the stratum if you specify a STRATA statement
NGroups
, which is the number of groups in the data set, or the number in the stratum if you specify a STRATA statement
GroupSize
, which is the number of units in the observation’s group