-
CERTSIZE
-
requests certainty selection, where the certainty size values are provided in the secondary input data set. Use the CERTSIZE
option when you have already named the secondary data set in another option, such as the SAMPSIZE=SAS-data-set option. See the section Secondary Input Data Set for details.
The CERTSIZE option is available for METHOD=PPS and METHOD=PPS_SAMPFORD. The CERTSIZE option is not available with the SAMPLINGUNIT statement.
In certainty selection, PROC SURVEYSELECT automatically selects all sampling units that have size measures greater than or
equal to the stratum certainty size values. After identifying the certainty units, PROC SURVEYSELECT selects the remainder
of the sample according to the method that is specified in the METHOD= option.
You provide the stratum certainty size values in the secondary input data set variable _CERTSIZE_
. Each certainty size value must be a positive number. The variable Certain
in the OUT= data set identifies the certainty selections, which have selection probabilities equal to 1.
If you want to specify a single certainty size value for all strata, you can use the CERTSIZE=certain option.
-
CERTSIZE=certain
-
specifies the certainty size value, which must be a positive number. PROC SURVEYSELECT automatically selects all sampling
units that have size measures greater than or equal to the value certain. After identifying the certainty units, PROC SURVEYSELECT selects the remainder of the sample according to the method that
is specified in the METHOD= option.
The CERTSIZE= option is available for METHOD=PPS and METHOD=PPS_SAMPFORD. The CERTSIZE= option is not available with the SAMPLINGUNIT statement.
The variable Certain
in the OUT= data set identifies the certainty selections, which have selection probabilities equal to 1.
If you request a stratified sample design with the STRATA statement and specify the CERTSIZE=certain option, PROC SURVEYSELECT uses the value certain for all strata. If you do not want to use the same certainty size for all strata, use the CERTSIZE=SAS-data-set option to specify a certainty size value for each stratum.
-
CERTSIZE=SAS-data-set
-
names a SAS data set that contains certainty size values for the strata. PROC SURVEYSELECT automatically selects all sampling
units that have size measures greater than or equal to the stratum certainty size values. After identifying the certainty
units, PROC SURVEYSELECT selects the remainder of the sample according to the method that is specified in the METHOD= option.
The CERTSIZE= option is available for METHOD=PPS and METHOD=PPS_SAMPFORD. The CERTSIZE= option is not available with the SAMPLINGUNIT statement.
You provide the stratum certainty size values in the CERTSIZE= data set variable _CERTSIZE_
. Each certainty size value must be a positive number. The variable Certain
in the OUT= data set identifies the certainty selections, which have selection probabilities equal to 1.
The CERTSIZE= input data set should contain all the STRATA variables, with the same type and length as in the DATA= data set.
The STRATA groups should appear in the same order in the CERTSIZE= data set as in the DATA= data set. The CERTSIZE= data set
must include a variable named _CERTSIZE_
that contains the certainty size value for each stratum. The CERTSIZE= data set is a secondary input data set. See the section
Secondary Input Data Set for details. You can name only one secondary input data set in each invocation of the procedure.
If you want to specify a single certainty size value for all strata, you can use the CERTSIZE=certain option.
-
CERTSIZE=P
-
requests certainty proportion selection, where the stratum certainty proportions are provided in the secondary input data
set. Use the CERTSIZE=P option when you have already named the secondary data set in another option, such as the SAMPSIZE=SAS-data-set option. See the section Secondary Input Data Set for details.
The CERTSIZE=P option is available for METHOD=PPS and METHOD=PPS_SAMPFORD. The CERTSIZE=P option it not available with the SAMPLINGUNIT statement.
In certainty proportion selection, PROC SURVEYSELECT automatically selects all sampling units that have size measures greater
than or equal to the stratum certainty proportion of the total stratum size. The procedure repeats this process with the remaining
units until no more certainty units are selected. After identifying the certainty units, PROC SURVEYSELECT selects the remainder
of the sample according to the method that is specified in the METHOD= option.
You provide the stratum certainty proportions in the secondary input data set variable _CERTP_
. Each certainty proportion must be a positive number. You can specify a proportion value as a number between 0 and 1. Or
you can specify a proportion value in percentage form as a number between 1 and 100, and PROC SURVEYSELECT converts that number
to a proportion. The procedure treats the value 1 as 100% instead of 1%.
The variable Certain
in the OUT= data set identifies the certainty selections, which have selection probabilities equal to 1.
If you want to specify a single certainty proportion for all strata, you can use the CERTSIZE=P=p option.
-
CERTSIZE=P=p
-
specifies the certainty proportion. PROC SURVEYSELECT automatically selects all sampling units that have size measures greater
than or equal to the proportion p of the total stratum size. The procedure repeats this process with the remaining units until no more certainty units are
selected. After identifying the certainty units, PROC SURVEYSELECT selects the remainder of the sample according to the method
that is specified in the METHOD= option.
The CERTSIZE=P= option is available for METHOD=PPS and METHOD=PPS_SAMPFORD. The CERTSIZE=P= option is not available with the SAMPLINGUNIT statement.
The value of the certainty proportion p must be a positive number. You can specify p as a number between 0 and 1. Or you can specify p in percentage form as a number between 1 and 100, and PROC SURVEYSELECT converts that number to a proportion. The procedure
treats the value 1 as 100% instead of 1%.
The variable Certain
in the OUT= data set identifies the certainty selections, which have selection probabilities equal to 1.
If you request a stratified sample design with the STRATA statement and specify the CERTSIZE=P=p option, PROC SURVEYSELECT uses the certainty proportion p for all strata. If you do not want to use the same certainty proportion for all strata, use the CERTSIZE=P=SAS-data-set option to specify a certainty proportion for each stratum.
-
CERTSIZE=P=SAS-data-set
-
names a SAS data set that contains certainty proportions for the strata. PROC SURVEYSELECT automatically selects all sampling
units with size measures greater than or equal to the certainty proportion of the total stratum size. The procedure repeats
this process with the remaining units until no more certainty units are selected. After identifying the certainty units, PROC
SURVEYSELECT selects the remainder of the sample according to the method that is specified in the METHOD= option.
The CERTSIZE=P= option is available for METHOD=PPS and METHOD=PPS_SAMPFORD. The CERTSIZE=P= option is not available with the SAMPLINGUNIT statement.
You provide the stratum certainty proportions in the CERTSIZE=P= data set variable _CERTP_
. Each certainty proportion must be a positive number. You can specify a proportion value as a number between 0 and 1. Or
you can specify a proportion value in percentage form as a number between 1 and 100, and PROC SURVEYSELECT converts that number
to a proportion. The procedure treats the value 1 as 100% instead of 1%.
The variable Certain
in the OUT= data set identifies the certainty selections, which have selection probabilities equal to 1.
The CERTSIZE=P= input data set should contain all the STRATA variables, with the same type and length as in the DATA= data
set. The STRATA groups should appear in the same order in the CERTSIZE=P= data set as in the DATA= data set. The CERTSIZE=P=
data set must include a variable named _CERTP_
that contains the certainty proportion for each stratum. The CERTSIZE=P= data set is a secondary input data set. See the
section Secondary Input Data Set for details. You can name only one secondary input data set in each invocation of the procedure.
If you want to specify a single certainty proportion for all strata, you can use the CERTSIZE=P=p option.
-
DATA=SAS-data-set
-
names the SAS data set from which PROC SURVEYSELECT selects the sample. If you omit the DATA= option, the procedure uses the
most recently created SAS data set. In sampling terminology, the input data set is the sampling frame (the list of units from which the sample is selected).
By default, the procedure uses input data set observations as sampling units and selects a sample of these units. Alternatively,
you can use the SAMPLINGUNIT statement to define sampling units as groups of observations (clusters).
-
JTPROBS
-
includes joint probabilities of selection in the OUT= output data set. This option is available for the following probability
proportional to size selection methods: METHOD=PPS, METHOD=PPS_SAMPFORD, and METHOD=PPS_WR. By default, PROC SURVEYSELECT outputs joint selection probabilities for METHOD=PPS_BREWER and METHOD=PPS_MURTHY, which select two units per stratum.
For details about computation of joint selection probabilities for a particular sampling method, see the method description
in the section Sample Selection Methods. For more information about the contents of the output data set, see the section Sample Output Data Set.
-
MAXSIZE
-
requests adjustment of size measures according to the stratum maximum size values provided in the secondary input data set.
Use the MAXSIZE option when you have already named the secondary input data set in another option, such as the SAMPSIZE=SAS-data-set option. See the section Secondary Input Data Set for details.
The MAXSIZE option is available when you use size measures for any PPS selection method and also include a STRATA statement. You provide size measures by specifying the SIZE statement or the PPS option in the SAMPLINGUNIT statement.
You provide the stratum maximum size values in the secondary input data set variable _MAXSIZE_
. Each maximum size value must be a positive number.
When a size measure exceeds the specified maximum value for its stratum, PROC SURVEYSELECT adjusts the size measure downward
to equal the maximum size value. If your sampling units are individual observations, the variable AdjustedSize
in the OUT= data set contains the adjusted size measures.
If you use a SAMPLINGUNIT statement to define sampling units (clusters), then the procedure applies the MAXSIZE adjustment to the sampling unit size.
The sampling unit size equals the number of observations in the sampling unit if you specify the PPS option, or the sum of the observation size measures if you specify a SIZE statement. The output data set variable UnitSize
contains the adjusted sampling unit size measures.
If you want to specify a single maximum size value for all strata, you can use the MAXSIZE=max option.
-
MAXSIZE=max
-
specifies the maximum size value. The value of max must be a positive number.
When a size measure exceeds the value max, PROC SURVEYSELECT adjusts the size measure downward to equal max. If your sampling units are individual observations, the variable AdjustedSize
in the OUT= data set contains the adjusted size measures.
If you use a SAMPLINGUNIT statement to define sampling units (clusters), then the procedure applies the MAXSIZE adjustment to the sampling unit size.
The sampling unit size equals the number of observations in the sampling unit if you specify the PPS option, or the sum of the observation size measures if you specify a SIZE statement. The output data set variable UnitSize
contains the adjusted sampling unit size measures.
The MAXSIZE=max option is available when you use size measures for any PPS selection method. You provide size measures by specifying the
SIZE statement or the PPS option in the SAMPLINGUNIT statement.
If you request a stratified sample design with the STRATA statement and specify the MAXSIZE=max option, PROC SURVEYSELECT uses the maximum size max for all strata. If you do not want to use the same maximum size for all strata, use the MAXSIZE=SAS-data-set option to specify a maximum size value for each stratum.
-
MAXSIZE=SAS-data-set
-
names a SAS data set that contains maximum size values for the strata. You provide the stratum maximum size values in the
MAXSIZE= data set variable _MAXSIZE_
. Each maximum size value must be a positive number.
The MAXSIZE=SAS-data-set option is available when you use size measures for any PPS selection method and also include a STRATA statement. You provide size measures by specifying the SIZE statement or the PPS option in the SAMPLINGUNIT statement.
When a size measure exceeds the maximum size value for its stratum, PROC SURVEYSELECT adjusts the size measure downward to
equal the maximum size value. If your sampling units are individual observations, the variable AdjustedSize
in the OUT= data set contains the adjusted size measures.
If you use a SAMPLINGUNIT statement to define sampling units (clusters), then the procedure applies the MAXSIZE adjustment to the sampling unit size.
The sampling unit size equals the number of observations in the sampling unit if you specify the PPS option, or the sum of the observation size measures if you specify a SIZE statement. The output data set variable UnitSize
contains the adjusted sampling unit size measures.
The MAXSIZE= input data set should contain all the STRATA variables, with the same type and length as in the DATA= data set.
The STRATA groups should appear in the same order in the MAXSIZE= data set as in the DATA= data set. The MAXSIZE= data set
must include a variable named _MAXSIZE_
that contains the maximum size value for each stratum. The MAXSIZE= data set is a secondary input data set. See the section
Secondary Input Data Set for details. You can name only one secondary input data set in each invocation of the procedure.
If you want to specify a single maximum size value for all strata, you can use the MAXSIZE=max option.
-
METHOD=name
M=name
-
specifies the method for sample selection.
If you do not specify the METHOD= option, PROC SURVEYSELECT uses simple random sampling (METHOD=SRS) by default unless you specify a SIZE statement or the PPS option in the SAMPLINGUNIT statement. If you do specify a SIZE statement (or the PPS option), PROC SURVEYSELECT uses probability proportional to size
selection without replacement (METHOD=PPS) by default.
The following values are available for the METHOD= option:
-
BERNOULLI
-
requests Bernoulli sampling, which consists of N independent selection trials, each with constant inclusion probability , where N is the total number of sampling units in the stratum or data set. The sample size is not fixed but is a random variable.
See the section Bernoulli Sampling for details.
When you specify METHOD=BERNOULLI, you must provide the sampling rate (inclusion probability ) by using the SAMPRATE= option. For stratified sampling (which you request with the STRATA statement), you can specify the same sampling rate for each stratum with the SAMPRATE=r option. Or you can specify different sampling rates for different strata by using the SAMPRATE=(values) or SAMPRATE=SAS-data-set option.
Because Bernoulli sampling is based on a specified inclusion probability instead of a fixed sample size, METHOD=BERNOULLI
does not use the SAMPSIZE= option. Also, the ALLOC= option in the STRATA statement (which allocates the total sample size among strata) is not available with METHOD=BERNOULLI.
-
POISSON
-
requests Poisson sampling. A generalization of Bernoulli sampling, Poisson sampling consists of N independent selection trials with a separate inclusion probability specified for each unit, where N is the total number of sampling units in the stratum or data set. The sample size is not fixed but is a random variable.
See the section Poisson Sampling for details.
You must provide inclusion probabilities for Poisson sampling in the SIZE variable. The probability values should be between 0 and 1. If a value of the SIZE variable is missing, nonpositive, or greater
than 1, PROC SURVEYSELECT omits the observation from sample selection.
Because Poisson sampling is based on specified inclusion probabilities instead of a fixed sample size, METHOD=POISSON does
not use the SAMPSIZE= option. Also, the ALLOC= option in the STRATA statement (which allocates the total sample size among strata) is not available with METHOD=POISSON.
The SAMPLINGUNIT statement is not available with METHOD=POISSON.
When METHOD=POISSON is specified with the SAMPRATE= option and without a SIZE statement, PROC SURVEYSELECT uses METHOD=BERNOULLI.
-
PPS
-
requests selection with probability proportional to size and without replacement. See the section PPS Sampling without Replacement for details. If you specify METHOD=PPS, you must name a size measure variable in the SIZE statement or specify the PPS option in the SAMPLINGUNIT statement.
-
PPS_BREWER | BREWER
-
requests selection according to Brewer’s method. Brewer’s method selects two units from each stratum with probability proportional
to size and without replacement. See the section Brewer’s PPS Method for details. If you specify METHOD=PPS_BREWER, you must name a size measure variable in the SIZE statement or specify the PPS option in the SAMPLINGUNIT statement. You do not need to specify the sample size with the SAMPSIZE= option because Brewer’s method selects two units from each stratum.
-
PPS_MURTHY | MURTHY
-
requests selection according to Murthy’s method. Murthy’s method selects two units from each stratum with probability proportional
to size and without replacement. See the section Murthy’s PPS Method for details. If you specify METHOD=PPS_MURTHY, you must name a size measure variable in the SIZE statement or specify the PPS option in the SAMPLINGUNIT statement. You do not need to specify the sample size with the SAMPSIZE= option because Murthy’s method selects two units from each stratum.
-
PPS_SAMPFORD | SAMPFORD
-
requests selection according to Sampford’s method. Sampford’s method selects units with probability proportional to size and
without replacement. See the section Sampford’s PPS Method for details. If you specify METHOD=PPS_SAMPFORD, you must name a size measure variable in the SIZE statement or specify the PPS option in the SAMPLINGUNIT statement.
-
PPS_SEQ | CHROMY
-
requests sequential selection with probability proportional to size and with minimum replacement. This method is also known
as Chromy’s method. See the section PPS Sequential Sampling for details. If you specify METHOD=PPS_SEQ, you must name a size measure variable in the SIZE statement or specify the PPS option in the SAMPLINGUNIT statement.
-
PPS_SYS
-
requests systematic selection with probability proportional to size. See the section PPS Systematic Sampling for details. If you specify METHOD=PPS_SYS, you must name a size measure variable in the SIZE statement or specify the PPS option in the SAMPLINGUNIT statement.
-
PPS_WR
-
requests selection with probability proportional to size and with replacement. See the section PPS Sampling with Replacement for details. If you specify METHOD=PPS_WR, you must name a size measure variable in the SIZE statement or specify the PPS option in the SAMPLINGUNIT statement.
-
SEQ
-
requests sequential selection according to Chromy’s method. If you specify METHOD=SEQ and do not specify a SIZE statement (or the PPS option in the SAMPLINGUNIT statement), PROC SURVEYSELECT uses sequential zoned selection with equal probability and without replacement. See the section
Sequential Random Sampling for details.
If you specify METHOD=SEQ and also specify a SIZE statement (or the PPS option in the SAMPLINGUNIT statement), PROC SURVEYSELECT uses METHOD=PPS_SEQ, which is sequential selection with probability proportional to size and
with minimum replacement. See the section PPS Sequential Sampling for more information.
-
SRS
-
requests simple random sampling, which is selection with equal probability and without replacement. See the section Simple Random Sampling for details. METHOD=SRS is the default if you do not specify the METHOD= option and also do not specify a SIZE statement (or the PPS option in the SAMPLINGUNIT statement).
-
SYS
-
requests systematic random sampling. If you specify METHOD=SYS and do not specify a SIZE statement (or the PPS option in the SAMPLINGUNIT statement), PROC SURVEYSELECT uses systematic selection with equal probability. See the section Systematic Random Sampling for more information.
If you specify METHOD=SYS and also specify a SIZE statement (or the PPS option in the SAMPLINGUNIT statement), PROC SURVEYSELECT uses METHOD=PPS_SYS, which is systematic selection with probability proportional to size. See
the section PPS Systematic Sampling for details.
-
URS
-
requests unrestricted random sampling, which is selection with equal probability and with replacement. See the section Unrestricted Random Sampling for details.
-
MINSIZE
-
requests adjustment of size measures according to the stratum minimum size values provided in the secondary input data set.
Use the MINSIZE option when you have already named the secondary input data set in another option, such as the SAMPSIZE=SAS-data-set option. See the section Secondary Input Data Set for details.
The MINSIZE option is available when you use size measures for any PPS selection method and also include a STRATA statement. You provide size measures by specifying the SIZE statement or the PPS option in the SAMPLINGUNIT statement.
You provide the stratum minimum size values in the secondary input data set variable _MINSIZE_
. Each minimum size value must be a positive number.
When a size measure is less than the specified minimum value for its stratum, PROC SURVEYSELECT adjusts the size measure upward
to equal the minimum size value. If your sampling units are individual observations, the variable AdjustedSize
in the OUT= data set contains the adjusted size measures.
If you use a SAMPLINGUNIT statement to define sampling units (clusters), then the procedure applies the MINSIZE adjustment to the sampling unit size.
The sampling unit size equals the number of observations in the sampling unit if you specify the PPS option, or the sum of the observation size measures if you specify a SIZE statement. The output data set variable UnitSize
contains the adjusted sampling unit size measures.
If you want to specify a single minimum size value for all strata, you can use the MINSIZE=min option.
-
MINSIZE=min
-
specifies the minimum size value. The value of min must be a positive number.
When a size measure is less than the value min, PROC SURVEYSELECT adjusts the size measure upward to equal min. If your sampling units are individual observations, the variable AdjustedSize
in the OUT= data set contains the adjusted size measures.
If you use a SAMPLINGUNIT statement to define sampling units (clusters), then the procedure applies the MINSIZE adjustment to the sampling unit size.
The sampling unit size equals the number of observations in the sampling unit if you specify the PPS option, or the sum of the observation size measures if you specify a SIZE statement. The output data set variable UnitSize
contains the adjusted sampling unit size measures.
The MINSIZE=min option is available when you use size measures for any PPS selection method. You provide size measures by specifying the
SIZE statement or the PPS option in the SAMPLINGUNIT statement.
If you request a stratified sample design with the STRATA statement and specify the MINSIZE=min option, PROC SURVEYSELECT uses the minimum size min for all strata. If you do not want to use the same minimum size for all strata, use the MINSIZE=SAS-data-set option to specify a minimum size value for each stratum.
-
MINSIZE=SAS-data-set
-
names a SAS data set that contains minimum size values for the strata. You provide the stratum minimum size values in the
MINSIZE= data set variable _MINSIZE_
. Each minimum size value must be a positive number.
The MINSIZE=SAS-data-set option is available when you use size measures for any PPS selection method and also include a STRATA statement. You provide size measures by specifying the SIZE statement or the PPS option in the SAMPLINGUNIT statement.
When a size measure is less than the minimum size value for its stratum, PROC SURVEYSELECT adjusts the size measure upward
to equal the minimum size measure. If your sampling units are individual observations, the variable AdjustedSize
in the OUT= data set contains the adjusted size measures.
If you use a SAMPLINGUNIT statement to define sampling units (clusters), then the procedure applies the MINSIZE adjustment to the sampling unit size.
The sampling unit size equals the number of observations in the sampling unit if you specify the PPS option, or the sum of the observation size measures if you specify a SIZE statement. The output data set variable UnitSize
contains the adjusted sampling unit size measures.
The MINSIZE= input data set should contain all the STRATA variables, with the same type and length as in the DATA= data set.
The STRATA groups should appear in the same order in the MINSIZE= data set as in the DATA= data set. The MINSIZE= data set
must include a variable named _MINSIZE_
that contains the minimum size measure for each stratum. The MINSIZE= data set is a secondary input data set. See the section
Secondary Input Data Set for details. You can name only one secondary input data set in each invocation of the procedure.
If you want to specify a single minimum size value for all strata, you can use the MINSIZE=min option.
-
NMAX=n
-
specifies the maximum stratum sample size n for the SAMPRATE= option. When you specify the SAMPRATE= option, PROC SURVEYSELECT calculates the stratum sample size by multiplying the total
number of units in the stratum by the specified sampling rate. If this sample size is greater than the value NMAX=n, then PROC SURVEYSELECT selects only n units.
The maximum sample size n must be a positive integer. The NMAX= option is available only with the SAMPRATE= option, which can be used with equal probability
selection methods (METHOD=SRS, METHOD=URS, METHOD=SYS, and METHOD=SEQ). The NMAX= option is not available with METHOD=BERNOULLI, where the SAMPRATE= option specifies the constant inclusion probability.
-
NMIN=n
-
specifies the minimum stratum sample size n for the SAMPRATE= option. When you specify the SAMPRATE= option, PROC SURVEYSELECT calculates the stratum sample size by multiplying the total
number of units in the stratum by the specified sampling rate. If this sample size is less than the value NMIN=n, then PROC SURVEYSELECT selects n units.
The minimum sample size n must be a positive integer. The NMIN= option is available only with the SAMPRATE= option, which can be used with equal probability
selection methods (METHOD=SRS, METHOD=URS, METHOD=SYS, and METHOD=SEQ). The NMIN= option is not available with METHOD=BERNOULLI, where the SAMPRATE= option specifies the constant inclusion probability.
-
NOPRINT
-
suppresses the display of all output. You can use the NOPRINT option when you want only to create an output data set. Note
that this option temporarily disables the Output Delivery System (ODS). For more information, see Chapter 20: Using the Output Delivery System.
-
OUT=SAS-data-set
-
names the output data set that contains the sample. If you omit the OUT= option, the data set is named DATAn
, where n is the smallest integer that makes the name unique.
The output data set contains the units that are selected for the sample, in addition to design information and selection statistics,
depending on the selection method and output options that you request. See descriptions of the options JTPROBS, OUTALL, OUTHITS, OUTSEED, OUTSIZE, and STATS, which specify information to include in the output data set. See the section Sample Output Data Set for details about the contents of the output data set.
By default, the output data set contains only those units that are selected for the sample. To include all observations from
the input data set in the output data set, use the OUTALL option.
By default, the output data set includes one copy of each selected unit, even when a unit is selected more than once, which
can occur when you use with-replacement or with-minimum-replacement selection methods. For with-replacement or with-minimum-replacement
selection methods, the output data set includes a variable NumberHits
that records the number of hits (selections) for each unit. To include a distinct copy of each selection in the output data
set when the same unit is selected more than once, use the OUTHITS option.
If you specify the NOSAMPLE option in the STRATA statement, PROC SURVEYFREQ allocates the total sample size among the strata but does not select the
sample. In this case, the OUT= data set contains the allocated sample sizes. See the section Allocation Output Data Set for details.
-
OUTALL
-
includes all observations from the DATA= input data set in the OUT= output data set. By default, the output data set includes only those units selected for the sample. When you specify the
OUTALL option, the output data set includes all observations from the input data set and also contains a variable that indicates
each observation’s selection status. The variable Selected
equals 1 for an observation that is selected for the sample, and equals 0 for an observation that is not selected. For information
about the contents of the output data set, see the section Sample Output Data Set.
The OUTALL option is available for equal probability selection methods (METHOD=SRS, METHOD=URS, METHOD=SYS, METHOD=SEQ, and METHOD=BERNOULLI). The OUTALL option is also available for METHOD=POISSON.
-
OUTHITS
-
includes a distinct copy of each selected unit in the OUT= output data set when the same sampling unit is selected more than once. By default, the output data set contains a single
copy of each unit selected, even when a unit is selected more than once, and the variable NumberHits
records the number of hits (selections) for each unit. If you specify the OUTHITS option, the output data set contains m copies of a sampling unit for which NumberHits
equals m. For example, with the OUTHITS option a unit that is selected three times is represented by three copies in the output data
set.
A sampling unit can be selected more than once by with-replacement and with-minimum-replacement selection methods, which include
METHOD=URS, METHOD=PPS_WR, METHOD=PPS_SYS, and METHOD=PPS_SEQ. The OUTHITS option is available for these selection methods.
See the section Sample Output Data Set for details about the contents of the output data set.
-
OUTSEED
-
includes the initial seed for each stratum in the OUT= output data set. The variable InitialSeed
contains the stratum initial seeds. See the section Sample Output Data Set for details about the contents of the output data set.
To reproduce the same sample for any stratum in a subsequent execution of PROC SURVEYSELECT, you can specify the same stratum
initial seed with the SEED=SAS-data-set option, along with the same sample selection parameters. See the section Random Number Generation for more information.
The “Sample Selection Summary” table displays the initial random number seed for the entire sample selection, which is the same as the initial seed for
the first stratum when the design is stratified. To reproduce the entire sample, you can specify this same seed value in the
SEED= option, along with the same sample selection parameters.
Beginning in SAS/STAT 12.1, PROC SURVEYSELECT uses the Mersenne-Twister random number generator by default. In previous releases,
PROC SURVEYSELECT used the RANUNI random number generator, which you can now request by specifying the RANUNI option. To reproduce samples that PROC SURVEYSELECT selected in releases prior to SAS/STAT 12.1, specify the RANUNI option
with the SEED= option (for the same input data set and sample selection parameters).
-
OUTSIZE
-
includes additional design and sampling frame information in the OUT= output data set.
If you use a STRATA statement, the OUTSIZE option provides stratum-level values in the output data set. Otherwise, the OUTSIZE option provides
overall values.
The OUTSIZE option includes the sample size or sampling rate in the output data set, depending on whether you specify the
SAMPSIZE= option or the SAMPRATE= option, respectively. For PPS selection methods, the OUTSIZE option includes the total size measure in the output data set.
If you do not provide size measures, or if you specify a SAMPLINGUNIT statement, the OUTSIZE option includes the total number of sampling units.
If you request size measure adjustment or certainty selection, the OUTSIZE option includes the following information in the
output data set: the minimum size measure if you specify the MINSIZE= option, the maximum size measure if you specify the MAXSIZE= option, the certainty size measure if you specify the CERTSIZE= option, and the certainty proportion if you specify the CERTSIZE=P= option.
For METHOD=BERNOULLI, the OUTSIZE option includes the following information in the output data set: total number of sampling units, selection
probability (sampling rate), expected sample size, and actual sample size. See the section Bernoulli Sampling for descriptions of these statistics.
For more information about the contents of the output data set, see the section Sample Output Data Set.
-
OUTSORT=SAS-data-set
-
names an output data set to store the sorted input data set. This option is available when you specify a CONTROL statement to sort the DATA= input data set for systematic or sequential selection methods (METHOD=SYS, METHOD=PPS_SYS, METHOD=SEQ, and METHOD=PPS_SEQ).
If you specify CONTROL variables but do not name an output data set with the OUTSORT= option, then the sorted data set replaces
the input data set.
-
RANUNI
-
requests uniform random number generation by the method of Fishman and Moore (1982), which PROC SURVEYSELECT used in releases prior to SAS/STAT 12.1. This is the same random number generator that the RANUNI
function provides.
Beginning in SAS/STAT 12.1, PROC SURVEYSELECT uses the Mersenne-Twister random number generator by default. Developed by Matsumoto
and Nishimura (1998), the Mersenne-Twister random number generator has a very long period and good statistical properties. This is the random
number generator that the RAND function provides for the uniform distribution.
See the section Random Number Generation for details, and see
SAS Functions and CALL Routines: Reference for information about the RANUNI and RAND functions.
You can specify the RANUNI option with the SEED= option to reproduce samples that PROC SURVEYSELECT selected in releases prior to SAS/STAT 12.1. To reproduce a sample by
using the RANUNI and SEED= options, you must also specify the same input data set and sample selection parameters.
-
REPS=nreps
-
specifies the number of sample replicates. The value of nreps must be a positive integer.
When you specify the REPS= option, PROC SURVEYSELECT selects nreps independent samples, each with the same sample size or sampling rate and the same sample design that you request. The variable
Replicate
in the OUT= data set contains the sample replicate number.
You can use replicated sampling to provide a simple method of variance estimation for any form of statistic, and also to evaluate
variable nonsampling errors such as interviewer differences. For information about replicated sampling, see Lohr (2010); Wolter (2007); Kish (1965, 1987); Kalton (1983). You can also use the REPS= option to perform a variety of other resampling and simulation tasks. See Cassell (2007) for more information.
-
SAMPRATE=r
RATE=r
-
specifies the sampling rate, which is the proportion of units to select for the sample. The sampling rate r must be a positive number. You can specify r as a number between 0 and 1. Or you can specify r in percentage form as a number between 1 and 100, and PROC SURVEYSELECT converts that number to a proportion. The procedure
treats the value 1 as 100% instead of 1%.
The SAMPRATE= option is available only for equal probability selection methods (METHOD=SRS, METHOD=URS, METHOD=SYS, METHOD=SEQ, and METHOD=BERNOULLI). For systematic random sampling (METHOD=SYS), PROC SURVEYSELECT uses the inverse of the sampling rate r as the selection interval. See the section Systematic Random Sampling for details. For Bernoulli sampling (METHOD=BERNOULLI), PROC SURVEYSELECT uses the sampling rate r as the inclusion probability. See the section Bernoulli Sampling for details. For the other equal probability selection methods, PROC SURVEYSELECT converts the sampling rate r to the sample size before selection by multiplying the total number of units in the stratum or frame by the sampling rate
and rounding up to the nearest integer.
If you request a stratified sample design with the STRATA statement and specify the SAMPRATE=r option, PROC SURVEYSELECT uses the sampling rate r for each stratum. If you do not want to use the same sampling rate for each stratum, use the SAMPRATE=(values) option or the SAMPRATE=SAS-data-set option to specify a sampling rate for each stratum.
-
SAMPRATE=(values)
RATE=(values)
-
specifies stratum sampling rates, where the stratum sampling rate is the proportion of units to select from the stratum. You
can separate values with blanks or commas. The number of SAMPRATE= values must equal the number of strata in the input data set.
List the stratum sampling rate values in the order in which the strata appear in the input data set. When you use the SAMPRATE=(values) option, the input data set must be sorted by the STRATA variables in ascending order. You cannot use the DESCENDING or NOTSORTED
option in the STRATA statement.
Each stratum sampling rate value must be a nonnegative. You can specify a rate value as a number between 0 and 1. Or you can
specify a rate value in percentage form as a number between 1 and 100, and PROC SURVEYSELECT converts that number to a proportion.
The procedure treats the value 1 as 100% instead of 1%.
To select a sample from a stratum, the value of the stratum sampling rate must be positive. If you specify a stratum sampling
rate of 0, then PROC SURVEYSELECT does not select a sample from the stratum. This has the effect of subsetting the input data
set before sample selection; the stratum that you omit is not included in the sampling frame or represented in the sample.
The SAMPRATE= option is available only for equal probability selection methods (METHOD=SRS, METHOD=URS, METHOD=SYS, METHOD=SEQ, and METHOD=BERNOULLI). For systematic random sampling (METHOD=SYS), PROC SURVEYSELECT uses the inverse of the stratum sampling rate as the stratum
selection interval. See the section Systematic Random Sampling for details. For Bernoulli sampling (METHOD=BERNOULLI), PROC SURVEYSELECT uses the stratum sampling rate as the inclusion
probability for the stratum. See the section Bernoulli Sampling for details. For the other equal probability selection methods, PROC SURVEYSELECT converts the stratum sampling rate to the
stratum sample size before selection by multiplying the total number of units in the stratum by the sampling rate and rounding
up to the nearest integer.
-
SAMPRATE=SAS-data-set
RATE=SAS-data-set
-
names a SAS data set that contains stratum sampling rates, where the stratum sampling rate is the proportion of units to select
from the stratum. The SAMPRATE= data set should include a variable _RATE_
that contains the stratum sampling rates.
Each sampling rate value must be a nonnegative number. You can specify a rate value as a number between 0 and 1. Or you can
specify a rate value in percentage form as a number between 1 and 100, and PROC SURVEYSELECT converts that number to a proportion.
The procedure treats the value 1 as 100% instead of 1%.
To select a sample from a stratum, the value of the stratum sampling rate must be positive. If you specify a stratum sampling
rate of 0, then PROC SURVEYSELECT does not select a sample from the stratum. This has the effect of subsetting the input data
set before sample selection; the stratum that you omit is not included in the sampling frame or represented in the sample.
The SAMPRATE= input data set should contain all the STRATA variables, with the same type and length as in the DATA= data set.
The STRATA groups should appear in the same order in the SAMPRATE= data set as in the DATA= data set.
The SAMPRATE= option is available only for equal probability selection methods (METHOD=SRS, METHOD=URS, METHOD=SYS, METHOD=SEQ, and METHOD=BERNOULLI). For systematic random sampling (METHOD=SYS), PROC SURVEYSELECT uses the inverse of the stratum sampling rate as the stratum
selection interval. See the section Systematic Random Sampling for details. For Bernoulli sampling (METHOD=BERNOULLI), PROC SURVEYSELECT uses the stratum sampling rate as the inclusion
probability for the stratum. See the section Bernoulli Sampling for details. For the other equal probability selection methods, PROC SURVEYSELECT converts the stratum sampling rate to the
stratum sample size before selection by multiplying the total number of units in the stratum by the sampling rate and rounding
up to the nearest integer.
-
SAMPSIZE=n
N=n
-
specifies the sample size, which is the number of units to select for the sample. The sample size n must be a positive integer. For selection methods that select without replacement, the sample size n must not exceed the number of units in the input data set.
If you do not specify a SAMPLINGUNIT statement, then your sampling units are observations, and PROC SURVEYSELECT selects n observations. If you use a SAMPLINGUNIT statement to define sampling units as groups of observations (clusters), then the
procedure selects n clusters.
If you specify the SAMPSIZE=n option and request stratified selection with the STRATA statement, PROC SURVEYSELECT selects n units from each stratum unless you also specify the ALLOC= option in the STRATA statement to allocate the total sample size among the strata.
If you specify the ALLOC= option in the STRATA statement and the SAMPSIZE=n option, PROC SURVEYSELECT allocates the total sample size n among the strata according to the allocation method that you request. See the section Sample Size Allocation for details. If you specify the MARGIN= option with the ALLOC= option in the STRATA statement, PROC SURVEYSELECT determines the stratum sample sizes that provide
the requested margin of error for the allocation. Therefore, you cannot use the SAMPSIZE= option with the MARGIN= option.
For methods that select without replacement, the sample size n must not exceed the number of units in any stratum. If you do not want to select the same number of units from each stratum,
use the SAMPSIZE=(values) option or the SAMPSIZE=SAS-data-set option to specify a sample size for each stratum.
For without-replacement selection methods, by default, PROC SURVEYSELECT does not allow you to specify a stratum sample size
that is greater than the total number of units available in the stratum. If you specify the SELECTALL option, PROC SURVEYSELECT selects all stratum units when the stratum sample size exceeds the number of units in the stratum.
-
SAMPSIZE=(values)
N=(values)
-
specifies stratum sample sizes, where the stratum sample size is the number of units to select from the stratum. You can separate
values with blanks or commas. The number of SAMPSIZE= values must equal the number of strata in the input data set.
List the stratum sample size values in the order in which the strata appear in the input data set. When you use the SAMPSIZE=(values) option, the input data set must be sorted by the STRATA variables in ascending order. You cannot use the DESCENDING or NOTSORTED
option in the STRATA statement.
Each stratum sample size value must be a nonnegative integer. To select a sample from a stratum, the value of the stratum
sample size must be positive. If you specify a stratum sample size of 0, then PROC SURVEYSELECT does not select a sample from
the stratum. This has the effect of subsetting the input data set before sample selection; the stratum that you omit is not
included in the sampling frame or represented in the sample.
For without-replacement selection methods, by default, PROC SURVEYSELECT does not allow you to specify a stratum sample size
that is greater than the total number of units available in the stratum. If you specify the SELECTALL option, PROC SURVEYSELECT selects all stratum units when the stratum sample size exceeds the number of units in the stratum.
-
SAMPSIZE=SAS-data-set
N=SAS-data-set
-
names a SAS data set that contains stratum sample sizes, where the stratum sample size is the number of units to select from
the stratum. The SAMPSIZE= input data set should include a variable named _NSIZE_
or SampleSize
that contains the stratum sample sizes.
Each stratum sample size value must be a nonnegative integer. To select a sample from a stratum, the value of the stratum
sample size must be positive. If you specify a stratum sample size of 0, then PROC SURVEYSELECT does not select a sample from
the stratum. This has the effect of subsetting the input data set before sample selection; the stratum that you omit is not
included in the sampling frame or represented in the sample.
The SAMPSIZE= input data set should contain all the STRATA variables, with the same type and length as in the DATA= data set.
The STRATA groups should appear in the same order in the SAMPSIZE= data set as in the DATA= data set. The SAMPSIZE= data set
is a secondary input data set. See the section Secondary Input Data Set for details. You can name only one secondary input data set in each invocation of the procedure.
For without-replacement selection methods, by default, PROC SURVEYSELECT does not allow you to specify a stratum sample size
that is greater than the total number of units available in the stratum. If you specify the SELECTALL option, PROC SURVEYSELECT selects all stratum units when the stratum sample size exceeds the number of units in the stratum.
-
SEED
-
indicates that stratum-level initial seeds are included in the secondary input data set. Use the SEED option when you have
already named the secondary input data set in another option, such as the SAMPSIZE=SAS-data-set option. See the section Secondary Input Data Set for details. You can name only one secondary input data set in each invocation of the procedure.
You provide the stratum initial seeds in the secondary input data set variable named _SEED_
or InitialSeed
. The initial seeds must be positive integers.
See the description of the SEED=SAS-data-set option for more information about initial seeds for random number generation.
-
SEED=number
-
specifies the initial seed for random number generation. The SEED= value must be a positive integer. If you do not specify
the SEED= option, or if the SEED= value is negative or 0, PROC SURVEYSELECT uses the time of day from the computer’s clock
to obtain the initial seed. See the section Random Number Generation for more information.
If you request a stratified sample design with the STRATA statement, you can use the SEED=SAS-data-set option to specify an initial seed for each stratum. Otherwise, PROC SURVEYSELECT generates random numbers continuously across
strata from the random number stream initialized by the SEED= value.
You can use the OUTSEED option to include the stratum initial seeds in the output data set.
Whether or not you specify the SEED= option, PROC SURVEYSELECT displays the value of the initial seed in the “Sample Selection Summary” table. If you need to reproduce the same sample in a subsequent execution of PROC SURVEYSELECT, you can specify this same
seed value in the SEED= option, along with the same sample selection parameters, and PROC SURVEYSELECT will reproduce the
sample.
Beginning in SAS/STAT 12.1, PROC SURVEYSELECT uses the Mersenne-Twister random number generator by default. In previous releases,
PROC SURVEYSELECT used the RANUNI random number generator, which you can now request by specifying the RANUNI option. To reproduce samples that PROC SURVEYSELECT selected in releases prior to SAS/STAT 12.1, use the RANUNI option with the SEED= option (for the same input data set and sample selection parameters).
-
SEED=SAS-data-set
-
names a SAS data set that contains initial seeds for the strata. You provide the stratum seeds in the SEED= input data set
variable _SEED_
or InitialSeed
.
The initial seed values must be positive integers. If the initial seed value for the first stratum is not a positive integer,
PROC SURVEYSELECT uses the time of day from the computer’s clock to obtain the initial seed. If the initial seed value for
a subsequent stratum is not a positive integer, PROC SURVEYSELECT continues to use the random number stream already initialized
by the seed for the previous stratum. See the section Sample Selection Methods for more information.
The SEED= input data set should contain all the STRATA variables, with the same type and length as in the DATA= data set.
The STRATA groups should appear in the same order in the SEED= data set as in the DATA= data set. The SEED= data set is a
secondary input data set. See the section Secondary Input Data Set for details. You can name only one secondary input data set in each invocation of the procedure.
You can use the OUTSEED option to include the stratum initial seeds in the output data set.
Whether or not you specify the SEED= option, PROC SURVEYSELECT displays the value of the initial seed in the “Sample Selection Summary” table. If you need to reproduce the same sample in a subsequent execution of PROC SURVEYSELECT, you can specify this same
seed value in the SEED= option, along with the same sample selection parameters, and PROC SURVEYSELECT will reproduce the
sample.
If you specify initial seeds by strata with the SEED=SAS-data-set option, you can reproduce the same sample in a subsequent execution of PROC SURVEYSELECT by specifying these same stratum
initial seeds, along with the same sample selection parameters. If you need to reproduce the same sample for only a subset
of the strata, you can use the same initial seeds for those strata in the subset.
Beginning in SAS/STAT 12.1, PROC SURVEYSELECT uses the Mersenne-Twister random number generator by default. In previous releases,
PROC SURVEYSELECT used the RANUNI random number generator, which you can now request by specifying the RANUNI option. To reproduce samples that PROC SURVEYSELECT selected in releases prior to SAS/STAT 12.1, use the RANUNI option with the SEED= option (for the same input data set and sample selection parameters).
-
SELECTALL
-
requests that PROC SURVEYSELECT select all stratum units when the stratum sample size exceeds the total number of units in
the stratum. By default, PROC SURVEYSELECT does not allow you to specify a stratum sample size that is greater than the total
number of units in the stratum, unless you are using a with-replacement selection method.
The SELECTALL option is available for the following without-replacement selection methods: METHOD=SRS, METHOD=SYS, METHOD=SEQ, METHOD=PPS, and METHOD=PPS_SAMPFORD.
The SELECTALL option is not available for with-replacement selection methods, with-minimum-replacement methods, or those PPS
methods that select two units per stratum.
-
SORT=NEST | SERP
-
specifies the type of sorting by CONTROL variables. The option SORT=NEST requests nested sorting, and SORT=SERP requests hierarchic
serpentine sorting. The default is SORT=SERP. See the section Sorting by CONTROL Variables for descriptions of serpentine and nested sorting. Where there is only one CONTROL variable, the two types of sorting are
equivalent.
The SORT= option is available when you specify a CONTROL statement for systematic or sequential selection methods (METHOD=SYS, METHOD=PPS_SYS, METHOD=SEQ, and METHOD=PPS_SEQ). When you specify a CONTROL statement, PROC SURVEYSELECT sorts the input data set by the CONTROL variables within strata
before selecting the sample.
The SORT= option and the CONTROL statement are not available with a SAMPLINGUNIT statement. See the descriptions of the CONTROL and SAMPLINGUNIT statements for more information.
When you specify a CONTROL statement, you can also use the OUTSORT= option to name an output data set that contains the sorted input data set. Otherwise, if you do not specify the OUTSORT=
option, the sorted data set replaces the input data set.
-
STATS
-
includes the selection probability and sampling weight in the OUT= output data set for equal probability selection methods
when you do not specify a STRATA statement. By default, the output data set does not include these values for equal probability selection methods unless you
specify a STRATA statement. The STATS option applies to the following selection methods: METHOD=SRS, METHOD=URS, METHOD=SYS, METHOD=SEQ, and METHOD=BERNOULLI.
In addition to the selection probability and sampling weight, the STATS option includes the following statistics in the output
data set for METHOD=BERNOULLI: total number of sampling units, expected sample size, actual sample size, and adjusted sampling weight. See the section
Bernoulli Sampling for more information.
For PPS selection methods, the output data set contains selection probabilities and sampling weights by default. The STATS
option has no effect for PPS methods.
For more information about the contents of the output data set, see the section Sample Output Data Set.