-
DATA=SAS-data-set
-
names the SAS-data-set to be analyzed by PROC SURVEYFREQ. If you omit the DATA= option, the procedure uses the most recently created SAS data set.
-
MISSING
-
treats missing values as a valid (nonmissing) category for all categorical variables, which include TABLES
, STRATA
, and CLUSTER
variables.
By default, if you do not specify the MISSING option, an observation is excluded from the analysis if it has a missing value
for any STRATA
or CLUSTER
variable. Additionally, PROC SURVEYFREQ excludes an observation from a frequency or crosstabulation table if that observation
has a missing value for any of the variables in the table request, unless you specify the MISSING option. For more information,
see the section Missing Values.
-
NOMCAR
-
includes observations with missing values of TABLES
variables in the variance computation as not missing completely at random (NOMCAR) for Taylor series variance estimation. When you specify the NOMCAR option, PROC SURVEYFREQ computes variance estimates
by analyzing the nonmissing values as a domain (subpopulation), where the entire population includes both nonmissing and missing
domains. For more information, see the section Missing Values.
By default, PROC SURVEYFREQ completely excludes an observation from a frequency or crosstabulation table (and the corresponding
variance computations) if that observation has a missing value for any of the variables in the table request, unless you specify
the MISSING
option. The NOMCAR option has no effect when you specify the MISSING option, which treats missing values as a valid nonmissing
level.
The NOMCAR option applies only to Taylor series variance estimation. The replication methods, which you can request by specifying
the VARMETHOD=BRR
and VARMETHOD=JACKKNIFE
options, do not use the NOMCAR option.
-
NOSUMMARY
-
suppresses the display of the "Data Summary" table, which PROC SURVEYFREQ produces by default. For information about this
table, see the section Data Summary Table.
-
ORDER=DATA | FORMATTED | FREQ | INTERNAL
-
specifies the order of the variable levels in the frequency and crosstabulation tables, which you request in the TABLES
statement. The ORDER= option also controls the order of the STRATA
variable levels in the "Stratum Information" table.
The ORDER= option can take the following values:
ORDER=
|
Levels Ordered By
|
DATA
|
Order of appearance in the input data set
|
FORMATTED
|
External formatted value, except for numeric variables with no explicit format, which are sorted by their unformatted (internal)
value
|
FREQ
|
Descending frequency count; levels with the most observations come first in the order
|
INTERNAL
|
Unformatted value
|
By default, ORDER=INTERNAL. The FORMATTED and INTERNAL orders are machine-dependent. The frequency count used by ORDER=FREQ
is the nonweighted frequency (sample size), rather than the weighted frequency.
For more information about sort order, see the chapter on the SORT procedure in the
Base SAS Procedures Guide and the discussion of BY-group processing in
SAS Language Reference: Concepts.
-
PAGE
-
displays only one table per page. Otherwise, PROC SURVEYFREQ displays multiple tables per page as space permits.
-
RATE=value | SAS-data-set
R=value | SAS-data-set
-
specifies the sampling rate, which PROC SURVEYFREQ uses to compute a finite population correction for Taylor series variance estimation. You can provide a single sampling rate value, or you can provide stratum sampling rates by specifying a SAS-data-set.
If your sample design has multiple stages, you should specify the first-stage sampling rate, which is the ratio of the number of primary sampling units (PSUs) in the sample to the total number of PSUs in the population.
For a nonstratified sample design, or for a stratified sample design that uses the same sampling rate in all strata, you should
specify a single sampling rate value. If your design is stratified and uses different sampling rates in different strata, you should name a SAS-data-set that contains the stratification variables and the stratum sampling rates. You should provide the stratum sampling rates
in the data set variable named _RATE_
. For more information, see the section Population Totals and Sampling Rates.
The sampling rate values must be nonnegative numbers. You can specify sampling rates as numbers between 0 and 1. Or you can
specify sampling rates in percentage form as numbers between 1 and 100, which PROC SURVEYFREQ converts to proportions. The
procedure treats the value 1 as 100% instead of 1%.
If you do not specify the RATE= or the TOTAL=
option, the Taylor series variance estimation does not include a finite population correction. You cannot specify both the
RATE= and the TOTAL=
option in the same PROC SURVEYFREQ statement.
PROC SURVEYSELECT does not use the RATE= or the TOTAL= option for BRR or jackknife variance estimation (which you can request
by specifying the VARMETHOD=BRR
or VARMETHOD=JACKKNIFE
option, respectively).
-
TOTAL=value | SAS-data-set
N=value | SAS-data-set
-
specifies the total number of primary sampling units (PSUs), which PROC SURVEYFREQ uses to compute a finite population correction for Taylor series variance estimation. You can provide
a single total value, or you can provide stratum totals by specifying a SAS-data-set. The totals must be positive numbers.
If your sample design has multiple stages, you should specify the total number of primary sampling units (PSUs).
For a nonstratified sample design, you should specify a single total value, which refers to the total number of PSUs in the population. For a stratified sample design that has the same population
total in each stratum, you can specify a single total value, which refers to the total number of PSUs in each stratum. If your design is stratified and has different totals in different
strata, you should name a SAS-data-set that contains the stratification variables and the stratum totals. You should provide the stratum totals in the data set
variable named _TOTAL_
. For more information, see the section Population Totals and Sampling Rates.
If you do not specify the RATE=
or the TOTAL= option, the Taylor series variance estimation does not include a finite population correction. You cannot specify
both the RATE=
and the TOTAL= option in the same PROC SURVEYFREQ statement.
PROC SURVEYSELECT does not use the RATE= or the TOTAL= option for BRR or jackknife variance estimation (which you can request
by specifying the VARMETHOD=BRR
or VARMETHOD=JACKKNIFE
option, respectively).
-
VARHEADER=LABEL | NAME | NAMELABEL
-
specifies the variable identification to use in the displayed output. By default VARHEADER=NAME, which displays variable names
in the output. The VARHEADER= option affects the headers of the variable level columns in one-way frequency tables, crosstabulation
tables, and the "Stratum Information" table. The VARHEADER= option also controls variable identification in the table headers.
The VARHEADER= option can take the following values:
VARHEADER=
|
Variable Identification Displayed
|
LABEL
|
Variable label
|
NAME
|
Variable name
|
NAMELABEL
|
Variable name and label, as Name (Label)
|
-
VARMETHOD=BRR < (method-options)>
VARMETHOD=JACKKNIFE | JK < (method-options)>
VARMETHOD=TAYLOR
-
specifies the variance estimation method. VARMETHOD=TAYLOR requests the Taylor series method, which is the default if you do not specify the VARMETHOD= option or the REPWEIGHTS
statement. VARMETHOD=BRR requests variance estimation by balanced repeated replication (BRR), and VARMETHOD=JACKKNIFE requests
variance estimation by the delete-1 jackknife method.
For VARMETHOD=BRR and VARMETHOD=JACKKNIFE, you can specify method-options in parentheses after the variance method name. For example:
varmethod=BRR(reps=60 outweights=myReplicateWeights)
Table 97.2 summarizes the available method-options.
Table 97.2: Variance Estimation Options
You can specify the following values for the VARMETHOD= option:
-
BRR < (method-options)>
-
requests variance estimation by balanced repeated replication (BRR). The BRR method requires a stratified sample design that
has two primary sampling units (PSUs) in each stratum. If you specify this option, you must also specify a STRATA
statement unless you use a REPWEIGHTS
statement to provide replicate weights. For more information, see the section Balanced Repeated Replication (BRR).
You can specify the following method-options:
-
DFADJ
-
computes the degrees of freedom as the number of nonmissing strata for the individual table request. If you specify this option,
PROC SURVEYFREQ does not count any empty strata that occur when observations that have missing values of the TABLES
variables are removed from the analysis of the table. By default, PROC SURVEYFREQ computes the degrees of freedom by counting
the number of nonmissing strata for all valid observations in the input data set.
For more information, see the section Degrees of Freedom. For information about valid observations, see the section Data Summary Table.
This method-option has no effect when you specify the MISSING
option, which treats missing values as a valid nonmissing level.
This method-option is not used when you specify the degrees of freedom in the DF=
option in the TABLES statement or when you specify a REPWEIGHTS
statement to provide replicate weights. When you specify a REPWEIGHTS
statement, the degrees of freedom are the number of REPWEIGHTS variables (replicates) unless you specify the DF=
option in the REPWEIGHTS or the TABLES statement.
-
FAY < =value >
-
requests Fay’s method, which is a modification of the BRR method. For more information, see the section Fay’s BRR Method.
You can specify the value of the Fay coefficient, which is used in converting the original sampling weights to replicate weights. The Fay coefficient
must be a nonnegative number less than 1. By default, the Fay coefficient is 0.5.
-
HADAMARD=SAS-data-set
H=SAS-data-set
-
names a SAS-data-set that contains the Hadamard matrix for BRR replicate construction. If you do not specify this method-option, PROC SURVEYFREQ generates an appropriate Hadamard matrix for replicate construction. For more information, see the sections
Balanced Repeated Replication (BRR) and Hadamard Matrix.
If a Hadamard matrix of a particular dimension exists, it is not necessarily unique. Therefore, if you want to use a specific
Hadamard matrix, you must provide the matrix as a SAS-data-set in this method-option.
In the HADAMARD= input data set, each variable corresponds to a column and each observation corresponds to a row of the Hadamard
matrix. You can use any variable names in the HADAMARD= data set. All values in the data set must equal either 1 or –1. You
must ensure that the matrix you provide is indeed a Hadamard matrix—that is, , where is the Hadamard matrix of dimension R and is an identity matrix. PROC SURVEYFREQ does not check the validity of the Hadamard matrix that you provide.
The HADAMARD= input data set must contain at least H variables, where H denotes the number of first-stage strata in your design. If the data set contains more than H variables, PROC SURVEYFREQ uses only the first H variables. Similarly, the HADAMARD= input data set must contain at least H observations.
If you do not specify the REPS=
method-option, the number of replicates is assumed to be the number of observations in the HADAMARD= input data set. If you specify the
number of replicates—for example, REPS=nreps—the first nreps observations in the HADAMARD= data set are used to construct the replicates.
You can specify the PRINTH
method-option to display the Hadamard matrix that PROC SURVEYFREQ uses to construct replicates for BRR.
-
OUTWEIGHTS=SAS-data-set
-
names a SAS-data-set to store the replicate weights that PROC SURVEYFREQ creates for BRR variance estimation. For information about replicate
weights, see the section Balanced Repeated Replication (BRR). For information about the contents of the OUTWEIGHTS= data set, see the section Replicate Weight Output Data Set.
The OUTWEIGHTS= method-option is not available when you provide replicate weights in a REPWEIGHTS
statement.
-
PRINTH
-
displays the Hadamard matrix that PROC SURVEYFREQ uses to construct replicates for BRR variance estimation. When you provide
the Hadamard matrix in the HADAMARD=
method-option, PROC SURVEYFREQ displays only the rows and columns that are actually used to construct replicates. For more information,
see the sections Balanced Repeated Replication (BRR) and Hadamard Matrix.
The PRINTH method-option is not available when you provide replicate weights in a REPWEIGHTS
statement because the procedure does not use a Hadamard matrix in this case.
-
REPS=number
-
specifies the number of replicates for BRR variance estimation. The value of number must be an integer greater than 1.
If you do not use the HADAMARD=
method-option to provide a Hadamard matrix, the number of replicates should be greater than the number of strata and should be a multiple
of 4. For more information, see the section Balanced Repeated Replication (BRR). If PROC SURVEYFREQ cannot construct a Hadamard matrix for the REPS= value that you specify, the value is increased until
a Hadamard matrix of that dimension can be constructed. Therefore, the actual number of replicates that PROC SURVEYFREQ uses
might be larger than number.
If you use the HADAMARD=
method-option to provide a Hadamard matrix, the value of number must not be less than the number of rows in the Hadamard matrix. If you provide a Hadamard matrix and do not specify the
REPS= method-option, the number of replicates equals the number of rows in the Hadamard matrix.
If you do not specify the REPS= or the HADAMARD=
method-option and do not use a REPWEIGHTS
statement, the number of replicates equals the smallest multiple of 4 that is greater than the number of strata.
If you use a REPWEIGHTS
statement to provide replicate weights, PROC SURVEYFREQ does not use the REPS= method-option; the number of replicates equals the number of REPWEIGHTS variables.
-
JACKKNIFE < (method-options)>
JK < (method-options)>
-
requests variance estimation by the delete-1 jackknife method. For more information, see the section The Jackknife Method. If you use a REPWEIGHTS
statement to provide replicate weights, VARMETHOD=JACKKNIFE is the default variance estimation method.
The delete-1 jackknife method requires at least two primary sampling units (PSUs) in each stratum for stratified designs unless
you use a REPWEIGHTS
statement to provide replicate weights.
You can specify the following method-options:
-
DFADJ
-
computes the degrees of freedom by using the number of nonmissing strata and clusters for the individual table request. If
you specify this method-option, PROC SURVEYFREQ does not count any empty strata or clusters that occur when observations that have missing values of the
TABLES
variables are removed from the analysis of the table. By default, PROC SURVEYFREQ computes the degrees of freedom by counting
the number of nonmissing strata and clusters for all valid observations in the input data set. The degrees of freedom for
VARMETHOD=JACKKNIFE equal the number of clusters minus the number of strata.
For more information, see the section Degrees of Freedom. For information about valid observations, see the section Data Summary Table.
This method-option has no effect when you specify the MISSING
option, which treats missing values as a valid nonmissing level.
This method-option is not used when you specify the degrees of freedom in the DF=
option in the TABLES statement or when you specify a REPWEIGHTS
statement to provide replicate weights. When you specify a REPWEIGHTS
statement, the degrees of freedom are the number of REPWEIGHTS variables (replicates) unless you specify the DF=
option in the REPWEIGHTS or the TABLES statement.
-
OUTJKCOEFS=SAS-data-set
-
names a SAS-data-set to store the jackknife coefficients. For information about jackknife coefficients, see the section The Jackknife Method. For information about the contents of the OUTJKCOEFS= data set, see the section Jackknife Coefficient Output Data Set.
-
OUTWEIGHTS=SAS-data-set
-
names a SAS-data-set to store the replicate weights that PROC SURVEYFREQ creates for jackknife variance estimation. For information about replicate
weights, see the section The Jackknife Method. For information about the contents of the OUTWEIGHTS= data set, see the section Replicate Weight Output Data Set.
This method-option is not available when you use a REPWEIGHTS
statement to provide replicate weights.
-
TAYLOR
-
requests Taylor series variance estimation. This is the default method if you do not specify the VARMETHOD= option or a REPWEIGHTS
statement. For more information, see the section Taylor Series Variance Estimation.