PROC SURVEYFREQ
<options> ;
The PROC SURVEYFREQ statement invokes the SURVEYFREQ procedure. It also identifies the data set to be analyzed, specifies the variance estimation method to use, and provides sample design information. The DATA= option names the input data set to be analyzed. The VARMETHOD= option specifies the variance estimation method, which is the Taylor series method by default. For Taylor series variance estimation, you can include a finite population correction factor in the analysis by providing either the sampling rate or population total with the RATE= or TOTAL= option. If your design is stratified with different sampling rates or totals for different strata, you can input these stratum rates or totals in a SAS data set that contains the stratification variables.
Table 90.1 summarizes the options available in the PROC SURVEYFREQ statement.
Table 90.1: PROC SURVEYFREQ Statement Options
Option |
Description |
---|---|
Names the input SAS data set |
|
Treats missing values as a valid level |
|
Treats missing values as not missing completely at random |
|
Suppresses the display of the “Data Summary” table |
|
Specifies the order of variable levels |
|
Displays only one table per page |
|
Specifies the first-stage sampling rate |
|
Specifies the total number of primary sampling units |
|
Specifies the variable identification to display |
|
Specifies the variance estimation method |
You can specify the following options in the PROC SURVEYFREQ statement:
names the SAS data set to be analyzed by PROC SURVEYFREQ. If you omit the DATA= option, the procedure uses the most recently created SAS data set.
treats missing values as a valid (nonmissing) category for all categorical variables, which include TABLES, STRATA, and CLUSTER variables.
By default, if you do not specify the MISSING option, an observation is excluded from the analysis if it has a missing value for any STRATA or CLUSTER variable. Additionally, PROC SURVEYFREQ excludes an observation from a frequency or crosstabulation table if that observation has a missing value for any of the variables in the table request, unless you specify the MISSING option. For more information, see the section Missing Values.
includes observations with missing values of TABLES variables in the variance computation as not missing completely at random (NOMCAR) for Taylor series variance estimation. When you specify the NOMCAR option, PROC SURVEYFREQ computes variance estimates by analyzing the nonmissing values as a domain (subpopulation), where the entire population includes both nonmissing and missing domains. See the section Missing Values for details.
By default, PROC SURVEYFREQ completely excludes an observation from a frequency or crosstabulation table (and the corresponding variance computations) if that observation has a missing value for any of the variables in the table request, unless you specify the MISSING option. Note that the NOMCAR option has no effect when you specify the MISSING option, which treats missing values as a valid nonmissing level.
The NOMCAR option applies only to Taylor series variance estimation. The replication methods, which you request with the VARMETHOD=BRR and VARMETHOD=JACKKNIFE options, do not use the NOMCAR option.
suppresses the display of the “Data Summary” table, which PROC SURVEYFREQ produces by default. For details about this table, see the section Data Summary Table.
specifies the order of the variable levels in the frequency and crosstabulation tables, which you request in the TABLES statement. The ORDER= option also controls the order of the STRATA variable levels in the “Stratum Information” table.
The ORDER= option can take the following values:
ORDER= |
Levels Ordered By |
---|---|
DATA |
Order of appearance in the input data set |
FORMATTED |
External formatted value, except for numeric variables with no explicit format, which are sorted by their unformatted (internal) value |
FREQ |
Descending frequency count; levels with the most observations come first in the order |
INTERNAL |
Unformatted value |
By default, ORDER=INTERNAL. The FORMATTED and INTERNAL orders are machine-dependent. Note that the frequency count used by ORDER=FREQ is the nonweighted frequency (sample size), rather than the weighted frequency.
For more information about sort order, see the chapter on the SORT procedure in the Base SAS Procedures Guide and the discussion of BY-group processing in SAS Language Reference: Concepts.
displays only one table per page. Otherwise, PROC SURVEYFREQ displays multiple tables per page as space permits.
specifies the sampling rate as a nonnegative value, or identifies an input data set that provides the stratum sampling rates in a variable named _RATE_
. PROC SURVEYFREQ uses this information to compute a finite population correction for Taylor series variance estimation. The
procedure does not use the RATE= option for BRR or jackknife variance estimation, which you request with the VARMETHOD=BRR or VARMETHOD=JACKKNIFE option.
If your sample design has multiple stages, you should specify the first-stage sampling rate, which is the ratio of the number of primary sampling units (PSUs) that are selected to the total number of PSUs in the population.
For a nonstratified sample design, or for a stratified sample design with the same sampling rate in all strata, you should specify a nonnegative value for the RATE= option. If your design is stratified with different sampling rates in different strata, then you should name a SAS data set that contains the stratification variables and the stratum sampling rates. See the section Population Totals and Sampling Rates for details.
The sampling rate value must be a nonnegative number. You can specify value as a number between 0 and 1. Or you can specify value in percentage form as a number between 1 and 100, and PROC SURVEYFREQ converts that number to a proportion. The procedure treats the value 1 as 100% instead of 1%.
If you do not specify the RATE= or TOTAL= option, then the Taylor series variance estimation does not include a finite population correction. You cannot specify both the TOTAL= option and the RATE= option in the same PROC SURVEYFREQ statement.
specifies the total number of primary sampling units (PSUs) in the study population as a positive value, or identifies an input data set that provides the stratum population totals in a variable named _TOTAL_
. PROC SURVEYFREQ uses this information to compute a finite population correction for Taylor series variance estimation. The
procedure does not use the TOTAL= option for BRR or jackknife variance estimation, which you request with the VARMETHOD=BRR or VARMETHOD=JACKKNIFE option.
For a nonstratified sample design, or for a stratified sample design with the same population total in all strata, you should specify a positive value for the TOTAL= option. If your sample design is stratified with different population totals in different strata, then you should name a SAS data set that contains the stratification variables and the stratum totals. See the section Population Totals and Sampling Rates for details.
If you do not specify the TOTAL= or RATE= option, then the Taylor series variance estimation does not include a finite population correction. You cannot specify both the TOTAL= option and the RATE= option in the same PROC SURVEYFREQ statement.
specifies the variable identification to use in the displayed output. By default VARHEADER=NAME, which displays variable names in the output. The VARHEADER= option affects the headers of the variable level columns in one-way frequency tables, crosstabulation tables, and the “Stratum Information” table. The VARHEADER= option also controls variable identification in the table headers.
The VARHEADER= option can take the following values:
VARHEADER= |
Variable Identification Displayed |
---|---|
LABEL |
Variable label |
NAME |
Variable name |
NAMELABEL |
Variable name and label, as Name (Label) |
specifies the variance estimation method. VARMETHOD=TAYLOR requests the Taylor series method, which is the default if you do not specify the VARMETHOD= option or the REPWEIGHTS statement. VARMETHOD=BRR requests variance estimation by balanced repeated replication (BRR), and VARMETHOD=JACKKNIFE requests variance estimation by the delete-1 jackknife method.
For VARMETHOD=BRR and VARMETHOD=JACKKNIFE, you can specify method-options in parentheses after the variance method name. Table 90.2 summarizes the available method-options.
Table 90.2: Variance Estimation Options
VARMETHOD= |
Variance Estimation Method |
Method Options |
---|---|---|
BRR |
Balanced repeated replication |
|
JACKKNIFE |
Jackknife |
|
TAYLOR |
Taylor series linearization |
None |
Method-options must be enclosed in parentheses after the variance method name. For example:
varmethod=BRR(reps=60 outweights=myReplicateWeights)
The following values are available for the VARMETHOD= option:
requests variance estimation by balanced repeated replication (BRR). The BRR method requires a stratified sample design with two primary sampling units (PSUs) in each stratum. If you specify the VARMETHOD=BRR option, you must also specify a STRATA statement unless you provide replicate weights with a REPWEIGHTS statement. See the section Balanced Repeated Replication (BRR) for details.
You can specify the following method-options in parentheses after VARMETHOD=BRR:
computes the degrees of freedom as the number of nonmissing strata for the individual table request. The degrees of freedom for VARMETHOD=BRR equal the number of strata, which by default is based on all valid observations in the data set. But if you specify the DFADJ method-option, PROC SURVEYFREQ does not count any empty strata that occur when observations with missing values of the TABLES variables are removed from the analysis of that table.
See the section Degrees of Freedom for more information. See the section Data Summary Table for details about valid observations.
The DFADJ method-option has no effect when you specify the MISSING option, which treats missing values as a valid nonmissing level. The DFADJ method-option is not used when you specify the degrees of freedom in the DF= option in the TABLES statement.
The DFADJ method-option cannot be used when you provide replicate weights with a REPWEIGHTS statement. When you use a REPWEIGHTS statement, the degrees of freedom equal the number of REPWEIGHTS variables (or replicates), unless you specify an alternative value in the DF= option in the REPWEIGHTS or TABLES statement.
requests Fay’s method, which is a modification of the BRR method. See the section Fay’s BRR Method for details.
You can specify the value of the Fay coefficient, which is used in converting the original sampling weights to replicate weights. The Fay coefficient must be a nonnegative number less than 1. By default, the value of the Fay coefficient equals 0.5.
names a SAS data set that contains the Hadamard matrix for BRR replicate construction. If you do not provide a Hadamard matrix with the HADAMARD= method-option, PROC SURVEYFREQ generates an appropriate Hadamard matrix for replicate construction. See the sections Balanced Repeated Replication (BRR) and Hadamard Matrix for details.
If a Hadamard matrix of a given dimension exists, it is not necessarily unique. Therefore, if you want to use a specific Hadamard matrix, you must provide the matrix as a SAS data set in the HADAMARD=SAS-data-set method-option.
In the HADAMARD= input data set, each variable corresponds to a column of the Hadamard matrix, and each observation corresponds to a row of the matrix. You can use any variable names in the HADAMARD= data set. All values in the data set must equal either 1 or –1. You must ensure that the matrix you provide is indeed a Hadamard matrix—that is, , where is the Hadamard matrix of dimension R and is an identity matrix. PROC SURVEYFREQ does not check the validity of the Hadamard matrix that you provide.
The HADAMARD= input data set must contain at least H variables, where H denotes the number of first-stage strata in your design. If the data set contains more than H variables, PROC SURVEYFREQ uses only the first H variables. Similarly, the HADAMARD= input data set must contain at least H observations.
If you do not specify the REPS= method-option, then the number of replicates is taken to be the number of observations in the HADAMARD= input data set. If you specify the number of replicates—for example, REPS=nreps—then the first nreps observations in the HADAMARD= data set are used to construct the replicates.
You can specify the PRINTH method-option to display the Hadamard matrix that the procedure uses to construct replicates for BRR.
names a SAS data set to store the replicate weights that PROC SURVEYFREQ creates for BRR variance estimation. See the section Balanced Repeated Replication (BRR) for information about replicate weights. See the section Replicate Weight Output Data Set for details about the contents of the OUTWEIGHTS= data set.
The OUTWEIGHTS= method-option is not available when you provide replicate weights with a REPWEIGHTS statement.
displays the Hadamard matrix used to construct replicates for BRR. When you provide the Hadamard matrix in the HADAMARD= method-option, PROC SURVEYFREQ displays only the rows and columns that are actually used to construct replicates. See the sections Balanced Repeated Replication (BRR) and Hadamard Matrix for more information.
The PRINTH method-option is not available when you provide replicate weights with a REPWEIGHTS statement because the procedure does not use a Hadamard matrix in this case.
specifies the number of replicates for BRR variance estimation. The value of n must be an integer greater than 1.
If you do not provide a Hadamard matrix with the HADAMARD= method-option, the number of replicates should be greater than the number of strata and should be a multiple of 4. See the section Balanced Repeated Replication (BRR) for more information. If a Hadamard matrix cannot be constructed for the REPS= value that you specify, the value is increased until a Hadamard matrix of that dimension can be constructed. Therefore, it is possible for the actual number of replicates used to be larger than the REPS= value that you specify.
If you provide a Hadamard matrix with the HADAMARD= method-option, the value of REPS= must not be less than the number of rows in the Hadamard matrix. If you provide a Hadamard matrix and do not specify the REPS= method-option, the number of replicates equals the number of rows in the Hadamard matrix.
If you do not specify the REPS= or HADAMARD= method-option and do not include a REPWEIGHTS statement, the number of replicates equals the smallest multiple of 4 that is greater than the number of strata.
If you provide replicate weights with a REPWEIGHTS statement, the procedure does not use the REPS= method-option. With a REPWEIGHTS statement, the number of replicates equals the number of REPWEIGHTS variables.
requests variance estimation by the delete-1 jackknife method. See the section The Jackknife Method for details. If you provide replicate weights with a REPWEIGHTS statement, VARMETHOD=JACKKNIFE is the default variance estimation method.
The jackknife method requires at least two primary sampling units (PSUs) in each stratum for stratified designs unless you provide replicate weights with a REPWEIGHTS statement.
You can specify the following method-options in parentheses after VARMETHOD=JACKKNIFE:
computes the degrees of freedom by using the number of nonmissing strata and clusters for the individual table request. The degrees of freedom for VARMETHOD=JACKKNIFE equal the number of clusters minus the number of strata, which by default is based on all valid observations in the data set. But if you specify the DFADJ method-option, PROC SURVEYFREQ does not count any empty strata or clusters that occur when observations with missing values of the TABLES variables are removed from the analysis of that table.
See the section Degrees of Freedom for more information. See the section Data Summary Table for details about valid observations.
The DFADJ method-option has no effect when you specify the MISSING option, which treats missing values as a valid nonmissing level. The DFADJ method-option is not used when you specify the degrees of freedom in the DF= option in the TABLES statement.
The DFADJ method-option cannot be used when you provide replicate weights with a REPWEIGHTS statement. When you include a REPWEIGHTS statement, the degrees of freedom equal the number of REPWEIGHTS variables (or replicates), unless you specify an alternative value in the DF= option in the REPWEIGHTS or TABLES statement.
names a SAS data set to store the jackknife coefficients. See the section The Jackknife Method for information about jackknife coefficients. See the section Jackknife Coefficient Output Data Set for details about the contents of the OUTJKCOEFS= data set.
names a SAS data set to store the replicate weights that PROC SURVEYFREQ creates for jackknife variance estimation. See the section The Jackknife Method for information about replicate weights. See the section Replicate Weight Output Data Set for details about the contents of the OUTWEIGHTS= data set.
The OUTWEIGHTS= method-option is not available when you provide replicate weights with a REPWEIGHTS statement.
requests Taylor series variance estimation. This is the default method if you do not specify the VARMETHOD= option or a REPWEIGHTS statement. See the section Taylor Series Variance Estimation for details.