PROC FREQ produces two types of output data sets that you can use with other statistical and reporting procedures. You can request these data sets as follows:
Specify the OUT= option in a TABLES statement. This creates an output data set that contains frequency or crosstabulation table counts and percentages
Specify an OUTPUT statement. This creates an output data set that contains statistics.
PROC FREQ does not display the output data sets. Use PROC PRINT, PROC REPORT, or any other SAS reporting tool to display an output data set.
In addition to these two output data sets, you can create a SAS data set from any piece of PROC FREQ output by using the Output Delivery System. See the section ODS Table Names for more information.
The OUT= option in the TABLES statement creates an output data set that contains one observation for each combination of variable values (or table cell) in the last table request. By default, each observation contains the frequency and percentage for the table cell. When the input data set contains missing values, the output data set also contains an observation with the frequency of missing values. The output data set includes the following variables:
BY variables
table request variables, such as A
, B
, C
, and D
in the table request A
*B
*C
*D
COUNT
, which contains the table cell frequency
PERCENT
, which contains the table cell percentage
If you specify the OUTEXPECT option in the TABLES statement for a two-way or multiway table, the output data set also includes expected frequencies. If you specify the OUTPCT option for a two-way or multiway table, the output data set also includes row, column, and table percentages. The additional variables are as follows:
EXPECTED
, which contains the expected frequency
PCT_TABL
, which contains the percentage of two-way table frequency, for n-way tables where n > 2
PCT_ROW
, which contains the percentage of row frequency
PCT_COL
, which contains the percentage of column frequency
If you specify the OUTCUM option in the TABLES statement for a one-way table, the output data set also includes cumulative frequencies and cumulative percentages. The additional variables are as follows:
CUM_FREQ
, which contains the cumulative frequency
CUM_PCT
, which contains the cumulative percentage
The OUTCUM option has no effect for two-way or multiway tables.
The following PROC FREQ statements create an output data set of frequencies and percentages:
proc freq; tables A A*B / out=D; run;
The output data set D
contains frequencies and percentages for the table of A
by B
, which is the last table request listed in the TABLES statement. If A
has two levels (1 and 2), B
has three levels (1,2, and 3), and no table cell count is zero or missing, then the output data set D
includes six observations, one for each combination of A
and B
levels. The first observation corresponds to A
=1 and B
=1; the second observation corresponds to A
=1 and B
=2; and so on. The data set includes the variables COUNT
and PERCENT
. The value of COUNT
is the number of observations with the given combination of A
and B
levels. The value of PERCENT
is the percentage of the total number of observations with that A
and B
combination.
When PROC FREQ combines different variable values into the same formatted level, the output data set contains the smallest
internal value for the formatted level. For example, suppose a variable X
has the values 1.1., 1.4, 1.7, 2.1, and 2.3. When you submit the statement
format X 1.;
in a PROC FREQ step, the formatted levels listed in the frequency table for X
are 1 and 2. If you create an output data set with the frequency counts, the internal values of the levels of X
are 1.1 and 1.7. To report the internal values of X
when you display the output data set, use a format of 3.1 for X
.
The OUTPUT statement creates a SAS data set that contains statistics computed by PROC FREQ. Table 40.7 lists the statistics that can be stored in the output data set. You identify which statistics to include by specifying output-options. See the description of the OUTPUT statement for details.
If you specify multiple TABLES statements or multiple table requests in a single TABLES statement, the contents of the output data set correspond to the last table request.
For a one-way table or a two-way table, the output data set contains one observation that stores the requested statistics for the table. For a multiway table, the output data set contains an observation for each two-way table (stratum) of the multiway crosstabulation. If you request summary statistics for the multiway table, the output data set also contains an observation that stores the across-strata summary statistics. If you use a BY statement, the output data set contains an observation (for one-way or two-way tables) or set of observations (for multiway tables) for each BY group.
The OUTPUT data set can include the following variables:
BY variables
Variables that identify the stratum for multiway tables, such as A
and B
in the table request A
*B
*C
*D
Variables that contain the specified statistics
In addition to the specified estimate or test statistic, the output data set includes associated values such as standard errors, confidence limits, p-values, and degrees of freedom.
PROC FREQ constructs variable names for the statistics in the output data set by enclosing the output-option names in underscores. Variable names for the corresponding standard errors, confidence limits, p-values, and degrees of freedom are formed by combining the output-option names with prefixes that identify the associated values. Table 40.20 lists the prefixes and their descriptions.
Table 40.20: Output Data Set Variable Name Prefixes
Prefix |
Description |
---|---|
|
Asymptotic standard error (ASE) |
|
Lower confidence limit |
|
Upper confidence limit |
|
Null hypothesis ASE |
|
Standardized value |
|
Degrees of freedom |
|
p-value |
|
Two-sided p-value |
|
Left-sided p-value |
|
Right-sided p-value |
|
Exact p-value |
|
Exact two-sided p-value |
|
Exact left-sided p-value |
|
Exact right-sided p-value |
|
Exact point probability |
|
Exact mid p-value |
|
Exact lower confidence limit |
|
Exact upper confidence limit |
For example, the PCHI
output-option in the OUTPUT
statement includes the Pearson chi-square test in the output data set. The variable names for the Pearson chi-square statistic,
its degrees of freedom, and the corresponding p-value are _PCHI_
, DF_PCHI
, and P_PCHI
, respectively. For variables that were added to the output data set before SAS/STAT 8.2, PROC FREQ truncates the variable
name to eight characters when the length of the prefix plus the output-option name exceeds eight characters.