The UNIVARIATE Procedure

OUT= Output Data Set in the OUTPUT Statement

PROC UNIVARIATE creates an OUT= data set for each OUTPUT statement. This data set contains an observation for each combination of levels of the variables in the BY statement, or a single observation if you do not specify a BY statement. Thus the number of observations in the new data set corresponds to the number of groups for which statistics are calculated. Without a BY statement, the procedure computes statistics and percentiles by using all the observations in the input data set. With a BY statement, the procedure computes statistics and percentiles by using the observations within each BY group.

The variables in the OUT= data set are as follows:

BY statement variables. The values of these variables match the values in the corresponding BY group in the DATA= data set and indicate which BY group each observation summarizes.
variables created by selecting statistics in the OUTPUT statement. The statistics are computed using all the nonmissing data, or they are computed for each BY group if you use a BY statement.
variables created by requesting new percentiles with the PCTLPTS= option. The names of these new variables depend on the values of the PCTLPRE= and PCTLNAME= options.

If the output data set contains a percentile variable or a quartile variable, the percentile definition assigned with the PCTLDEF= option in the PROC UNIVARIATE statement is recorded in the output data set label. See Example 4.8.

The following table lists variables available in the OUT= data set.

Table 4.36: Variables Available in the OUT= Data Set

Variable Name	Description
Descriptive Statistics
CSS	sum of squares corrected for the mean
CV	percent coefficient of variation
KURTOSIS \| KURT	measurement of the heaviness of tails
MAX	largest (maximum) value
MEAN	arithmetic mean
MIN	smallest (minimum) value
MODE	most frequent value (if not unique, the smallest mode)
N	number of observations on which calculations are based
NMISS	number of missing observations
NOBS	total number of observations
RANGE	difference between the maximum and minimum values
SKEWNESS \| SKEW	measurement of the tendency of the deviations to be larger in one direction than in the other
STD \| STDDEV	standard deviation
STDMEAN \| STDERR	standard error of the mean
SUM	sum
SUMWGT	sum of the weights
USS	uncorrected sum of squares
VAR	variance
Quantile Statistics
MEDIAN \| Q2 \| P50	middle value (50th percentile)
P1	1st percentile
P5	5th percentile
P10	10th percentile
P90	90th percentile
P95	95th percentile
P99	99th percentile
Q1 \| P25	lower quartile (25th percentile)
Q3 \| P75	upper quartile (75th percentile)
QRANGE	difference between the upper and lower quartiles (also known as the inner quartile range)
Robust Statistics
GINI	Gini’s mean difference
MAD	median absolute difference
QN	2nd variation of median absolute difference
SN	1st variation of median absolute difference
STD_GINI	standard deviation for Gini’s mean difference
STD_MAD	standard deviation for median absolute difference
STD_QN	standard deviation for the second variation of the median absolute difference
STD_QRANGE	estimate of the standard deviation, based on interquartile range
STD_SN	standard deviation for the first variation of the median absolute difference
Hypothesis Test Statistics
MSIGN	sign statistic
NORMAL	test statistic for normality. If the sample size is less than or equal to 2000, this is the Shapiro-Wilk $W$ statistic. Otherwise, it is the Kolmogorov $D$ statistic.
PROBM	probability of a greater absolute value for the sign statistic
PROBN	probability that the data came from a normal distribution
PROBS	probability of a greater absolute value for the signed rank statistic
PROBT	two-tailed $p$ -value for Student’s $t$ statistic with $n-1$ degrees of freedom
SIGNRANK	signed rank statistic
T	Student’s $t$ statistic to test the null hypothesis that the population mean is equal to $\mu _0$