The OUTPUT statement creates a new SAS data set that saves diagnostic measures calculated for the selected model. If you do not specify a keyword, then the only diagnostic included is the predicted response.
All the variables in the original data set are included in the new data set, along with variables created in the OUTPUT statement.
These new variables contain the values of a variety of statistics and diagnostic measures that are calculated for each observation
in the data set. If you specify a BY statement, then a variable _BY_
that indexes the BY groups is included. For each observation, the value of _BY_
is the index of the BY group to which this observation belongs. This variable is useful for matching BY groups with macro
variables that PROC GLMSELECT creates. See the section Macro Variables Containing Selected Models for details.
If you have requested n-fold cross validation by requesting CHOOSE=
CV, SELECT=
CV, or STOP=
CV in the MODEL
statement, then a variable _CVINDEX_
is included in the output data set. For each observation used for model training the value of _CVINDEX_
is i if that observation is omitted in forming the ith subset of the training data. See the CVMETHOD=
for additional details. The value of _CVINDEX_
is 0 for all observations in the input data set that are not used for model training.
If you have partitioned the input data with a PARTITION
statement, then a character variable _ROLE_
is included in the output data set. For each observation the value of _ROLE_
is as follows:
|
Observation Role |
TEST |
testing |
TRAIN |
training |
VALIDATE |
validation |
If you want to create a SAS data set in a permanent library, you must specify a two-level name. For more information about permanent libraries and SAS data sets, see SAS Language Reference: Concepts.
Details on the specifications in the OUTPUT statement follow.