The main features of the GLMSELECT procedure are as follows:
Model Specification
supports different parameterizations for classification effects
supports any degree of interaction (crossed effects) and nested effects
supports hierarchy among effects
supports partitioning of data into training, validation, and testing roles
supports constructed effects including spline and multimember effects
Selection Control
provides multiple effect selection methods
enables selection from a very large number of effects (tens of thousands)
offers selection of individual levels of classification effects
provides effect selection based on a variety of selection criteria
provides stopping rules based on a variety of model evaluation criteria
provides leave-one-out and k-fold cross validation
supports data resampling and model averaging
Display and Output
produces graphical representation of selection process
produces output data sets containing predicted values and residuals
produces an output data set containing the design matrix
produces macro variables containing selected models
supports parallel processing of BY groups
supports multiple SCORE statements
The GLMSELECT procedure supports the following effect selection methods. These methods are explained in detail in the section Model-Selection Methods.
Forward selection. This method starts with no effects in the model and adds effects.
Backward elimination. This method starts with all effects in the model and deletes effects.
Stepwise regression. This is similar to the FORWARD method except that effects already in the model do not necessarily stay there.
Least angle regression. This method, like forward selection, starts with no effects in the model and adds effects. The parameter estimates at any step are “shrunk” when compared to the corresponding least squares estimates.
This method adds and deletes parameters based on a version of ordinary least squares where the sum of the absolute regression coefficients is constrained.
Hybrid versions of LAR and LASSO are also supported. They use LAR or LASSO to select the model, but then estimate the regression coefficients by ordinary weighted least squares.
The GLMSELECT procedure is intended primarily as a model selection procedure and does not include regression diagnostics or other postselection facilities such as hypothesis testing, testing of contrasts, and LS-means analyses. The intention is that you use PROC GLMSELECT to select a model or a set of candidate models. Further investigation of these models can be done by using these models in existing regression procedures.