An alternative to stepwise selection of variables is best subset selection. This method uses the branch-and-bound algorithm of Furnival and Wilson (1974) to find a specified number of best models containing one, two, or three variables, and so on, up to the single model containing all of the explanatory variables. The criterion used to determine the “best” subset is based on the global score chi-square statistic. For two models A and B, each having the same number of explanatory variables, model A is considered to be better than model B if the global score chi-square statistic for A exceeds that for B.
In the following statements, best subset selection analysis is requested by specifying the SELECTION=SCORE option in the MODEL statement. The BEST=3 option requests the procedure to identify only the three best models for each size. In other words, PROC PHREG will list the three models having the highest score statistics of all the models possible for a given number of covariates.
proc phreg data=Myeloma; model Time*VStatus(0)=LogBUN HGB Platelet Age LogWBC Frac LogPBM Protein SCalc / selection=score best=3; run;
Output 67.2.1 displays the results of this analysis. The number of explanatory variables in the model is given in the first column, and
the names of the variables are listed on the right. The models are listed in descending order of their score chi-square values
within each model size. For example, among all models containing two explanatory variables, the model that contains the variables
LogBUN
and HGB
has the largest score value (12.7252), the model that contains the variables LogBUN
and Platelet
has the second-largest score value (11.1842), and the model that contains the variables LogBUN
and SCalc
has the third-largest score value (9.9962).
Output 67.2.1: Best Variable Combinations
Regression Models Selected by Score Criterion | ||
---|---|---|
Number of Variables |
Score Chi-Square |
Variables Included in Model |
1 | 8.5164 | LogBUN |
1 | 5.0664 | HGB |
1 | 3.1816 | Platelet |
2 | 12.7252 | LogBUN HGB |
2 | 11.1842 | LogBUN Platelet |
2 | 9.9962 | LogBUN SCalc |
3 | 15.3053 | LogBUN HGB SCalc |
3 | 13.9911 | LogBUN HGB Age |
3 | 13.5788 | LogBUN HGB Frac |
4 | 16.9873 | LogBUN HGB Age SCalc |
4 | 16.0457 | LogBUN HGB Frac SCalc |
4 | 15.7619 | LogBUN HGB LogPBM SCalc |
5 | 17.6291 | LogBUN HGB Age Frac SCalc |
5 | 17.3519 | LogBUN HGB Age LogPBM SCalc |
5 | 17.1922 | LogBUN HGB Age LogWBC SCalc |
6 | 17.9120 | LogBUN HGB Age Frac LogPBM SCalc |
6 | 17.7947 | LogBUN HGB Age LogWBC Frac SCalc |
6 | 17.7744 | LogBUN HGB Platelet Age Frac SCalc |
7 | 18.1517 | LogBUN HGB Platelet Age Frac LogPBM SCalc |
7 | 18.0568 | LogBUN HGB Age LogWBC Frac LogPBM SCalc |
7 | 18.0223 | LogBUN HGB Platelet Age LogWBC Frac SCalc |
8 | 18.3925 | LogBUN HGB Platelet Age LogWBC Frac LogPBM SCalc |
8 | 18.1636 | LogBUN HGB Platelet Age Frac LogPBM Protein SCalc |
8 | 18.1309 | LogBUN HGB Platelet Age LogWBC Frac Protein SCalc |
9 | 18.4550 | LogBUN HGB Platelet Age LogWBC Frac LogPBM Protein SCalc |