The GLMSELECT Procedure

Macro Variables Containing Selected Models

Often you might want to perform postselection analysis by using other SAS procedures. To facilitate this, PROC GLMSELECT saves the list of selected effects in a macro variable. This list does not explicitly include the intercept so that you can use it in the MODEL statement of other SAS/STAT regression procedures.

The following table describes the macro variables that PROC GLMSELECT creates. Note that when BY processing is used, one macro variable, indexed by the BY group number, is created for each BY group.

Macro Variable

Description

No BY processing

_GLSIND1

Selected model

BY processing

_GLSNUMBYS

Number of BY groups

_GLSIND1

Selected model for BY group 1

_GLSIND2

Selected model for BY group 2

 

You can use the macro variable _GLSIND as a synonym for _GLSIND1. If you do not use BY processing, _GLSNUMBYS is still defined and has the value 1.

To aid in associating indexed macro variables with the appropriate observations when BY processing is used, PROC GLMSELECT creates a variable _BY_ in the output data set specified in an OUTPUT statement (see the section OUTPUT Statement) that tags observations with an index that matches the index of the appropriate macro variable.

The following statements create a data set with two BY groups and run PROC GLMSELECT to select a model for each BY group.

data one(drop=i j);
   array x{5} x1-x5;
   do i=1 to 1000;
      classVar = mod(i,4)+1;  
      do j=1 to 5;
         x{j} = ranuni(1);
      end;   
      if i<400 then do;
         byVar = 'group 1'; 
         y     = 3*classVar+7*x2+5*x2*x5+rannor(1);
      end;
      else do;
         byVar = 'group 2'; 
         y     = 2*classVar+x5+rannor(1);
      end;
      output;
   end;
run;
proc glmselect data=one;
   by     byVar;
   class  classVar;
   model  y = classVar x1|x2|x3|x4|x5 @2 /
                  selection=stepwise(stop=aicc);
   output out=glmselectOutput;
run;

The preceding PROC GLMSELECT step produces three macro variables:

Macro Variable

Value

Description

_GLSNUMBYS

2

Number of BY groups

_GLSIND1

classVar x2 x2*x5

Selected model for the first BY group

_GLSIND2

classVar x5

Selected model for the second BY group

You can now leverage these macro variables and the output data set created by PROC GLMSELECT to perform postselection analyses that match the selected models with the appropriate BY-group observations. For example, the following statements create and run a macro that uses PROC GLM to perform LSMeans analyses.

%macro LSMeansAnalysis;
   %do i=1 %to &_GLSNUMBYS;
      title1  "Analysis Using the Selected Model for BY group number &i";
      title2 "Selected Effects: &&_GLSIND&i";

      ods select LSMeans;
      proc glm data=glmselectOutput(where = (_BY_ = &i));
         class classVar;
         model y = &&_GLSIND&i;
         lsmeans classVar;
      run;quit;
   %end;
%mend;
%LSMeansAnalysis;

The LSMeans analysis output from PROC GLM is shown in Figure 48.16.

Figure 48.16: LS-Means Analyses for Selected Models

Analysis Using the Selected Model for BY group number 1
Selected Effects: classVar x2 x2*x5

The GLM Procedure
Least Squares Means

classVar y LSMEAN
1 7.8832052
2 10.9528618
3 13.9412216
4 16.7929355

Analysis Using the Selected Model for BY group number 2
Selected Effects: classVar x5

The GLM Procedure
Least Squares Means

classVar y LSMEAN
1 2.46805014
2 4.52102826
3 6.53369479
4 8.49354763