CLASS
variable <(options)> …<variable <(options)>> </ global-options> ;
The CLASS statement names the classification variables to be used as explanatory variables in the analysis. Response variables do not need to be specified in the CLASS statement.
The CLASS statement must precede the MODEL statement. Most options can be specified either as individual variable options or as global-options. You can specify options for each variable by enclosing the options in parentheses after the variable name. You can also specify global-options for the CLASS statement by placing them after a slash (/). Global-options are applied to all the variables specified in the CLASS statement. If you specify more than one CLASS statement, the global-options specified in any one CLASS statement apply to all CLASS statements. However, individual CLASS variable options override the global-options. You can specify the following values for either an option or a global-option:
Parameter names for a CLASS predictor variable are constructed by concatenating the CLASS variable name with the CLASS levels. However, for the POLYNOMIAL and orthogonal parameterizations, parameter names are formed by concatenating the CLASS variable name and keywords that reflect the parameterization. See the section Other Parameterizations in Chapter 19: Shared Concepts and Topics, for examples and further details.
PROC LOGISTIC initially parameterizes the CLASS variables by looking at the levels of the variables across the complete data
set. If you have an unbalanced replication of levels across variables or BY groups, then the design matrix and the parameter interpretation might be different
from what you expect. For instance, suppose you have a model with one CLASS variable A
with three levels (1, 2, and 3), and another CLASS variable B
with two levels (1 and 2). If the third level of A
occurs only with the first level of B
, if you use the EFFECT parameterization, and if your model contains the effect A(B)
and an intercept, then the design for A
within the second level of B
is not a differential effect. In particular, the design looks like the following:
Design Matrix |
|||||||
---|---|---|---|---|---|---|---|
A(B=1) |
A(B=2) |
||||||
B |
A |
A1 |
A2 |
A1 |
A2 |
||
1 |
1 |
1 |
0 |
0 |
0 |
||
1 |
2 |
0 |
1 |
0 |
0 |
||
1 |
3 |
–1 |
–1 |
0 |
0 |
||
2 |
1 |
0 |
0 |
1 |
0 |
||
2 |
2 |
0 |
0 |
0 |
1 |
PROC LOGISTIC detects linear dependency among the last two design variables and sets the parameter for A2(B=2) to zero, resulting in an interpretation of these parameters as if they were reference- or dummy-coded. The REFERENCE or GLM parameterization might be more appropriate for such problems.