By default, the CATMOD procedure treats all variables as classification variables. As a result, there is no CLASS statement in PROC CATMOD. The values of a classification variable can be numeric or character. PROC CATMOD builds a set of effects-coded variables to represent the levels of the classification variable and then uses these to fit the model (for details, see the section Generation of the Design Matrix). You can modify the default by using the DIRECT statement to treat numeric independent continuous variables as continuous variables. The classification variables, combinations of classification variables, and continuous variables are then used in fitting linear models to data.
The parameters of a linear model are generally divided into subsets that correspond to meaningful sources of variation in the response functions. These sources, called effects, can be specified in the MODEL, LOGLIN, FACTORS, REPEATED, and CONTRAST statements. Effects can be specified in any of the following ways:
A main effect is a single classification variable (that is, it produces class levels): A
B
C
.
A crossed effect (or interaction) is two or more classification variables joined by asterisks—for example: A
*B
A
*B
*C
.
A nested effect is a main effect or an interaction,
followed by a parenthetical field containing a main effect or an interaction. Multiple variables within the parentheses are
assumed to form a crossed effect even when the asterisk is absent. In the following list, the last two effects are identical:
B
(A
) C
(A
*B
) A
*B
(C
*D
) A
*B
(C
D
).
A nested-by-value effect is the same as a nested effect
except that any variable in the parentheses can be followed by an equal sign and a value: B
(A
=1) C
(A
B
=1) C
*D
(A
=1 B
=1) A
(C
=’low’).
A direct effect is a variable specified in a DIRECT statement: X
Y
.
Direct effects can be crossed with other effects: X
*Y
X
*X
*X
X
*A
*B
(C
D
=1).
The variables for crossed and nested effects remain in the order in which they are first encountered. For example, in the
following model, the effect A
*B
is reported as B
*A
since B
appears before A
in the statement:
model R=B A A*B C(A B);
Also, C
(A
B
) is interpreted as C
(A
*B
) and is therefore reported as C
(B
*A
).
You can shorten the specification of multiple effects by using bar notation. For example, the following statements illustrate two methods of writing a full three-way factorial model:
proc catmod; model y=a b c a*b a*c b*c a*b*c; run; proc catmod; model y=a|b|c; run;
When you use the bar (|) notation, the right and left sides become effects, and the interaction between them becomes an effect. Multiple bars are permitted. The expressions are expanded from left to right, using rules 1 through 4 given in Searle (1971, p. 390):
Multiple bars are evaluated left to right. For example, A
|B
|C
is evaluated as follows:
|
|
{ |
|
{ |
|
|
|
Crossed and nested groups of variables are combined. For example, A
(B
) | C
(D
) generates A
*C
(B
D
), among other terms.
Duplicate variables are removed. For example, A
(C
) | B
(C
) generates A
*B
(C
C
), among other terms, and the extra C
is removed.
Effects are discarded if a variable occurs on both the crossed and nested sides of an effect. For instance, A
(B
) | B
(D
E
) generates A
*B
(B
D
E
), but this effect is deleted.
You can also specify the maximum number of variables involved in any effect that results from bar evaluation by specifying
that maximum number, preceded by an @ sign, at
the end of the bar effect. For example, the specification A
| B
| C
@ 2 would result in only those effects that contain two or fewer variables; in this case, the effects A
, B
, A
*B
, C
, A
*C
, and B
*C
are generated.
Other examples of the bar notation follow:
|
is equivalent to |
|
|
is equivalent to |
|
|
is equivalent to |
|
|
is equivalent to |
|
|
is equivalent to |
|
|
is equivalent to |
|
For details about how the effects specified lead to a design matrix, see the section Generation of the Design Matrix.