This section illustrates that an analysis of variance model can be formulated as a simple regression model with optimal scoring. The purpose of the example is to explain one aspect of how PROC TRANSREG works, not to propose an alternative way of performing an analysis of variance.
Finding the overall fit of a large, unbalanced analysis of variance model can be handled as an optimal scoring problem without
creating large, sparse design matrices. For example, consider an unbalanced full main-effects and interactions ANOVA model
with six factors. Assume that a SAS data set is created with factor-level indicator variables c1
through c6
and dependent variable y
. If each factor level consists of nonblank single characters, you can create a cell indicator in a DATA step with the statement
as follows:
x=compress(c1||c2||c3||c4||c5||c6);
The following statements optimally score x
(by using the OPSCORE
transformation) and do not transform y
:
proc transreg; model identity(y)=opscore(x); output; run;
The final R square reported is the R square for the full analysis of variance model. This R square is the same R square that would be reported by both of the following PROC GLM steps:
proc glm; class x; model y=x; run; proc glm; class c1-c6; model y=c1|c2|c3|c4|c5|c6; run;
PROC TRANSREG optimally scores the classes of x
, within the space of a single variable with values linearly related to the cell means, so the full ANOVA problem is reduced
to a simple regression problem with an optimal independent variable. PROC TRANSREG requires only one iteration to find the
optimal scoring of x
but, by default, performs a second iteration, which reports no data changes.