In this example, a
and b
are classification variables and y
is the dependent variable. a
is declared fixed, and b
and a
*b
are random. Note that this design is unbalanced because the cell sizes are not all the same. PROC VARCOMP is invoked four
times, once for each of the general estimation methods. The data are from Hemmerle and Hartley (1973). The following statements produce Output 108.1.1.
data a; input a b y @@; datalines; 1 1 237 1 1 254 1 1 246 1 2 178 1 2 179 2 1 208 2 1 178 2 1 187 2 2 146 2 2 145 2 2 141 3 1 186 3 1 183 3 2 142 3 2 125 3 2 136 ;
proc varcomp method=type1 data=a; class a b; model y=a|b / fixed=1; run;
Output 108.1.1: VARCOMP Procedure: Method=TYPE1
Type 1 Analysis of Variance | ||||
---|---|---|---|---|
Source | DF | Sum of Squares | Mean Square | Expected Mean Square |
a | 2 | 11736 | 5868.218750 | Var(Error) + 2.725 Var(a*b) + 0.1 Var(b) + Q(a) |
b | 1 | 11448 | 11448 | Var(Error) + 2.6308 Var(a*b) + 7.8 Var(b) |
a*b | 2 | 299.041026 | 149.520513 | Var(Error) + 2.5846 Var(a*b) |
Error | 10 | 786.333333 | 78.633333 | Var(Error) |
Corrected Total | 15 | 24270 |
The "Class Level Information" table in Output 108.1.1 displays the levels of each variable specified in the CLASS statement. You can check this table to make sure the data are input correctly.
The Type I analysis of variance in Output 108.1.1 consists of a sequential partition of the total sum of squares. The mean square is the sum of squares divided by the degrees of freedom, and the expected mean square is the expected value of the mean square under the mixed model. The "Q" notation in the expected mean squares refers to a quadratic form in parameters of the parenthesized effect.
The Type I estimates of the variance components in Output 108.1.1 result from solving the linear system of equations established by equating the observed mean squares to their expected values.
The following statements are the same as before, except that the estimation method is MIVQUE0 instead of the default TYPE1. They produce Output 108.1.2.
proc varcomp method=mivque0 data=a; class a b; model y=a|b / fixed=1; run;
The MIVQUE0 estimates in Output 108.1.2 result from solving the equations established by the MIVQUE0 SSQ matrix. Note that the estimate of the variance component for the interaction effect, Var(a*b), is negative for this example.
The following statements use METHOD=ML to invoke maximum likelihood estimation. They produce Output 108.1.3.
proc varcomp method=ml data=a; class a b; model y=a|b / fixed=1; run;
Output 108.1.3: VARCOMP Procedure: Method=ML
Maximum Likelihood Iterations | ||||
---|---|---|---|---|
Iteration | Objective | Var(b) | Var(a*b) | Var(Error) |
0 | 78.3850371200 | 1031.49070 | 0 | 74.3909717935 |
1 | 78.2637043807 | 732.3606453635 | 0 | 77.4011688154 |
2 | 78.2635471161 | 723.6867470850 | 0 | 77.5301774839 |
3 | 78.2635471152 | 723.6658365289 | 0 | 77.5304926877 |
The "Maximum Likelihood Iterations" table in Output 108.1.3 shows that the Newton-Raphson algorithm used by PROC VARCOMP requires three iterations to converge.
The ML estimate of Var(a*b) is zero for this example, and the other two estimates are smaller than their Type I and MIVQUE0 counterparts.
One benefit of using likelihood-based methods is that an approximate covariance matrix is available from the matrix of second derivatives evaluated at the ML solution. This covariance matrix is valid asymptotically and can be unreliable in small samples.
Here the variance component estimates for B and the Error are negatively correlated, and the elements for Var(a*b) are set to zero because the estimate equals zero. Also, the very large variance for Var(b) indicates a lot of uncertainty about the estimate for Var(b), and one contributing explanation is that B has only two levels in this data set.
Finally, the following statements use the restricted maximum likelihood (REML) for estimation. They produce Output 108.1.4.
proc varcomp method=reml data=a; class a b; model y=a|b / fixed=1; run;
Output 108.1.4: VARCOMP Procedure: Method=REML
REML Iterations | ||||
---|---|---|---|---|
Iteration | Objective | Var(b) | Var(a*b) | Var(Error) |
0 | 63.4134144942 | 1269.52701 | 0 | 91.5581191305 |
1 | 63.0446869787 | 1601.84199 | 32.7632417174 | 76.9355562461 |
2 | 63.0311530508 | 1468.82932 | 27.2258186561 | 78.7548276319 |
3 | 63.0311265148 | 1464.33646 | 26.9564053003 | 78.8431476502 |
4 | 63.0311265127 | 1464.36727 | 26.9588525177 | 78.8423898761 |
The "REML Iterations" table in Output 108.1.4 shows that the REML optimization requires four iterations to converge.
The REML estimates in Output 108.1.4 are all larger than the corresponding ML estimates (adjusting for potential downward bias) and are fairly similar to the Type I estimates.
The "Asymptotic Covariance Matrix of Estimates" table in Output 108.1.4 shows that the Error variance component estimate is negatively correlated with the other two variance component estimates, and the estimated variances are all larger than their ML counterparts.