The MIANALYZE Procedure

Multivariate Inferences

Multivariate inference based on Wald tests can be done with m imputed data sets. The approach is a generalization of the approach taken in the univariate case (Rubin 1987, p. 137; Schafer 1997, p. 113). Suppose that $\hat{\mb{Q}_ i}$ and $\hat{\mb{W}_ i}$ are the point and covariance matrix estimates for a p-dimensional parameter $\mb{Q}$ (such as a multivariate mean) from the $i\mr{th}$ imputed data set, i = 1, 2, …, m. Then the combined point estimate for $\mb{Q}$ from the multiple imputation is the average of the m complete-data estimates:

\[  \overline{\mb{Q}} = \frac{1}{m} \sum _{i=1}^{m} \hat{\mb{Q}_ i}  \]

Suppose that $\overline{\mb{W}}$ is the within-imputation covariance matrix, which is the average of the m complete-data estimates:

\[  \overline{\mb{W}} = \frac{1}{m} \sum _{i=1}^{m} \hat{\mb{W}_ i}  \]

And suppose that $\mb{B}$ is the between-imputation covariance matrix:

\[  \mb{B} = \frac{1}{m-1} \sum _{i=1}^{m} (\hat{\mb{Q}_ i}-\overline{\mb{Q}}) (\hat{\mb{Q}_ i}-\overline{\mb{Q}})’  \]

Then the covariance matrix associated with $\overline{\mb{Q}}$ is the total covariance matrix

\[  \mb{T}_{0} = \overline{\mb{W}} + (1+\frac{1}{m})\mb{B}  \]

The natural multivariate extension of the t statistic used in the univariate case is the F statistic

\[  F_{0} = (\mb{Q}-\overline{\mb{Q}})’ \mb{T}_{0}^{-1} (\mb{Q}-\overline{\mb{Q}})  \]

with degrees of freedom p and

\[  v=(m-1)(1+1/r)^{2}  \]

where

\[  r = (1+\frac{1}{m}) \,  \mr{trace} (\mb{B} \overline{\mb{W}}^{-1}) / p  \]

is an average relative increase in variance due to nonresponse (Rubin 1987, p. 137; Schafer 1997, p. 114).

However, the reference distribution of the statistic $F_{0}$ is not easily derived. Especially for small m, the between-imputation covariance matrix $\mb{B}$ is unstable and does not have full rank for $m \le p$ (Schafer, 1997, p. 113).

One solution is to make an additional assumption that the population between-imputation and within-imputation covariance matrices are proportional to each other (Schafer, 1997, p. 113). This assumption implies that the fractions of missing information for all components of $\mb{Q}$ are equal. Under this assumption, a more stable estimate of the total covariance matrix is

\[  \mb{T} = (1+r) \overline{\mb{W}}  \]

With the total covariance matrix $\mb{T}$, the F statistic (Rubin, 1987, p. 137)

\[  F = (\mb{Q}-\overline{\mb{Q}})’ \mb{T}^{-1} (\mb{Q}-\overline{\mb{Q}}) / p  \]

has an F distribution with degrees of freedom p and $v_{1}$, where

\[  v_{1} = \frac{1}{2} (p+1) (m-1) (1+\frac{1}{r})^{2}  \]

For $t=p(m-1) \leq 4$, PROC MIANALYZE uses the degrees of freedom $v_{1}$ in the analysis. For $t=p(m-1) > 4$, PROC MIANALYZE uses $v_{2}$, a better approximation of the degrees of freedom given by Li, Raghunathan, and Rubin (1991):

\[  v_{2} = 4 + (t-4) \left[ 1+ \frac{1}{r} (1-\frac{2}{t}) \right]^{2}  \]