With m imputations, m different sets of the point and variance estimates for a parameter Q can be computed. Suppose that and are the point and variance estimates, respectively, from the ith imputed data set, i = 1, 2, …, m. Then the combined point estimate for Q from multiple imputation is the average of the m complete-data estimates:
Suppose that is the within-imputation variance, which is the average of the m complete-data estimates:
And suppose that B is the between-imputation variance:
Then the variance estimate associated with is the total variance (Rubin, 1987)
The statistic is approximately distributed as t with degrees of freedom (Rubin, 1987), where
The degrees of freedom depend on m and the ratio
The ratio r is called the relative increase in variance due to nonresponse (Rubin, 1987). When there is no missing information about Q, the values of r and B are both zero. With a large value of m or a small value of r, the degrees of freedom will be large and the distribution of will be approximately normal.
Another useful statistic is the fraction of missing information about Q:
Both statistics r and are helpful diagnostics for assessing how the missing data contribute to the uncertainty about Q.
When the complete-data degrees of freedom are small, and there is only a modest proportion of missing data, the computed degrees of freedom, , can be much larger than , which is inappropriate. For example, with m = 5 and r = 10%, the computed degrees of freedom , which is inappropriate for data sets with complete-data degrees of freedom less than 484.
Barnard and Rubin (1999) recommend the use of adjusted degrees of freedom
where and .
If you specify the complete-data degrees of freedom with the EDF= option, the MIANALYZE procedure uses the adjusted degrees of freedom, , for inference. Otherwise, the degrees of freedom are used.