The MI Procedure

Imputation Methods

This section describes the methods for multiple imputation that are available in the MI procedure. The method of choice depends on the pattern of missingness in the data and the type of the imputed variable, as summarized in Table 57.5.

Table 57.5: Imputation Methods in PROC MI

Pattern of

Type of

Type of

Available Methods

Missingness

Imputed Variable

Covariates

 

Monotone

Continuous

Arbitrary

$\bullet $ Monotone regression

     

$\bullet $ Monotone predicted mean matching

     

$\bullet $ Monotone propensity score

Monotone

Classification (ordinal)

Arbitrary

$\bullet $ Monotone logistic regression

Monotone

Classification (nominal)

Arbitrary

$\bullet $ Monotone discriminant function

Arbitrary

Continuous

Continuous

$\bullet $ MCMC full-data imputation

     

$\bullet $ MCMC monotone-data imputation

Arbitrary

Continuous

Arbitrary

$\bullet $ FCS regression

     

$\bullet $ FCS predicted mean matching

Arbitrary

Classification (ordinal)

Arbitrary

$\bullet $ FCS logistic regression

Arbitrary

Classification (nominal)

Arbitrary

$\bullet $ FCS discriminant function


To impute missing values for a continuous variable in data sets with monotone missing patterns, you should use either a parametric method that assumes multivariate normality or a nonparametric method that uses propensity scores Rubin 1987, pp. 124, 158; Lavori, Dawson, and Shera 1995). Parametric methods available include the regression method (Rubin, 1987, pp. 166–167) and the predictive mean matching method (Heitjan and Little, 1991; Schenker and Taylor, 1996).

To impute missing values for a classification variable in data sets with monotone missing patterns, you should use the logistic regression method or the discriminant function method. Use the logistic regression method when the classification variable has a binary or ordinal response, and use the discriminant function method when the classification variable has a binary or nominal response.

For data sets with arbitrary missing patterns, you can use either of the following methods to impute missing values: a Markov chain Monte Carlo (MCMC) method (Schafer, 1997) that assumes multivariate normality, or a fully conditional specification (FCS) method (van Buuren, 2007; Brand, 1999) that assumes the existence of a joint distribution for all variables.

For continuous variables in data sets with arbitrary missing patterns, you can use the MCMC method to impute either all the missing values or just enough missing values to make the imputed data sets have monotone missing patterns. With a monotone missing data pattern, you have greater flexibility in your choice of imputation models. In addition to the MCMC method, you can implement other methods, such as the regression method, that do not use Markov chains. You can also specify a different set of covariates for each imputed variable.

Although the regression and MCMC methods assume multivariate normality, inferences based on multiple imputation can be robust to departures from multivariate normality if the amount of missing information is not large, because the imputation model is effectively applied not to the entire data set but only to its missing part (Schafer, 1997, pp.147–148).

To impute missing values for both continuous and classification variables in data sets with arbitrary missing patterns, you can use FCS methods to impute missing values for all variables assuming a joint distribution for these variables exists (Brand, 1999; van Buuren, 2007). Similar to the methods of imputing missing values for variables in data sets with monotone missing patterns, you can use the regression and predictive mean matching methods to impute missing values for a continuous variable, and use the logistic regression method to impute missing values for a classification variable when the variable has a binary or ordinal response, or use the discriminant function method when the variable has a binary or nominal response.

You can also use a TRANSFORM statement to transform variables to conform to the multivariate normality assumption. Variables are transformed before the imputation process and then are reverse-transformed to create the imputed data set. All continuous variables are standardized before the imputation process and then are transformed back to the original scale after the imputation process.

Li (1988) presents a theoretical argument for convergence of the MCMC method in the continuous case and uses it to create imputations for incomplete multivariate continuous data. In practice, however, it is not easy to check the convergence of a Markov chain, especially for a large number of parameters. PROC MI generates statistics and plots that you can use to check for convergence of the MCMC method. The details are described in the section Checking Convergence in MCMC.