This example uses the EM algorithm to compute the maximum likelihood estimates for parameters of multivariate normally distributed
data with missing values. The following statements invoke the MI procedure and request the EM algorithm to compute the MLE
for of a multivariate normal distribution from the input data set Fitness1
:
proc mi data=Fitness1 seed=1518971 simple nimpute=0; em itprint outem=outem; var Oxygen RunTime RunPulse; run;
Note that when you specify the NIMPUTE=0 option, the missing values are not imputed.
The “Model Information” table in Output 57.1.1 describes the method and options used in the procedure if a positive number is specified in the NIMPUTE= option.
Output 57.1.1: Model Information
Model Information | |
---|---|
Data Set | WORK.FITNESS1 |
Method | MCMC |
Multiple Imputation Chain | Single Chain |
Initial Estimates for MCMC | EM Posterior Mode |
Start | Starting Value |
Prior | Jeffreys |
Number of Imputations | 0 |
Number of Burn-in Iterations | 200 |
Number of Iterations | 100 |
Seed for random number generator | 1518971 |
The “Missing Data Patterns” table in Output 57.1.2 lists distinct missing data patterns with corresponding frequencies and percentages. Here, a value of “X” means that the variable is observed in the corresponding group and a value of “.” means that the variable is missing. The table also displays group-specific variable means.
Output 57.1.2: Missing Data Patterns
Missing Data Patterns | ||||||||
---|---|---|---|---|---|---|---|---|
Group | Oxygen | RunTime | RunPulse | Freq | Percent | Group Means | ||
Oxygen | RunTime | RunPulse | ||||||
1 | X | X | X | 21 | 67.74 | 46.353810 | 10.809524 | 171.666667 |
2 | X | X | . | 4 | 12.90 | 47.109500 | 10.137500 | . |
3 | X | . | . | 3 | 9.68 | 52.461667 | . | . |
4 | . | X | X | 1 | 3.23 | . | 11.950000 | 176.000000 |
5 | . | X | . | 2 | 6.45 | . | 9.885000 | . |
With the SIMPLE option, the procedure displays simple descriptive univariate statistics for available cases in the “Univariate Statistics” table in Output 57.1.3 and correlations from pairwise available cases in the “Pairwise Correlations” table in Output 57.1.4.
Output 57.1.3: Univariate Statistics
Univariate Statistics | |||||||
---|---|---|---|---|---|---|---|
Variable | N | Mean | Std Dev | Minimum | Maximum | Missing Values | |
Count | Percent | ||||||
Oxygen | 28 | 47.11618 | 5.41305 | 37.38800 | 60.05500 | 3 | 9.68 |
RunTime | 28 | 10.68821 | 1.37988 | 8.63000 | 14.03000 | 3 | 9.68 |
RunPulse | 22 | 171.86364 | 10.14324 | 148.00000 | 186.00000 | 9 | 29.03 |
Output 57.1.4: Pairwise Correlations
Pairwise Correlations | |||
---|---|---|---|
Oxygen | RunTime | RunPulse | |
Oxygen | 1.000000000 | -0.849118562 | -0.343961742 |
RunTime | -0.849118562 | 1.000000000 | 0.247258191 |
RunPulse | -0.343961742 | 0.247258191 | 1.000000000 |
When you use the EM statement, the MI procedure displays the initial parameter estimates for the EM algorithm in the “Initial Parameter Estimates for EM” table in Output 57.1.5.
Output 57.1.5: Initial Parameter Estimates for EM
Initial Parameter Estimates for EM | ||||
---|---|---|---|---|
_TYPE_ | _NAME_ | Oxygen | RunTime | RunPulse |
MEAN | 47.116179 | 10.688214 | 171.863636 | |
COV | Oxygen | 29.301078 | 0 | 0 |
COV | RunTime | 0 | 1.904067 | 0 |
COV | RunPulse | 0 | 0 | 102.885281 |
When you use the ITPRINT option in the EM statement, the “EM (MLE) Iteration History” table in Output 57.1.6 displays the iteration history for the EM algorithm.
Output 57.1.6: EM (MLE) Iteration History
EM (MLE) Iteration History | ||||
---|---|---|---|---|
_Iteration_ | -2 Log L | Oxygen | RunTime | RunPulse |
0 | 289.544782 | 47.116179 | 10.688214 | 171.863636 |
1 | 263.549489 | 47.116179 | 10.688214 | 171.863636 |
2 | 255.851312 | 47.139089 | 10.603506 | 171.538203 |
3 | 254.616428 | 47.122353 | 10.571685 | 171.426790 |
4 | 254.494971 | 47.111080 | 10.560585 | 171.398296 |
5 | 254.483973 | 47.106523 | 10.556768 | 171.389208 |
6 | 254.482920 | 47.104899 | 10.555485 | 171.385257 |
7 | 254.482813 | 47.104348 | 10.555062 | 171.383345 |
8 | 254.482801 | 47.104165 | 10.554923 | 171.382424 |
9 | 254.482800 | 47.104105 | 10.554878 | 171.381992 |
10 | 254.482800 | 47.104086 | 10.554864 | 171.381796 |
11 | 254.482800 | 47.104079 | 10.554859 | 171.381708 |
12 | 254.482800 | 47.104077 | 10.554858 | 171.381669 |
The “EM (MLE) Parameter Estimates” table in Output 57.1.7 displays the maximum likelihood estimates for and of a multivariate normal distribution from the data set Fitness1
.
Output 57.1.7: EM (MLE) Parameter Estimates
EM (MLE) Parameter Estimates | ||||
---|---|---|---|---|
_TYPE_ | _NAME_ | Oxygen | RunTime | RunPulse |
MEAN | 47.104077 | 10.554858 | 171.381669 | |
COV | Oxygen | 27.797931 | -6.457975 | -18.031298 |
COV | RunTime | -6.457975 | 2.015514 | 3.516287 |
COV | RunPulse | -18.031298 | 3.516287 | 97.766857 |
You can also output the EM (MLE) parameter estimates to an output data set with the OUTEM= option. The following statements
list the observations in the output data set outem
:
proc print data=outem; title 'EM Estimates'; run;
The output data set outem
in Output 57.1.8 is a TYPE=COV data set. The observation with _TYPE_
=‘MEAN’ contains the MLE for the parameter , and the observations with _TYPE_
=‘COV’ contain the MLE for the parameter of a multivariate normal distribution from the data set Fitness1
.
Output 57.1.8: EM Estimates
EM Estimates |
Obs | _TYPE_ | _NAME_ | Oxygen | RunTime | RunPulse |
---|---|---|---|---|---|
1 | MEAN | 47.1041 | 10.5549 | 171.382 | |
2 | COV | Oxygen | 27.7979 | -6.4580 | -18.031 |
3 | COV | RunTime | -6.4580 | 2.0155 | 3.516 |
4 | COV | RunPulse | -18.0313 | 3.5163 | 97.767 |