The following statements use the MVPMODEL procedure to conduct a preliminary principal component analysis:
ods graphics on; proc mvpmodel data=MWflightDelays; var AA CO DL F9 FL NW UA US WN; run;
The DATA= option specifies the input data set, which contains the process measurement variables. The VAR statement specifies the process measurement variables to be analyzed. The ODS GRAPHICS ON statement enables ODS Graphics, which is used to produce plots for interpreting the model.
The procedure first outputs a summary of the model and the data, as shown in Figure 12.1.
This output includes the number of principal components in the model and the number of variables. In this case the procedure produces a model with nine principal components by default, because there are nine process variables.
Next, the procedure outputs the correlation matrix shown in Figure 12.2.
Figure 12.2: Correlation Matrix
Correlation Matrix | |||||||||
---|---|---|---|---|---|---|---|---|---|
AA | CO | DL | F9 | FL | NW | UA | US | WN | |
AA | 1.0000 | 0.5640 | 0.5206 | 0.4874 | 0.5403 | 0.4860 | 0.6466 | 0.7856 | 0.5506 |
CO | 0.5640 | 1.0000 | 0.7855 | 0.6580 | 0.8519 | 0.6421 | 0.7672 | 0.8415 | 0.6526 |
DL | 0.5206 | 0.7855 | 1.0000 | 0.8231 | 0.7598 | 0.4782 | 0.4951 | 0.7463 | 0.4525 |
F9 | 0.4874 | 0.6580 | 0.8231 | 1.0000 | 0.5119 | 0.2279 | 0.3509 | 0.6832 | 0.3914 |
FL | 0.5403 | 0.8519 | 0.7598 | 0.5119 | 1.0000 | 0.6807 | 0.6975 | 0.8207 | 0.7186 |
NW | 0.4860 | 0.6421 | 0.4782 | 0.2279 | 0.6807 | 1.0000 | 0.6715 | 0.5598 | 0.3970 |
UA | 0.6466 | 0.7672 | 0.4951 | 0.3509 | 0.6975 | 0.6715 | 1.0000 | 0.7540 | 0.7736 |
US | 0.7856 | 0.8415 | 0.7463 | 0.6832 | 0.8207 | 0.5598 | 0.7540 | 1.0000 | 0.8152 |
WN | 0.5506 | 0.6526 | 0.4525 | 0.3914 | 0.7186 | 0.3970 | 0.7736 | 0.8152 | 1.0000 |
There are strong correlations (greater than 0.8) between variable pairs F9
and DL
, CO
and FL
, and US
and WN
. This is not surprising, because these pairs of airlines have closely located hubs or focus cities.
The procedure also outputs the eigenvalue and variance information shown in Figure 12.3.
Figure 12.3: Eigenvalue and Variance Information
Eigenvalues of the Correlation Matrix | ||||
---|---|---|---|---|
Eigenvalue | Difference | Proportion | Cumulative | |
1 | 6.09006397 | 5.02872938 | 0.6767 | 0.6767 |
2 | 1.06133459 | 0.36642409 | 0.1179 | 0.7946 |
3 | 0.69491050 | 0.16102099 | 0.0772 | 0.8718 |
4 | 0.53388951 | 0.28357563 | 0.0593 | 0.9311 |
5 | 0.25031387 | 0.09537517 | 0.0278 | 0.9589 |
6 | 0.15493870 | 0.03339131 | 0.0172 | 0.9762 |
7 | 0.12154739 | 0.06166364 | 0.0135 | 0.9897 |
8 | 0.05988375 | 0.02676604 | 0.0067 | 0.9963 |
9 | 0.03311771 | 0.0037 | 1.0000 |
The eigenvalues are the variances of the principal components, and the proportions reflect the relative amount of variance explained by each component. The eigenvalues and the proportions are ordered from largest to smallest. Recall that principal components are orthogonal linear combinations of the variables that maximize variance in orthogonal directions.
More than 85% of the variance is explained by the first three principal components, as shown in the cumulative variance column. This suggests that a model with three principal components is adequate; this is confirmed by the plots in Figure 12.4.
Figure 12.4 shows a paneled display, with a scree plot in the left panel and a variance-explained plot in the right panel.
The scree plot shows the eigenvalues for each principal component. Traditionally, the scree plot has been recommended as an aid in selecting the number of principal components for the model by examining the "knee" in the plot (Mardia, Kent, and Bibby, 1979). The variance-explained plot shows both the proportion of variance and the cumulative variance explained by the principal components.