The first 14 observations of the following data set (see Hawkins, Bradu, and Kass (1984)) are leverage points; however, only observations 12, 13, and 14 have large , and only observations 12 and 14 have large values.
title "Hawkins, Bradu, Kass (1984) Data"; aa = { 1 10.1 19.6 28.3 9.7, 2 9.5 20.5 28.9 10.1, 3 10.7 20.2 31.0 10.3, 4 9.9 21.5 31.7 9.5, 5 10.3 21.1 31.1 10.0, 6 10.8 20.4 29.2 10.0, 7 10.5 20.9 29.1 10.8, 8 9.9 19.6 28.8 10.3, 9 9.7 20.7 31.0 9.6, 10 9.3 19.7 30.3 9.9, 11 11.0 24.0 35.0 -0.2, 12 12.0 23.0 37.0 -0.4, 13 12.0 26.0 34.0 0.7, 14 11.0 34.0 34.0 0.1, 15 3.4 2.9 2.1 -0.4, 16 3.1 2.2 0.3 0.6, 17 0.0 1.6 0.2 -0.2, 18 2.3 1.6 2.0 0.0, 19 0.8 2.9 1.6 0.1, 20 3.1 3.4 2.2 0.4, 21 2.6 2.2 1.9 0.9, 22 0.4 3.2 1.9 0.3, 23 2.0 2.3 0.8 -0.8, 24 1.3 2.3 0.5 0.7, 25 1.0 0.0 0.4 -0.3, 26 0.9 3.3 2.5 -0.8, 27 3.3 2.5 2.9 -0.7, 28 1.8 0.8 2.0 0.3, 29 1.2 0.9 0.8 0.3, 30 1.2 0.7 3.4 -0.3, 31 3.1 1.4 1.0 0.0, 32 0.5 2.4 0.3 -0.4, 33 1.5 3.1 1.5 -0.6, 34 0.4 0.0 0.7 -0.7, 35 3.1 2.4 3.0 0.3, 36 1.1 2.2 2.7 -1.0, 37 0.1 3.0 2.6 -0.6, 38 1.5 1.2 0.2 0.9, 39 2.1 0.0 1.2 -0.7, 40 0.5 2.0 1.2 -0.5, 41 3.4 1.6 2.9 -0.1, 42 0.3 1.0 2.7 -0.7, 43 0.1 3.3 0.9 0.6, 44 1.8 0.5 3.2 -0.7, 45 1.9 0.1 0.6 -0.5, 46 1.8 0.5 3.0 -0.4, 47 3.0 0.1 0.8 -0.9, 48 3.1 1.6 3.0 0.1, 49 3.1 2.5 1.9 0.9, 50 2.1 2.8 2.9 -0.4, 51 2.3 1.5 0.4 0.7, 52 3.3 0.6 1.2 -0.5, 53 0.3 0.4 3.3 0.7, 54 1.1 3.0 0.3 0.7, 55 0.5 2.4 0.9 0.0, 56 1.8 3.2 0.9 0.1, 57 1.8 0.7 0.7 0.7, 58 2.4 3.4 1.5 -0.1, 59 1.6 2.1 3.0 -0.3, 60 0.3 1.5 3.3 -0.9, 61 0.4 3.4 3.0 -0.3, 62 0.9 0.1 0.3 0.6, 63 1.1 2.7 0.2 -0.3, 64 2.8 3.0 2.9 -0.5, 65 2.0 0.7 2.7 0.6, 66 0.2 1.8 0.8 -0.9, 67 1.6 2.0 1.2 -0.7, 68 0.1 0.0 1.1 0.6, 69 2.0 0.6 0.3 0.2, 70 1.0 2.2 2.9 0.7, 71 2.2 2.5 2.3 0.2, 72 0.6 2.0 1.5 -0.2, 73 0.3 1.7 2.2 0.4, 74 0.0 2.2 1.6 -0.9, 75 0.3 0.4 2.6 0.2 }; a = aa[,2:4]; b = aa[,5];
The data are also listed in Rousseeuw and Leroy (1987).
The complete enumeration must inspect 1,215,450 subsets.
Output 12.6.1 displays the iteration history for MVE.
optn = j(9,1,.); optn[1]= 3; /* ipri */ optn[2]= 1; /* pcov: print COV */ optn[3]= 1; /* pcor: print CORR */ optn[5]= -1; /* nrep: all subsets */ call mve(sc,xmve,dist,optn,a);
Output 12.6.1: Iteration History for MVE
Hawkins, Bradu, Kass (1984) Data |
Subset | Singular | Best Criterion |
Percent |
---|---|---|---|
121545 | 0 | 51.104276 | 10 |
243090 | 1 | 51.104276 | 20 |
364635 | 1 | 51.104276 | 30 |
486180 | 2 | 51.104276 | 40 |
607725 | 3 | 51.104276 | 50 |
729270 | 9 | 6.271725 | 60 |
850815 | 35 | 6.271725 | 70 |
972360 | 55 | 5.912308 | 80 |
1093905 | 76 | 5.912308 | 90 |
1215450 | 114 | 5.912308 | 100 |
Output 12.6.2 reports the robust parameter estimates for MVE.
Output 12.6.2: Robust Location Estimates
Robust MVE Location Estimates | |
---|---|
VAR1 | 1.5133333333 |
VAR2 | 1.8083333333 |
VAR3 | 1.7016666667 |
Robust MVE Scatter Matrix | |||
---|---|---|---|
VAR1 | VAR2 | VAR3 | |
VAR1 | 1.1143954802 | 0.0939548023 | 0.1416723164 |
VAR2 | 0.0939548023 | 1.1231497175 | 0.1174435028 |
VAR3 | 0.1416723164 | 0.1174435028 | 1.0747429379 |
Output 12.6.3 reports the eigenvalues of the robust scatter matrix and the robust correlation matrix.
Output 12.6.3: MVE Scatter Matrix
Eigenvalues of Robust Scatter Matrix |
|
---|---|
VAR1 | 1.3396371545 |
VAR2 | 1.0281247572 |
VAR3 | 0.9445262239 |
Robust Correlation Matrix | |||
---|---|---|---|
VAR1 | VAR2 | VAR3 | |
VAR1 | 1 | 0.0839808925 | 0.1294532696 |
VAR2 | 0.0839808925 | 1 | 0.1068951177 |
VAR3 | 0.1294532696 | 0.1068951177 | 1 |
Output 12.6.4 shows a portion of the classical Mahalanobis and robust distances obtained by complete enumeration. The first 14 observations are recognized as outliers (leverage points).
Output 12.6.4: Mahalanobis and Robust Distances
Classical Distances and Robust (Rousseeuw) Distances | |||
---|---|---|---|
Unsquared Mahalanobis Distance and | |||
Unsquared Rousseeuw Distance of Each Observation | |||
N | Mahalanobis Distances | Robust Distances | Weight |
1 | 1.916821 | 29.541649 | 0 |
2 | 1.855757 | 30.344481 | 0 |
3 | 2.313658 | 31.985694 | 0 |
4 | 2.229655 | 33.011768 | 0 |
5 | 2.100114 | 32.404938 | 0 |
6 | 2.146169 | 30.683153 | 0 |
7 | 2.010511 | 30.794838 | 0 |
8 | 1.919277 | 29.905756 | 0 |
9 | 2.221249 | 32.092048 | 0 |
10 | 2.333543 | 31.072200 | 0 |
11 | 2.446542 | 36.808021 | 0 |
12 | 3.108335 | 38.071382 | 0 |
13 | 2.662380 | 37.094539 | 0 |
14 | 6.381624 | 41.472255 | 0 |
15 | 1.815487 | 1.994672 | 1.000000 |
16 | 2.151357 | 2.202278 | 1.000000 |
17 | 1.384915 | 1.918208 | 1.000000 |
18 | 0.848155 | 0.819163 | 1.000000 |
19 | 1.148941 | 1.288387 | 1.000000 |
20 | 1.591431 | 2.046703 | 1.000000 |
21 | 1.089981 | 1.068327 | 1.000000 |
22 | 1.548776 | 1.768905 | 1.000000 |
23 | 1.085421 | 1.166951 | 1.000000 |
24 | 0.971195 | 1.304648 | 1.000000 |
25 | 0.799268 | 2.030417 | 1.000000 |
26 | 1.168373 | 1.727131 | 1.000000 |
27 | 1.449625 | 1.983831 | 1.000000 |
28 | 0.867789 | 1.073856 | 1.000000 |
29 | 0.576399 | 1.168060 | 1.000000 |
30 | 1.568868 | 2.091386 | 1.000000 |
... | ... | ... | ... |
75 | 1.899178 | 2.042560 | 1.000000 |
The graphs in Output 12.6.5 and Output 12.6.6 show the following:
the plot of standardized LMS residuals vs. robust distances
the plot of standardized LS residuals vs. Mahalanobis distances
The graph identifies the four good leverage points 11, 12, 13, and 14, which have small standardized LMS residuals but large robust distances, and the 10 bad leverage points , which have large standardized LMS residuals and large robust distances.
Output 12.6.5: Hawkins-Bradu-Kass Data: LMS Residuals vs. Robust Distances
Output 12.6.6: Hawkins-Bradu-Kass Data: LS Residuals vs. Mahalanobis Distances