In this example, the remote-sensing data are used. In this data set, the observations are grouped into five crops: clover,
corn, cotton, soybeans, and sugar beets. Four measures called x1
through x4
make up the descriptive variables.
In the first PROC DISCRIM statement, the DISCRIM procedure uses normal-theory methods (METHOD=NORMAL) assuming equal variances
(POOL=YES) in five crops. The PRIORS statement, PRIORS PROP, sets the prior probabilities proportional to the sample sizes.
The LIST option lists the resubstitution classification results for each observation (Output 35.4.2). The CROSSVALIDATE option displays cross validation error-rate estimates (Output 35.4.3). The OUTSTAT= option stores the calibration information in a new data set to classify future observations. A second PROC
DISCRIM statement uses this calibration information to classify a test data set. Note that the values of the identification
variable, xvalues
, are obtained by rereading the x1
through x4
fields in the data lines as a single character variable. The following statements produce Output 35.4.1 through Output 35.4.3:
title 'Discriminant Analysis of Remote Sensing Data on Five Crops'; data crops; input Crop $ 1-10 x1-x4 xvalues $ 11-21; datalines; Corn 16 27 31 33 Corn 15 23 30 30 Corn 16 27 27 26 Corn 18 20 25 23 Corn 15 15 31 32 Corn 15 32 32 15 Corn 12 15 16 73 Soybeans 20 23 23 25 Soybeans 24 24 25 32 Soybeans 21 25 23 24 Soybeans 27 45 24 12 Soybeans 12 13 15 42 Soybeans 22 32 31 43 Cotton 31 32 33 34 Cotton 29 24 26 28 Cotton 34 32 28 45 Cotton 26 25 23 24 Cotton 53 48 75 26 Cotton 34 35 25 78 Sugarbeets22 23 25 42 Sugarbeets25 25 24 26 Sugarbeets34 25 16 52 Sugarbeets54 23 21 54 Sugarbeets25 43 32 15 Sugarbeets26 54 2 54 Clover 12 45 32 54 Clover 24 58 25 34 Clover 87 54 61 21 Clover 51 31 31 16 Clover 96 48 54 62 Clover 31 31 11 11 Clover 56 13 13 71 Clover 32 13 27 32 Clover 36 26 54 32 Clover 53 08 06 54 Clover 32 32 62 16 ;
title2 'Using the Linear Discriminant Function'; proc discrim data=crops outstat=cropstat method=normal pool=yes list crossvalidate; class Crop; priors prop; id xvalues; var x1-x4; run;
Output 35.4.1: Linear Discriminant Function on Crop Data
Discriminant Analysis of Remote Sensing Data on Five Crops |
Using the Linear Discriminant Function |
Total Sample Size | 36 | DF Total | 35 |
---|---|---|---|
Variables | 4 | DF Within Classes | 31 |
Classes | 5 | DF Between Classes | 4 |
Number of Observations Read | 36 |
---|---|
Number of Observations Used | 36 |
Class Level Information | |||||
---|---|---|---|---|---|
Crop | Variable Name |
Frequency | Weight | Proportion | Prior Probability |
Clover | Clover | 11 | 11.0000 | 0.305556 | 0.305556 |
Corn | Corn | 7 | 7.0000 | 0.194444 | 0.194444 |
Cotton | Cotton | 6 | 6.0000 | 0.166667 | 0.166667 |
Soybeans | Soybeans | 6 | 6.0000 | 0.166667 | 0.166667 |
Sugarbeets | Sugarbeets | 6 | 6.0000 | 0.166667 | 0.166667 |
Pooled Covariance Matrix Information |
|
---|---|
Covariance Matrix Rank |
Natural Log of the Determinant of the Covariance Matrix |
4 | 21.30189 |
Discriminant Analysis of Remote Sensing Data on Five Crops |
Using the Linear Discriminant Function |
Generalized Squared Distance to Crop | |||||
---|---|---|---|---|---|
From Crop | Clover | Corn | Cotton | Soybeans | Sugarbeets |
Clover | 2.37125 | 7.52830 | 4.44969 | 6.16665 | 5.07262 |
Corn | 6.62433 | 3.27522 | 5.46798 | 4.31383 | 6.47395 |
Cotton | 3.23741 | 5.15968 | 3.58352 | 5.01819 | 4.87908 |
Soybeans | 4.95438 | 4.00552 | 5.01819 | 3.58352 | 4.65998 |
Sugarbeets | 3.86034 | 6.16564 | 4.87908 | 4.65998 | 3.58352 |
Linear Discriminant Function for Crop | |||||
---|---|---|---|---|---|
Variable | Clover | Corn | Cotton | Soybeans | Sugarbeets |
Constant | -10.98457 | -7.72070 | -11.46537 | -7.28260 | -9.80179 |
x1 | 0.08907 | -0.04180 | 0.02462 | 0.0000369 | 0.04245 |
x2 | 0.17379 | 0.11970 | 0.17596 | 0.15896 | 0.20988 |
x3 | 0.11899 | 0.16511 | 0.15880 | 0.10622 | 0.06540 |
x4 | 0.15637 | 0.16768 | 0.18362 | 0.14133 | 0.16408 |
Output 35.4.2: Misclassified Observations: Resubstitution
Discriminant Analysis of Remote Sensing Data on Five Crops |
Using the Linear Discriminant Function |
Posterior Probability of Membership in Crop | ||||||||
---|---|---|---|---|---|---|---|---|
xvalues | From Crop | Classified into Crop |
Clover | Corn | Cotton | Soybeans | Sugarbeets | |
16 27 31 33 | Corn | Corn | 0.0894 | 0.4054 | 0.1763 | 0.2392 | 0.0897 | |
15 23 30 30 | Corn | Corn | 0.0769 | 0.4558 | 0.1421 | 0.2530 | 0.0722 | |
16 27 27 26 | Corn | Corn | 0.0982 | 0.3422 | 0.1365 | 0.3073 | 0.1157 | |
18 20 25 23 | Corn | Corn | 0.1052 | 0.3634 | 0.1078 | 0.3281 | 0.0955 | |
15 15 31 32 | Corn | Corn | 0.0588 | 0.5754 | 0.1173 | 0.2087 | 0.0398 | |
15 32 32 15 | Corn | Soybeans | * | 0.0972 | 0.3278 | 0.1318 | 0.3420 | 0.1011 |
12 15 16 73 | Corn | Corn | 0.0454 | 0.5238 | 0.1849 | 0.1376 | 0.1083 | |
20 23 23 25 | Soybeans | Soybeans | 0.1330 | 0.2804 | 0.1176 | 0.3305 | 0.1385 | |
24 24 25 32 | Soybeans | Soybeans | 0.1768 | 0.2483 | 0.1586 | 0.2660 | 0.1502 | |
21 25 23 24 | Soybeans | Soybeans | 0.1481 | 0.2431 | 0.1200 | 0.3318 | 0.1570 | |
27 45 24 12 | Soybeans | Sugarbeets | * | 0.2357 | 0.0547 | 0.1016 | 0.2721 | 0.3359 |
12 13 15 42 | Soybeans | Corn | * | 0.0549 | 0.4749 | 0.0920 | 0.2768 | 0.1013 |
22 32 31 43 | Soybeans | Cotton | * | 0.1474 | 0.2606 | 0.2624 | 0.1848 | 0.1448 |
31 32 33 34 | Cotton | Clover | * | 0.2815 | 0.1518 | 0.2377 | 0.1767 | 0.1523 |
29 24 26 28 | Cotton | Soybeans | * | 0.2521 | 0.1842 | 0.1529 | 0.2549 | 0.1559 |
34 32 28 45 | Cotton | Clover | * | 0.3125 | 0.1023 | 0.2404 | 0.1357 | 0.2091 |
26 25 23 24 | Cotton | Soybeans | * | 0.2121 | 0.1809 | 0.1245 | 0.3045 | 0.1780 |
53 48 75 26 | Cotton | Clover | * | 0.4837 | 0.0391 | 0.4384 | 0.0223 | 0.0166 |
34 35 25 78 | Cotton | Cotton | 0.2256 | 0.0794 | 0.3810 | 0.0592 | 0.2548 | |
22 23 25 42 | Sugarbeets | Corn | * | 0.1421 | 0.3066 | 0.1901 | 0.2231 | 0.1381 |
25 25 24 26 | Sugarbeets | Soybeans | * | 0.1969 | 0.2050 | 0.1354 | 0.2960 | 0.1667 |
34 25 16 52 | Sugarbeets | Sugarbeets | 0.2928 | 0.0871 | 0.1665 | 0.1479 | 0.3056 | |
54 23 21 54 | Sugarbeets | Clover | * | 0.6215 | 0.0194 | 0.1250 | 0.0496 | 0.1845 |
25 43 32 15 | Sugarbeets | Soybeans | * | 0.2258 | 0.1135 | 0.1646 | 0.2770 | 0.2191 |
26 54 2 54 | Sugarbeets | Sugarbeets | 0.0850 | 0.0081 | 0.0521 | 0.0661 | 0.7887 | |
12 45 32 54 | Clover | Cotton | * | 0.0693 | 0.2663 | 0.3394 | 0.1460 | 0.1789 |
24 58 25 34 | Clover | Sugarbeets | * | 0.1647 | 0.0376 | 0.1680 | 0.1452 | 0.4845 |
87 54 61 21 | Clover | Clover | 0.9328 | 0.0003 | 0.0478 | 0.0025 | 0.0165 | |
51 31 31 16 | Clover | Clover | 0.6642 | 0.0205 | 0.0872 | 0.0959 | 0.1322 | |
96 48 54 62 | Clover | Clover | 0.9215 | 0.0002 | 0.0604 | 0.0007 | 0.0173 | |
31 31 11 11 | Clover | Sugarbeets | * | 0.2525 | 0.0402 | 0.0473 | 0.3012 | 0.3588 |
56 13 13 71 | Clover | Clover | 0.6132 | 0.0212 | 0.1226 | 0.0408 | 0.2023 | |
32 13 27 32 | Clover | Clover | 0.2669 | 0.2616 | 0.1512 | 0.2260 | 0.0943 | |
36 26 54 32 | Clover | Cotton | * | 0.2650 | 0.2645 | 0.3495 | 0.0918 | 0.0292 |
53 08 06 54 | Clover | Clover | 0.5914 | 0.0237 | 0.0676 | 0.0781 | 0.2392 | |
32 32 62 16 | Clover | Cotton | * | 0.2163 | 0.3180 | 0.3327 | 0.1125 | 0.0206 |
* Misclassified observation |
Discriminant Analysis of Remote Sensing Data on Five Crops |
Using the Linear Discriminant Function |
Number of Observations and Percent Classified into Crop | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
From Crop | Clover | Corn | Cotton | Soybeans | Sugarbeets | Total | ||||||||||||
Clover |
|
|
|
|
|
|
||||||||||||
Corn |
|
|
|
|
|
|
||||||||||||
Cotton |
|
|
|
|
|
|
||||||||||||
Soybeans |
|
|
|
|
|
|
||||||||||||
Sugarbeets |
|
|
|
|
|
|
||||||||||||
Total |
|
|
|
|
|
|
||||||||||||
Priors |
|
|
|
|
|
|
Error Count Estimates for Crop | ||||||
---|---|---|---|---|---|---|
Clover | Corn | Cotton | Soybeans | Sugarbeets | Total | |
Rate | 0.4545 | 0.1429 | 0.8333 | 0.5000 | 0.6667 | 0.5000 |
Priors | 0.3056 | 0.1944 | 0.1667 | 0.1667 | 0.1667 |
Output 35.4.3: Misclassified Observations: Cross Validation
Discriminant Analysis of Remote Sensing Data on Five Crops |
Using the Linear Discriminant Function |
Number of Observations and Percent Classified into Crop | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
From Crop | Clover | Corn | Cotton | Soybeans | Sugarbeets | Total | ||||||||||||
Clover |
|
|
|
|
|
|
||||||||||||
Corn |
|
|
|
|
|
|
||||||||||||
Cotton |
|
|
|
|
|
|
||||||||||||
Soybeans |
|
|
|
|
|
|
||||||||||||
Sugarbeets |
|
|
|
|
|
|
||||||||||||
Total |
|
|
|
|
|
|
||||||||||||
Priors |
|
|
|
|
|
|
Error Count Estimates for Crop | ||||||
---|---|---|---|---|---|---|
Clover | Corn | Cotton | Soybeans | Sugarbeets | Total | |
Rate | 0.6364 | 0.4286 | 1.0000 | 0.5000 | 0.8333 | 0.6667 |
Priors | 0.3056 | 0.1944 | 0.1667 | 0.1667 | 0.1667 |
Next, you can use the calibration information stored in the Cropstat
data set to classify a test data set. The TESTLIST option lists the classification results for each observation in the test
data set. The following statements produce Output 35.4.4 and Output 35.4.5:
data test; input Crop $ 1-10 x1-x4 xvalues $ 11-21; datalines; Corn 16 27 31 33 Soybeans 21 25 23 24 Cotton 29 24 26 28 Sugarbeets54 23 21 54 Clover 32 32 62 16 ;
title2 'Classification of Test Data'; proc discrim data=cropstat testdata=test testout=tout testlist; class Crop; testid xvalues; var x1-x4; run; proc print data=tout; title 'Discriminant Analysis of Remote Sensing Data on Five Crops'; title2 'Output Classification Results of Test Data'; run;
Output 35.4.4: Classification of Test Data
Discriminant Analysis of Remote Sensing Data on Five Crops |
Classification of Test Data |
Posterior Probability of Membership in Crop | ||||||||
---|---|---|---|---|---|---|---|---|
xvalues | From Crop | Classified into Crop |
Clover | Corn | Cotton | Soybeans | Sugarbeets | |
16 27 31 33 | Corn | Corn | 0.0894 | 0.4054 | 0.1763 | 0.2392 | 0.0897 | |
21 25 23 24 | Soybeans | Soybeans | 0.1481 | 0.2431 | 0.1200 | 0.3318 | 0.1570 | |
29 24 26 28 | Cotton | Soybeans | * | 0.2521 | 0.1842 | 0.1529 | 0.2549 | 0.1559 |
54 23 21 54 | Sugarbeets | Clover | * | 0.6215 | 0.0194 | 0.1250 | 0.0496 | 0.1845 |
32 32 62 16 | Clover | Cotton | * | 0.2163 | 0.3180 | 0.3327 | 0.1125 | 0.0206 |
* Misclassified observation |
Discriminant Analysis of Remote Sensing Data on Five Crops |
Classification of Test Data |
Observation Profile for Test Data | |
---|---|
Number of Observations Read | 5 |
Number of Observations Used | 5 |
Number of Observations and Percent Classified into Crop | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
From Crop | Clover | Corn | Cotton | Soybeans | Sugarbeets | Total | ||||||||||||
Clover |
|
|
|
|
|
|
||||||||||||
Corn |
|
|
|
|
|
|
||||||||||||
Cotton |
|
|
|
|
|
|
||||||||||||
Soybeans |
|
|
|
|
|
|
||||||||||||
Sugarbeets |
|
|
|
|
|
|
||||||||||||
Total |
|
|
|
|
|
|
||||||||||||
Priors |
|
|
|
|
|
|
Error Count Estimates for Crop | ||||||
---|---|---|---|---|---|---|
Clover | Corn | Cotton | Soybeans | Sugarbeets | Total | |
Rate | 1.0000 | 0.0000 | 1.0000 | 0.0000 | 1.0000 | 0.6389 |
Priors | 0.3056 | 0.1944 | 0.1667 | 0.1667 | 0.1667 |
Output 35.4.5: Output Data Set of the Classification Results for Test Data
Discriminant Analysis of Remote Sensing Data on Five Crops |
Output Classification Results of Test Data |
Obs | Crop | x1 | x2 | x3 | x4 | xvalues | Clover | Corn | Cotton | Soybeans | Sugarbeets | _INTO_ |
---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | Corn | 16 | 27 | 31 | 33 | 16 27 31 33 | 0.08935 | 0.40543 | 0.17632 | 0.23918 | 0.08972 | Corn |
2 | Soybeans | 21 | 25 | 23 | 24 | 21 25 23 24 | 0.14811 | 0.24308 | 0.11999 | 0.33184 | 0.15698 | Soybeans |
3 | Cotton | 29 | 24 | 26 | 28 | 29 24 26 28 | 0.25213 | 0.18420 | 0.15294 | 0.25486 | 0.15588 | Soybeans |
4 | Sugarbeets | 54 | 23 | 21 | 54 | 54 23 21 54 | 0.62150 | 0.01937 | 0.12498 | 0.04962 | 0.18452 | Clover |
5 | Clover | 32 | 32 | 62 | 16 | 32 32 62 16 | 0.21633 | 0.31799 | 0.33266 | 0.11246 | 0.02056 | Cotton |
In this next example, PROC DISCRIM uses normal-theory methods (METHOD=NORMAL) assuming unequal variances (POOL=NO) for the remote-sensing data. The PRIORS statement, PRIORS PROP, sets the prior probabilities proportional to the sample sizes. The CROSSVALIDATE option displays cross validation error-rate estimates. Note that the total error count estimate by cross validation (0.5556) is much larger than the total error count estimate by resubstitution (0.1111). The following statements produce Output 35.4.6:
title2 'Using Quadratic Discriminant Function'; proc discrim data=crops method=normal pool=no crossvalidate; class Crop; priors prop; id xvalues; var x1-x4; run;
Output 35.4.6: Quadratic Discriminant Function on Crop Data
Discriminant Analysis of Remote Sensing Data on Five Crops |
Using Quadratic Discriminant Function |
Total Sample Size | 36 | DF Total | 35 |
---|---|---|---|
Variables | 4 | DF Within Classes | 31 |
Classes | 5 | DF Between Classes | 4 |
Number of Observations Read | 36 |
---|---|
Number of Observations Used | 36 |
Class Level Information | |||||
---|---|---|---|---|---|
Crop | Variable Name |
Frequency | Weight | Proportion | Prior Probability |
Clover | Clover | 11 | 11.0000 | 0.305556 | 0.305556 |
Corn | Corn | 7 | 7.0000 | 0.194444 | 0.194444 |
Cotton | Cotton | 6 | 6.0000 | 0.166667 | 0.166667 |
Soybeans | Soybeans | 6 | 6.0000 | 0.166667 | 0.166667 |
Sugarbeets | Sugarbeets | 6 | 6.0000 | 0.166667 | 0.166667 |
Within Covariance Matrix Information | ||
---|---|---|
Crop | Covariance Matrix Rank |
Natural Log of the Determinant of the Covariance Matrix |
Clover | 4 | 23.64618 |
Corn | 4 | 11.13472 |
Cotton | 4 | 13.23569 |
Soybeans | 4 | 12.45263 |
Sugarbeets | 4 | 17.76293 |
Discriminant Analysis of Remote Sensing Data on Five Crops |
Using Quadratic Discriminant Function |
Generalized Squared Distance to Crop | |||||
---|---|---|---|---|---|
From Crop | Clover | Corn | Cotton | Soybeans | Sugarbeets |
Clover | 26.01743 | 1320 | 104.18297 | 194.10546 | 31.40816 |
Corn | 27.73809 | 14.40994 | 150.50763 | 38.36252 | 25.55421 |
Cotton | 26.38544 | 588.86232 | 16.81921 | 52.03266 | 37.15560 |
Soybeans | 27.07134 | 46.42131 | 41.01631 | 16.03615 | 23.15920 |
Sugarbeets | 26.80188 | 332.11563 | 43.98280 | 107.95676 | 21.34645 |
Discriminant Analysis of Remote Sensing Data on Five Crops |
Using Quadratic Discriminant Function |
Number of Observations and Percent Classified into Crop | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
From Crop | Clover | Corn | Cotton | Soybeans | Sugarbeets | Total | ||||||||||||
Clover |
|
|
|
|
|
|
||||||||||||
Corn |
|
|
|
|
|
|
||||||||||||
Cotton |
|
|
|
|
|
|
||||||||||||
Soybeans |
|
|
|
|
|
|
||||||||||||
Sugarbeets |
|
|
|
|
|
|
||||||||||||
Total |
|
|
|
|
|
|
||||||||||||
Priors |
|
|
|
|
|
|
Error Count Estimates for Crop | ||||||
---|---|---|---|---|---|---|
Clover | Corn | Cotton | Soybeans | Sugarbeets | Total | |
Rate | 0.1818 | 0.0000 | 0.0000 | 0.0000 | 0.3333 | 0.1111 |
Priors | 0.3056 | 0.1944 | 0.1667 | 0.1667 | 0.1667 |
Discriminant Analysis of Remote Sensing Data on Five Crops |
Using Quadratic Discriminant Function |
Number of Observations and Percent Classified into Crop | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
From Crop | Clover | Corn | Cotton | Soybeans | Sugarbeets | Total | ||||||||||||
Clover |
|
|
|
|
|
|
||||||||||||
Corn |
|
|
|
|
|
|
||||||||||||
Cotton |
|
|
|
|
|
|
||||||||||||
Soybeans |
|
|
|
|
|
|
||||||||||||
Sugarbeets |
|
|
|
|
|
|
||||||||||||
Total |
|
|
|
|
|
|
||||||||||||
Priors |
|
|
|
|
|
|
Error Count Estimates for Crop | ||||||
---|---|---|---|---|---|---|
Clover | Corn | Cotton | Soybeans | Sugarbeets | Total | |
Rate | 0.1818 | 0.7143 | 0.6667 | 0.6667 | 0.8333 | 0.5556 |
Priors | 0.3056 | 0.1944 | 0.1667 | 0.1667 | 0.1667 |