The DISCRIM Procedure

Example 35.4 Linear Discriminant Analysis of Remote-Sensing Data on Crops

In this example, the remote-sensing data are used. In this data set, the observations are grouped into five crops: clover, corn, cotton, soybeans, and sugar beets. Four measures called x1 through x4 make up the descriptive variables.

In the first PROC DISCRIM statement, the DISCRIM procedure uses normal-theory methods (METHOD=NORMAL) assuming equal variances (POOL=YES) in five crops. The PRIORS statement, PRIORS PROP, sets the prior probabilities proportional to the sample sizes. The LIST option lists the resubstitution classification results for each observation (Output 35.4.2). The CROSSVALIDATE option displays cross validation error-rate estimates (Output 35.4.3). The OUTSTAT= option stores the calibration information in a new data set to classify future observations. A second PROC DISCRIM statement uses this calibration information to classify a test data set. Note that the values of the identification variable, xvalues, are obtained by rereading the x1 through x4 fields in the data lines as a single character variable. The following statements produce Output 35.4.1 through Output 35.4.3:

title 'Discriminant Analysis of Remote Sensing Data on Five Crops';

data crops;
   input Crop $ 1-10 x1-x4 xvalues $ 11-21;
   datalines;
Corn      16 27 31 33
Corn      15 23 30 30
Corn      16 27 27 26
Corn      18 20 25 23
Corn      15 15 31 32
Corn      15 32 32 15
Corn      12 15 16 73
Soybeans  20 23 23 25
Soybeans  24 24 25 32
Soybeans  21 25 23 24
Soybeans  27 45 24 12
Soybeans  12 13 15 42
Soybeans  22 32 31 43
Cotton    31 32 33 34
Cotton    29 24 26 28
Cotton    34 32 28 45
Cotton    26 25 23 24
Cotton    53 48 75 26
Cotton    34 35 25 78
Sugarbeets22 23 25 42
Sugarbeets25 25 24 26
Sugarbeets34 25 16 52
Sugarbeets54 23 21 54
Sugarbeets25 43 32 15
Sugarbeets26 54  2 54
Clover    12 45 32 54
Clover    24 58 25 34
Clover    87 54 61 21
Clover    51 31 31 16
Clover    96 48 54 62
Clover    31 31 11 11
Clover    56 13 13 71
Clover    32 13 27 32
Clover    36 26 54 32
Clover    53 08 06 54
Clover    32 32 62 16
;
title2 'Using the Linear Discriminant Function';

proc discrim data=crops outstat=cropstat method=normal pool=yes
             list crossvalidate;
   class Crop;
   priors prop;
   id xvalues;
   var x1-x4;
run;

Output 35.4.1: Linear Discriminant Function on Crop Data

Discriminant Analysis of Remote Sensing Data on Five Crops
Using the Linear Discriminant Function

The DISCRIM Procedure

Total Sample Size 36 DF Total 35
Variables 4 DF Within Classes 31
Classes 5 DF Between Classes 4

Number of Observations Read 36
Number of Observations Used 36

Class Level Information
Crop Variable
Name
Frequency Weight Proportion Prior
Probability
Clover Clover 11 11.0000 0.305556 0.305556
Corn Corn 7 7.0000 0.194444 0.194444
Cotton Cotton 6 6.0000 0.166667 0.166667
Soybeans Soybeans 6 6.0000 0.166667 0.166667
Sugarbeets Sugarbeets 6 6.0000 0.166667 0.166667

Pooled Covariance Matrix
Information
Covariance
Matrix Rank
Natural Log of the
Determinant of the
Covariance Matrix
4 21.30189

Discriminant Analysis of Remote Sensing Data on Five Crops
Using the Linear Discriminant Function

The DISCRIM Procedure

Generalized Squared Distance to Crop
From Crop Clover Corn Cotton Soybeans Sugarbeets
Clover 2.37125 7.52830 4.44969 6.16665 5.07262
Corn 6.62433 3.27522 5.46798 4.31383 6.47395
Cotton 3.23741 5.15968 3.58352 5.01819 4.87908
Soybeans 4.95438 4.00552 5.01819 3.58352 4.65998
Sugarbeets 3.86034 6.16564 4.87908 4.65998 3.58352

Linear Discriminant Function for Crop
Variable Clover Corn Cotton Soybeans Sugarbeets
Constant -10.98457 -7.72070 -11.46537 -7.28260 -9.80179
x1 0.08907 -0.04180 0.02462 0.0000369 0.04245
x2 0.17379 0.11970 0.17596 0.15896 0.20988
x3 0.11899 0.16511 0.15880 0.10622 0.06540
x4 0.15637 0.16768 0.18362 0.14133 0.16408


Output 35.4.2: Misclassified Observations: Resubstitution

Discriminant Analysis of Remote Sensing Data on Five Crops
Using the Linear Discriminant Function

The DISCRIM Procedure
Classification Results for Calibration Data: WORK.CROPS
Resubstitution Results using Linear Discriminant Function

Posterior Probability of Membership in Crop
xvalues From Crop Classified into
Crop
Clover Corn Cotton Soybeans Sugarbeets
16 27 31 33 Corn Corn   0.0894 0.4054 0.1763 0.2392 0.0897
15 23 30 30 Corn Corn   0.0769 0.4558 0.1421 0.2530 0.0722
16 27 27 26 Corn Corn   0.0982 0.3422 0.1365 0.3073 0.1157
18 20 25 23 Corn Corn   0.1052 0.3634 0.1078 0.3281 0.0955
15 15 31 32 Corn Corn   0.0588 0.5754 0.1173 0.2087 0.0398
15 32 32 15 Corn Soybeans * 0.0972 0.3278 0.1318 0.3420 0.1011
12 15 16 73 Corn Corn   0.0454 0.5238 0.1849 0.1376 0.1083
20 23 23 25 Soybeans Soybeans   0.1330 0.2804 0.1176 0.3305 0.1385
24 24 25 32 Soybeans Soybeans   0.1768 0.2483 0.1586 0.2660 0.1502
21 25 23 24 Soybeans Soybeans   0.1481 0.2431 0.1200 0.3318 0.1570
27 45 24 12 Soybeans Sugarbeets * 0.2357 0.0547 0.1016 0.2721 0.3359
12 13 15 42 Soybeans Corn * 0.0549 0.4749 0.0920 0.2768 0.1013
22 32 31 43 Soybeans Cotton * 0.1474 0.2606 0.2624 0.1848 0.1448
31 32 33 34 Cotton Clover * 0.2815 0.1518 0.2377 0.1767 0.1523
29 24 26 28 Cotton Soybeans * 0.2521 0.1842 0.1529 0.2549 0.1559
34 32 28 45 Cotton Clover * 0.3125 0.1023 0.2404 0.1357 0.2091
26 25 23 24 Cotton Soybeans * 0.2121 0.1809 0.1245 0.3045 0.1780
53 48 75 26 Cotton Clover * 0.4837 0.0391 0.4384 0.0223 0.0166
34 35 25 78 Cotton Cotton   0.2256 0.0794 0.3810 0.0592 0.2548
22 23 25 42 Sugarbeets Corn * 0.1421 0.3066 0.1901 0.2231 0.1381
25 25 24 26 Sugarbeets Soybeans * 0.1969 0.2050 0.1354 0.2960 0.1667
34 25 16 52 Sugarbeets Sugarbeets   0.2928 0.0871 0.1665 0.1479 0.3056
54 23 21 54 Sugarbeets Clover * 0.6215 0.0194 0.1250 0.0496 0.1845
25 43 32 15 Sugarbeets Soybeans * 0.2258 0.1135 0.1646 0.2770 0.2191
26 54 2 54 Sugarbeets Sugarbeets   0.0850 0.0081 0.0521 0.0661 0.7887
12 45 32 54 Clover Cotton * 0.0693 0.2663 0.3394 0.1460 0.1789
24 58 25 34 Clover Sugarbeets * 0.1647 0.0376 0.1680 0.1452 0.4845
87 54 61 21 Clover Clover   0.9328 0.0003 0.0478 0.0025 0.0165
51 31 31 16 Clover Clover   0.6642 0.0205 0.0872 0.0959 0.1322
96 48 54 62 Clover Clover   0.9215 0.0002 0.0604 0.0007 0.0173
31 31 11 11 Clover Sugarbeets * 0.2525 0.0402 0.0473 0.3012 0.3588
56 13 13 71 Clover Clover   0.6132 0.0212 0.1226 0.0408 0.2023
32 13 27 32 Clover Clover   0.2669 0.2616 0.1512 0.2260 0.0943
36 26 54 32 Clover Cotton * 0.2650 0.2645 0.3495 0.0918 0.0292
53 08 06 54 Clover Clover   0.5914 0.0237 0.0676 0.0781 0.2392
32 32 62 16 Clover Cotton * 0.2163 0.3180 0.3327 0.1125 0.0206

* Misclassified observation


Discriminant Analysis of Remote Sensing Data on Five Crops
Using the Linear Discriminant Function

The DISCRIM Procedure
Classification Summary for Calibration Data: WORK.CROPS
Resubstitution Summary using Linear Discriminant Function

Number of Observations and Percent Classified into Crop
From Crop Clover Corn Cotton Soybeans Sugarbeets Total
Clover
6
54.55
0
0.00
3
27.27
0
0.00
2
18.18
11
100.00
Corn
0
0.00
6
85.71
0
0.00
1
14.29
0
0.00
7
100.00
Cotton
3
50.00
0
0.00
1
16.67
2
33.33
0
0.00
6
100.00
Soybeans
0
0.00
1
16.67
1
16.67
3
50.00
1
16.67
6
100.00
Sugarbeets
1
16.67
1
16.67
0
0.00
2
33.33
2
33.33
6
100.00
Total
10
27.78
8
22.22
5
13.89
8
22.22
5
13.89
36
100.00
Priors
0.30556
 
0.19444
 
0.16667
 
0.16667
 
0.16667
 
 
 

Error Count Estimates for Crop
  Clover Corn Cotton Soybeans Sugarbeets Total
Rate 0.4545 0.1429 0.8333 0.5000 0.6667 0.5000
Priors 0.3056 0.1944 0.1667 0.1667 0.1667  


Output 35.4.3: Misclassified Observations: Cross Validation

Discriminant Analysis of Remote Sensing Data on Five Crops
Using the Linear Discriminant Function

The DISCRIM Procedure
Classification Summary for Calibration Data: WORK.CROPS
Cross-validation Summary using Linear Discriminant Function

Number of Observations and Percent Classified into Crop
From Crop Clover Corn Cotton Soybeans Sugarbeets Total
Clover
4
36.36
3
27.27
1
9.09
0
0.00
3
27.27
11
100.00
Corn
0
0.00
4
57.14
1
14.29
2
28.57
0
0.00
7
100.00
Cotton
3
50.00
0
0.00
0
0.00
2
33.33
1
16.67
6
100.00
Soybeans
0
0.00
1
16.67
1
16.67
3
50.00
1
16.67
6
100.00
Sugarbeets
2
33.33
1
16.67
0
0.00
2
33.33
1
16.67
6
100.00
Total
9
25.00
9
25.00
3
8.33
9
25.00
6
16.67
36
100.00
Priors
0.30556
 
0.19444
 
0.16667
 
0.16667
 
0.16667
 
 
 

Error Count Estimates for Crop
  Clover Corn Cotton Soybeans Sugarbeets Total
Rate 0.6364 0.4286 1.0000 0.5000 0.8333 0.6667
Priors 0.3056 0.1944 0.1667 0.1667 0.1667  


Next, you can use the calibration information stored in the Cropstat data set to classify a test data set. The TESTLIST option lists the classification results for each observation in the test data set. The following statements produce Output 35.4.4 and Output 35.4.5:

data test;
   input Crop $ 1-10 x1-x4 xvalues $ 11-21;
   datalines;
Corn      16 27 31 33
Soybeans  21 25 23 24
Cotton    29 24 26 28
Sugarbeets54 23 21 54
Clover    32 32 62 16
;
title2 'Classification of Test Data';

proc discrim data=cropstat testdata=test testout=tout testlist;
   class Crop;
   testid xvalues;
   var x1-x4;
run;

proc print data=tout;
   title 'Discriminant Analysis of Remote Sensing Data on Five Crops';
   title2 'Output Classification Results of Test Data';
run;

Output 35.4.4: Classification of Test Data

Discriminant Analysis of Remote Sensing Data on Five Crops
Classification of Test Data

The DISCRIM Procedure
Classification Results for Test Data: WORK.TEST
Classification Results using Linear Discriminant Function

Posterior Probability of Membership in Crop
xvalues From Crop Classified into
Crop
Clover Corn Cotton Soybeans Sugarbeets
16 27 31 33 Corn Corn   0.0894 0.4054 0.1763 0.2392 0.0897
21 25 23 24 Soybeans Soybeans   0.1481 0.2431 0.1200 0.3318 0.1570
29 24 26 28 Cotton Soybeans * 0.2521 0.1842 0.1529 0.2549 0.1559
54 23 21 54 Sugarbeets Clover * 0.6215 0.0194 0.1250 0.0496 0.1845
32 32 62 16 Clover Cotton * 0.2163 0.3180 0.3327 0.1125 0.0206

* Misclassified observation


Discriminant Analysis of Remote Sensing Data on Five Crops
Classification of Test Data

The DISCRIM Procedure
Classification Summary for Test Data: WORK.TEST
Classification Summary using Linear Discriminant Function

Observation Profile for Test Data
Number of Observations Read 5
Number of Observations Used 5

Number of Observations and Percent Classified into Crop
From Crop Clover Corn Cotton Soybeans Sugarbeets Total
Clover
0
0.00
0
0.00
1
100.00
0
0.00
0
0.00
1
100.00
Corn
0
0.00
1
100.00
0
0.00
0
0.00
0
0.00
1
100.00
Cotton
0
0.00
0
0.00
0
0.00
1
100.00
0
0.00
1
100.00
Soybeans
0
0.00
0
0.00
0
0.00
1
100.00
0
0.00
1
100.00
Sugarbeets
1
100.00
0
0.00
0
0.00
0
0.00
0
0.00
1
100.00
Total
1
20.00
1
20.00
1
20.00
2
40.00
0
0.00
5
100.00
Priors
0.30556
 
0.19444
 
0.16667
 
0.16667
 
0.16667
 
 
 

Error Count Estimates for Crop
  Clover Corn Cotton Soybeans Sugarbeets Total
Rate 1.0000 0.0000 1.0000 0.0000 1.0000 0.6389
Priors 0.3056 0.1944 0.1667 0.1667 0.1667  


Output 35.4.5: Output Data Set of the Classification Results for Test Data

Discriminant Analysis of Remote Sensing Data on Five Crops
Output Classification Results of Test Data

Obs Crop x1 x2 x3 x4 xvalues Clover Corn Cotton Soybeans Sugarbeets _INTO_
1 Corn 16 27 31 33 16 27 31 33 0.08935 0.40543 0.17632 0.23918 0.08972 Corn
2 Soybeans 21 25 23 24 21 25 23 24 0.14811 0.24308 0.11999 0.33184 0.15698 Soybeans
3 Cotton 29 24 26 28 29 24 26 28 0.25213 0.18420 0.15294 0.25486 0.15588 Soybeans
4 Sugarbeets 54 23 21 54 54 23 21 54 0.62150 0.01937 0.12498 0.04962 0.18452 Clover
5 Clover 32 32 62 16 32 32 62 16 0.21633 0.31799 0.33266 0.11246 0.02056 Cotton


In this next example, PROC DISCRIM uses normal-theory methods (METHOD=NORMAL) assuming unequal variances (POOL=NO) for the remote-sensing data. The PRIORS statement, PRIORS PROP, sets the prior probabilities proportional to the sample sizes. The CROSSVALIDATE option displays cross validation error-rate estimates. Note that the total error count estimate by cross validation (0.5556) is much larger than the total error count estimate by resubstitution (0.1111). The following statements produce Output 35.4.6:

title2 'Using Quadratic Discriminant Function';

proc discrim data=crops method=normal pool=no crossvalidate;
   class Crop;
   priors prop;
   id xvalues;
   var x1-x4;
run;

Output 35.4.6: Quadratic Discriminant Function on Crop Data

Discriminant Analysis of Remote Sensing Data on Five Crops
Using Quadratic Discriminant Function

The DISCRIM Procedure

Total Sample Size 36 DF Total 35
Variables 4 DF Within Classes 31
Classes 5 DF Between Classes 4

Number of Observations Read 36
Number of Observations Used 36

Class Level Information
Crop Variable
Name
Frequency Weight Proportion Prior
Probability
Clover Clover 11 11.0000 0.305556 0.305556
Corn Corn 7 7.0000 0.194444 0.194444
Cotton Cotton 6 6.0000 0.166667 0.166667
Soybeans Soybeans 6 6.0000 0.166667 0.166667
Sugarbeets Sugarbeets 6 6.0000 0.166667 0.166667

Within Covariance Matrix Information
Crop Covariance
Matrix Rank
Natural Log of the
Determinant of the
Covariance Matrix
Clover 4 23.64618
Corn 4 11.13472
Cotton 4 13.23569
Soybeans 4 12.45263
Sugarbeets 4 17.76293

Discriminant Analysis of Remote Sensing Data on Five Crops
Using Quadratic Discriminant Function

The DISCRIM Procedure

Generalized Squared Distance to Crop
From Crop Clover Corn Cotton Soybeans Sugarbeets
Clover 26.01743 1320 104.18297 194.10546 31.40816
Corn 27.73809 14.40994 150.50763 38.36252 25.55421
Cotton 26.38544 588.86232 16.81921 52.03266 37.15560
Soybeans 27.07134 46.42131 41.01631 16.03615 23.15920
Sugarbeets 26.80188 332.11563 43.98280 107.95676 21.34645

Discriminant Analysis of Remote Sensing Data on Five Crops
Using Quadratic Discriminant Function

The DISCRIM Procedure
Classification Summary for Calibration Data: WORK.CROPS
Resubstitution Summary using Quadratic Discriminant Function

Number of Observations and Percent Classified into Crop
From Crop Clover Corn Cotton Soybeans Sugarbeets Total
Clover
9
81.82
0
0.00
0
0.00
0
0.00
2
18.18
11
100.00
Corn
0
0.00
7
100.00
0
0.00
0
0.00
0
0.00
7
100.00
Cotton
0
0.00
0
0.00
6
100.00
0
0.00
0
0.00
6
100.00
Soybeans
0
0.00
0
0.00
0
0.00
6
100.00
0
0.00
6
100.00
Sugarbeets
0
0.00
0
0.00
1
16.67
1
16.67
4
66.67
6
100.00
Total
9
25.00
7
19.44
7
19.44
7
19.44
6
16.67
36
100.00
Priors
0.30556
 
0.19444
 
0.16667
 
0.16667
 
0.16667
 
 
 

Error Count Estimates for Crop
  Clover Corn Cotton Soybeans Sugarbeets Total
Rate 0.1818 0.0000 0.0000 0.0000 0.3333 0.1111
Priors 0.3056 0.1944 0.1667 0.1667 0.1667  

Discriminant Analysis of Remote Sensing Data on Five Crops
Using Quadratic Discriminant Function

The DISCRIM Procedure
Classification Summary for Calibration Data: WORK.CROPS
Cross-validation Summary using Quadratic Discriminant Function

Number of Observations and Percent Classified into Crop
From Crop Clover Corn Cotton Soybeans Sugarbeets Total
Clover
9
81.82
0
0.00
0
0.00
0
0.00
2
18.18
11
100.00
Corn
3
42.86
2
28.57
0
0.00
0
0.00
2
28.57
7
100.00
Cotton
3
50.00
0
0.00
2
33.33
0
0.00
1
16.67
6
100.00
Soybeans
3
50.00
0
0.00
0
0.00
2
33.33
1
16.67
6
100.00
Sugarbeets
3
50.00
0
0.00
1
16.67
1
16.67
1
16.67
6
100.00
Total
21
58.33
2
5.56
3
8.33
3
8.33
7
19.44
36
100.00
Priors
0.30556
 
0.19444
 
0.16667
 
0.16667
 
0.16667
 
 
 

Error Count Estimates for Crop
  Clover Corn Cotton Soybeans Sugarbeets Total
Rate 0.1818 0.7143 0.6667 0.6667 0.8333 0.5556
Priors 0.3056 0.1944 0.1667 0.1667 0.1667