The iris data published by Fisher (1936) have been widely used for examples in discriminant analysis and cluster analysis. The sepal length, sepal width, petal length,
and petal width are measured in millimeters on 50 iris specimens from each of three species: Iris setosa, I. versicolor, and I. virginica. The iris data set is available from the Sashelp
library.
A stepwise discriminant analysis is performed by using stepwise selection.
In the PROC STEPDISC statement, the BSSCP and TSSCP options display the between-class SSCP matrix and the total-sample corrected SSCP matrix. By default, the significance level of an F test from an analysis of covariance is used as the selection criterion. The variable under consideration is the dependent variable, and the variables already chosen act as covariates. The following SAS statements produce Output 89.1.1 through Output 89.1.8:
title 'Fisher (1936) Iris Data'; %let _stdvar = ; proc stepdisc data=sashelp.iris bsscp tsscp; class Species; var SepalLength SepalWidth PetalLength PetalWidth; run;
Output 89.1.1: Iris Data: Summary Information
Fisher (1936) Iris Data |
The Method for Selecting Variables is STEPWISE | |||
---|---|---|---|
Total Sample Size | 150 | Variable(s) in the Analysis | 4 |
Class Levels | 3 | Variable(s) Will Be Included | 0 |
Significance Level to Enter | 0.15 | ||
Significance Level to Stay | 0.15 |
Number of Observations Read | 150 |
---|---|
Number of Observations Used | 150 |
Class Level Information | ||||
---|---|---|---|---|
Species | Variable Name |
Frequency | Weight | Proportion |
Setosa | Setosa | 50 | 50.0000 | 0.333333 |
Versicolor | Versicolor | 50 | 50.0000 | 0.333333 |
Virginica | Virginica | 50 | 50.0000 | 0.333333 |
Output 89.1.2: Iris Data: Between-Class and Total-Sample SSCP Matrices
Fisher (1936) Iris Data |
Between-Class SSCP Matrix | |||||
---|---|---|---|---|---|
Variable | Label | SepalLength | SepalWidth | PetalLength | PetalWidth |
SepalLength | Sepal Length (mm) | 6321.21333 | -1995.26667 | 16524.84000 | 7127.93333 |
SepalWidth | Sepal Width (mm) | -1995.26667 | 1134.49333 | -5723.96000 | -2293.26667 |
PetalLength | Petal Length (mm) | 16524.84000 | -5723.96000 | 43710.28000 | 18677.40000 |
PetalWidth | Petal Width (mm) | 7127.93333 | -2293.26667 | 18677.40000 | 8041.33333 |
Total-Sample SSCP Matrix | |||||
---|---|---|---|---|---|
Variable | Label | SepalLength | SepalWidth | PetalLength | PetalWidth |
SepalLength | Sepal Length (mm) | 10216.83333 | -632.26667 | 18987.30000 | 7692.43333 |
SepalWidth | Sepal Width (mm) | -632.26667 | 2830.69333 | -4911.88000 | -1812.42667 |
PetalLength | Petal Length (mm) | 18987.30000 | -4911.88000 | 46432.54000 | 19304.58000 |
PetalWidth | Petal Width (mm) | 7692.43333 | -1812.42667 | 19304.58000 | 8656.99333 |
In step 1, the tolerance is 1.0 for each variable under consideration because no variables have yet entered the model. The
variable PetalLength
is selected because its F statistic, 1180.161, is the largest among all variables.
Output 89.1.3: Iris Data: Stepwise Selection Step 1
Fisher (1936) Iris Data |
Statistics for Entry, DF = 2, 147 | |||||
---|---|---|---|---|---|
Variable | Label | R-Square | F Value | Pr > F | Tolerance |
SepalLength | Sepal Length (mm) | 0.6187 | 119.26 | <.0001 | 1.0000 |
SepalWidth | Sepal Width (mm) | 0.4008 | 49.16 | <.0001 | 1.0000 |
PetalLength | Petal Length (mm) | 0.9414 | 1180.16 | <.0001 | 1.0000 |
PetalWidth | Petal Width (mm) | 0.9289 | 960.01 | <.0001 | 1.0000 |
Variable PetalLength will be entered. |
Variable(s) That Have Been Entered |
---|
PetalLength |
Multivariate Statistics | |||||
---|---|---|---|---|---|
Statistic | Value | F Value | Num DF | Den DF | Pr > F |
Wilks' Lambda | 0.058628 | 1180.16 | 2 | 147 | <.0001 |
Pillai's Trace | 0.941372 | 1180.16 | 2 | 147 | <.0001 |
Average Squared Canonical Correlation | 0.470686 |
In step 2, with the variable PetalLength
already in the model, PetalLength
is tested for removal before a new variable is selected for entry. Since PetalLength
meets the criterion to stay, it is used as a covariate in the analysis of covariance for variable selection. The variable
SepalWidth
is selected because its F statistic, 43.035, is the largest among all variables not in the model and because its associated tolerance, 0.8164, meets
the criterion to enter. The process is repeated in steps 3 and 4. The variable PetalWidth
is entered in step 3, and the variable SepalLength
is entered in step 4.
Output 89.1.4: Iris Data: Stepwise Selection Step 2
Fisher (1936) Iris Data |
Statistics for Removal, DF = 2, 147 | ||||
---|---|---|---|---|
Variable | Label | R-Square | F Value | Pr > F |
PetalLength | Petal Length (mm) | 0.9414 | 1180.16 | <.0001 |
No variables can be removed. |
Statistics for Entry, DF = 2, 146 | |||||
---|---|---|---|---|---|
Variable | Label | Partial R-Square |
F Value | Pr > F | Tolerance |
SepalLength | Sepal Length (mm) | 0.3198 | 34.32 | <.0001 | 0.2400 |
SepalWidth | Sepal Width (mm) | 0.3709 | 43.04 | <.0001 | 0.8164 |
PetalWidth | Petal Width (mm) | 0.2533 | 24.77 | <.0001 | 0.0729 |
Variable SepalWidth will be entered. |
Variable(s) That Have Been Entered |
|
---|---|
SepalWidth | PetalLength |
Multivariate Statistics | |||||
---|---|---|---|---|---|
Statistic | Value | F Value | Num DF | Den DF | Pr > F |
Wilks' Lambda | 0.036884 | 307.10 | 4 | 292 | <.0001 |
Pillai's Trace | 1.119908 | 93.53 | 4 | 294 | <.0001 |
Average Squared Canonical Correlation | 0.559954 |
Output 89.1.5: Iris Data: Stepwise Selection Step 3
Fisher (1936) Iris Data |
Statistics for Removal, DF = 2, 146 | ||||
---|---|---|---|---|
Variable | Label | Partial R-Square |
F Value | Pr > F |
SepalWidth | Sepal Width (mm) | 0.3709 | 43.04 | <.0001 |
PetalLength | Petal Length (mm) | 0.9384 | 1112.95 | <.0001 |
No variables can be removed. |
Statistics for Entry, DF = 2, 145 | |||||
---|---|---|---|---|---|
Variable | Label | Partial R-Square |
F Value | Pr > F | Tolerance |
SepalLength | Sepal Length (mm) | 0.1447 | 12.27 | <.0001 | 0.1323 |
PetalWidth | Petal Width (mm) | 0.3229 | 34.57 | <.0001 | 0.0662 |
Variable PetalWidth will be entered. |
Variable(s) That Have Been Entered | ||
---|---|---|
SepalWidth | PetalLength | PetalWidth |
Multivariate Statistics | |||||
---|---|---|---|---|---|
Statistic | Value | F Value | Num DF | Den DF | Pr > F |
Wilks' Lambda | 0.024976 | 257.50 | 6 | 290 | <.0001 |
Pillai's Trace | 1.189914 | 71.49 | 6 | 292 | <.0001 |
Average Squared Canonical Correlation | 0.594957 |
Output 89.1.6: Iris Data: Stepwise Selection Step 4
Fisher (1936) Iris Data |
Statistics for Removal, DF = 2, 145 | ||||
---|---|---|---|---|
Variable | Label | Partial R-Square |
F Value | Pr > F |
SepalWidth | Sepal Width (mm) | 0.4295 | 54.58 | <.0001 |
PetalLength | Petal Length (mm) | 0.3482 | 38.72 | <.0001 |
PetalWidth | Petal Width (mm) | 0.3229 | 34.57 | <.0001 |
No variables can be removed. |
Statistics for Entry, DF = 2, 144 | |||||
---|---|---|---|---|---|
Variable | Label | Partial R-Square |
F Value | Pr > F | Tolerance |
SepalLength | Sepal Length (mm) | 0.0615 | 4.72 | 0.0103 | 0.0320 |
Variable SepalLength will be entered. |
All variables have been entered. |
Multivariate Statistics | |||||
---|---|---|---|---|---|
Statistic | Value | F Value | Num DF | Den DF | Pr > F |
Wilks' Lambda | 0.023439 | 199.15 | 8 | 288 | <.0001 |
Pillai's Trace | 1.191899 | 53.47 | 8 | 290 | <.0001 |
Average Squared Canonical Correlation | 0.595949 |
Since no more variables can be added to or removed from the model, the procedure stops at step 5 and displays a summary of the selection process.
Output 89.1.7: Iris Data: Stepwise Selection Step 5
Fisher (1936) Iris Data |
Statistics for Removal, DF = 2, 144 | ||||
---|---|---|---|---|
Variable | Label | Partial R-Square |
F Value | Pr > F |
SepalLength | Sepal Length (mm) | 0.0615 | 4.72 | 0.0103 |
SepalWidth | Sepal Width (mm) | 0.2335 | 21.94 | <.0001 |
PetalLength | Petal Length (mm) | 0.3308 | 35.59 | <.0001 |
PetalWidth | Petal Width (mm) | 0.2570 | 24.90 | <.0001 |
No variables can be removed. |
Output 89.1.8: Iris Data: Stepwise Selection Summary
No further steps are possible. |
Fisher (1936) Iris Data |
Stepwise Selection Summary | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Step | Number In |
Entered | Removed | Label | Partial R-Square |
F Value | Pr > F | Wilks' Lambda |
Pr < Lambda |
Average Squared Canonical Correlation |
Pr > ASCC |
1 | 1 | PetalLength | Petal Length (mm) | 0.9414 | 1180.16 | <.0001 | 0.05862828 | <.0001 | 0.47068586 | <.0001 | |
2 | 2 | SepalWidth | Sepal Width (mm) | 0.3709 | 43.04 | <.0001 | 0.03688411 | <.0001 | 0.55995394 | <.0001 | |
3 | 3 | PetalWidth | Petal Width (mm) | 0.3229 | 34.57 | <.0001 | 0.02497554 | <.0001 | 0.59495691 | <.0001 | |
4 | 4 | SepalLength | Sepal Length (mm) | 0.0615 | 4.72 | 0.0103 | 0.02343863 | <.0001 | 0.59594941 | <.0001 |
PROC STEPDISC automatically creates a list of the selected variables and stores it in a macro variable. You can submit the following statement to see the list of selected variables:
* print the macro variable list; %put &_stdvar;
The macro variable _StdVar
contains the following variable list:
SepalLength SepalWidth PetalLength PetalWidth
You could use this macro variable if you want to analyze these variables in subsequent steps as follows:
proc discrim data=sashelp.iris; class Species; var &_stdvar; run;
The results of this step are not shown.