The PROC VARCLUS statement invokes the VARCLUS procedure. By default, VARCLUS clusters the numeric variables in the most recently created SAS data set, starting with one cluster and splitting clusters until all clusters have at most one eigenvalue greater than one.
Table 107.1 summarizes the options available in the PROC VARCLUS statement.
Table 107.1: Options Available in the PROC VARCLUS Statement
Option |
Description |
---|---|
Data Sets |
|
Specifies the input SAS data set |
|
Specifies the output SAS data set to contain statistics |
|
Specifies the output SAS data set for use with PROC TREE |
|
Input Data Processing |
|
Uses the covariance matrix instead of the correlation matrix |
|
Omits the intercept |
|
Specifies the divisor for variances |
|
Number of Clusters |
|
Specifies the maximum number of clusters |
|
Specifies the minimum number of clusters |
|
Specifies the maximum second eigenvalue in a cluster |
|
Specifies the minimum proportion of variance explained by a cluster component |
|
Clustering Methods |
|
Uses centroid components instead of principal components |
|
Clusters hierarchically |
|
Specifies the initialization method |
|
Specifies the maximum iterations during the alternating least squares phase |
|
Specifies the maximum iterations during the search phase |
|
Performs a multiple group component analysis |
|
Specifies the random number seed |
|
Control Displayed Output |
|
Displays the correlation matrix |
|
Suppresses displayed output |
|
Specifies ODS Graphics details |
|
Suppresses display of large matrices |
|
Displays means and standard deviations |
|
Suppresses all default displayed output except the final summary table |
|
Displays the cluster to which each variable is assigned during the iterations |
VARCLUS chooses which cluster to split based on the MAXEIGEN= and PROPORTION= options.
If you specify either or both of these two options, then only the specified options affect the choice of the cluster to split.
If you specify neither of these options, the criterion for choice of cluster to split depends on the CENTROID option:
If you specify CENTROID, VARCLUS splits the cluster with the smallest percentage of variation explained by its cluster component, as if you had specified the PROPORTION= option.
If you do not specify CENTROID, VARCLUS splits the cluster with the largest eigenvalue associated with the second principal component, as if you had specified the MAXEIGEN= option.
The final number of clusters is controlled by three options: MAXCLUSTERS=, MAXEIGEN=, and PROPORTION=.
If you specify any of these three options, then only the options you specify affect the final number of clusters.
If you specify none of these options, VARCLUS continues to split clusters until the default splitting criterion is satisfied. The default splitting criterion depends on the CENTROID option:
If you specify CENTROID, the default splitting criterion is PROPORTION=0.75.
If you do not specify CENTROID, splitting is based on the MAXEIGEN= criterion, with a default depending on the COVARIANCE option:
For analyzing a correlation matrix (no COVARIANCE option), the default value for MAXEIGEN= is one.
For analyzing a covariance matrix (using the COVARIANCE option), the default value for MAXEIGEN= is the average variance of the variables being clustered.
VARCLUS continues to split clusters until any of the following conditions holds:
The number of cluster equals the value specified for MAXCLUSTERS=.
No cluster qualifies for splitting according to the MAXEIGEN= or PROPORTION= criterion.
A cluster was chosen for splitting, but after iteratively reassigning variables to clusters, one of the cluster has no members.
The following list gives details about the options.