The OUTSTAT= data set is TYPE=CORR, and it can be used as input to the SCORE procedure or a subsequent run of PROC VARCLUS. The OUSTAT= data set contains the following variables:
BY variables
_NCL_
, a numeric variable that gives the number of clusters
_TYPE_
, a character variable that indicates the type of statistic the observation contains
_NAME_
, a character variable that contains a variable name or a cluster name, which is of the form CLUSn
, where n is the number of the cluster
the variables that are clustered
The values of the _TYPE_
variable are listed in the following table.
Table 100.2: _TYPE_
|
Contents |
---|---|
'MEAN' |
Means |
'STD' |
Standard deviations |
'USTD' |
Uncorrected standard deviations, produced when the NOINT option is specified |
'N' |
Number of observations |
'CORR' |
Correlations |
'UCORR' |
Uncorrected correlation matrix, produced when the NOINT option is specified |
'MEMBERS' |
Number of members in each cluster |
'VAREXP' |
Variance explained by each cluster |
'PROPOR' |
Proportion of variance explained by each cluster |
'GROUP' |
Number of the cluster to which each variable belongs |
'RSQUARED' |
Squared multiple correlation of each variable with its cluster component |
'SCORE' |
Standardized scoring coefficients |
'USCORE' |
Scoring coefficients to be applied without subtracting the mean from the raw variables, produced when the NOINT option is specified |
'STRUCTUR' |
Cluster structure |
'CCORR' |
Correlations between cluster components |
The observations with _TYPE_
='MEAN', 'STD', 'N', and 'CORR' have missing values for the _NCL_
variable. All other values of the _TYPE_
variable are repeated for each cluster solution, with different solutions distinguished by the value of the _NCL_
variable. If you want to specify the OUTSTAT= data set with the SCORE procedure, you can use a DATA step to select observations
with the _NCL_
variable missing or equal to the desired number of clusters as follows:
data Coef2; set Coef; if _ncl_ = . or _ncl_ = 3; drop _ncl_; run; proc score data=NewScore score=Coef2; run;
PROC SCORE standardizes the new data by subtracting the original variable means that are stored in the _TYPE_
='MEAN' observations and dividing by the original variable standard deviations from the _TYPE_
='STD' observations. Then PROC SCORE multiplies the standardized variables by the coefficients from the _TYPE_
='SCORE' observations to get the cluster scores.
The OUTTREE= data set contains one observation for each variable clustered plus one observation for each cluster of two or more variables—that is, one observation for each node of the cluster tree. The total number of output observations is between n and , where n is the number of variables clustered.
The OUTTREE= data set contains the following variables:
BY variables, if any
_NAME_
, a character variable that gives the name of the node. If the node is a cluster, the name is CLUSn
, where n is the number of the cluster. If the node is a single variable, the variable name is used.
_PARENT_
, a character variable that gives the value of _NAME_
of the parent of the node. If the node is the root of the tree, _PARENT_
is blank.
_LABEL_
, a character variable that gives the label of the node. If the node is a cluster, the label is CLUSn
, where n is the number of the cluster. If the node is a single variable, the variable label is used.
_NCL_
, the number of clusters
_VAREXP_
, the total variance explained by the clusters at the current level of the tree
_PROPOR_
, the total proportion of variance explained by the clusters at the current level of the tree
_MINPRO_
, the minimum proportion of variance explained by a cluster component
_MAXEIG_
, the maximum second eigenvalue of a cluster