If the coordinates in the DATA= data set represent cluster means (for example, formed by the FASTCLUS procedure), you can obtain accurate statistics in the cluster histories for METHOD=AVERAGE, METHOD=CENTROID, or METHOD=WARD if the data set contains both of the following:
a variable giving the number of original observations in each cluster (see the discussion of the FREQ statement earlier in this chapter)
a variable giving the root mean squared standard deviation of each cluster
Specify the name of the variable containing root mean squared standard deviations in the RMSSTD statement. If you specify the RMSSTD statement, you must also specify a FREQ statement.
If you omit the RMSSTD statement but the DATA= data set contains a variable called _RMSSTD_
, then the root mean squared standard deviations are obtained from the _RMSSTD_
variable.
An RMSSTD statement or _RMSSTD_
variable is required when you specify the HYBRID option.
A data set created by PROC FASTCLUS, using the MEAN= option, contains _FREQ_
and _RMSSTD_
variables, so you do not have to use FREQ and RMSSTD statements when using such a data set as input to the CLUSTER procedure.