The OUT= data set contains all the variables in the original data set plus new variables that contain the principal component
scores. The N= option determines the number of new variables. The names of the new variables are formed by concatenating the
value given by the PREFIX= option (or Prin
if PREFIX= is omitted) to the numbers 1, 2, 3, and so on. The new variables have mean 0 and variance equal to the corresponding
eigenvalue, unless you specify the STANDARD option to standardize the scores to unit variance. Also, if you specify the COV
option, PROC PRINCOMP computes the principal component scores from the corrected or uncorrected (if the NOINT option is specified)
variables rather than from the standardized variables.
If you use a PARTIAL statement, the OUT= data set also contains the residuals from predicting the VAR variables from the PARTIAL variables.
You cannot create an OUT= data set if the DATA= data set is TYPE=ACE, TYPE=CORR, TYPE=COV, TYPE=EST, TYPE=FACTOR, TYPE=SSCP, TYPE=UCORR, or TYPE=UCOV.
The OUTSTAT= data set is similar to the TYPE=CORR data set that the CORR procedure produces. The following table relates the TYPE= value for the OUTSTAT= data set to the options that are specified in the PROC PRINCOMP statement:
Options |
TYPE= |
|
---|---|---|
(Default) |
CORR |
|
COV |
COV |
|
NOINT |
UCORR |
|
COV NOINT |
UCOV |
Note that the default (neither the COV nor NOINT option) produces a TYPE=CORR data set.
The new data set contains the following variables:
the BY variables, if any
two new variables, _TYPE_
and _NAME_
, both character variables
the variables that are analyzed (that is, those in the VAR statement); or, if there is no VAR statement, all numeric variables not listed in any other statement; or, if there is a PARTIAL statement, the residual variables as described in the section OUT= Data Set
Each observation in the new data set contains some type of statistic, as indicated by the _TYPE_
variable. The values of the _TYPE_
variable are as follows:
Contents
mean of each variable. If you specify the PARTIAL statement, this observation is omitted.
standard deviations. If you specify the COV option, this observation is omitted, so the SCORE procedure does not standardize the variables before computing scores. If you use the PARTIAL statement, the standard deviation of a variable is computed as its root mean squared error as predicted from the PARTIAL variables.
uncorrected standard deviations. When you specify the NOINT option in the PROC PRINCOMP statement, the OUTSTAT= data set contains standard deviations not corrected for the mean. However, if you also specify the COV option in the PROC PRINCOMP statement, this observation is omitted.
number of observations on which the analysis is based. This value is the same for each variable. If you specify the PARTIAL statement and the value of the VARDEF= option is DF or unspecified, then the number of observations is decremented by the degrees of freedom of the PARTIAL variables.
the sum of the weights of the observations. This value is the same of each variable. If you specify the PARTIAL statement
and VARDEF=WDF, then the sum of the weights is decremented by the degrees of freedom of the PARTIAL variables. This observation
is output only if the value is different from that in the observation for which _TYPE_
='N'.
correlations between each variable and the variable specified by the _NAME_
variable. The number of observations for which _TYPE_
='CORR' is equal to the number of variables being analyzed. If you specify the COV option, no _TYPE_
='CORR' observations are produced. If you use the PARTIAL statement, the partial correlations, not the raw correlations, are
output.
uncorrected correlation matrix. When you specify the NOINT option without the COV option in the PROC PRINCOMP statement, the OUTSTAT= data set contains a matrix of correlations not corrected for the means. However, if you also specify the COV option in the PROC PRINCOMP statement, this observation is omitted.
covariances between each variable and the variable specified by the _NAME_
variable. _TYPE_
='COV' observations are produced only if you specify the COV option. If you use the PARTIAL statement, the partial covariances,
not the raw covariances, are output.
uncorrected covariance matrix. When you specify the NOINT and COV options in the PROC PRINCOMP statement, the OUTSTAT= data set contains a matrix of covariances not corrected for the means.
eigenvalues. If the N= option requests fewer principal components than the maximum number, only the specified number of eigenvalues is produced, with missing values filling out the observation.
eigenvectors. The _NAME_
variable contains the name of the corresponding principal component as constructed from the PREFIX= option. The number of
observations for which _TYPE_
='SCORE' equals the number of principal components computed. The eigenvectors have unit length unless you specify the STD
option, in which case the unit-length eigenvectors are divided by the square roots of the eigenvalues to produce scores that
have unit standard deviations.
When you do not specify the COV option, you can produce the principal component scores by multiplying the standardized data
by these coefficients. When you specify the COV option, you can produce the principal component scores by multiplying the
centered data by these coefficients. You should use the means, obtained from the observation for which _TYPE_
='MEAN', to center the data. You should use the standard deviations, obtained from the observation for which _TYPE_
='STD', to standardize the data.
scoring coefficients to be applied without subtracting the mean from the raw variables. Observations for which _TYPE_
='USCORE' are produced when you specify the NOINT option in the PROC PRINCOMP statement.
To obtain the principal component scores, these coefficients should be multiplied by the data that are standardized by the
uncorrected standard deviations obtained from the observation for which _TYPE_
='USTD'.
R squares for each VAR variable as predicted by the PARTIAL variables
regression coefficients for each VAR variable as predicted by the PARTIAL variables. This observation is produced only if you specify the COV option.
standardized regression coefficients for each VAR variable as predicted by the PARTIAL variables. If you specify the COV option, this observation is omitted.
You can use the data set with the SCORE procedure to compute principal component scores, or you can use it as input to the FACTOR procedure and specify METHOD=SCORE to rotate the components. If you use the PARTIAL statement, the scoring coefficients should be applied to the residuals, not to the original variables.