The CANDISC Procedure

Output Data Sets

OUT= Data Set

The OUT= data set contains all the variables in the original data set plus new variables containing the canonical variable scores. You determine the number of new variables by using the NCAN= option. The names of the new variables are formed as described in the PREFIX= option. The new variables have means equal to zero and pooled within-class variances equal to one. An OUT= data set cannot be created if the DATA= data set is not an ordinary SAS data set.

OUTSTAT= Data Set

The OUTSTAT= data set is similar to the TYPE=CORR data set produced by the CORR procedure but contains many results in addition to those produced by the CORR procedure.

The OUTSTAT= data set is TYPE=CORR, and it contains the following variables:

  • the BY variables, if any

  • the CLASS variable

  • _TYPE_, a character variable of length 8 that identifies the type of statistic

  • _NAME_, a character variable of length 32 that identifies the row of the matrix or the name of the canonical variable

  • the quantitative variables (those in the VAR statement, or if there is no VAR statement, all numeric variables not listed in any other statement)

The observations, as identified by the variable _TYPE_, have the following _TYPE_ values:

_TYPE_

Contents

N

number of observations both for the total sample (CLASS variable missing) and within each class (CLASS variable present)

SUMWGT

sum of weights both for the total sample (CLASS variable missing) and within each class (CLASS variable present) if a WEIGHT statement is specified

MEAN

means both for the total sample (CLASS variable missing) and within each class (CLASS variable present)

STDMEAN

total-standardized class means

PSTDMEAN

pooled within-class standardized class means

STD

standard deviations both for the total sample (CLASS variable missing) and within each class (CLASS variable present)

PSTD

pooled within-class standard deviations

BSTD

between-class standard deviations

RSQUARED

univariate R squares

The following kinds of observations are identified by the combination of the variables _TYPE_ and _NAME_. When the _TYPE_ variable has one of the following values, the _NAME_ variable identifies the row of the matrix:

_TYPE_

Contents

CSSCP

corrected SSCP matrix for the total sample (CLASS variable missing) and within each class (CLASS variable present)

PSSCP

pooled within-class corrected SSCP matrix

BSSCP

between-class SSCP matrix

COV

covariance matrix for the total sample (CLASS variable missing) and within each class (CLASS variable present)

PCOV

pooled within-class covariance matrix

BCOV

between-class covariance matrix

CORR

correlation matrix for the total sample (CLASS variable missing) and within each class (CLASS variable present)

PCORR

pooled within-class correlation matrix

BCORR

between-class correlation matrix

When the _TYPE_ variable has one of the following values, the _NAME_ variable identifies the canonical variable:

_TYPE_

Contents

CANCORR

canonical correlations

STRUCTUR

canonical structure

BSTRUCT

between canonical structure

PSTRUCT

pooled within-class canonical structure

SCORE

total sample standardized canonical coefficients

PSCORE

pooled within-class standardized canonical coefficients

RAWSCORE

raw canonical coefficients

CANMEAN

means of the canonical variables for each class

You can use this data set with PROC SCORE to get scores on the canonical variables for new data by using one of the following forms:

* The CLASS variable C is numeric;
proc score data=NewData score=Coef(where=(c = .  )) out=Scores; 
run;

* The CLASS variable C is character;
proc score data=NewData score=Coef(where=(c = ' ')) out=Scores;
run;

The WHERE clause is used to exclude the within-class means and standard deviations. PROC SCORE standardizes the new data by subtracting the original variable means that are stored in the _TYPE_=’MEAN’ observations, and dividing by the original variable standard deviations from the _TYPE_=’STD’ observations. Then PROC SCORE multiplies the standardized variables by the coefficients from the _TYPE_=’SCORE’ observations to get the canonical scores.