The following statements show how you can define modules to compute correlation coefficients between numeric variables and standardized values for a set of data. For more efficient computations, use the built-in CORR function and the STD function.
proc iml; /* Module to compute correlations */ start corr; n = nrow(x); /* number of observations */ sum = x[+,] ; /* compute column sums */ xpx = t(x)*x-t(sum)*sum/n; /* compute sscp matrix */ s = diag(1/sqrt(vecdiag(xpx))); /* scaling matrix */ corr = s*xpx*s; /* correlation matrix */ print "Correlation Matrix",,corr[rowname=nm colname=nm] ; finish corr; /* Module to standardize data */ start std; mean = x[+,] /n; /* means for columns */ x = x-repeat(mean,n,1); /* center x to mean zero */ ss = x[##,] ; /* sum of squares for columns */ std = sqrt(ss/(n-1)); /* standard deviation estimate*/ x = x*diag(1/std); /* scaling to std dev 1 */ print ,"Standardized Data",,X[colname=nm] ; finish std; /* Sample run */ x = { 1 2 3, 3 2 1, 4 2 1, 0 4 1, 24 1 0, 1 3 8}; nm={age weight height}; run corr; run std;
The results are shown in Output 9.1.1.
Output 9.1.1: Correlation Coefficients and Standardized Values
Correlation Matrix |
corr | |||
---|---|---|---|
AGE | WEIGHT | HEIGHT | |
AGE | 1 | -0.717102 | -0.436558 |
WEIGHT | -0.717102 | 1 | 0.3508232 |
HEIGHT | -0.436558 | 0.3508232 | 1 |
Standardized Data |
x | ||
---|---|---|
AGE | WEIGHT | HEIGHT |
-0.490116 | -0.322749 | 0.2264554 |
-0.272287 | -0.322749 | -0.452911 |
-0.163372 | -0.322749 | -0.452911 |
-0.59903 | 1.6137431 | -0.452911 |
2.0149206 | -1.290994 | -0.792594 |
-0.490116 | 0.6454972 | 1.924871 |