The CALIS Procedure

Example 27.27 Linear Relations among Factor Loadings

In this example, you use the FACTOR modeling language of PROC CALIS to specify a confirmatory factor analysis model with linear constraints on loadings. You use SAS programming statements to set the constraints. This example also discusses the differences between fitting covariance structures and correlation structures in the current modeling context.

The correlation matrix of six variables from Kinzer and Kinzer (N=326) is used by Guttman (1957) as an example that yields an approximate simplex. McDonald (1980) uses this data set as an example of factor analysis where he assumes that the loadings on the second factor are linear functions of the loadings on the first factor. Let $\mb {B}$ be the factor loading matrix containing the two factors and six variables so that:

\[  \mb {B} = \left( \begin{matrix}  b_{11}   &  b_{12}  \\ b_{21}   &  b_{22}  \\ b_{31}   &  b_{32}  \\ b_{41}   &  b_{42}  \\ b_{51}   &  b_{52}  \\ b_{61}   &  b_{62}   \\ \end{matrix} \right)  \]

and

\[  b_{j2} = \alpha + \beta b_{j1},\quad j = 1, \ldots , 6  \]

The correlation structures are represented by:

\[  \mb {P} = \mb {B} \mb {B}^{\prime } + \bPsi  \]

where $\bPsi = \mr {diag}(\psi _{11},\psi _{22},\psi _{33},\psi _{44},\psi _{55},\psi _{66})$ represents the diagonal matrix of unique variances for the variables.

With parameters $\alpha $ and $\beta $ being unconstrained, McDonald (1980) has fitted an underidentified model with seven degrees of freedom. Browne (1982) imposes the following identification condition:

\[  \beta = -1  \]

In this example, Browne’s identification condition is imposed. The following is the specification of the confirmatory factor model using the FACTOR modeling language.

data kinzer(type=corr);
title "Data Matrix of Kinzer & Kinzer, see GUTTMAN (1957)";
   _type_ = 'corr';
   input _name_ $ var1-var6;
   datalines;
var1  1.00   .     .     .     .     .
var2   .51  1.00   .     .     .     .
var3   .46   .51  1.00   .     .     .
var4   .46   .47   .54  1.00   .     .
var5   .40   .39   .49   .57  1.00   .
var6   .33   .39   .47   .45   .56  1.00
  ;
proc calis data=kinzer nobs=326 nose;
   factor
      factor1 ===> var1-var6   = b11 b21 b31 b41 b51 b61 (6 *.6),
      factor2 ===> var1-var6   = b12 b22 b32 b42 b52 b62;
   pvar
      factor1-factor2 = 2 * 1.,
      var1-var6       = psi1-psi6 (6 *.3);
   cov
      factor1 factor2 = 0.;
   parameters alpha (1.);
   /* SAS Programming Statements to define dependent parameters */
   b12 = alpha - b11;
   b22 = alpha - b21;
   b32 = alpha - b31;
   b42 = alpha - b41;
   b52 = alpha - b51;
   b62 = alpha - b61;
   fitindex on(only)=[chisq df probchi];
run;

In the FACTOR statement, you specify two factors, named factor1 and factor2, for the variables. In this model, all manifest variables have nonzero loadings on the two factors. These loading parameters are specified after the equal signs and are named with the prefix 'b.' You specify the initial estimates in the parentheses for the parameters in the first entry of the FACTOR statement. The loadings in the first entry are all free parameters with initial estimates of .6. In the second entry of the FACTOR statement, you specify the Loadings of var1var6 on factor2. However, these parameters are dependent, as shown in the SAS programming statements. Initial values for these dependent parameters are thus unnecessary.

In the PVAR statement, the factor variances are fixed at ones, while the error variances of the variables are free parameters named psi1psi6. Again, you provide initial estimates for these error variance parameters. All have the initial value of 0.3.

An additional parameter alpha is specified in the PARAMETERS statement with an initial value of 1. Then, you use six SAS programming statements to define the loadings on the second factor as functions of the loadings on the first factor. Lastly, the FITINDEX statement is used to trim the results in the fit summary table.

In the specification, there are twelve loadings in the FACTOR statement and six error variances in the PVAR statement. Adding the parameter alpha in the list, there are 19 parameters in total. However, the loading parameters are not all independent of each other. As defined in the SAS programming statements, six loadings are dependent. This reduces the number of free parameters to 13. Hence the degrees of freedom for the model is 8 = 21 – 13. Notice that the factor variances are fixed at 1, as specified in the PVAR statement, and covariance among the two factors is fixed at zero, as specified in the COV statement.

Output 27.27.1 shows a concise fit summary table. The chi-square test statistic of model fit is 10.337 with df = 8 (p = 0.242). This indicates a good model fit.

Output 27.27.1: Fit of the Correlation Structures

Fit Summary
Chi-Square 10.3374
Chi-Square DF 8
Pr > Chi-Square 0.2421


The estimated factor loading matrix is presented in Output 27.27.2, and the estimated error variances and the estimate for alpha are presented in Output 27.27.3.

Output 27.27.2: Loading Estimates

Factor Loading Matrix
  factor1 factor2
var1
0.3609
[b11]
0.6174
[b12]
var2
0.3212
[b21]
0.6571
[b22]
var3
0.4859
[b31]
0.4923
[b32]
var4
0.5745
[b41]
0.4038
[b42]
var5
0.7985
[b51]
0.1797
[b52]
var6
0.6736
[b61]
0.3046
[b62]


Output 27.27.3: Unique Variances and the Additional Parameter

Error Variances
Variable Parameter Estimate
var1 psi1 0.53036
var2 psi2 0.44986
var3 psi3 0.48756
var4 psi4 0.47278
var5 psi5 0.31125
var6 psi6 0.53815

Additional Parameters
Type Parameter Estimate
Independent alpha 0.97825


All these estimates are essentially the same as those reported in Browne (1982). Notice that there are no standard error estimates in the output, as requested by the NOSE option in the PROC CALIS statement. Standard error estimates are not of interest in this example.

In fitting the preceding factor model, wrong covariance structures rather than the intended correlation structures have been specified. As pointed out by Browne (1982), fitting such covariance structures directly is not entirely appropriate for analyzing correlations. For example, when fitting the correlation structures, the diagonal elements of $\mb {P}$ must always be fixed ones. This fact has never been enforced in the preceding specification. A simple check of the estimates will illustrate the problem. In Output 27.27.2, the loading estimates of VAR1 on the two factors are 0.3609 and 0.6174, respectively. In Output 27.27.3, the error variance estimate for VAR1 is 0.53036. The fitted variance of VAR1 can therefore be computed by the following equation:

\[  \mbox{fitted variance} = 0.3609^2 + 0.6174^2 + 0.53036 = 1.0418  \]

This fitted value is quite a bit off from 1.00, as required for the standardized variance of VAR1.

Fortunately, even though the wrong covariance structure model has been analyzed, the preceding analysis is not completely useless. For the current confirmatory factor model, according to Browne (1982) the estimates obtained from fitting the wrong covariance structure model are still consistent (as if they were estimating the population parameters in the correlation structures). However, the chi-square test statistic as reported previously is not correct.

Note that using the CORR option in the PROC CALIS statement will not solve the problem. By specifying the CORR option you merely request PROC CALIS to use the correlation matrix directly as a covariance matrix in the objective function for model fitting. It still would not constrain the fitting of the diagonal elements to 1 during estimation.

In the next section, a solution to the correlation analysis problem is suggested. It is not claimed that this is the only solution or the best solution. Alternative treatments of the problem are possible.

Fitting the Correct Correlation Structures

This main idea of this solution is to embed the intended correlation structures (with correct constraints on the diagonal elements of the correlation matrix) into a covariance structure model so that the estimation methods of PROC CALIS can be applied legitimately to the specially constructed covariance structures.

First, the issue of the fixed ones on the diagonal of the correlation structure model is addressed. That is, the diagonal elements of the correlation structures represented by $(\mb {B} \mb {B}^{\prime } + \bPsi )$ must be fitted by ones. This can be accomplished by constraining the error variances as dependent parameters of the loadings, as shown in the following:

\[  \bPsi _{jj} = 1. - b_{j1}^2 - b_{j2}^2, \quad j = 1, \ldots , 6  \]

Other constraints might also serve the purpose, but the proposed constraints here are the most convenient and intuitive.

Now, due to the fact that discrepancy functions used in PROC CALIS are derived for covariance matrices rather than correlation matrices, PROC CALIS is essentially set up for analyzing covariance structures (with or without mean structures), but not correlation structures. Hence, the statistical theory behind PROC CALIS applies to covariance structure analysis, but it might not generalize to correlation structure analysis in all situations. Despite that, with some manipulations PROC CALIS can fit the correct correlation structures to the current data without compromising the statistical theory. These manipulations are now discussed. Recall that the correlation structures are represented by:

\[  \mb {P} = \mb {B} \mb {B}^{\prime } + \bPsi  \]

As before, in the $\mb {B}$ matrix, there are six linear constraints on the factor loadings. In addition, the diagonal elements of $(\mb {B} \mb {B}^{\prime } + \bPsi )$ are constrained to ones, as done by defining the error variances as dependent parameters of the loadings in the preceding equation. To analyze the correlation structures by using PROC CALIS, a covariance structure model with such correlation structures embedded is now specified. That is, the covariance structure to be fitted by PROC CALIS is as follows:

\[  \bSigma = \mb {D} \mb {P} \mb {D}^{\prime } = \mb {D} (\mb {B} \mb {B}^{\prime } + \bPsi ) \mb {D}^{\prime }  \]

where $\mb {D}$ is a 6 x 6 diagonal matrix containing the population standard deviations for the manifest variables. Theoretically, it is legitimate that you analyze this covariance structure model for studying the embedded correlation structures. In addition, it does not matter whether your input matrix is a correlation or covariance matrix, or any rescaled covariance matrix (by multiplying any variables by any positive constants). You would get correct results if you could somehow specify these covariance structures correctly in PROC CALIS. However, there seems to be nowhere in PROC CALIS that you can specify the diagonal matrix $\mb {D}$ for the population standard deviations. So what can one do with this formulation? The answer is to rewrite the covariance structure model in a form similar to the usual confirmatory factor model, as presented in the following.

Let $\mb {T}=\mb {D} \mb {B}$ and $\mb {K} = \mb {D} \bPsi \mb {D}^{\prime }$. The covariance structure model of interest can now be rewritten as:

\[  \bSigma = \mb {T} \mb {T}^{\prime } + \mb {K}  \]

This form of covariance structures implies a confirmatory factor model with factor loading matrix $\mb {T}$ and error covariance matrix $\mb {K}$. This confirmatory factor model can certainly be specified using the FACTOR modeling language, in much the same way you specify a confirmatory factor model in the preceding section. However, because you are actually more interested in estimating the basic set of parameters in matrices $\mb {B}$ and $\bPsi $ of the embedded correlation structures, you would define the model parameters as functions of this basic set of parameters of interest. This can be accomplished by using the PARAMETERS and the SAS programming statements.

All in all, you can use the following statements to set up such a confirmatory factor model with the desired correlation structures embedded.

proc calis data=Kinzer nobs=326 nose;
   factor
      factor1 ===> var1-var6   = t11 t21 t31 t41 t51 t61,
      factor2 ===> var1-var6   = t12 t22 t32 t42 t52 t62;
   pvar
      factor1-factor2 = 2 * 1.,
      var1-var6       = k1-k6;
   cov
      factor1 factor2 = 0.;
   parameters alpha (1.) d1-d6 (6 * 1.)
              b11 b21 b31 b41 b51 b61 (6 *.6),
              b12 b22 b32 b42 b52 b62
              psi1-psi6;
   /* SAS Programming Statements */
   /* 12 Constraints on Correlation structures */
   b12  = alpha - b11;
   b22  = alpha - b21;
   b32  = alpha - b31;
   b42  = alpha - b41;
   b52  = alpha - b51;
   b62  = alpha - b61;
   psi1 = 1. - b11 * b11 - b12 * b12;
   psi2 = 1. - b21 * b21 - b22 * b22;
   psi3 = 1. - b31 * b31 - b32 * b32;
   psi4 = 1. - b41 * b41 - b42 * b42;
   psi5 = 1. - b51 * b51 - b52 * b52;
   psi6 = 1. - b61 * b61 - b62 * b62;
   /* Defining Covariance Structure Parameters */
   t11  = d1 * b11;
   t21  = d2 * b21;
   t31  = d3 * b31;
   t41  = d4 * b41;
   t51  = d5 * b51;
   t61  = d6 * b61;
   t12  = d1 * b12;
   t22  = d2 * b22;
   t32  = d3 * b32;
   t42  = d4 * b42;
   t52  = d5 * b52;
   t62  = d6 * b62;
   k1   = d1 * d1 * psi1;
   k2   = d2 * d2 * psi2;
   k3   = d3 * d3 * psi3;
   k4   = d4 * d4 * psi4;
   k5   = d5 * d5 * psi5;
   k6   = d6 * d6 * psi6;
   fitindex on(only)=[chisq df probchi];
run;

First, you notice that specifications in the FACTOR and the PVAR statements are essentially unchanged from the previous specification, except that the parameters are named differently here to reflect different model matrices. In the current specification, the factor loading parameters in matrix $\mb {T}$ are named with prefix 't,' and the error variance parameters in matrix $\mb {K}$ are named with prefix 'k.' Specification of these parameters reflects the covariance structures. As you see in the last block of the SAS programming statements statements, all these parameters are functions of the correlation structure parameters in $\mb {B}$, $\bPsi $, and $\mb {D}$.

Next, in the PARAMETERS statement, all correlation structure parameters are defined with initial values provided. These are the parameters of interest: alpha is used to define dependencies among loadings, d’s are the population standard deviations, b’s are the loading parameters, and psi’s are the error variance parameters. There are 25 parameters specified in this statement, but not all of them are free or independent.

In the first block of SAS programming statements, parameter dependencies or constraints on the correlation structures are specified. The first six statements realize the required linear relations among the factor loadings:

\[  b_{j2} = \alpha - b_{j1},\quad j = 1, \ldots , 6  \]

The next six statements constrain the error variances so as to ensure that an embedded correlation structure model is being fitted. That is, each error variance is dependent on the corresponding loadings, as prescribed by the following equation:

\[  \bPsi _{jj} = 1. - b_{j1}^2 - b_{j2}^2, \quad j = 1, \ldots , 6  \]

These twelve constraints reduce the number of independent parameters to 13, as expected.

The next block of SAS programming statements are essentially for relating the correlation structure parameters to the covariance structures that are specified in the FACTOR and the PVAR statements. These SAS programming statements realize the required relations: $\mb {T}=\mb {D} \mb {B}$ and $\mb {K} = \mb {D} \bPsi \mb {D}^{\prime }$, but in non-matrix forms:

\[  t_{ji} = d_{j} b_{ji} \quad (j = 1, \ldots , 6; \quad i = 1,2)  \]
\[  k_{jj} = d_{j} d_{j} \bPsi _{jj} \quad (j = 1, \ldots , 6)  \]

where $d_{j}$ denotes the jth diagonal element of $\mb {D}$.

The fit summary is presented in Output 27.27.4. The chi-square test statistic is 14.63 with df = 8 (p = 0.067). This shows that the previous chi-square test based on fitting a wrong covariance structure model is indeed questionable.

Output 27.27.4: Model Fit of the Correlation Structures

Fit Summary
Chi-Square 14.6269
Chi-Square DF 8
Pr > Chi-Square 0.0668


Estimates of the loadings and error variances are presented in Output 27.27.5. These estimates are for the covariance structure model with loading matrix $\mb {T}$ and error covariance matrix $\mb {K}$. They are rescaled versions of the correlation structure parameters and are not of primary interest themselves.

Output 27.27.5: Estimates of Loadings and Error Variances

Factor Loading Matrix
  factor1 factor2
var1
0.3448
[t11]
0.6367
[t12]
var2
0.3200
[t21]
0.6512
[t22]
var3
0.4873
[t31]
0.4778
[t32]
var4
0.5703
[t41]
0.3948
[t42]
var5
0.7741
[t51]
0.1964
[t52]
var6
0.6778
[t61]
0.3126
[t62]

Factor Covariance Matrix
  factor1 factor2
factor1 1.0000 0
factor2 0 1.0000

Error Variances
Variable Parameter Estimate
var1 k1 0.49119
var2 k2 0.46780
var3 k3 0.51597
var4 k4 0.50070
var5 k5 0.35505
var6 k6 0.47685


The parameter estimates of the embedded correlation structures are shown in Output 27.27.6 as additional parameters.

Output 27.27.6: Estimates of Correlation Structure Parameters

Additional Parameters
Type Parameter Estimate
Independent alpha 0.97400
  d1 1.00771
  d2 0.99712
  d3 0.99078
  d4 0.99085
  d5 0.99640
  d6 1.01687
  b11 0.34217
  b21 0.32095
  b31 0.49179
  b41 0.57553
  b51 0.77686
  b61 0.66659
Dependent b12 0.63183
  b22 0.65305
  b32 0.48222
  b42 0.39848
  b52 0.19714
  b62 0.30742
  psi1 0.48371
  psi2 0.47051
  psi3 0.52561
  psi4 0.50998
  psi5 0.35762
  psi6 0.46116


Except for the population standard deviation parameter d’s, all other parameters estimated in the current model can be compared with those from the previous fitting of an incorrect covariance structure model. Although estimates in the current model do not differ very much from those in the previous specification, it is at least reassuring that they are obtained from fitting a correctly specified covariance structure model with the intended correlation structures embedded.