The GENMOD Procedure

Generalized Estimating Equations

This section illustrates the use of the REPEATED statement to fit a GEE model, using repeated measures data from the Six Cities study of the health effects of air pollution (Ware et al., 1984). The data analyzed are the 16 selected cases in Lipsitz et al. (1994). The binary response is the wheezing status of 16 children at ages 9, 10, 11, and 12 years. The mean response is modeled as a logistic regression model by using the explanatory variables city of residence, age, and maternal smoking status at the particular age. The binary responses for individual children are assumed to be equally correlated, implying an exchangeable correlation structure.

The data set and SAS statements that fit the model by the GEE method are as follows:

data six;
   input case city$ @@;
   do i=1 to 4;
      input age smoke wheeze @@;
      output;
   end;
   datalines;
 1 portage   9 0 1  10 0 1  11 0 1  12 0 0
 2 kingston  9 1 1  10 2 1  11 2 0  12 2 0
 3 kingston  9 0 1  10 0 0  11 1 0  12 1 0
 4 portage   9 0 0  10 0 1  11 0 1  12 1 0
 5 kingston  9 0 0  10 1 0  11 1 0  12 1 0
 6 portage   9 0 0  10 1 0  11 1 0  12 1 0
 7 kingston  9 1 0  10 1 0  11 0 0  12 0 0
 8 portage   9 1 0  10 1 0  11 1 0  12 2 0
 9 portage   9 2 1  10 2 0  11 1 0  12 1 0
10 kingston  9 0 0  10 0 0  11 0 0  12 1 0
11 kingston  9 1 1  10 0 0  11 0 1  12 0 1
12 portage   9 1 0  10 0 0  11 0 0  12 0 0
13 kingston  9 1 0  10 0 1  11 1 1  12 1 1
14 portage   9 1 0  10 2 0  11 1 0  12 2 1
15 kingston  9 1 0  10 1 0  11 1 0  12 2 1
16 portage   9 1 1  10 1 1  11 2 0  12 1 0
;

proc genmod data=six;
   class case city;
   model  wheeze = city age smoke  /  dist=bin;
   repeated  subject=case / type=exch covb corrw;
run;

The CLASS statement and the MODEL statement specify the model for the mean of the wheeze variable response as a logistic regression with city, age, and smoke as independent variables, just as for an ordinary logistic regression.

The REPEATED statement invokes the GEE method, specifies the correlation structure, and controls the displayed output from the GEE model. The option SUBJECT=CASE specifies that individual subjects be identified in the input data set by the variable case. The SUBJECT= variable case must be listed in the CLASS statement. Measurements on individual subjects at ages 9, 10, 11, and 12 are in the proper order in the data set, so the WITHINSUBJECT= option is not required. The TYPE=EXCH option specifies an exchangeable working correlation structure, the COVB option specifies that the parameter estimate covariance matrix be displayed, and the CORRW option specifies that the final working correlation be displayed.

Initial parameter estimates for iterative fitting of the GEE model are computed as in an ordinary generalized linear model, as described previously. Results of the initial model fit displayed as part of the generated output are not shown here. Statistics for the initial model fit such as parameter estimates, standard errors, deviances, and Pearson chi-squares do not apply to the GEE model and are valid only for the initial model fit. The following figures display information that applies to the GEE model fit.

Figure 40.27 displays general information about the GEE model fit.

Figure 40.27: GEE Model Information

The GENMOD Procedure

GEE Model Information
Correlation Structure Exchangeable
Subject Effect case (16 levels)
Number of Clusters 16
Correlation Matrix Dimension 4
Maximum Cluster Size 4
Minimum Cluster Size 4


Figure 40.28 displays the parameter estimate covariance matrices specified by the COVB option. Both model-based and empirical covariances are produced.

Figure 40.28: GEE Parameter Estimate Covariance Matrices

Covariance Matrix (Model-Based)
  Prm1 Prm2 Prm4 Prm5
Prm1 5.74947 -0.22257 -0.53472 0.01655
Prm2 -0.22257 0.45478 -0.002410 0.01876
Prm4 -0.53472 -0.002410 0.05300 -0.01658
Prm5 0.01655 0.01876 -0.01658 0.19104

Covariance Matrix (Empirical)
  Prm1 Prm2 Prm4 Prm5
Prm1 9.33994 -0.85104 -0.83253 -0.16534
Prm2 -0.85104 0.47368 0.05736 0.04023
Prm4 -0.83253 0.05736 0.07778 -0.002364
Prm5 -0.16534 0.04023 -0.002364 0.13051


The exchangeable working correlation matrix specified by the CORRW option is displayed in Figure 40.29.

Figure 40.29: GEE Working Correlation Matrix

Working Correlation Matrix
  Col1 Col2 Col3 Col4
Row1 1.0000 0.1648 0.1648 0.1648
Row2 0.1648 1.0000 0.1648 0.1648
Row3 0.1648 0.1648 1.0000 0.1648
Row4 0.1648 0.1648 0.1648 1.0000


The parameter estimates table, displayed in Figure 40.30, contains parameter estimates, standard errors, confidence intervals, Z scores, and p-values for the parameter estimates. Empirical standard error estimates are used in this table. A table that displays model-based standard errors can be created by using the REPEATED statement option MODELSE.

Figure 40.30: GEE Parameter Estimates Table

Analysis Of GEE Parameter Estimates
Empirical Standard Error Estimates
Parameter   Estimate Standard Error 95% Confidence Limits Z Pr > |Z|
Intercept   -1.2751 3.0561 -7.2650 4.7148 -0.42 0.6765
city kingston -0.1223 0.6882 -1.4713 1.2266 -0.18 0.8589
city portage 0.0000 0.0000 0.0000 0.0000 . .
age   0.2036 0.2789 -0.3431 0.7502 0.73 0.4655
smoke   0.0935 0.3613 -0.6145 0.8016 0.26 0.7957