This section illustrates the use of the REPEATED statement to fit a GEE model, using repeated measures data from the “Six Cities” study of the health effects of air pollution (Ware et al., 1984). The data analyzed are the 16 selected cases in Lipsitz et al. (1994). The binary response is the wheezing status of 16 children at ages 9, 10, 11, and 12 years. The mean response is modeled as a logistic regression model by using the explanatory variables city of residence, age, and maternal smoking status at the particular age. The binary responses for individual children are assumed to be equally correlated, implying an exchangeable correlation structure.
The data set and SAS statements that fit the model by the GEE method are as follows:
data six; input case city$ @@; do i=1 to 4; input age smoke wheeze @@; output; end; datalines; 1 portage 9 0 1 10 0 1 11 0 1 12 0 0 2 kingston 9 1 1 10 2 1 11 2 0 12 2 0 3 kingston 9 0 1 10 0 0 11 1 0 12 1 0 4 portage 9 0 0 10 0 1 11 0 1 12 1 0 5 kingston 9 0 0 10 1 0 11 1 0 12 1 0 6 portage 9 0 0 10 1 0 11 1 0 12 1 0 7 kingston 9 1 0 10 1 0 11 0 0 12 0 0 8 portage 9 1 0 10 1 0 11 1 0 12 2 0 9 portage 9 2 1 10 2 0 11 1 0 12 1 0 10 kingston 9 0 0 10 0 0 11 0 0 12 1 0 11 kingston 9 1 1 10 0 0 11 0 1 12 0 1 12 portage 9 1 0 10 0 0 11 0 0 12 0 0 13 kingston 9 1 0 10 0 1 11 1 1 12 1 1 14 portage 9 1 0 10 2 0 11 1 0 12 2 1 15 kingston 9 1 0 10 1 0 11 1 0 12 2 1 16 portage 9 1 1 10 1 1 11 2 0 12 1 0 ;
proc genmod data=six; class case city; model wheeze = city age smoke / dist=bin; repeated subject=case / type=exch covb corrw; run;
The CLASS statement and the MODEL statement specify the model for the mean of the wheeze
variable response as a logistic regression with city
, age
, and smoke
as independent variables, just as for an ordinary logistic regression.
The REPEATED statement invokes the GEE method, specifies the correlation structure, and controls the displayed output from
the GEE model. The option SUBJECT=CASE specifies that individual subjects be identified in the input data set by the variable
case
. The SUBJECT= variable case
must be listed in the CLASS statement. Measurements on individual subjects at ages 9, 10, 11, and 12 are in the proper order
in the data set, so the WITHINSUBJECT= option is not required. The TYPE=EXCH option specifies an exchangeable working correlation
structure, the COVB option specifies that the parameter estimate covariance matrix be displayed, and the CORRW option specifies
that the final working correlation be displayed.
Initial parameter estimates for iterative fitting of the GEE model are computed as in an ordinary generalized linear model, as described previously. Results of the initial model fit displayed as part of the generated output are not shown here. Statistics for the initial model fit such as parameter estimates, standard errors, deviances, and Pearson chi-squares do not apply to the GEE model and are valid only for the initial model fit. The following figures display information that applies to the GEE model fit.
Figure 40.27 displays general information about the GEE model fit.
Figure 40.27: GEE Model Information
GEE Model Information | |
---|---|
Correlation Structure | Exchangeable |
Subject Effect | case (16 levels) |
Number of Clusters | 16 |
Correlation Matrix Dimension | 4 |
Maximum Cluster Size | 4 |
Minimum Cluster Size | 4 |
Figure 40.28 displays the parameter estimate covariance matrices specified by the COVB option. Both model-based and empirical covariances are produced.
Figure 40.28: GEE Parameter Estimate Covariance Matrices
Covariance Matrix (Model-Based) | ||||
---|---|---|---|---|
Prm1 | Prm2 | Prm4 | Prm5 | |
Prm1 | 5.74947 | -0.22257 | -0.53472 | 0.01655 |
Prm2 | -0.22257 | 0.45478 | -0.002410 | 0.01876 |
Prm4 | -0.53472 | -0.002410 | 0.05300 | -0.01658 |
Prm5 | 0.01655 | 0.01876 | -0.01658 | 0.19104 |
Covariance Matrix (Empirical) | ||||
---|---|---|---|---|
Prm1 | Prm2 | Prm4 | Prm5 | |
Prm1 | 9.33994 | -0.85104 | -0.83253 | -0.16534 |
Prm2 | -0.85104 | 0.47368 | 0.05736 | 0.04023 |
Prm4 | -0.83253 | 0.05736 | 0.07778 | -0.002364 |
Prm5 | -0.16534 | 0.04023 | -0.002364 | 0.13051 |
The exchangeable working correlation matrix specified by the CORRW option is displayed in Figure 40.29.
Figure 40.29: GEE Working Correlation Matrix
Working Correlation Matrix | ||||
---|---|---|---|---|
Col1 | Col2 | Col3 | Col4 | |
Row1 | 1.0000 | 0.1648 | 0.1648 | 0.1648 |
Row2 | 0.1648 | 1.0000 | 0.1648 | 0.1648 |
Row3 | 0.1648 | 0.1648 | 1.0000 | 0.1648 |
Row4 | 0.1648 | 0.1648 | 0.1648 | 1.0000 |
The parameter estimates table, displayed in Figure 40.30, contains parameter estimates, standard errors, confidence intervals, Z scores, and p-values for the parameter estimates. Empirical standard error estimates are used in this table. A table that displays model-based standard errors can be created by using the REPEATED statement option MODELSE.
Figure 40.30: GEE Parameter Estimates Table
Analysis Of GEE Parameter Estimates | |||||||
---|---|---|---|---|---|---|---|
Empirical Standard Error Estimates | |||||||
Parameter | Estimate | Standard Error | 95% Confidence Limits | Z | Pr > |Z| | ||
Intercept | -1.2751 | 3.0561 | -7.2650 | 4.7148 | -0.42 | 0.6765 | |
city | kingston | -0.1223 | 0.6882 | -1.4713 | 1.2266 | -0.18 | 0.8589 |
city | portage | 0.0000 | 0.0000 | 0.0000 | 0.0000 | . | . |
age | 0.2036 | 0.2789 | -0.3431 | 0.7502 | 0.73 | 0.4655 | |
smoke | 0.0935 | 0.3613 | -0.6145 | 0.8016 | 0.26 | 0.7957 |