This section illustrates some of the basic features of the GEE procedure by analyzing longitudinal data from Stokes, Davis, and Koch (2012).
In this study, researchers followed 25 children at ages 8, 9, 10, and 11 years. The goal of this study is to investigate the health effects of air pollution on children. The binary response is the wheezing status of the children at four different ages. The explanatory variables are age, city, and passive smoking index (with values 0, 1, 2) that represented the degree of smoking in the home. The responses for individual children are assumed to be equally correlated, implying an exchangeable correlation structure.
The following statements create the data set Children
:
data Children; input ID City$ @@; do i=1 to 4; input Age Smoke Symptom @@; output; end; datalines; 1 steelcity 8 0 1 9 0 1 10 0 1 11 0 0 2 steelcity 8 2 1 9 2 1 10 2 1 11 1 0 3 steelcity 8 2 1 9 2 0 10 1 0 11 0 0 4 greenhills 8 0 0 9 1 1 10 1 1 11 0 0 5 steelcity 8 0 0 9 1 0 10 1 0 11 1 0 6 greenhills 8 0 1 9 0 0 10 0 0 11 0 1 7 steelcity 8 1 1 9 1 1 10 0 1 11 0 0 8 greenhills 8 1 0 9 1 0 10 1 0 11 2 0 9 greenhills 8 2 1 9 2 0 10 1 1 11 1 0 10 steelcity 8 0 0 9 0 0 10 0 0 11 1 0 11 steelcity 8 1 1 9 0 0 10 0 0 11 0 1 12 greenhills 8 0 0 9 0 0 10 0 0 11 0 0 13 steelcity 8 2 1 9 2 1 10 1 0 11 0 1 14 greenhills 8 0 1 9 0 1 10 0 0 11 0 0 15 steelcity 8 2 0 9 0 0 10 0 0 11 2 1 16 greenhills 8 1 0 9 1 0 10 0 0 11 1 0 17 greenhills 8 0 0 9 0 1 10 0 1 11 1 1 18 steelcity 8 1 1 9 2 1 10 0 0 11 1 0 19 steelcity 8 2 1 9 1 0 10 0 1 11 0 0 20 greenhills 8 0 0 9 0 1 10 0 1 11 0 0 21 steelcity 8 1 0 9 1 0 10 1 0 11 2 1 22 greenhills 8 0 1 9 0 1 10 0 0 11 0 0 23 steelcity 8 1 1 9 1 0 10 0 1 11 0 0 24 greenhills 8 1 0 9 1 1 10 1 1 11 2 1 25 greenhills 8 0 1 9 0 0 10 0 0 11 0 0 ;
The following statements fit the model by the GEE method:
proc gee data=Children descending; class ID City; model Symptom = City Age Smoke / dist=bin link=logit; repeated subject=ID / type=exch covb corrw; run;
Both the MODEL statement and the REPEATED statement are required.
The DIST=BIN and LINK=LOGIT options in the MODEL statement request a logistic regression with the variable Symptom
as the response and City
, Age
, and Smoke
as explanatory variables.
The REPEATED statement specifies the correlation structure and requests various tables in the output. The SUBJECT=ID option
requests that individual subjects be identified in the input data set by the variable Case
, which must be listed in the CLASS statement. Measurements of individual subjects at ages 8, 9, 10, and 11 are in the proper
order in the data set, so the WITHIN= option is not required. The TYPE=EXCH option specifies an exchangeable working correlation
structure, the COVB option requests the parameter estimate covariance matrix, and the CORRW option requests the working correlation
matrix.
Figure 42.1 shows the "Model Information" table, which provides information about the specified logistic regression model and the input data set.
Figure 42.2 displays general information about the GEE analysis. Each subject has four measurements.
Figure 42.3 displays the model-based and empirical covariance matrices of the parameter estimates.
The exchangeable working correlation matrix is displayed in Figure 42.4.
The parameter estimates table, shown in Figure 42.5, contains parameter estimates, standard errors, confidence intervals, Z scores, and p-values for the parameter estimates. Empirical standard error estimates are used in this table. You can create a table that
uses model-based standard errors by specifying the MODELSE option in the REPEATED statement. The results indicate that smoking
exposure is significant with a p-value of 0.0211, Age
is marginally influential with a p-value of 0.0893, and City
does not influence wheezing. The parameter estimate for Age
is –0.3201, which indicates that the odds ratio of wheezing for the children at the higher age group compared to those in
the lower age group is .
Figure 42.5: GEE Parameter Estimates Table
Parameter Estimates for Response Model | |||||||
---|---|---|---|---|---|---|---|
with Empirical Standard Error | |||||||
Parameter | Estimate | Standard Error |
95% Confidence Limits | Z | Pr > |Z| | ||
Intercept | 2.2615 | 2.0243 | -1.7060 | 6.2290 | 1.12 | 0.2639 | |
City | greenhil | 0.0418 | 0.5435 | -1.0234 | 1.1070 | 0.08 | 0.9387 |
City | steelcit | 0.0000 | 0.0000 | 0.0000 | 0.0000 | . | . |
Age | -0.3201 | 0.1884 | -0.6894 | 0.0492 | -1.70 | 0.0893 | |
Smoke | 0.6506 | 0.2821 | 0.0978 | 1.2035 | 2.31 | 0.0211 |
Goodness-of-fit criteria for the model are displayed in Figure 42.6. For more information about the quasi-likelihood information criterion (QIC), see the section Quasi-likelihood Information Criterion.