The GLIMMIX Procedure

Example 41.12 Fitting a Marginal (GEE-Type) Model

A marginal GEE-type model for clustered data is a model for correlated data that is specified through a mean function, a variance function, and a working covariance structure. Because the assumed covariance structure can be wrong, the covariance matrix of the parameter estimates is not based on the model alone. Rather, one of the empirical (sandwich) estimators is used to make inferences robust against the choice of working covariance structure. PROC GLIMMIX can fit marginal models by using R-side random effects and drawing on the distributional specification in the MODEL statement to derive the link and variance functions. The EMPIRICAL= option in the PROC GLIMMIX statement enables you to choose one of a number of empirical covariance estimators.

The data for this example are from Thall and Vail (1990) and reflect the number of seizures of patients suffering from epileptic episodes. After an eight-week period without treatment, patients were observed four times in two-week intervals during which they received a placebo or the drug Progabide in addition to other therapy. These data are also analyzed in Example 40.7 of Chapter 40: The GENMOD Procedure. The following DATA step creates the data set seizures. The variable id identifies the subjects in the study, and the variable trt identifies whether a subject received the placebo (trt = 0) or the drug Progabide (trt = 1). The variable x1 takes on value 0 for the baseline measurement and 1 otherwise.

data seizures;
   array c{5};
   input id trt c1-c5;
   do i=1 to 5;
      x1    = (i > 1);
      ltime = (i=1)*log(8) + (i ne 1)*log(2);
      cnt   = c{i};
      output;
   end;
   keep id cnt x1 trt ltime;
   datalines;
101 1  76 11 14  9  8
102 1  38  8  7  9  4
103 1  19  0  4  3  0
104 0  11  5  3  3  3
106 0  11  3  5  3  3
107 0   6  2  4  0  5
108 1  10  3  6  1  3
110 1  19  2  6  7  4
111 1  24  4  3  1  3
112 1  31 22 17 19 16
113 1  14  5  4  7  4
114 0   8  4  4  1  4
116 0  66  7 18  9 21
117 1  11  2  4  0  4
118 0  27  5  2  8  7
121 1  67  3  7  7  7
122 1  41  4 18  2  5
123 0  12  6  4  0  2
124 1   7  2  1  1  0
126 0  52 40 20 23 12
128 1  22  0  2  4  0
129 1  13  5  4  0  3
130 0  23  5  6  6  5
135 0  10 14 13  6  0
137 1  46 11 14 25 15
139 1  36 10  5  3  8
141 0  52 26 12  6 22
143 1  38 19  7  6  7
145 0  33 12  6  8  4
147 1   7  1  1  2  3
201 0  18  4  4  6  2
202 0  42  7  9 12 14
203 1  36  6 10  8  8
204 1  11  2  1  0  0
205 0  87 16 24 10  9
206 0  50 11  0  0  5
208 1  22  4  3  2  4
209 1  41  8  6  5  7
210 0  18  0  0  3  3
211 1  32  1  3  1  5
213 0 111 37 29 28 29
214 1  56 18 11 28 13
215 0  18  3  5  2  5
217 0  20  3  0  6  7
218 1  24  6  3  4  0
219 0  12  3  4  3  4
220 0   9  3  4  3  4
221 1  16  3  5  4  3
222 0  17  2  3  3  5
225 1  22  1 23 19  8
226 0  28  8 12  2  8
227 0  55 18 24 76 25
228 1  25  2  3  0  1
230 0   9  2  1  2  1
232 1  13  0  0  0  0
234 0  10  3  1  4  2
236 1  12  1  4  3  2
238 0  47 13 15 13 12
;

The model fit initially with the following PROC GLIMMIX statements is a Poisson generalized linear model with effects for an intercept, the baseline measurement, the treatment, and their interaction:

proc glimmix data=seizures;
   model cnt = x1 trt x1*trt / dist=poisson offset=ltime
                               ddfm=none s;
run;

The DDFM=NONE option is chosen in the MODEL statement to produce chi-square and z tests instead of F and t tests.

Because the initial pretreatment time period is four times as long as the subsequent measurement intervals, an offset variable is used to standardize the counts. If $Y_{ij}$ denotes the number of seizures of subject i in time interval j of length $t_ j$, then $Y_{ij}/t_ j$ is the number of seizures per time unit. Modeling the average number per time unit with a log link leads to $\log \{ \mr {E}[Y_{ij}/t_ j]\}  = \mb {x}’\bbeta $ or $\log \{ \mr {E}[Y_{ij}]\}  = \mb {x}’\bbeta + \log \{ t_ j\} $. The logarithm of time (variable ltime) thus serves as an offset. Suppose that $\beta _0$ denotes the intercept, $\beta _1$ the effect of x1, and $\beta _2$ the effect of trt. Then $\exp \{ \beta _0\} $ is the expected number of seizures per week in the placebo group at baseline. The corresponding numbers in the treatment group are $\exp \{ \beta _0 + \beta _2\} $ at baseline and $\exp \{ \beta _0 + \beta _1 + \beta _2\} $ for postbaseline visits.

The Model Information table shows that the parameters in this Poisson model are estimated by maximum likelihood (Output 41.12.1). In addition to the default link and variance function, the variable ltime is used as an offset.

Output 41.12.1: Model Information in Poisson GLM

The GLIMMIX Procedure

Model Information
Data Set WORK.SEIZURES
Response Variable cnt
Response Distribution Poisson
Link Function Log
Variance Function Default
Offset Variable ltime
Variance Matrix Diagonal
Estimation Technique Maximum Likelihood
Degrees of Freedom Method None


Fit statistics and parameter estimates are shown in Output 41.12.2.

Output 41.12.2: Results from Fitting Poisson GLM

Fit Statistics
-2 Log Likelihood 3442.66
AIC (smaller is better) 3450.66
AICC (smaller is better) 3450.80
BIC (smaller is better) 3465.34
CAIC (smaller is better) 3469.34
HQIC (smaller is better) 3456.54
Pearson Chi-Square 3015.16
Pearson Chi-Square / DF 10.54

Parameter Estimates
Effect Estimate Standard Error DF t Value Pr > |t|
Intercept 1.3476 0.03406 Infty 39.57 <.0001
x1 0.1108 0.04689 Infty 2.36 0.0181
trt -0.1080 0.04865 Infty -2.22 0.0264
x1*trt -0.3016 0.06975 Infty -4.32 <.0001


Because this is a generalized linear model, the large value for the ratio of the Pearson chi-square statistic and its degrees of freedom is indicative of a model shortcoming. The data are considerably more dispersed than is expected under a Poisson model. There could be many reasons for this overdispersion—for example, a misspecified mean model, data that might not be Poisson distributed, an incorrect variance function, and correlations among the observations. Because these data are repeated measurements, the presence of correlations among the observations from the same subject is a likely contributor to the overdispersion.

The following PROC GLIMMIX statements fit a marginal model with correlations. The model is a marginal one, because no G-side random effects are specified on which the distribution could be conditioned. The choice of the id variable as the SUBJECT effect indicates that observations from different IDs are uncorrelated. Observations from the same ID are assumed to follow a compound symmetry (equicorrelation) model. The EMPIRICAL option in the PROC GLIMMIX statement requests the classical sandwich estimator as the covariance estimator for the fixed effects:

proc glimmix data=seizures empirical;
   class id;
   model cnt = x1 trt x1*trt / dist=poisson offset=ltime
                               ddfm=none covb s;
   random _residual_ / subject=id type=cs vcorr;
run;

The Model Information table shows that the parameters are now estimated by residual pseudo-likelihood (compare Output 41.12.3 and Output 41.12.1). And in this fact lies the main difference between fitting marginal models with PROC GLIMMIX and with GEE methods as per Liang and Zeger (1986), where parameters of the working correlation matrix are estimated by the method of moments.

Output 41.12.3: Model Information in Marginal Model

The GLIMMIX Procedure

Model Information
Data Set WORK.SEIZURES
Response Variable cnt
Response Distribution Poisson
Link Function Log
Variance Function Default
Offset Variable ltime
Variance Matrix Blocked By id
Estimation Technique Residual PL
Degrees of Freedom Method None
Fixed Effects SE Adjustment Sandwich - Classical


According to the compound symmetry model, there is substantial correlation among the observations from the same subject (Output 41.12.4).

Output 41.12.4: Covariance Parameter Estimates and Correlation Matrix

Estimated V Correlation Matrix for id 101
Row Col1 Col2 Col3 Col4 Col5
1 1.0000 0.6055 0.6055 0.6055 0.6055
2 0.6055 1.0000 0.6055 0.6055 0.6055
3 0.6055 0.6055 1.0000 0.6055 0.6055
4 0.6055 0.6055 0.6055 1.0000 0.6055
5 0.6055 0.6055 0.6055 0.6055 1.0000

Covariance Parameter Estimates
Cov Parm Subject Estimate Standard Error
CS id 6.4653 1.3833
Residual   4.2128 0.3928


The parameter estimates in Output 41.12.5 are the same as in the Poisson generalized linear model (Output 41.12.2), because of the balance in these data. The standard errors have increased substantially, however, by taking into account the correlations among the observations.

Output 41.12.5: GEE-Type Inference for Fixed Effects

Solutions for Fixed Effects
Effect Estimate Standard Error DF t Value Pr > |t|
Intercept 1.3476 0.1574 Infty 8.56 <.0001
x1 0.1108 0.1161 Infty 0.95 0.3399
trt -0.1080 0.1937 Infty -0.56 0.5770
x1*trt -0.3016 0.1712 Infty -1.76 0.0781

Empirical Covariance Matrix for Fixed Effects
Effect Row Col1 Col2 Col3 Col4
Intercept 1 0.02476 -0.00115 -0.02476 0.001152
x1 2 -0.00115 0.01348 0.001152 -0.01348
trt 3 -0.02476 0.001152 0.03751 -0.00300
x1*trt 4 0.001152 -0.01348 -0.00300 0.02931