When experimental units are naturally or artificially clustered, failure times of experimental units within a cluster are correlated. Lee, Wei, and Amato (1992) estimate the regression parameters in the Cox model by maximizing a partial likelihood function under an independent working correlation assumption and estimate the variance of the estimated regression coefficients by using a robust sandwich variance estimator that accounts for the intracluster dependence.
The Diabetic Retinopathy Study (DRS) is a randomized, controlled clinical trial of more than 1,700 patients across 15 medical centers. One objective of this study was to determine if photocoagulation treatment delays the occurrence of blindness. One eye of each patient was randomly assigned to treatment and the other eye to control. See Example 73.11 in Chapter 73: The PHREG Procedure, for more information about the data set and a similar analysis; see http://www.nei.nih.gov/neitrials/static/study62.asp for more information about the DRS.
Each patient is a cluster that contributes two observations to the input data set, one for each eye. The following variables are available:
ID
, patient’s identification
Time
, failure time
Status
, event indicator (0=censored, and 1=uncensored)
Treatment
, treatment received (1=laser photocoagulation, and 0=otherwise)
DiabeticType
, type of diabetes (0=juvenile onset with age of onset at 20 or under, and 1= adult onset with age of onset over 20)
The following DATA step creates the data set Blind
, which represents 197 diabetic patients from the DRS:
data Blind; input ID Time Status DiabeticType Treatment @@; datalines; 5 46.23 0 1 1 5 46.23 0 1 0 14 42.50 0 0 1 14 31.30 1 0 0 16 42.27 0 0 1 16 42.27 0 0 0 25 20.60 0 0 1 25 20.60 0 0 0 29 38.77 0 0 1 29 0.30 1 0 0 46 65.23 0 0 1 46 54.27 1 0 0 49 63.50 0 0 1 49 10.80 1 0 0 56 23.17 0 0 1 56 23.17 0 0 0 61 1.47 0 0 1 61 1.47 0 0 0 71 58.07 0 1 1 71 13.83 1 1 0 100 46.43 1 1 1 100 48.53 0 1 0 112 44.40 0 1 1 112 7.90 1 1 0 120 39.57 0 1 1 120 39.57 0 1 0 127 30.83 1 1 1 127 38.57 1 1 0 133 66.27 0 1 1 133 14.10 1 1 0 150 20.17 1 0 1 150 6.90 1 0 0 167 58.43 0 1 1 167 41.40 1 1 0 176 58.20 0 0 1 176 58.20 0 0 0 185 57.43 0 1 1 185 57.43 0 1 0 190 56.03 0 0 1 190 56.03 0 0 0 ... more lines ... 1705 8.00 0 0 1 1705 8.00 0 0 0 1717 51.60 0 1 1 1717 42.33 1 1 0 1727 49.97 0 1 1 1727 2.90 1 1 0 1746 45.90 0 0 1 1746 1.43 1 0 0 1749 41.93 0 1 1 1749 41.93 0 1 0 ;
The following SAS statements request a proportional hazards regression of Time
on Treatment
, DiabeticType
, and the Treatment
DiabeticType
interaction, with Status
as the censoring indicator. The CLUSTER statement indicates the observations that came from the same patient.
proc surveyphreg data=Blind; model Time*Status(0) = Treatment DiabeticType Treatment*DiabeticType; cluster id; run;
Output 100.1.1 displays some summary information. There are 394 observations and 197 patients (clusters). Almost 61% of the observations
are censored. The p-values for the null model are less than 0.0001 for both the likelihood ratio test and the Wald test (Output 100.1.2), which indicates that the survival time is highly dependent on Treatment
and DiabeticType
. In this example, the likelihood ratio statistic has a chi-square distribution with 3 degrees of freedom and the Wald statistics
has the F distribution with the numerator degrees of freedom 3 and the denominator degrees of freedom 196. The denominator degrees
of freedom are calculated as the number of clusters (197) minus one.
Output 100.1.3 displays parameter estimates, standard errors, t statistics, denominator degrees of freedom, p-values, and hazard ratios. In this example data set, Treatment
and Treatment
DiabeticType
interaction are significant with p-values 0.023 and 0.006, respectively. Since the model contains Treatment
DiabeticType
interaction, the exponential of the estimated regression coefficient is not the hazard ratio. Use the ESTIMATE statement
to calculate the hazard ratios.