The SURVEYPHREG Procedure

Time and CLASS Variables Usage

The following DATA step creates an artificial data set, Test, to be used in this section. There are six variables in Test: the variable T contains the failure times; the variable Status is the censoring indicator variable with the value 1 for an uncensored failure time and the value 0 for a censored time; the variable A is a categorical variable with values 1, 2, and 3 representing three different categories; the variable MirrorT is an exact copy of T; the variable W is the observation weight; and the variable S is the strata indicator.

data Test;
   input T Status A W S @@;
   MirrorT = T;
   datalines;
 23    1    1   10   1    7    0   1   20   2
 23    1    1   10   1   10    1   1   20   2
 20    0    1   10   1   13    0   1   20   2
 24    1    1   10   1   10    1   1   20   2
 18    1    2   10   1    6    1   2   20   2
 18    0    2   10   1    6    1   2   20   2
 13    0    2   10   1   13    1   2   20   2
  9    0    2   10   1   15    1   2   20   2
  8    1    3   10   1    6    1   3   20   2
 12    0    3   10   1    4    1   3   20   2
 11    1    3   10   1    8    1   1   20   2
  6    1    3   10   1    7    1   3   20   2
  7    1    3   10   1   12    1   3   20   2
  9    1    2   10   1   15    1   2   20   2
  3    1    2   10   1   14    0   3   20   2
  6    1    1   10   1   13    1   2   20   2
;

Time Variable on the Right Side of the MODEL Statement

The time variable cannot be used explicitly as an explanatory effect in the MODEL statement. The following statements produce an error message:

proc surveyphreg data=Test;
   weight W;
   strata S;
   class A;
   model T*Status(0)=T*A;
run;

To use the time variable as an explanatory effect, replace T by MirrorT as an effect, which is an exact copy of T, as in the following statements:

proc surveyphreg data=Test;
   weight W;
   strata S;
   class A;
   model T*Status(0)=A*MirrorT;
run;

Note that neither T*A nor MirrorT*A in the MODEL statement is time-dependent. The results of fitting this model are shown in Figure 97.3.

Figure 97.3: T*A Effect

The SURVEYPHREG Procedure

Analysis of Maximum Likelihood Estimates
Parameter DF Estimate Standard Error t Value Pr > |t| Hazard
Ratio
MirrorT*A 1 30 -17.560699 0.337239 -52.07 <.0001 0.000
MirrorT*A 2 30 -17.424235 0.331186 -52.61 <.0001 0.000
MirrorT*A 3 30 -17.448672 0.290159 -60.13 <.0001 0.000


CLASS Variables and Programming Statements

In PROC SURVEYPHREG, the levels of CLASS variables are determined by the CLASS statement and the input data and are not affected by user-supplied programming statements. Consider the following statements, which produce the results in Figure 97.4. Variable A is declared as a CLASS variable in the CLASS statement.

proc surveyphreg data=Test;
   weight W;
   strata S;
   class A;
   model T*Status(0)=A;
run;

Figure 97.4 shows the parameters that correspond to A and their respective regression coefficients estimates.

Figure 97.4: Design Variable and Regression Coefficient Estimates

The SURVEYPHREG Procedure

Class Level Information
Class Levels Values
A 3 1 2 3

Analysis of Maximum Likelihood Estimates
Parameter DF Estimate Standard Error t Value Pr > |t| Hazard
Ratio
A 1 30 -1.162184 0.655136 -1.77 0.0862 0.313
A 2 30 -0.616962 0.521841 -1.18 0.2464 0.540
A 3 30 0 . . . 1.000


Now consider the programming statement that attempts to change the value of the CLASS variable A as in the following specification:

proc surveyphreg data=Test;
   weight W;
   strata S;
   class A;
   model T*Status(0)=A;
   if A=3 then A=2;
run;

Results of this analysis are shown in Figure 97.5 and are identical to those in Figure 97.4. The if A=3 then A=2 programming statement has no effect on the explanatory variable for A, which have already been determined.

Figure 97.5: Design Variable and Regression Coefficient Estimates

The SURVEYPHREG Procedure

Class Level Information
Class Levels Values
A 3 1 2 3

Analysis of Maximum Likelihood Estimates
Parameter DF Estimate Standard Error t Value Pr > |t| Hazard
Ratio
A 1 30 -1.162184 0.655136 -1.77 0.0862 0.313
A 2 30 -0.616962 0.521841 -1.18 0.2464 0.540
A 3 30 0 . . . 1.000


Additionally any variable used in a programming statement that has already been declared in the CLASS statement is not treated as a collection of the corresponding design variables. Consider the following statements:

proc surveyphreg data=Test;
   class A;
   model T*Status(0)=A X;
   X=T*A;
run;

The CLASS variable A generates two design variables as explanatory variables. The variable X created by the X=T*A programming statement is a single time-dependent covariate whose values are evaluated using the exact values of A given in the data, not the dummy coded values that represent A. In the data set Test, A has the values of 1, 2, and 3, and these values are multiplied by the values of T to produce X. If A were a character variable with values 'Bird', 'Cat', and 'Dog', the programming statement X=T*A would have produced an error in the attempt to multiply a number with a character value.

Figure 97.6: Single Time-Dependent Variable X*A

The SURVEYPHREG Procedure

Analysis of Maximum Likelihood Estimates
Parameter DF Estimate Standard Error t Value Pr > |t| Hazard
Ratio
A 1 31 0.158010 1.222654 0.13 0.8980 1.171
A 2 31 0.008993 0.674629 0.01 0.9894 1.009
A 3 31 0 . . . 1.000
X 31 0.092679 0.073746 1.26 0.2182 1.097


The following statements are not the same as in the preceding program. If you want to create time-dependent covariates from the values of a CLASS variable, you could use syntax like the following:

proc surveyphreg data=Test;
   class A;
   model T*Status(0)=A X1 X2;
   X1= T*(A=1);
   X2= T*(A=2);
run;

The Boolean parenthetical expressions (A=1) and (A=2) resolve to a value of 1 or 0, depending on whether the expression is true or false, respectively.

Results of this test are shown in Figure 97.7.

Figure 97.7: Simple Test of Proportional Hazards Assumption

The SURVEYPHREG Procedure

Analysis of Maximum Likelihood Estimates
Parameter DF Estimate Standard Error t Value Pr > |t| Hazard
Ratio
A 1 31 -0.007655 1.284875 -0.01 0.9953 0.992
A 2 31 -0.881383 1.834533 -0.48 0.6343 0.414
A 3 31 0 . . . 1.000
X1 31 -0.155220 0.172914 -0.90 0.3763 0.856
X2 31 0.011554 0.198796 0.06 0.9540 1.012


In general, when your model contains a categorical explanatory variable that is time-dependent, it might be necessary to use hardcoded dummy variables to represent the categories of the categorical variable.