The PHREG Procedure

Firth’s Modification for Maximum Likelihood Estimation

In fitting a Cox model, the phenomenon of monotone likelihood is observed if the likelihood converges to a finite value while at least one parameter diverges (Heinze and Schemper, 2001).

Let $\mb {x}_ l(t)$ denote the vector explanatory variables for the lth individual at time t. Let $t_{1} < t_{2} < \ldots <t_{m}$ denote the k distinct, ordered event times. Let $ d_ j$ denote the multiplicity of failures at $t_ j$; that is, $d_ j$ is the size of the set $\mc {D}_ j$ of individuals that fail at $t_ j$. Let $\mc {R}_ j$ denote the risk set just before $t_ j$. Let $\bbeta =(\beta _1, \ldots , \beta _ k)’$ be the vector of regression parameters. The Breslow log partial likelihood is given by

\[  l(\bbeta ) = \log L(\bbeta ) = \sum _{j=1}^ m \biggl \{  \bbeta ’ \sum _{l\in \mc {D}_ j}\mb {x}_ l(t_ j) - d_ j \log \sum _{h \in \mc {R}j} \mr {e}^{\bbeta \mb {x}_ h(t_ j)} \biggr \}   \]

Denote

\[  \mb {S}_ j^{(a)}(\bbeta ) = \sum _{h \in \mc {R}j} \mr {e}^{\bbeta \mb {x}_ h(t_ j)} [\mb {x}_ h(t_ j)]^{\otimes a} \hspace{1cm} a=0,1,2  \]

Then the score function is given by

$\displaystyle  \mb {U}(\bbeta )  $
$\displaystyle \equiv  $
$\displaystyle  (U(\beta _1), \ldots , U(\beta _ k))’  $
$\displaystyle  $
$\displaystyle = $
$\displaystyle  \frac{\partial l(\bbeta )}{\partial \bbeta }  $
$\displaystyle  $
$\displaystyle = $
$\displaystyle  \sum _{j=1}^ m \biggl \{  \sum _{l \in \mc {D}j}\mb {x}_ l(t_ j) - d_ j \frac{\mb {S}_ j^{(1)}(\bbeta )}{S_ j^{0}(\bbeta )} \biggl \}   $

and the Fisher information matrix is given by

$\displaystyle  \mc {I}(\bbeta )  $
$\displaystyle = $
$\displaystyle  - \frac{\partial ^2 l(\bbeta )}{\partial \bbeta ^2} $
$\displaystyle  $
$\displaystyle  = $
$\displaystyle  \sum _{j=1}^ m d_ j \biggl \{  \frac{\mb {S}_ j^{(2)}(\bbeta )}{S_ j^{(0)}(\bbeta )} - \biggl [ \frac{\bS _ j^{(1)}(\bbeta )}{\mb {S}_ j^{(0)}(\bbeta )} \biggr ] \biggl [ \frac{\mb {S}_ j^{(1)}(\bbeta )}{\mb {S}_ j^{(0)}(\bbeta )} \biggr ]’ \biggr \}   $

Heinze (1999); Heinze and Schemper (2001) applied the idea of Firth (1993) by maximizing the penalized partial likelihood

\[  l^*(\bbeta ) = l(\bbeta ) + 0.5 \log (|\mc {I}(\bbeta )|)  \]

The score function $\mb {U}(\bbeta )$ is replaced by the modified score function by $\mb {U}^*(\bbeta )\equiv (U^*(\beta _1), \ldots , U^*(\beta _ k))’$, where

\[  U^*(\beta _ r) = U(\beta _ r) + 0.5 \mr {tr} \biggl \{ \mc {I}^{-1}(\bbeta ) \frac{\partial \mc {I}(\bbeta )}{\partial \beta _ r} \biggr \}  \hspace{1cm} r=1,\ldots ,k  \]

The Firth estimate is obtained iteratively as

\[  \bbeta ^{(s+1)} = \bbeta ^{(s)} + \mc {I}^{-1}(\bbeta ^{(s)})\mb {U}^*(\bbeta ^{(s)})  \]

The covariance matrix $\hat{\mb {V}}$ is computed as $\mc {I}^{-1}(\hat{\bbeta })$, where $\hat{\bbeta }$ is the maximum penalized partial likelihood estimate.

Explicit formulae for $\frac{\partial \mc {I}(\bbeta )}{\partial \beta _ r}, r=1,\ldots ,k$

Denote

$\displaystyle  \mb {x}_ h(t)  $
$\displaystyle = $
$\displaystyle  (x_{h1}(t), \ldots , x_{hk}(t))’  $
$\displaystyle \mb {Q}_{jr}^{(a)}(\bbeta )  $
$\displaystyle = $
$\displaystyle  \sum _{h \in \mc {R}j} \mr {e}^{\bbeta \mb {x}_ h(t_ j)} x_{hr}(t_ j) [\mb {x}_ h(t_ j)]^{\otimes a} \hspace{1cm} a=0,1,2; r=1,\ldots ,k  $

Then

$\displaystyle  \frac{\partial \mc {I}(\bbeta )}{\partial \beta _ r}  $
$\displaystyle = $
$\displaystyle  \sum _{j=1}^ m d_ j \biggl \{  \biggl [ \frac{\mb {Q}_{jr}^{(2)}(\bbeta )}{S_ j^{(0)}(\bbeta )} - \frac{\mb {Q}_{jr}^{(0)}(\bbeta )}{S_ j^{(0)}(\bbeta )} \frac{\mb {S}_{j}^{(2)}(\bbeta )}{S_ j^{(0)}(\bbeta )} \biggl ] -  $
$\displaystyle  $
$\displaystyle  $
$\displaystyle  \biggl [ \frac{\mb {Q}_{jr}^{(1)}(\bbeta )}{S_ j^{(0)}(\bbeta )} - \frac{\mb {Q}_{jr}^{(0)}(\bbeta )}{S_ j^{(0)}(\bbeta )} \frac{\mb {S}_{j}^{(1)}(\bbeta )}{S_ j^{(0)}(\bbeta )} \biggl ] \biggl [\frac{\mb {S}_{j}^{(1)}(\bbeta )}{S_ j^{(0)}(\bbeta )} \biggl ]’ - $
$\displaystyle  $
$\displaystyle  $
$\displaystyle  \biggl [\frac{\mb {S}_{j}^{(1)}(\bbeta )}{S_ j^{(0)}(\bbeta )} \biggl ] \biggl [ \frac{\mb {Q}_{jr}^{(1)}(\bbeta )}{S_ j^{(0)}(\bbeta )} - \frac{\mb {Q}_{jr}^{(0)}(\bbeta )}{S_ j^{(0)}(\bbeta )} \frac{\mb {S}_{j}^{(1)}(\bbeta )}{S_ j^{(0)}(\bbeta )} \biggl ]’ \biggr \}  \hspace{1cm} r=1,\ldots ,k  $