The PHREG Procedure

Firth’s Modification for Maximum Likelihood Estimation

In fitting a Cox model, the phenomenon of monotone likelihood is observed if the likelihood converges to a finite value while at least one parameter diverges (Heinze and Schemper, 2001).

Let $\mb{x}_ l(t)$ denote the vector explanatory variables for the lth individual at time t. Let $t_{1} < t_{2} < \ldots <t_{m}$ denote the k distinct, ordered event times. Let $ d_ j$ denote the multiplicity of failures at $t_ j$; that is, $d_ j$ is the size of the set $\mc{D}_ j$ of individuals that fail at $t_ j$. Let $\mc{R}_ j$ denote the risk set just before $t_ j$. Let $\bbeta =(\beta _1, \ldots , \beta _ k)’$ be the vector of regression parameters. The Breslow log partial likelihood is given by

\[  l(\bbeta ) = \log L(\bbeta ) = \sum _{j=1}^ m \biggl \{  \bbeta ’ \sum _{l\in \mc{D}_ j}\mb{x}_ l(t_ j) - d_ j \log \sum _{h \in \mc{R}j} \mr{e}^{\bbeta '\mb{x}_ h(t_ j)} \biggr \}   \]

Denote

\[  \mb{S}_ j^{(a)}(\bbeta ) = \sum _{h \in \mc{R}j} \mr{e}^{\bbeta '\mb{x}_ h(t_ j)} [\mb{x}_ h(t_ j)]^{\otimes a} \hspace{1cm} a=0,1,2  \]

Then the score function is given by

\begin{eqnarray*}  \mb{U}(\bbeta ) & \equiv &  (U(\beta _1), \ldots , U(\beta _ k))’ \\ & =&  \frac{\partial l(\bbeta )}{\partial \bbeta } \\ & =&  \sum _{j=1}^ m \biggl \{  \sum _{l \in \mc{D}j}\mb{x}_ l(t_ j) - d_ j \frac{\mb{S}_ j^{(1)}(\bbeta )}{S_ j^{0}(\bbeta )} \biggl \}  \end{eqnarray*}

and the Fisher information matrix is given by

\begin{eqnarray*}  \mc{I}(\bbeta ) & =&  - \frac{\partial ^2 l(\bbeta )}{\partial \bbeta ^2}\\ &  =&  \sum _{j=1}^ m d_ j \biggl \{  \frac{\mb{S}_ j^{(2)}(\bbeta )}{S_ j^{(0)}(\bbeta )} - \biggl [ \frac{\bS _ j^{(1)}(\bbeta )}{\mb{S}_ j^{(0)}(\bbeta )} \biggr ] \biggl [ \frac{\mb{S}_ j^{(1)}(\bbeta )}{\mb{S}_ j^{(0)}(\bbeta )} \biggr ]’ \biggr \}  \end{eqnarray*}

Heinze (1999); Heinze and Schemper (2001) applied the idea of Firth (1993) by maximizing the penalized partial likelihood

\[  l^*(\bbeta ) = l(\bbeta ) + 0.5 \log (|\mc{I}(\bbeta )|)  \]

The score function $\mb{U}(\bbeta )$ is replaced by the modified score function by $\mb{U}^*(\bbeta )\equiv (U^*(\beta _1), \ldots , U^*(\beta _ k))’$, where

\[  U^*(\beta _ r) = U(\beta _ r) + 0.5 \mr{tr} \biggl \{ \mc{I}^{-1}(\bbeta ) \frac{\partial \mc{I}(\bbeta )}{\partial \beta _ r} \biggr \}  \hspace{1cm} r=1,\ldots ,k  \]

The Firth estimate is obtained iteratively as

\[  \bbeta ^{(s+1)} = \bbeta ^{(s)} + \mc{I}^{-1}(\bbeta ^{(s)})\mb{U}^*(\bbeta ^{(s)})  \]

The covariance matrix $\hat{\mb{V}}$ is computed as $\mc{I}^{-1}(\hat{\bbeta })$, where $\hat{\bbeta }$ is the maximum penalized partial likelihood estimate.

Explicit formulae for $\frac{\partial \mc{I}(\bbeta )}{\partial \beta _ r}, r=1,\ldots ,k$

Denote

\begin{eqnarray*}  \mb{x}_ h(t) & =&  (x_{h1}(t), \ldots , x_{hk}(t))’ \\ \mb{Q}_{jr}^{(a)}(\bbeta ) & =&  \sum _{h \in \mc{R}j} \mr{e}^{\bbeta '\mb{x}_ h(t_ j)} x_{hr}(t_ j) [\mb{x}_ h(t_ j)]^{\otimes a} \hspace{1cm} a=0,1,2; r=1,\ldots ,k \end{eqnarray*}

Then

\begin{eqnarray*}  \frac{\partial \mc{I}(\bbeta )}{\partial \beta _ r} & =&  \sum _{j=1}^ m d_ j \biggl \{  \biggl [ \frac{\mb{Q}_{jr}^{(2)}(\bbeta )}{S_ j^{(0)}(\bbeta )} - \frac{\mb{Q}_{jr}^{(0)}(\bbeta )}{S_ j^{(0)}(\bbeta )} \frac{\mb{S}_{j}^{(2)}(\bbeta )}{S_ j^{(0)}(\bbeta )} \biggl ] - \\ & &  \biggl [ \frac{\mb{Q}_{jr}^{(1)}(\bbeta )}{S_ j^{(0)}(\bbeta )} - \frac{\mb{Q}_{jr}^{(0)}(\bbeta )}{S_ j^{(0)}(\bbeta )} \frac{\mb{S}_{j}^{(1)}(\bbeta )}{S_ j^{(0)}(\bbeta )} \biggl ] \biggl [\frac{\mb{S}_{j}^{(1)}(\bbeta )}{S_ j^{(0)}(\bbeta )} \biggl ]’ -\\ & &  \biggl [\frac{\mb{S}_{j}^{(1)}(\bbeta )}{S_ j^{(0)}(\bbeta )} \biggl ] \biggl [ \frac{\mb{Q}_{jr}^{(1)}(\bbeta )}{S_ j^{(0)}(\bbeta )} - \frac{\mb{Q}_{jr}^{(0)}(\bbeta )}{S_ j^{(0)}(\bbeta )} \frac{\mb{S}_{j}^{(1)}(\bbeta )}{S_ j^{(0)}(\bbeta )} \biggl ]’ \biggr \}  \hspace{1cm} r=1,\ldots ,k \end{eqnarray*}