The LOGISTIC Procedure

Regression Diagnostics

For binary response data, regression diagnostics developed by Pregibon (1981) can be requested by specifying the INFLUENCE option. For diagnostics available with conditional logistic regression, see the section Regression Diagnostic Details. These diagnostics can also be obtained from the OUTPUT statement.

This section uses the following notation:

$r_ j, n_ j$

$r_ j$ is the number of event responses out of $n_ j$ trials for the jth observation. If events/trials syntax is used, $r_ j$ is the value of events and $n_ j$ is the value of trials. For single-trial syntax, $n_ j=1$, and $r_ j=1$ if the ordered response is 1, and $r_ j=0$ if the ordered response is 2.

$w_ j$

is the weight of the jth observation.

$\pi _ j$

is the probability of an event response for the jth observation given by $\pi _ j=F(\alpha +\bbeta ’\mb{x}_ j)$, where $F(\cdot )$ is the inverse link function .

$\widehat{\bbeta }$

is the maximum likelihood estimate (MLE) of $(\alpha ,\beta _{1},\dots ,\beta _{s})^\prime $.

${\widehat{\bV }}(\widehat{\bbeta })$

is the estimated covariance matrix of $\widehat{\bbeta }$.

$\hat{p}_ j,\hat{q}_ j$

$\hat{p}_ j$ is the estimate of $\pi _ j$ evaluated at $\widehat{\bbeta }$, and $\hat{q}_ j= 1-\hat{p}_ j$.

Pregibon (1981) suggests using the index plots of several diagnostic statistics to identify influential observations and to quantify the effects on various aspects of the maximum likelihood fit. In an index plot, the diagnostic statistic is plotted against the observation number. In general, the distributions of these diagnostic statistics are not known, so cutoff values cannot be given for determining when the values are large. However, the IPLOTS and INFLUENCE options in the MODEL statement and the PLOTS option in the PROC LOGISTIC statement provide displays of the diagnostic values, allowing visual inspection and comparison of the values across observations. In these plots, if the model is correctly specified and fits all observations well, then no extreme points should appear.

The next five sections give formulas for these diagnostic statistics.

Hat Matrix Diagonal (Leverage)

The diagonal elements of the hat matrix are useful in detecting extreme points in the design space where they tend to have larger values. The jth diagonal element is

\begin{eqnarray*}  h_{j}= \left\{  \begin{array}{ll} \widetilde{w}_ j(1,\mb{x}’_ j) {\widehat{\bV }}(\widehat{\bbeta })(1,\mb{x}’_ j)’ &  \mbox{Fisher scoring}\\ \widehat{w}_ j(1,\mb{x}’_ j) {\widehat{\bV }}(\widehat{\bbeta })(1,\mb{x}’_ j)’ &  \mbox{Newton-Raphson} \end{array} \right. \end{eqnarray*}

where

\begin{eqnarray*}  \widetilde{w}_ j &  = &  \frac{w_ j n_ j}{\hat{p}_ j\hat{q}_ j[g'(\hat{p}_ j)]^2} \\ \widehat{w}_ j &  = &  \widetilde{w}_ j + \frac{ w_ j(r_ j - n_ j\hat{p}_ j) [\hat{p}_ j\hat{q}_ j g''(\hat{p}_ j) + (\hat{q}_ j-\hat{p}_ j)g'(\hat{p}_ j)]}{ (\hat{p}_ j\hat{q}_ j)^2 [g'(\hat{p}_ j)]^3} \end{eqnarray*}

and $g’(\cdot )$ and $g”(\cdot )$ are the first and second derivatives of the link function $g(\cdot )$, respectively.

For a binary response logit model, the hat matrix diagonal elements are

\[  h_{j} = w_ jn_ j\hat{p}_ j\hat{q}_ j (1, \mb{x}_ j’){\widehat{\bV }}(\widehat{\bbeta }) \left( \begin{array}{c} 1 \\ \mb{x}_ j \end{array} \right)  \]

If the estimated probability is extreme (less than 0.1 and greater than 0.9, approximately), then the hat diagonal might be greatly reduced in value. Consequently, when an observation has a very large or very small estimated probability, its hat diagonal value is not a good indicator of the observation’s distance from the design space (Hosmer and Lemeshow, 2000, p. 171).

Residuals

Residuals are useful in identifying observations that are not explained well by the model. Pearson residuals are components of the Pearson chi-square statistic and deviance residuals are components of the deviance. The Pearson residual for the jth observation is

\[  \chi _ j=\frac{\sqrt {w_ j}(r_ j-n_ j\hat{p}_ j)}{\sqrt {n_ j\hat{p}_ j\hat{q}_ j}}  \]

The Pearson chi-square statistic is the sum of squares of the Pearson residuals.

The deviance residual for the jth observation is

\begin{eqnarray*}  d_ j= &  \left\{  \begin{array}{ll} -\sqrt {-2w_ jn_ j\log (\hat{q}_ j)} &  \mbox{if }r_ j=0 \\ \pm \sqrt {2w_ j[r_ j\log (\frac{r_ j}{n_ j\hat{p_ j}})+ (n_ j-r_ j)\log (\frac{n_ j-r_ j}{n_ j\hat{q}_ j}) ]} &  \mbox{if }0<r_ j<n_ j \\ \sqrt {-2w_ jn_ j\log (\hat{p}_ j)} &  \mbox{if }r_ j=n_ j \end{array} \right. \\ \end{eqnarray*}

where the plus (minus) in $\pm $ is used if ${{r_ j}/{n_ j}}$ is greater (less) than ${\hat{p}}_ j$. The deviance is the sum of squares of the deviance residuals.

The STDRES option in the INFLUENCE and PLOTS=INFLUENCE options computes three more residuals (Collett, 2003). The Pearson and deviance residuals are standardized to have approximately unit variance:

\begin{eqnarray*}  e_{p_ j}& =& \frac{\chi _ j}{\sqrt {1-h_ j}} \\ e_{d_ j}& =& \frac{d_ j}{\sqrt {1-h_ j}} \\ \end{eqnarray*}

The likelihood residuals, which estimate components of a likelihood ratio test of deleting an individual observation, are a weighted combination of the standardized Pearson and deviance residuals

\begin{eqnarray*}  e_{l_ j}& =& \mathrm{sign}(r_ j-n_ j\hat{p}_ j)\sqrt {h_ je_{p_ j}^2+(1-h_ j)e_{d_ j}^2} \end{eqnarray*}

DFBETAS

For each parameter estimate, the procedure calculates a DFBETAS diagnostic for each observation. The DFBETAS diagnostic for an observation is the standardized difference in the parameter estimate due to deleting the observation, and it can be used to assess the effect of an individual observation on each estimated parameter of the fitted model. Instead of reestimating the parameter every time an observation is deleted, PROC LOGISTIC uses the one-step estimate. See the section Predicted Probability of an Event for Classification. For the jth observation, the DFBETAS are given by

\[  \mbox{DFBETAS}i_ j={\bDelta }_ i \widehat{\bbeta }_ j^1 / \hat{\sigma }_ i  \]

where $i=0, 1, \ldots , s, \hat{\sigma }_ i$ is the standard error of the ith component of $\widehat{\bbeta }$, and ${\bDelta }_ i \widehat{\bbeta }_ j^1$ is the ith component of the one-step difference

\[  {\bDelta }\widehat{\bbeta }_ j^1 = \frac{w_ j(r_ j-n_ j\hat{p}_ j)}{1-h_{j}}{\widehat{\bV }}(\widehat{\bbeta }) \left( \begin{array}{c} 1 \\ \mb{x}_ j \end{array} \right)  \]

${\bDelta }\widehat{\bbeta }_ j^1$ is the approximate change ($\widehat{\bbeta } - \widehat{\bbeta }_ j^1$) in the vector of parameter estimates due to the omission of the jth observation. The DFBETAS are useful in detecting observations that are causing instability in the selected coefficients.

C and CBAR

C and CBAR are confidence interval displacement diagnostics that provide scalar measures of the influence of individual observations on $\widehat{\bbeta }$. These diagnostics are based on the same idea as the Cook distance in linear regression theory (Cook and Weisberg, 1982), but use the one-step estimate. C and CBAR for the jth observation are computed as

\[  C_ j=\chi _ j^2 h_{j} / (1-h_{j})^2  \]

and

\[  \overline{C}_ j=\chi _ j^2 h_{j} / (1-h_{j})  \]

respectively.

Typically, to use these statistics, you plot them against an index and look for outliers.

DIFDEV and DIFCHISQ

DIFDEV and DIFCHISQ are diagnostics for detecting ill-fitted observations; in other words, observations that contribute heavily to the disagreement between the data and the predicted values of the fitted model. DIFDEV is the change in the deviance due to deleting an individual observation while DIFCHISQ is the change in the Pearson chi-square statistic for the same deletion. By using the one-step estimate, DIFDEV and DIFCHISQ for the jth observation are computed as

\[  \mbox{DIFDEV}=d_ j^2+\overline{C}_ j  \]

and

\[  \mbox{DIFCHISQ}=\overline{C}_ j/h_{j}  \]