The QLIM Procedure

Ordinal Discrete Choice Modeling

Binary Probit and Logit Model

The binary choice model is

\[  y^{*}_{i} = \mathbf{x}_{i}’\bbeta + \epsilon _{i}  \]

where value of the latent dependent variable, $y^{*}_{i}$, is observed only as follows:

\begin{eqnarray*}  y_{i} &  = 1 &  \hbox{if } y^{*}_{i}>0 \\ &  = 0 &  \hbox{otherwise} \end{eqnarray*}

The disturbance, $\epsilon _{i}$, of the probit model has standard normal distribution with the distribution function (CDF)

\[  \Phi (x)=\int _{-\infty }^{x}\frac{1}{\sqrt {2\pi }}\exp (-t^2/2)dt  \]

The disturbance of the logit model has standard logistic distribution with the CDF

\[  \Lambda (x)=\frac{\exp (x)}{1+\exp (x)} = \frac{1}{1+\exp (-x)}  \]

The binary discrete choice model has the following probability that the event $\{ y_{i}=1\} $ occurs:

\[  P(y_{i}=1) = F(\mathbf{x}_{i}’\bbeta ) = \left\{  \begin{array}{ll} \Phi (\mathbf{x}_{i}’\bbeta ) &  \mr {(probit)} \\ \Lambda (\mathbf{x}_{i}’\bbeta ) &  \mr {(logit)} \end{array} \right.  \]

The log-likelihood function is

\[  \ell = \sum _{i=1}^{N}\left\{ y_{i}\log [F(\mathbf{x}_{i}’\bbeta )] + (1-y_{i})\log [1-F(\mathbf{x}_{i}’\bbeta )]\right\}   \]

where the CDF $F(x)$ is defined as $\Phi (x)$ for the probit model while $F(x)=\Lambda (x)$ for logit. The first order derivatives of the logit model are

\[  \frac{\partial \ell }{\partial \bbeta } = \sum _{i=1}^{N}(y_{i}- \Lambda (\mathbf{x}_{i}’\bbeta ))\mathbf{x}_{i}  \]

The probit model has more complicated derivatives

\[  \frac{\partial \ell }{\partial \bbeta } = \sum _{i=1}^{N} \left[\frac{(2y_{i} - 1)\phi (\mathbf{x}_{i}\bbeta )}{\Phi (\mathbf{x}_{i}\bbeta )}\right] \mathbf{x}_{i} = \sum _{i=1}^{N}r_{i} \mathbf{x}_{i}  \]

where

\[  r_{i} = \frac{(2y_{i} - 1)\phi (\mathbf{x}_{i}\bbeta )}{\Phi (\mathbf{x}_{i}\bbeta )}  \]

Note that the logit maximum likelihood estimates are $\frac{\pi }{\sqrt {3}}$ times greater than probit maximum likelihood estimates, since the probit parameter estimates, $\bbeta $, are standardized, and the error term with logistic distribution has a variance of $\frac{\pi ^{2}}{3}$.

Ordinal Probit/Logit

When the dependent variable is observed in sequence with $M$ categories, binary discrete choice modeling is not appropriate for data analysis. McKelvey and Zavoina (1975) proposed the ordinal (or ordered) probit model.

Consider the following regression equation:

\[  y_{i}^{*} = \mathbf{x}_{i}’\bbeta + \epsilon _{i}  \]

where error disturbances, $\epsilon _{i}$, have the distribution function $F$. The unobserved continuous random variable, $y_{i}^{*}$, is identified as $M$ categories. Suppose there are $M+1$ real numbers, $\mu _{0},\cdots ,\mu _{M}$, where $\mu _{0}=-\infty $, $\mu _{1}=0$, $\mu _{M}=\infty $, and $\mu _{0} \leq \mu _{1} \leq \cdots \leq \mu _{M}$. Define

\[  R_{i,j} = \mu _{j} - \mathbf{x}_{i}’\bbeta  \]

The probability that the unobserved dependent variable is contained in the $j$th category can be written as

\[  P[\mu _{j-1}< y_{i}^{*} \leq \mu _{j}] = F(R_{i,j}) - F(R_{i,j-1})  \]

The log-likelihood function is

\[  \ell = \sum _{i=1}^{N}\sum _{j=1}^{M}d_{ij}\log \left[F(R_{i,j}) - F(R_{i,j-1})\right]  \]

where

\[  d_{ij} = \left\{  \begin{array}{cl} 1&  \mr {if} \mu _{j-1}< y_{i} \leq \mu _{j} \\ 0&  \mr {otherwise} \end{array} \right.  \]

The first derivatives are written as

\[  \frac{\partial \ell }{\partial \bbeta } = \sum _{i=1}^{N}\sum _{j=1}^{M} d_{ij}\left[\frac{f(R_{i,j-1}) - f(R_{i,j})}{F(R_{i,j})-F(R_{i,j-1})} \mathbf{x}_{i}\right]  \]
\[  \frac{\partial \ell }{\partial \mu _{k}} = \sum _{i=1}^{N}\sum _{j=1}^{M} d_{ij}\left[\frac{\delta _{j,k}f(R_{i,j}) - \delta _{j-1,k}f(R_{i,j-1})}{F(R_{i,j})-F(R_{i,j-1})}\right]  \]

where $f(x) = \frac{d F(x)}{dx}$ and $\delta _{j,k}=1$ if $j=k$, and $\delta _{j,k}=0$ otherwise. When the ordinal probit is estimated, it is assumed that $F(R_{i,j})=\Phi (R_{i,j})$. The ordinal logit model is estimated if $F(R_{i,j})=\Lambda (R_{i,j})$. The first threshold parameter, $\mu _{1}$, is estimated when the LIMIT1=VARYING option is specified. By default (LIMIT1=ZERO), so that $M-2$ threshold parameters ($\mu _{2},\dots ,\mu _{M-1}$) are estimated.

The ordered probit models are analyzed by Aitchison and Silvey (1957), and Cox (1970) discussed ordered response data by using the logit model. They defined the probability that $y_{i}^{*}$ belongs to $j$th category as

\[  P[\mu _{j-1}< y_{i} \leq \mu _{j}] = F(\mu _{j}+\mathbf{x}_{i}’\btheta ) - F(\mu _{j-1}+\mathbf{x}_{i}’\btheta )  \]

where $\mu _{0}=-\infty $ and $\mu _{M}=\infty $. Therefore, the ordered response model analyzed by Aitchison and Silvey can be estimated if the LIMIT1=VARYING option is specified. Note that $\btheta =-\bbeta $.

Goodness-of-Fit Measures

The goodness-of-fit measures discussed in this section apply only to discrete dependent variable models.

McFadden (1974) suggested a likelihood ratio index that is analogous to the $R^{2}$ in the linear regression model:

\[  R^{2}_{M} = 1 - \frac{\ln L}{\ln L_{0}}  \]

where $L$ is the value of the maximum likelihood function and $L_{0}$ is the value of a likelihood function when regression coefficients except an intercept term are zero. It can be shown that $L_{0}$ can be written as

\[  L_{0} = \sum _{j=1}^{M} N_{j} \ln (\frac{N_{j}}{N} )  \]

where $N_{j}$ is the number of responses in category $j$.

Estrella (1998) proposes the following requirements for a goodness-of-fit measure to be desirable in discrete choice modeling:

  • The measure must take values in $[0,1]$, where 0 represents no fit and 1 corresponds to perfect fit.

  • The measure should be directly related to the valid test statistic for significance of all slope coefficients.

  • The derivative of the measure with respect to the test statistic should comply with corresponding derivatives in a linear regression.

Estrella’s (1998) measure is written

\[  R_{E1}^{2} = 1 - \left(\frac{\ln L}{\ln L_{0}}\right) ^{-\frac{2}{N}\ln L_{0}}  \]

An alternative measure suggested by Estrella (1998) is

\[  R_{E2}^{2} = 1 - [ (\ln L - K) / \ln L_{0} ]^{-\frac{2}{N}\ln L_{0}}  \]

where $\ln L_{0}$ is computed with null slope parameter values, $N$ is the number observations used, and $K$ represents the number of estimated parameters.

Other goodness-of-fit measures are summarized as follows:

\[  R_{CU1}^{2} = 1 - \left(\frac{L_{0}}{L}\right)^{\frac{2}{N}} \; \; (\mr {Cragg-Uhler 1})  \]
\[  R_{CU2}^{2} = \frac{1 - (L_{0}/L)^{\frac{2}{N}}}{1 - L_{0}^{\frac{2}{N}}} \; \; (\mr {Cragg-Uhler 2})  \]
\[  R_{A}^{2} = \frac{2(\ln L - \ln L_{0})}{2(\ln L - \ln L_{0})+N} \; \; (\mr {Aldrich-Nelson})  \]
\[  R_{VZ}^{2} = R_{A}^{2}\frac{2\ln L_{0} - N}{2\ln L_{0}} \; \; (\mr {Veall-Zimmermann})  \]
\[  R_{MZ}^{2} = \frac{\sum _{i=1}^{N}(\hat{y}_{i} - \bar{\hat{y_{i}}})^{2}}{N +\sum _{i=1}^{N}(\hat{y}_{i} - \bar{\hat{y_{i}}})^{2}} \; \; (\mr {McKelvey-Zavoina})  \]

where $\hat{y}_{i}=\mathbf{x}_{i}’\hat{\bbeta }$ and $\bar{\hat{y_{i}}} = \sum _{i=1}^{N} \hat{y}_{i} / N$.