The COUNTREG Procedure

Negative Binomial Regression

The Poisson regression model can be generalized by introducing an unobserved heterogeneity term for observation i. Thus, the individuals are assumed to differ randomly in a manner that is not fully accounted for by the observed covariates. This is formulated as

\[  E(y_{i}|\mathbf{x}_{i}, \tau _{i}) = \mu _{i} \tau _{i} = e^{\mathbf{x}_{i}'\bbeta + \epsilon _{i}}  \]

where the unobserved heterogeneity term $\tau _{i} = e^{\epsilon _{i}}$ is independent of the vector of regressors $\mathbf{x}_{i}$. Then the distribution of $y_{i}$ conditional on $\mathbf{x}_{i}$ and $\tau _{i}$ is Poisson with conditional mean and conditional variance $\mu _{i} \tau _{i}$:

\[  f(y_{i}|\mathbf{x}_{i},\tau _{i}) = \frac{\exp (-\mu _{i}\tau _{i}) (\mu _{i}\tau _{i})^{y_{i}}}{y_{i}!}  \]

Let $g(\tau _{i})$ be the probability density function of $\tau _{i}$. Then, the distribution $f(y_{i}|\mathbf{x}_{i})$ (no longer conditional on $\tau _{i}$) is obtained by integrating $f(y_{i}|\mathbf{x}_{i},\tau _{i})$ with respect to $\tau _{i}$:

\[  f(y_{i}|\mathbf{x}_{i}) = \int _{0}^{\infty } f(y_{i}|\mathbf{x}_{i},\tau _{i}) g(\tau _{i}) d\tau _{i}  \]

An analytical solution to this integral exists when $\tau _{i}$ is assumed to follow a gamma distribution. This solution is the negative binomial distribution. When the model contains a constant term, it is necessary to assume that $E(e^{\epsilon _{i}}) = E(\tau _{i}) =1$ in order to identify the mean of the distribution. Thus, it is assumed that $\tau _{i}$ follows a gamma($\theta , \theta $) distribution with $E(\tau _{i}) =1$ and $V(\tau _{i}) = 1 \slash \theta $,

\[  g(\tau _{i}) = \frac{\theta ^{\theta }}{\Gamma (\theta )} \tau _{i}^{\theta -1}\exp (-\theta \tau _{i})  \]

where $\Gamma (x)=\int _{0}^{\infty }z^{x-1}\exp (-z)dz$ is the gamma function and $\theta $ is a positive parameter. Then, the density of $y_ i$ given $\mathbf{x}_ i$ is derived as

\begin{eqnarray*}  f(y_{i}|\mathbf{x}_{i}) &  = &  \int _{0}^{\infty }f(y_{i}|\mathbf{x}_{i},\tau _{i}) g(\tau _{i})d\tau _{i} \\ &  = &  \frac{\theta ^{\theta } \mu _{i}^{y_{i}}}{y_{i}!\Gamma (\theta )} \int _{0}^{\infty } e^{-(\mu _{i}+\theta )\tau _{i}}\tau _{i}^{\theta +y_{i}-1}d\tau _{i} \\ &  = &  \frac{\theta ^{\theta }\mu _{i}^{y_{i}}\Gamma (y_{i}+\theta )}{y_{i}!\Gamma (\theta )(\theta +\mu _{i})^{\theta +y_{i}}} \\ &  = &  \frac{\Gamma (y_{i}+\theta )}{y_{i}!\Gamma (\theta )} \left(\frac{\theta }{\theta +\mu _{i}}\right)^{\theta } \left(\frac{\mu _{i}}{\theta +\mu _{i}}\right)^{y_{i}} \end{eqnarray*}

Making the substitution $\alpha =\frac{1}{\theta }$ ($\alpha > 0$), the negative binomial distribution can then be rewritten as

\[  f(y_{i}|\mathbf{x}_{i}) = \frac{\Gamma (y_{i}+\alpha ^{-1})}{y_{i}!\Gamma (\alpha ^{-1})} \left(\frac{\alpha ^{-1}}{\alpha ^{-1}+\mu _{i}}\right)^{\alpha ^{-1}} \left(\frac{\mu _{i}}{\alpha ^{-1}+\mu _{i}}\right)^{y_{i}}, \quad y_{i} = 0,1,2,\ldots  \]

Thus, the negative binomial distribution is derived as a gamma mixture of Poisson random variables. It has conditional mean

\[  E(y_{i}|\mathbf{x}_{i})=\mu _{i} = e^{\mathbf{x}_{i}'\bbeta }  \]

and conditional variance

\[  V(y_{i}|\mathbf{x}_{i}) = \mu _{i} [1+\frac{1}{\theta }\mu _{i}] = \mu _{i}[1+\alpha \mu _{i}] > E(y_{i}|\mathbf{x}_{i})  \]

The conditional variance of the negative binomial distribution exceeds the conditional mean. Overdispersion results from neglected unobserved heterogeneity. The negative binomial model with variance function $V(y_{i}|\mathbf{x}_{i}) = \mu _{i}+\alpha \mu _{i}^2$, which is quadratic in the mean, is referred to as the NEGBIN2 model (Cameron and Trivedi, 1986). To estimate this model, specify DIST=NEGBIN(p=2) in the MODEL statement. The Poisson distribution is a special case of the negative binomial distribution where $\alpha =0$. A test of the Poisson distribution can be carried out by testing the hypothesis that $\alpha =\frac{1}{\theta _ i}=0$. A Wald test of this hypothesis is provided (it is the reported t statistic for the estimated $\alpha $ in the negative binomial model).

The log-likelihood function of the negative binomial regression model (NEGBIN2) is given by

\begin{eqnarray*}  \mathcal{L} &  = &  \sum _{i=1}^{N} w_ i\Bigg\{  \sum _{j=0}^{y_{i}-1}\ln (j+\alpha ^{-1}) -\ln (y_{i}!) \\ & &  - (y_{i}+\alpha ^{-1}) \ln (1+\alpha \exp (\mathbf{x}_{i}^{\prime }\bbeta )) +y_{i}\ln (\alpha ) + y_{i}\mathbf{x}_{i}^{\prime }\bbeta \Bigg\}  \end{eqnarray*}
\[  \Gamma (y+a)/\Gamma (a)= \prod _{j=0}^{y-1} (j+a)  \]

if y is an integer. See Poisson Regression for the definition of $w_ i$.

The gradient is

\[  \frac{\partial \mathcal{L}}{\partial \bbeta } = \sum _{i=1}^{N} w_ i\frac{y_{i}-\mu _{i}}{1+\alpha \mu _{i}} \mathbf{x}_{i}  \]

and

\[  \frac{\partial \mathcal{L}}{\partial \alpha } = \sum _{i=1}^{N} w_ i\left\{  - \alpha ^{-2} \sum _{j=0}^{y_{i}-1} \frac{1}{(j + \alpha ^{-1})} + \alpha ^{-2} \ln (1+\alpha \mu _{i}) + \frac{y_{i}-\mu _{i}}{\alpha (1+\alpha \mu _{i})}\right\}   \]

Cameron and Trivedi (1986) consider a general class of negative binomial models with mean $\mu _{i}$ and variance function $\mu _{i}+\alpha \mu _{i}^ p$. The NEGBIN2 model, with $p =2$, is the standard formulation of the negative binomial model. Models with other values of p, $- \infty < p < \infty $, have the same density $f(y_{i}|\mathbf{x}_{i})$ except that $\alpha ^{-1}$ is replaced everywhere by $\alpha ^{-1} \mu ^{2-p}$. The negative binomial model NEGBIN1, which sets $p=1$, has variance function $V(y_{i}|\mathbf{x}_{i}) = \mu _{i} + \alpha \mu _{i}$, which is linear in the mean. To estimate this model, specify DIST=NEGBIN(p=1) in the MODEL statement.

The log-likelihood function of the NEGBIN1 regression model is given by

\begin{eqnarray*}  \mathcal{L} &  = &  \sum _{i=1}^{N} w_ i\Bigg\{  \sum _{j=0}^{y_{i}-1} \ln \left(j+\alpha ^{-1}\exp (\mathbf{x}_{i}^{\prime }\bbeta ) \right) \\ & &  -\ln (y_{i}!) -\left(y_{i}+\alpha ^{-1}\exp (\mathbf{x}_{i}^{\prime }\bbeta )\right)\ln (1+\alpha ) + y_{i}\ln (\alpha ) \Bigg\}  \end{eqnarray*}

See the section Poisson Regression for the definition of $w_ i$.

The gradient is

\[  \frac{\partial \mathcal{L}}{\partial \bbeta } = \sum _{i=1}^{N} w_ i\left\{  \left( \sum _{j=0}^{y_{i}-1} \frac{\mu _{i}}{(j \alpha + \mu _{i})} \right) \mathbf{x}_{i} - \alpha ^{-1} \ln (1+\alpha )\mu _{i} \mathbf{x}_{i} \right\}   \]

and

\[  \frac{\partial \mathcal{L}}{\partial \alpha } = \sum _{i=1}^{N} w_ i\left\{  - \left( \sum _{j=0}^{y_{i}-1} \frac{\alpha ^{-1}\mu _{i}}{(j \alpha + \mu _{i})} \right) - \alpha ^{-2} \mu _{i} \ln (1+\alpha ) - \frac{(y_ i+\alpha ^{-1}\mu _{i})}{1+\alpha } + \frac{y_ i}{\alpha } \right\}   \]