The main motivation for zero-inflated count models is that real-life data frequently display overdispersion and excess zeros.
Zero-inflated count models provide a way of modeling the excess zeros in addition to allowing for overdispersion. In particular,
for each observation, there are two possible data generation processes. The result of a Bernoulli trial is used to determine
which of the two processes is used. For observation , Process 1 is chosen with probability
and Process 2 with probability
. Process 1 generates only zero counts. Process 2 generates counts from either a Poisson or a negative binomial model. In
general,
![]() |
Therefore, the probability of can be described as
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
where follows either the Poisson or the negative binomial distribution. You can specify the probability
with the PROBZERO= option in the OUTPUT statement.
When the probability depends on the characteristics of observation
,
is written as a function of
, where
is the
vector of zero-inflation covariates and
is the
vector of zero-inflation coefficients to be estimated. (The zero-inflation intercept is
; the coefficients for the
zero-inflation covariates are
.) The function
that relates the product
(which is a scalar) to the probability
is called the zero-inflation link function,
![]() |
In the TCOUNTREG procedure, the zero-inflation covariates are indicated in the ZEROMODEL statement. Furthermore, the zero-inflation
link function can be specified as either the logistic function,
![]() |
or the standard normal cumulative distribution function (also called the probit function),
![]() |
The zero-inflation link function is indicated in the LINK option in ZEROMODEL statement. The default ZI link function is the logistic function.