Discrete choice logit models fall in the framework of a generalized linear model (GLM) with a logit link. The Metropolis-Hastings sampling approach of Gamerman (1997) is well suited to this type of model.
In the GLM setting, the data are assumed to be independent with exponential family density
The means that are related to the canonical parameters
via
and to the regression coefficients via the link function
The maximum likelihood (ML) estimator in a GLM and the asymptotic variance are obtained by iterative application of weighted least squares (IWLS) to transformed observations. Following McCullagh and Nelder (1989), define the transformed response as
and define the corresponding weights as
Suppose a normal prior is specified on ,
. The posterior density is as follows:
Gamerman (1997) proposes that Metropolis-Hastings sampling be combined with iterative weighted least squares as follows:
Start with and
.
Sample from the proposal density
, where
Accept with probability
where is the posterior density and
and
are the transitional probabilities that are based on the proposal density
. More specifically,
is an
density that is evaluated at
, whereas
and
have the same expression as
and
but depend on
instead of
. If
is not accepted, the chain stays with
.
Set and return to step 1.
You can extend this methodology to logit models that have random effects. If there are random effects, the link function is extended to
where the random effects are assumed to have a normal distribution, , and
. The posterior density is
The parameters are divided into blocks, , and
. For the fixed-effects
block, the conditional posterior has the same form, but the link changes to include
, which are taken as known constants (offsets) at each iteration. The only change that is needed is to replace the transformed
response
with
in step 2 of the previous Gamerman procedure.
For the random-effects block, the same Metropolis-Hastings sampling with the least square proposal can apply. The conditional posterior is
The transformed response is now , and the proposal density is
, where
Finally, for the covariance matrix block, direct sampling from an
is used, where
.
The chain is initialized with random effects set to 0 and the covariance set to the identity matrix. Updating is done first
for the fixed effects, , as a block to position the chain in the correct region of the parameter space. Then the random effects are updated, and
finally the covariance of the random effects is updated. For more information about this algorithm, see Gamerman (1997).