Choice models that have random effects (or random coefficients) provide solutions to create individual-level or group-specific utilities. Because people have different preferences, it can be misleading to roll the whole sample together into a single set of utilities. The desire to account for individual differences, instead of treating all respondents alike, provides challenges in marketing research. For logit models that have random effects, using frequentist methods to optimize of the likelihood function can be numerically difficult. Bayesian methods are ideally suited for analysis with random effects.
Choice models that have random effects generalize the standard choice models to incorporate individual-level effects. Let the utility that individual i obtains from alternative j in choice situation t ( ) be
where is the observed choice for individual i and alternative j in choice situation t; is the fixed design vector for individual i and alternative j in choice situation t; are the fixed coefficients; is the random design vector for individual i and alternative j in choice situation t; and are the random coefficients for individual i corresponding to .
It is assumed that each is drawn from a superpopulation and that this superpopulation is normal, . An additional stage is added to the model in which a prior for is specified:
The covariance matrix characterizes the extent of heterogeneity among individuals. Large diagonal elements of indicate substantial heterogeneity in part-worths. Off-diagonal elements indicate patterns in the evaluation of attribute levels.
Consider a study that estimates the market demand for kitchen trash cans (Rossi, 2013). There are four attributes, and each has two levels: touchless opening (Yes/No), material (Steel/Plastic), automatic trash bag replacement (Yes/No), and price (40). The number of all possible hypothetical types of trash cans is . Including more attributes and more levels can easily become unmanageable. The study uses a fractional factorial design, in which the first three factors are set up to be a full factorial design and the fourth is generated as the product of the first three. This design confounds the three-way interaction with the effect of the fourth factor, shown in Table 27.1.
Table 27.1: Design for the Trash Can Study
Obs |
Touchless |
Steel |
AutoBag |
Price80 |
1 |
–1 |
–1 |
–1 |
–1 |
2 |
–1 |
–1 |
1 |
1 |
3 |
–1 |
1 |
–1 |
1 |
4 |
–1 |
1 |
1 |
–1 |
5 |
1 |
–1 |
–1 |
1 |
6 |
1 |
–1 |
1 |
–1 |
7 |
1 |
1 |
–1 |
–1 |
8 |
1 |
1 |
1 |
1 |
In Table 27.1, 1 means "Yes" and –1 means "No." This is a balanced design, in which each level appears the same number of times. This study assigns only two alternatives to a choice set by randomly sampling two rows from the previous table and giving each individual 10 choice sets (or choice tasks) to pick from. For more information about how to design a choice model efficiently, see Kuhfeld (2010).
Data were obtained by enrolling 104 people and assigning 10 choice tasks to each of them: for each task, the participants stated their preference between two types of trash cans. The following steps read in the data:
data Trashcan; input ID Task Choice Index Touchless Steel AutoBag Price80 @@; datalines; 1 1 1 1 0 1 1 0 1 1 0 2 1 1 0 0 1 2 0 1 0 0 0 0 1 2 1 2 1 1 1 1 1 3 0 1 0 0 0 0 1 3 1 2 1 1 0 0 1 4 0 1 0 1 0 1 1 4 1 2 1 0 0 1 1 5 1 1 0 1 1 0 1 5 0 2 1 0 0 1 1 6 0 1 0 0 1 1 1 6 1 2 1 1 1 1 1 7 0 1 1 0 0 1 1 7 1 2 1 1 0 0 1 8 0 1 0 0 1 1 1 8 1 2 0 1 1 0 1 9 1 1 0 0 1 1 1 9 0 2 1 0 0 1 1 10 0 1 0 1 1 0 1 10 1 2 1 0 1 0 2 1 1 1 0 1 1 0 2 1 0 2 1 1 0 0 2 2 0 1 0 0 0 0 2 2 1 2 1 1 1 1 2 3 0 1 0 0 0 0 2 3 1 2 1 1 0 0 2 4 0 1 0 1 0 1 2 4 1 2 1 0 0 1 2 5 1 1 0 1 1 0 2 5 0 2 1 0 0 1 2 6 1 1 0 0 1 1 2 6 0 2 1 1 1 1 2 7 1 1 1 0 0 1 2 7 0 2 1 1 0 0 2 8 1 1 0 0 1 1 2 8 0 2 0 1 1 0 2 9 1 1 0 0 1 1 2 9 0 2 1 0 0 1 2 10 1 1 0 1 1 0 2 10 0 2 1 0 1 0 3 1 ... more lines ... 2 1 2 1 1 1 1 104 3 0 1 0 0 0 0 104 3 1 2 1 1 0 0 104 4 0 1 0 1 0 1 104 4 1 2 1 0 0 1 104 5 1 1 0 1 1 0 104 5 0 2 1 0 0 1 104 6 0 1 0 0 1 1 104 6 1 2 1 1 1 1 104 7 0 1 1 0 0 1 104 7 1 2 1 1 0 0 104 8 0 1 0 0 1 1 104 8 1 2 0 1 1 0 104 9 0 1 0 0 1 1 104 9 1 2 1 0 0 1 104 10 0 1 0 1 1 0 104 10 1 2 1 0 1 0 ;
proc print data=Trashcan (obs=8); run;
The data for the first four choice tasks are shown in Figure 27.8.
In the data, ID
is the individual’s ID number, and Task
indexes the number of choice tasks. The response is Choice
, which states each individual’s choice for each choice task. Touchless
, Steel
, AutoBag
, and Price80
are the attribute variables; for each of them, 1 means "Yes" and 0 means "No." In the data, 0 replaces the –1 values that
are shown in the design matrix in Table 27.1.
The following statements fit a logit model with random effects:
proc bchoice data=Trashcan seed=123 nmc=30000 thin=2 nthreads=4; class ID Task; model Choice = Touchless Steel AutoBag Price80 / choiceset=(ID Task); random Touchless Steel AutoBag Price80 / sub=ID monitor=(1 to 5) type=un; run;
The NTHREADS option in the PROC BCHOICE statement specifies the number of threads to be used for analytic computations and sampling. Using four threads at the same time enhances the efficiency and reduces the run time. If you do not specify the NTHREADS option, the default number is 1. The maximum number of threads should not exceed the total number of CPUs on the host where the analytic computations execute.
The choice set is specified by ID
(which identifies the participants) and by Task
(which identifies each of the 10 choice tasks that are assigned to each participant). The variables ID
and Task
are needed in the CLASS
statement because they define the choice set in the MODEL
statement.
In addition to the MODEL
statement for fixed effects, the RANDOM
statement is added for random effects. Note that Touchless
, Steel
, AutoBag
, and Price80
are listed as both fixed and random effects, so that their average part-worth values in the population are estimated via
fixed effects and the deviation from the overall mean for each individual is presented through random effects. The SUB
=ID argument in the RANDOM
statement defines ID
as a subject index for the random effects grouping, so that each person with a different ID
has his or her own random effects. The MONITOR
option requests the production of the individual-level random-effects parameter estimates, and the MONITOR
=(1 to 5) option requests the random-effects parameter estimates for the first five subjects, (By default, PROC BCHOICE does
not output results for any individual-level random-effects parameters.) The TYPE
=UN option in the RANDOM
statement specifies an unstructured covariance matrix for the random effects. The unstructured type provides a mechanism
for estimating the correlation between the random effects. The TYPE
=VC (variance components) option, which is the default structure, models a different variance component for each random effect.
Summary statistics for the fixed coefficients (), the covariance of the random coefficients (), and the random coefficients () for the first five individuals are shown in Figure 27.9.
Figure 27.9: Posterior Summary Statistics
Posterior Summaries and Intervals | ||||||
---|---|---|---|---|---|---|
Parameter | Subject | N | Mean | Standard Deviation |
95% HPD Interval | |
Touchless | 15000 | 1.7081 | 0.2679 | 1.2053 | 2.2581 | |
Steel | 15000 | 1.0433 | 0.2526 | 0.5499 | 1.5478 | |
AutoBag | 15000 | 2.1593 | 0.3463 | 1.4962 | 2.8520 | |
Price80 | 15000 | -4.6163 | 0.6317 | -5.8642 | -3.4235 | |
RECov Touchless, Touchless | 15000 | 3.0062 | 0.9900 | 1.3471 | 4.9893 | |
RECov Steel, Touchless | 15000 | -0.4443 | 0.8014 | -2.0999 | 1.0894 | |
RECov Steel, Steel | 15000 | 2.5606 | 0.9148 | 1.0161 | 4.3751 | |
RECov AutoBag, Touchless | 15000 | -0.6839 | 0.8485 | -2.3454 | 1.0370 | |
RECov AutoBag, Steel | 15000 | 0.0501 | 0.6696 | -1.3860 | 1.2717 | |
RECov AutoBag, AutoBag | 15000 | 3.5241 | 1.5374 | 1.0363 | 6.5421 | |
RECov Price80, Touchless | 15000 | -1.2238 | 1.1181 | -3.4774 | 0.7575 | |
RECov Price80, Steel | 15000 | -1.5802 | 1.1229 | -3.8779 | 0.5613 | |
RECov Price80, AutoBag | 15000 | -2.1971 | 1.6354 | -5.5615 | 0.7227 | |
RECov Price80, Price80 | 15000 | 7.5585 | 3.1071 | 2.4740 | 13.8806 | |
Touchless | ID 1 | 15000 | 0.5149 | 1.0161 | -1.4304 | 2.5541 |
Steel | ID 1 | 15000 | -0.3185 | 1.0738 | -2.4518 | 1.7835 |
AutoBag | ID 1 | 15000 | 1.6236 | 1.4019 | -0.7959 | 4.7049 |
Price80 | ID 1 | 15000 | -0.5927 | 2.1788 | -4.9792 | 3.4309 |
Touchless | ID 2 | 15000 | -1.7297 | 0.9174 | -3.4325 | 0.1308 |
Steel | ID 2 | 15000 | -1.4287 | 0.8517 | -3.0397 | 0.2828 |
AutoBag | ID 2 | 15000 | 0.4463 | 1.3149 | -2.0657 | 3.0433 |
Price80 | ID 2 | 15000 | 4.6565 | 1.5015 | 1.5290 | 7.4524 |
Touchless | ID 3 | 15000 | -0.5857 | 0.9883 | -2.5030 | 1.3642 |
Steel | ID 3 | 15000 | 0.5854 | 0.9766 | -1.2045 | 2.6450 |
AutoBag | ID 3 | 15000 | -1.8484 | 1.2446 | -4.4333 | 0.4567 |
Price80 | ID 3 | 15000 | 3.6182 | 1.7028 | 0.3880 | 6.9603 |
Touchless | ID 4 | 15000 | -0.5922 | 0.9599 | -2.4904 | 1.2913 |
Steel | ID 4 | 15000 | 0.3505 | 1.0201 | -1.6013 | 2.4416 |
AutoBag | ID 4 | 15000 | 0.9801 | 1.3792 | -1.6352 | 3.8147 |
Price80 | ID 4 | 15000 | -1.9014 | 2.2739 | -6.6483 | 2.1584 |
Touchless | ID 5 | 15000 | -1.2820 | 1.1807 | -3.5653 | 1.0154 |
Steel | ID 5 | 15000 | 1.6341 | 1.1977 | -0.5354 | 4.1211 |
AutoBag | ID 5 | 15000 | 1.2724 | 1.5373 | -1.5898 | 4.5012 |
Price80 | ID 5 | 15000 | -0.3725 | 2.3928 | -5.1457 | 4.2140 |
The fixed effects (Touchless
, Steel
, AutoBag
, and Price80
) are shown in the first four rows. Across all the respondents in the data, the average part-worths for touchless opening,
steel material, and automatic trash bag replacement are all positive, indicating that most people favor those features; the
average part-worth for having to pay USD80 for a trash can instead of USD40 is negative (–4.6), which is very intuitive, because
spending more money is usually unfavorable.
The covariance estimate of the random coefficients () is displayed by the parameters whose label begins with "RECov":
where the dots refer to the corresponding elements in the lower part of the symmetric covariance matrix. The covariance estimate
of the random coefficients () characterizes the variability of part-worths across respondents. Some of the diagonal elements of the matrix are large.
For example, the variance for price (labeled "RECov Price80, Price80") is quite large, indicating substantial unexplained
difference in response to price. Off-diagonal elements of the matrix illustrate attribute levels that tend to be evaluated
similarly (positive covariance) or differently (negative covariance) across all the respondents. The covariances between each
of the attributes (Touchless
, Steel
, and AutoBag
) and Price80
are all negative, implying that the respondents who prefer some of the new features are those who are also unwilling to pay
a higher price for the trash can. Therefore, offering a discounted price might be a particularly effective method of introducing
the new features to customers.
The next set of parameters that are displayed are the estimates for the individual-level random effects for the first five respondents (see Figure 27.9). These estimates are the deviation from the overall means (which are estimated via the fixed effects). The part-worth for touchless opening for the first respondent (who is labeled "ID 1" in the Subject column) is 1.7 + 0.5 = 2.2.
Allenby and Rossi (1999) and Rossi, Allenby, and McCulloch (2005) propose a hierarchical Bayesian random-effects model that is set up in a different way such that there are no fixed effects but only random effects. For more information about this type of model, see the section Random Effects and a follow-up example in A Random-Effects-Only Logit Model.