In many situations, a choice model that includes characteristics of both the alternatives and the individuals is needed for investigating consumer choice.
Consider an example of travel demand. People are asked to choose among travel by auto, plane, or public transit (bus or train).
The following SAS statements create the data set Travel
. The variables AutoTime
, PlanTime
, and TranTime
represent the total travel time that is required to get to a destination by using auto, plane, or public transit, respectively.
The variable Age
represents the age of each individual who is surveyed, and the variable Chosen
contains each individual’s choice of travel mode.
data Travel; input AutoTime PlanTime TranTime Age Chosen $; AgeCtr=Age-34; datalines; 10.0 4.5 10.5 32 Plane 5.5 4.0 7.5 13 Auto 4.5 6.0 5.5 41 Transit 3.5 2.0 5.0 41 Transit 1.5 4.5 4.0 47 Auto 10.5 3.0 10.5 24 Plane 7.0 3.0 9.0 27 Auto 9.0 3.5 9.0 21 Plane 4.0 5.0 5.5 23 Auto 22.0 4.5 22.5 30 Plane 7.5 5.5 10.0 58 Plane 11.5 3.5 11.5 36 Transit 3.5 4.5 4.5 43 Auto 12.0 3.0 11.0 33 Plane 18.0 5.5 20.0 30 Plane 23.0 5.5 21.5 28 Plane 4.0 3.0 4.5 44 Plane 5.0 2.5 7.0 37 Transit 3.5 2.0 7.0 45 Auto 12.5 3.5 15.5 35 Plane 1.5 4.0 2.0 22 Auto ;
In this example, the AutoTime
, PlanTime
, and TranTime
variables apply to the alternatives, whereas Age
is a characteristic of the individuals. AgeCtr
, a centered version of Age
, is created by subtracting the sample’s mean age from each individual’s age. To study how the choice depends on both the
travel time and age of the individuals, you need to incorporate both types of variables.
Before you invoke PROC BCHOICE to fit a choice logit model, you must arrange your data in such a way that there is one observation
for each combination of individual and alternative. In this example, let Subject
identify the individuals, let TravTime
represent the travel time for each mode of transportation, and let Choice
have the value 1 if the alternative is chosen and 0 otherwise. The following SAS statements rearrange the data set Travel
into a new data set, Travel2
, and display the first nine observations:
data Travel2(keep=Subject Mode TravTime Age AgeCtr Choice); array Times[3] AutoTime PlanTime TranTime; array Allmodes[3] $ _temporary_ ('Auto' 'Plane' 'Transit'); set Travel; Subject = _n_; do i = 1 to 3; Mode = Allmodes[i]; TravTime = Times[i]; Choice = (Chosen eq Mode); output; end; run;
proc print data=Travel2 (obs=20); by Subject; id Subject; run;
The data for the first nine observations is shown in Output 27.1.1.
Notice that each subject in the data set Travel
corresponds to a block of three observations in the data set Travel2
, one for each travel alternative. The response variable Choice
indicates the chosen alternative by the value 1 and the unchosen alternative by the value 0; exactly one alternative is chosen.
The following SAS statements invoke PROC BCHOICE to fit the choice logit model:
proc bchoice data=Travel2 seed=124; class Mode Subject / param=ref order=data; model Choice = Mode TravTime / choiceset=(Subject); run;
The "Choice Sets Summary" table shows that there are 21 choice sets and that each consists of three alternatives and one chosen alternative (each subject chooses one out of the three travel modes). It seems that the data are arrayed correctly.
Summary statistics are shown in Output 27.1.3.
When Transit
is the reference mode (normalized to 0), the part-worth (posterior mean) of Auto
, which is negative, might reflect that driving is more inconvenient than traveling by bus or train, and the negative part-worth
of Plane
might reflect that traveling by plane is more expensive than traveling by bus or train. However, both are only suggestive,
because the 95% HPD intervals have 0 in them. The posterior mean of TravTime
is negative, which makes sense because having to spend more time en route is often unfavorable.
To study the relationship between the choice of transportation and the age of people who make the choice, you need to create
an interaction between AgeCtr
and Mode
. AgeCtr
is not estimable by itself, because it is the same throughout a choice set for an individual. The following statements request
the interaction between AgeCtr
and Mode
:
proc bchoice data=Travel2 seed=124; class Mode Subject / param=ref order=data; model Choice = Mode Mode*AgeCtr TravTime / choiceset=(Subject); run;
Output 27.1.4: PROC BCHOICE Posterior Summary Statistics
Posterior Summaries and Intervals | |||||
---|---|---|---|---|---|
Parameter | N | Mean | Standard Deviation |
95% HPD Interval | |
Mode Auto | 5000 | -0.2634 | 0.7883 | -1.8072 | 1.1395 |
Mode Plane | 5000 | -2.8210 | 1.5370 | -5.9228 | -0.0686 |
AgeCtr*Mode Auto | 5000 | -0.0986 | 0.0678 | -0.2182 | 0.0350 |
AgeCtr*Mode Plane | 5000 | 0.0251 | 0.0775 | -0.1268 | 0.1618 |
TravTime | 5000 | -0.7608 | 0.2564 | -1.2943 | -0.3473 |
The parameter estimate for Mode Auto
reflects the part-worth of Auto
for an individual of mean age (34 years old), whereas the parameter estimate for Mode Plane
is the part-worth of Plane
for an individual of mean age. There are two interaction effects: the first corresponds to the effect of a one-unit change
in age on the probability of choosing Auto
over Transit
, and the second corresponds to the effect of a one-unit change in age on the probability of choosing Plane
over Transit
.