The BCHOICE Procedure (Experimental)

Example 27.5 Heterogeneity Affected by Individual Characteristics

Rossi, Allenby, and McCulloch (2005) studied a scanner panel data about purchases of margarine. The data were first analyzed in Allenby and Rossi (1991) and are about purchases of ten brands of margarine. This example considers a subset of data about six margarine brands: Parkay stick, Blue Bonnet stick, Fleischmann’s stick, a house brand stick, a generic stick, and Shedd’s Spread tub. There are 313 households, which made a total of 3,405 purchases. Information about a few demographic characteristics of these households (income and family size) is expected to have effects on the central location of the distribution of heterogeneity.

The data set, which is called Sashelp.Margarin, comes from the SASHELP library.

proc print data=Sashelp.Margarin (obs=24);
   by HouseID Set;
   id HouseID Set;
run;

The data for the first four choice sets are shown in Output 27.5.1.

Output 27.5.1: Data for the First Four Choice Sets

HouseID Set Choice Brand LogPrice LogInc FamSize
2100016 1 1 PPk -0.41552 3.48124 2
0 PBB -0.40048 3.48124 2
0 PFl 0.08618 3.48124 2
0 PHse -0.56212 3.48124 2
0 PGen -1.02165 3.48124 2
0 PSS -0.16252 3.48124 2

HouseID Set Choice Brand LogPrice LogInc FamSize
2100016 2 1 PPk -0.46204 3.48124 2
0 PBB -0.40048 3.48124 2
0 PFl -0.01005 3.48124 2
0 PHse -0.56212 3.48124 2
0 PGen -1.02165 3.48124 2
0 PSS -0.16252 3.48124 2

HouseID Set Choice Brand LogPrice LogInc FamSize
2100016 3 1 PPk -1.23787 3.48124 2
0 PBB -0.69315 3.48124 2
0 PFl -0.01005 3.48124 2
0 PHse -0.56212 3.48124 2
0 PGen -1.02165 3.48124 2
0 PSS -0.23572 3.48124 2

HouseID Set Choice Brand LogPrice LogInc FamSize
2100016 4 1 PPk -0.47804 3.48124 2
0 PBB -0.49430 3.48124 2
0 PFl -0.01005 3.48124 2
0 PHse -0.56212 3.48124 2
0 PGen -1.02165 3.48124 2
0 PSS -0.16252 3.48124 2


The variable HouseID represents the household ID, and each household made at least five purchases, which are defined by Set. The variable Choice represents the choice made among the six margarine brands for each purchase or choice set. The variable Brand has the value PPK for Parkay stick, PBB for Blue Bonnet stick, PFL for Fleischmann’s stick, PHse for the house brand stick, PGen for the generic stick, and PSS for Shedd’s Spread tub. The variable LogPrice is the logarithm of the product price. The variables LogInc and variable FamSize provide information about household income and family size, respectively.

For the purpose of comparison, the following statements fit the random-effects-only logit model by using random walk Metropolis sampling as suggested in Rossi, Allenby, and McCulloch (2005):

proc bchoice data=Sashelp.Margarin seed=123 nmc=20000 thin=4 alg=rwm nthreads=4;
   class Brand(ref='PPk') HouseID Set;
   model Choice = / choiceset=(HouseID Set);
   random  Brand LogPrice / subject=HouseID remean=(LogInc FamSize)
           type=un monitor=(1);
run;

The ALG=RWM option in the PROC BCHOICE statement requests the random walk Metropolis sampling algorithm, and the REMEAN=(LOGINC FAMSIZE) option in the RANDOM statement requests estimation of the nonzero mean of the random effects, which is a function of household income and family size. No fixed effects are specified in the MODEL statement. The NMC=20000 option in the PROC BCHOICE statement runs the chain for 20,000 iterations, and the THIN=4 option keeps one of every four samples. Summary statistics for the mean matrix of the random coefficients ($\bGamma $), the covariance of the random coefficients ($\bOmega _{\bgamma }$), and the random coefficients ($\bgamma _ i$) for the first household are shown in Output 27.5.2.

Output 27.5.2: Posterior Summary Statistics

The BCHOICE Procedure

Posterior Summaries and Intervals
Parameter Subject N Mean Standard
Deviation
95% HPD Interval
REMean Brand PBB   5000 -1.1809 0.6122 -2.3775 -0.0144
REMean Brand PFl   5000 -3.5116 2.0188 -7.5975 0.2097
REMean Brand PGen   5000 -4.9821 1.1709 -7.2338 -2.6884
REMean Brand PHse   5000 -3.2291 0.8966 -5.0406 -1.5415
REMean Brand PSS   5000 -0.0157 1.2214 -2.4096 2.3238
REMean LogPrice   5000 -3.3194 0.8545 -5.0431 -1.6453
REMean Brand PBB LogInc   5000 0.0550 0.2043 -0.3367 0.4451
REMean Brand PFl LogInc   5000 0.8263 0.6562 -0.4800 2.0819
REMean Brand PGen LogInc   5000 -0.5483 0.3930 -1.3391 0.2101
REMean Brand PHse LogInc   5000 0.0272 0.3023 -0.5640 0.6158
REMean Brand PSS LogInc   5000 -0.5861 0.4168 -1.3908 0.2276
REMean LogPrice LogInc   5000 -0.3175 0.2990 -0.8955 0.2697
REMean Brand PBB FamSize   5000 -0.0336 0.0966 -0.2275 0.1437
REMean Brand PFl FamSize   5000 -0.7625 0.3231 -1.4176 -0.1522
REMean Brand PGen FamSize   5000 0.5773 0.1836 0.2248 0.9402
REMean Brand PHse FamSize   5000 0.2346 0.1352 -0.0467 0.4808
REMean Brand PSS FamSize   5000 0.0406 0.1997 -0.3719 0.4186
REMean LogPrice FamSize   5000 0.1085 0.1331 -0.1504 0.3580
RECov Brand PBB, Brand PBB   5000 2.1824 0.3916 1.4411 2.9386
RECov Brand PFl, Brand PBB   5000 2.0848 0.8517 0.4737 3.7633
RECov Brand PFl, Brand PFl   5000 13.1762 3.9078 6.3684 21.0989
RECov Brand PGen, Brand PBB   5000 1.9526 0.5760 0.8471 3.0718
RECov Brand PGen, Brand PFl   5000 1.5669 1.7949 -1.6053 5.2591
RECov Brand PGen, Brand PGen   5000 8.3864 1.3307 6.0342 11.1880
RECov Brand PHse, Brand PBB   5000 1.5108 0.4554 0.6690 2.4392
RECov Brand PHse, Brand PFl   5000 2.4807 1.3482 -0.2733 4.9664
RECov Brand PHse, Brand PGen   5000 5.7498 0.9319 4.0263 7.6664
RECov Brand PHse, Brand PHse   5000 5.4675 0.8456 3.8925 7.1852
RECov Brand PSS, Brand PBB   5000 1.1780 0.6307 -0.0181 2.4864
RECov Brand PSS, Brand PFl   5000 0.8534 2.1434 -3.5777 5.0734
RECov Brand PSS, Brand PGen   5000 4.9601 1.1765 2.7987 7.4216
RECov Brand PSS, Brand PHse   5000 3.4612 0.8947 1.8634 5.3448
RECov Brand PSS, Brand PSS   5000 8.8466 1.7773 5.5686 12.3094
RECov LogPrice, Brand PBB   5000 -0.2652 0.3179 -0.8778 0.3718
RECov LogPrice, Brand PFl   5000 2.1725 0.8906 0.4017 3.8500
RECov LogPrice, Brand PGen   5000 -1.1063 0.5708 -2.2724 -0.0595
RECov LogPrice, Brand PHse   5000 -0.4487 0.4543 -1.2981 0.4822
RECov LogPrice, Brand PSS   5000 0.1428 0.6313 -1.0706 1.3744
RECov LogPrice, LogPrice   5000 2.0727 0.4839 1.1196 3.0223
Brand PBB HouseID 2100016 5000 -2.2892 1.0105 -4.3643 -0.4129
Brand PFl HouseID 2100016 5000 -4.1063 2.7737 -9.7447 0.5782
Brand PGen HouseID 2100016 5000 -6.5999 1.6118 -9.7086 -3.5667
Brand PHse HouseID 2100016 5000 -2.9469 1.1851 -5.3047 -0.7302
Brand PSS HouseID 2100016 5000 -3.3378 2.1715 -7.5906 0.7075
LogPrice HouseID 2100016 5000 -4.4983 1.1717 -6.8224 -2.2033


Table 27.11 collects the posterior means and standard deviations of $\bar\bGamma $ that are shown in Output 27.5.2. The first column corresponds to the parameters that are specified in the model, namely the brands and price. The second column shows the average part-worths of each brand (versus the brand, Parkay stick) and the price at LogInc=0 and FamSize=0. The LogInc and Age columns list the modifying effects on the preference for each brand and price by household income and family size, respectively. Larger families show more interest in the generic and house brands and tend to stay away from the Fleischmann’s brand. For example, consider the part-worth estimates for Fleischmann’s. The posterior mean for REMean Brand PFI FamSize (the Fleischmann’s row and the Famsize column) is –0.76 with a standard deviation of 0.32, meaning that an additional unit increase in family size is associated with a reduction of 0.76 in the estimated part-worth for Fleischmann’s. In general, the demographics of households are only weakly associated with preference for brand and price. These results are in good agreement with those of Rossi, Allenby, and McCulloch (2005).

Table 27.11: Posterior Mean and Standard Deviation of $\bGamma $

Parameter

 

Intercept

LogInc

FamSize

Blue

Name

REMean Brand PBB

REMean Brand PBB LogInc

REMean Brand PBB FamSize

Bonnet

Mean

–1.18

0.06

–0.03

 

Std

0.61

0.20

0.10

Fleisch-

Name

REMean Brand PFI

REMean Brand PFI LogInc

REMean Brand PFI FamSize

mann’s

Mean

–3.51

0.83

–0.76

 

Std

2.02

0.66

0.32

 

Name

REMean Brand PGen

REMean Brand PGen LogInc

REMean Brand PGen FamSize

Generic

Mean

–4.98

–0.55

0.58

 

Std

1.17

0.39

0.18

 

Name

REMean Brand PHse

REMean Brand PHse LogInc

REMean Brand PHse FamSize

House

Mean

–3.23

0.03

0.23

 

Std

0.90

0.30

0.14

Shedd’s

Name

REMean Brand PSS

REMean Brand PSS LogInc

REMean Brand PSS FamSize

Spread

Mean

–0.02

–0.59

–0.04

 

Std

1.22

0.42

0.20

 

Name

REMean LogPrice

REMean LogPrice LogInc

REMean LogPrice FamSize

LogPrice

Mean

–3.32

–0.32

0.11

 

Std

0.85

0.30

0.13


Because the demographic variables are not zero-centered, the Intercept column shows the average part-worths of each brand and price for households with LogInc=0 and FamSize=0, which are not very meaningful. It is better to center demographic variables by their means, so that the posterior means listed in the Intercept column can be interpreted as the part-worths of a household that has an average income and average size.

Nevertheless, you can obtain the utilities of households that have any income levels and sizes. For example, the average part-worth of the Fleischmann’s brand for a household with average income (LogInc=3.1) and family size (FamSize=3) would be as follows, because the estimated LogInc coefficient is 0.83 and the estimated FamSize coefficient is –0.76 for Fleischmann’s:

\[  -3.51+0.83\times 3.1-0.76\times 3=-3.22  \]

You can obtain part-worths for all other brands and compare their popularity among average households.

The posterior means and standard deviations of the covariance matrix of the random coefficients ($\bOmega _{\bgamma }$) are displayed by parameters that are labeled RECov Brand PBB, Brand PBB, RECov Brand PFI, Brand PBB, and so on. Some of the diagonal terms are fairly large, indicating that there is quite a bit of heterogeneity between households in margarine brand preference and price sensitivity. The covariance between the generic and house brands, RECov Brand PHse, Brand PGen, is fairly large, suggesting that household preferences for these two brands are highly correlated.

The next set of parameters, which are displayed in Output 27.5.2, contain the estimates for the random effects for the first household.