This example shows how to obtain the posterior predictive distribution of the choice probability that each alternative is chosen from a choice set. The posterior predictive distribution enables you to get the expected choice probabilities of all the alternatives in the data, or even to predict market share for simulated or hypothetical products or marketplaces that do not directly reflect the choice set in the data.
Suppose you have a data set that contains all the attribute variables (the design matrix) for all the alternatives in a choice
set. For example, in the candy study earlier in the chapter, in the section A Simple Logit Model, you can use the same eight alternatives: Dark
is 1 for dark chocolate and 0 for soft chocolate; Soft
is 1 for soft center and 0 for chewy center; Nuts
is 1 if the candy contains nuts and 0 if it contains no nuts. The following data set contains the eight alternatives:
data DesignMatrix; input Dark Soft Nuts; datalines; 0 0 0 0 0 1 0 1 0 0 1 1 1 0 0 1 0 1 1 1 0 1 1 1 ;
You can use the PREDDIST statement, which obtains samples from the posterior predictive distribution of each of the choice probabilities by using the posterior samples of parameters in the model:
proc bchoice data=Chocs outpost=Bsamp nmc=10000 thin=2 seed=124; class Dark(ref='0') Soft(ref='0') Nuts(ref='0') Subj; model Choice = Dark Soft Nuts / choiceset=(Subj); preddist covariates=DesignMatrix nalter=8 outpred=Predout; run; %POSTSUM(data=Predout, var=Prob_1_:);
In the PREDDIST statement, the COVARIATES= option names the data set to contain the explanatory variable values for which the predictions are established. This data set must contain data that have the same variables that are used in the model. The NALTER= option specifies the number of alternatives in each choice set in the COVARIATES= data set. All choice sets in the data must have the same number of alternatives. If you omit the COVARIATES = option, the DATA = data set that you specify in the PROC BCHOICE statement is used instead. The OUTPRED= option creates an output data set to contain the samples from the posterior predictive distribution of the choice probabilities. Then you can use SAS autocall macros to analyze the posterior samples. For example, the %POSTSUM macro provides summary statistics.
You can predict the choice probabilities by using the means of the posterior distributions. The results from using the %POSTSUM macro are shown in Output 27.7.1. There is only one choice set for choice probability prediction, in which there are a total of eight alternatives. This explains the parameter names in the first column of the output, where the first number indexes the choice sets and the second number indexes the alternatives in each choice set. The most preferred chocolate candy is the sixth one, Dark/Chewy/Nuts, which takes about half the market.
Output 27.7.1: Choice of Chocolate Candies
Summary Statistics |
Parameter | N | Mean | StdDev | P25 | P50 | P75 |
---|---|---|---|---|---|---|
Prob_1_1 | 5000 | 0.05541 | 0.04252 | 0.02396 | 0.04529 | 0.07282 |
Prob_1_2 | 5000 | 0.13093 | 0.08009 | 0.06949 | 0.11487 | 0.17965 |
Prob_1_3 | 5000 | 0.00686 | 0.00896 | 0.00137 | 0.00366 | 0.00853 |
Prob_1_4 | 5000 | 0.01578 | 0.01786 | 0.00389 | 0.00966 | 0.02061 |
Prob_1_5 | 5000 | 0.21016 | 0.10328 | 0.13431 | 0.19301 | 0.27497 |
Prob_1_6 | 5000 | 0.49462 | 0.13152 | 0.40465 | 0.49734 | 0.58797 |
Prob_1_7 | 5000 | 0.02617 | 0.02793 | 0.00717 | 0.01683 | 0.03550 |
Prob_1_8 | 5000 | 0.06008 | 0.05261 | 0.02032 | 0.04571 | 0.08514 |