MODEL
dependent-variable-list ~distribution <options> ;
The MODEL statement specifies the conditional distribution of the data given the parameters (the likelihood function). You specify a single dependent variable or a list of dependent variables, a tilde ~, and then a distribution with its arguments. The dependent variables can be variables from the input data set or functions of the symbols in the program. You must specify the dependent variables unless you use the GENERAL function or the DGENERAL function (see the section Specifying a New Distribution for more details).
The MODEL statement assumes that the observations are independent of each other, conditional on the model parameters. If you want to model dependent data—that is, for —you can use the JOINTMODEL option in the PROC MCMC statement. See the section Modeling Joint Likelihood for more details. By default, the log-likelihood value is the sum of the individual log-likelihood value for each observation.
You can specify multiple MODEL statements. You can define likelihood functions that are independent of each other. For example,
in the following statements, the dependent variables y1
and y2
are independent of each other:
model y1 ~ normal(alpha, var=s21); model y2 ~ normal(beta, var=s22);
Alternatively, you can use marginal and conditional distributions to define a joint log-likelihood function for multiple dependent
variables. For example, the following statements jointly define a distribution over . They specify a marginal distribution for the dependent variable y1
and a conditional distribution for the dependent variable y2
:
model y1 ~ normal(alpha, var=s21); model y2 ~ normal(beta * y1, var=s22);
Every program must have at least one MODEL statement. If you want to run a Monte Carlo simulation that does not require a response variable, use the GENERAL function in the MODEL statement:
model general(0);
PROC MCMC interprets the statement as a flat likelihood function with a constant log-likelihood value of 0.
PROC MCMC is a programming language that is similar to the DATA step, and the order of statement evaluation is important. For example, the MODEL statement must come after any SAS programming statements that define or modify arguments used in the construction of the log likelihood. In PROC MCMC, a symbol can be defined multiple times and used at different places. Using an expression out of order produces erroneous results that can also be hard to detect.
Do not embed the MODEL statement within programming statements. For example, suppose you have three response variables, y1
, y2
, and y3
, and want to model each with a normal distribution. The following statements lead to erroneous output:
array Y[3] y1 y2 y3; do i = 1 to 3; model y[i] ~ normal(mu, sd=s); end;
Instead, you should do one of the following.
Use separate MODEL statements:
model y1 ~ normal(mu, sd=s); model y2 ~ normal(mu, sd=s); model y3 ~ normal(mu, sd=s);
Use the GENERAL function to construct a joint distribution of the three dependent variables and use a single MODEL statement to specify the log-likelihood function:
llike = logpdf("normal", y1, mu, s) + logpdf("normal", y2, mu, s) + logpdf("normal", y3, mu, s); model y1 y2 y3 ~ general(llike);
See the section Specifying a New Distribution for more information about how to use the GENERAL function to specify an arbitrary distribution.
Missing data are allowed in the response variables; the MODEL statement augments missing data automatically. (In releases before SAS/STAT 12.1, observations with missing values were discarded prior to analysis and PROC MCMC did not attempt to model these values.) In each iteration, PROC MCMC samples missing values from their posterior distributions and incorporates them as part of the simulation. PROC MCMC creates one variable for each missing response value. There are two ways to create the missing value variable names; see the NAMESUFFIX= option for the naming convention of the variables.
Standard distributions that the MODEL statement supports are listed in the Table 55.2 (univariate) and Table 55.3 (multivariate). See the section Standard Distributions for density specifications. You can also specify all distributions except the multinomial distribution in the PRIOR and HYPERPRIOR statements. The RANDOM statement supports only a subset of the distributions (see Table 55.4).
PROC MCMC allows some distributions to be parameterized in multiple ways. For example, you can specify a normal distribution with a variance, standard deviation, or precision parameter. For distributions that have different parameterizations, you must specify an option to clearly name the ambiguous parameter. For example, in the normal distribution, you must indicate whether the second argument represents variance, standard deviation, or precision.
All univariate distributions, with the exception of binary and uniform, can have the optional LOWER= and UPPER= arguments, which specify a truncated density. See the section Truncation and Censoring for more details. Truncation is not supported for multivariate distributions.
Table 55.2: Univariate Distributions
Distribution Name |
Definition |
---|---|
beta(<a=>, <b=>) |
Beta distribution with shape parameters and |
binary(<prob|p=> p) |
Binary (Bernoulli) distribution with probability of success p. You can use the alias bern for this distribution. |
binomial (<n=> n, <prob|p=> p) |
Binomial distribution with count n and probability of success p |
cauchy (<location|loc|l=>, <scale|s=>) |
Cauchy distribution with location and scale |
chisq(<df=> ) |
distribution with degrees of freedom |
dgeneral(ll) |
General log-likelihood function that you construct using SAS programming statements for single or multiple discrete parameters. Also see the function general. The name dlogden is an alias for this function. |
expchisq(<df=> ) |
Log transformation of a distribution with degrees of freedom: . You can use the alias echisq for this distribution. |
Log transformation of an exponential distribution with scale or inverse-scale parameter : . You can use the alias eexpon for this distribution. |
|
expGamma(<shape|sp=> a, scale|s= ) |
Log transformation of a gamma distribution with shape a and scale or inverse-scale : . You can use the alias egamma for this distribution. |
expichisq(<df=> ) |
Log transformation of an inverse distribution with degrees of freedom: . You can use the alias eichisq for this distribution. |
expiGamma(<shape|sp=> a, scale|s= ) |
Log transformation of an inverse-gamma distribution with shape a and scale or inverse-scale : . You can use the alias eigamma for this distribution. |
expsichisq(<df=> , <scale|s=> s) |
Log transformation of a scaled inverse distribution with degrees of freedom and scale parameter s: . You can use the alias esichisq for this distribution. |
Exponential distribution with scale or inverse-scale parameter |
|
gamma(<shape|sp=> a, scale|s= ) |
Gamma distribution with shape a and scale or inverse-scale |
geo(<prob|p=> p) |
Geometric distribution with probability p |
general(ll) |
General log-likelihood function that you construct using SAS programming statements for a single or multiple continuous parameters. The argument ll is an expression for the log of the distribution. If there are multiple variables specified before the tilde in a MODEL, PRIOR, or HYPERPRIOR statement, ll is interpreted as the log of the joint distribution for these variables. Note that in the MODEL statement, the response variable specified before the tilde is just a place holder and is of no consequence; the variable must have appeared in the construction of ll in the programming statements. general(constant) is equivalent to a uniform distribution on the real line. You can use the alias logden for this distribution. |
ichisq(<df=>) |
Inverse distribution with degrees of freedom |
igamma(<shape|sp=> a, scale|s= ) |
Inverse-gamma distribution with shape a and scale or inverse-scale |
laplace(<location|loc|l=> , scale|s= ) |
Laplace distribution with location and scale or inverse-scale . This is also known as the double exponential distribution. You can use the alias dexpon for this distribution. |
logistic(<location|loc|l=> a, <scale|s=> b) |
Logistic distribution with location a and scale b |
lognormal(<mean|m=> , sd= ) |
Log-normal distribution with mean and a value of for the standard deviation, variance, or precision. You can use the aliases lognormal or lnorm for this distribution. |
negbin(<n=> n, <prob|p=> p) |
Negative binomial distribution with count n and probability of success p. You can use the alias nb for this distribution. |
normal(<mean|m=> , sd= ) |
Normal (Gaussian) distribution with mean and a value of for the standard deviation, variance, or precision. You can use the aliases gaussian, norm, or n for this distribution. |
pareto(<shape|sp=> a, <scale|s=> b) |
Pareto distribution with shape a and scale b |
poisson(<mean|m=> ) |
Poisson distribution with mean |
sichisq(<df=> , <scale|s=> s) |
Scaled inverse distribution with degrees of freedom and scale parameter s |
t(<mean|m=> , sd= , <df=> ) |
T distribution with mean , standard deviation or variance or precision , and degrees of freedom |
uniform(<left|l=> a, <right|r=> b) |
Uniform distribution with range a and b. You can use the alias unif for this distribution. |
wald(<mean|m=> , <iscale|is=> ) |
Wald distribution with mean parameter and inverse scale parameter . This is also known as the Inverse Gaussian distribution. You can use the alias igaussian for this distribution. |
weibull() |
Weibull distribution with location (threshold) parameter , shape parameter c, and scale parameter . |
Table 55.3: Multivariate Distributions
Distribution Name |
Definition |
---|---|
dirichlet(<alpha=>) |
Dirichlet distribution with parameter vector , where must be a one-dimensional array of length greater than 1 |
iwish(<df=>, <scale=>S) |
Inverse Wishart distribution with degrees of freedom and symmetric positive definite scale array S |
mvn(<mu=>, <cov=>) |
Multivariate normal distribution with mean vector and covariance matrix |
mvnar(<mu=>, sd= , <rho=>) |
Multivariate normal distribution with mean vector and a covariance matrix . The covariance matrix is a multiple of the scale and a matrix with a first-order autoregressive structure. When rho=0, this distribution becomes a multivariate normal distribution with shared variance. |
multinom(<p=>p) |
Multinomial distribution with probability vector p |
The options in the MODEL statement apply only when there are missing values in the response variable. You can specify the following options: