The MCMC Procedure

Functions of Random-Effects Parameters

When you specify a RANDOM statement in a program, PROC MCMC internally creates a random-effects parameter for every unique value in the SUBJECT= variable. You can calculate any transformations of these random-effects parameters by applying SAS functions to the effect, and you can use the transformed variable in the subsequent statements. For example, the following statements perform a logit transformation of an effect:

random u ~ normal(mu, var=s2) subject=students;
p = logistic(u);
...

The value of the variable p changes with u as the procedure steps through the input data set: for different unique values of the students variable, u takes on a different parameter value, and p changes accordingly.

To save all the transformed values in p to the OUTPOST= data set, you cannot just specify the MONITOR=(p) option in the PROC MCMC statement. With such a specification, PROC MCMC can save only one value of p (usually the value associated with the last observation in the data set); it cannot save all values. To output all transformed values, you must create an array to store every transformation and use the MONITOR= option to save the entire array to the OUTPOST= data set. The difficult part of the programming involves the creation of the correct array index to use in different types of the SUBJECT= variables. The rest of this section describes how to monitor functions of random-effects parameters in different situations.

Indexing Subject Variable

This subsection describes how to monitor transformation of an effect u when the students variable is an indexing subject variable. An indexing subject variable is an integer variable that takes value from one to the total number of unique subjects in a variable. In other words, the variable can be used as an index in a SAS array. The indexing subject variable does not need to be sorted for the example code in this section to work. An example of an indexing variable takes the values of (1 2 3 4 5 1 2 3 4 5), where the total number of observation is n=10 and the number of unique values is m=5.

The following statements create an indexing variable students in the data set a:

data a;
   input students @@;
   datalines;
   1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10 10
;

The following statements run a random-effects model without any response variables. There are only random-effects parameters in the model; the program calculates the logit transformation of each effect, saves the results to the OUTPOST= data set, and produces Figure 55.21:

proc mcmc data=a monitor=(p) diag=none stats=none outpost=a1 
   plots=none seed=1;
   array p[10];
   random u ~ n(0, sd=1) subject=students;
   p[students] = logistic(u);
   model general(0);
run;

proc print data=a1(obs=3);
run;

The ARRAY statement creates an array p of size 10, which is the number of unique values in students. The p array stores all the transformed values. The RANDOM statement declares u to be an effect with the subject variable students. The P[STUDENTS] assignment statement calculates the logit transformations of u and saves them in appropriate array elements—this is why the students variable must be an indexing variable. Because the students variable used in the p[] array is also the subject variable in the RANDOM statement, PROC MCMC can match each random-effects parameter with the corresponding element in array p. The MONITOR= option monitors all elements in p and saves the output to the a1 data set. The a1 data set contains variables p1p10. Figure 55.21 shows the first three observations of the OUTPOST= data set.

Figure 55.21: Monitor Functions of Random Effect u

Obs Iteration p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 u_1 u_2 u_3 u_4 u_5 u_6 u_7 u_8 u_9 u_10 LogReff LogLike LogPost
1 1 0.5050 0.7334 0.5375 0.5871 0.6862 0.4944 0.5740 0.5991 0.6812 0.8678 0.0198 1.0120 0.1504 0.3521 0.7826 -0.0225 0.2983 0.4019 0.7595 1.8819 -12.2658 0 -12.2658
2 2 0.5563 0.4254 0.7260 0.5280 0.4593 0.5797 0.5813 0.7360 0.9079 0.3351 0.2261 -0.3006 0.9742 0.1120 -0.1630 0.3213 0.3281 1.0253 2.2887 -0.6854 -13.2393 0 -13.2393
3 3 0.6132 0.8031 0.4735 0.3135 0.7047 0.5273 0.6761 0.2379 0.2938 0.3986 0.4607 1.4058 -0.1060 -0.7837 0.8698 0.1092 0.7361 -1.1641 -0.8772 -0.4112 -12.3984 0 -12.3984


The variable p1 is the logit transformation of the variable u_1, p2 is the logit transformation of the variable u_2, and so on.

The same idea works for a students variable that is unsorted. The following statements create an unsorted indexing variable students with repeated measures in each subject, fit the same model, and produce Figure 55.22:

data a;
   input students @@;
   datalines;
   1 1 1 3 5 3 4 5 3 1 5 5 4 4 2 2 2 2 4 3 
;
proc mcmc data=a monitor=(p) diag=none stats=none outpost=a1
   plots=none seed=1;
   array p[5];
   random u ~ n(0, sd=1) subject=students;
   p[students] = logistic(u);
   model general(0);
run;

proc print data=a1(obs=3);
run;

Figure 55.22: Monitor Functions of Random Effect u When the students Variable is Unsorted

Obs Iteration p1 p2 p3 p4 p5 u_1 u_3 u_5 u_4 u_2 LogReff LogLike LogPost
1 1 0.5050 0.6862 0.7334 0.5871 0.5375 0.0198 1.0120 0.1504 0.3521 0.7826 -5.4865 0 -5.4865
2 2 0.4944 0.8678 0.5740 0.6812 0.5991 -0.0225 0.2983 0.4019 0.7595 1.8819 -6.7793 0 -6.7793
3 3 0.5563 0.4593 0.4254 0.5280 0.7260 0.2261 -0.3006 0.9742 0.1120 -0.1630 -5.1595 0 -5.1595


There are five random-effects parameters in this example, and the array p also has five elements. The values p1p5 are the transformations of u_1u_5, respectively. The u variables are not sorted from u_1 to u_5 because PROC MCMC creates the names according to the order by which the subject variable appears in the input data set. Nevertheless, because students is an indexing variable, the first element p[1] stores the transformation that corresponds to students=1 (which is u_1), the second element p[2] stores the transformation that corresponds to students=2, and so on.

Non-Indexing Subject Variable

A non-indexing subject variable can take values of character literals (for example, names) or numerals (for example, ZIP code or a person’s weight). This section illustrates how to monitor functions of random-effects parameters in these situations.

Suppose you have unsorted character literals as the subject variable:

data a;
   input students$ @@;
   datalines;
   smith john john mary kay smith lizzy ben ben dylan 
   ben toby abby mary kay kay lizzy ben dylan mary
;

A statement such as following does not work anymore because a character variable cannot be used as an array index:

p[students] = logistic(u);

In this situation, you usually need to do two things: (1) find out the number of unique values in the subject variable, and (2) create a numeric index variable that replaces the students array index. You can use the following statements to do the first task:

proc sql noprint;
   select count(distinct(students)) into :nuniq from a;   
quit;
%put &nuniq;

The PROC SQL call counts the distinct values in the students variable and saves the count to the macro variable &nuniq. The macro variable is used later to specify the element size of the p array. In this example, the a data set contains 20 observations and 9 unique elements (the value of &nuinq).

The following statements create an Index variable in the a data set that is in the order by which the students names appear in the data set:

proc freq data=a order=data noprint;
   tables students / out=_f(keep=students);
run;

proc print data=_f;
run;

data a(drop=n);
   set a;
   do i = 1 to nobs until(students=n);
      set _f(keep=students rename=(students=n)) point=i nobs=nobs;
      Index = i;
   end;
run;

proc print data=a;
run;

The PROC FREQ call identifies the unique students names and saves them to the _f data set, which is displayed in Figure 55.23.

Figure 55.23: Unique Names in the Variable Students

Obs students
1 smith
2 john
3 mary
4 kay
5 lizzy
6 ben
7 dylan
8 toby
9 abby


The DATA step steps through the a data set and creates an Index variable to match the order in which the students names appear in the data set. The new a data set[5] is displayed in Figure 55.24.

Figure 55.24: New Index Variable in the a Data Set

Obs students Index
1 smith 1
2 john 2
3 john 2
4 mary 3
5 kay 4
6 smith 1
7 lizzy 5
8 ben 6
9 ben 6
10 dylan 7
11 ben 6
12 toby 8
13 abby 9
14 mary 3
15 kay 4
16 kay 4
17 lizzy 5
18 ben 6
19 dylan 7
20 mary 3


Student smith is the first subject, and his Index value is one. The same student appears again in the sixth observation, which is given the same Index value. Now, this Index variable can be used to index the p array, in a similar fashion as demonstrated in previous programs:

data _f;
   set _f;
   subj = compress('p_'||students);
run;
proc sql noprint;
   select subj into :pnames separated by ' ' from _f;
quit;
%put &pnames;

proc mcmc data=a monitor=(p) diag=none stats=none outpost=a1 
   plots=none seed=1;
   array p[&nuniq] &pnames;
   random u ~ n(0, sd=1) subject=students;
   p[index] = logistic(u);
   model general(0);
run;

proc print data=a1(obs=3);
run;

The first part of the DATA step and the PROC SQL call create array names for the p array that match the subject names in the students variable. It is not necessary to include these steps for the PROC MCMC program to work, but it makes the output more readable. The first DATA step steps through the _f data set and creates a subj variable that concatenates the prefix characters p_ with the names of the students. The PROC SQL calls put all the subj values to a macro called &pnames, which looks like the following:

p_smith p_john p_mary p_kay p_lizzy p_ben p_dylan p_toby p_abby

In the PROC MCMC program, the ARRAY statement defines an p array with size &nuniq (9), and use the macro variable &pnames to name the array elements. The P[INDEX] assignment statement uses the Index variable to find the correct array element to store the transformation. Figure 55.25 displays the first few observations of the OUTPOST=a1 data set.

Figure 55.25: First Few Observations of the Outpost Data Set

Obs Iteration p_smith p_john p_mary p_kay p_lizzy p_ben p_dylan p_toby p_abby u_smith u_john u_mary u_kay u_lizzy u_ben u_dylan u_toby u_abby LogReff LogLike LogPost
1 1 0.5050 0.7334 0.5375 0.5871 0.6862 0.4944 0.5740 0.5991 0.6812 0.0198 1.0120 0.1504 0.3521 0.7826 -0.0225 0.2983 0.4019 0.7595 -9.5761 0 -9.5761
2 2 0.8678 0.5563 0.4254 0.7260 0.5280 0.4593 0.5797 0.5813 0.7360 1.8819 0.2261 -0.3006 0.9742 0.1120 -0.1630 0.3213 0.3281 1.0253 -11.2370 0 -11.2370
3 3 0.9079 0.3351 0.6132 0.8031 0.4735 0.3135 0.7047 0.5273 0.6761 2.2887 -0.6854 0.4607 1.4058 -0.1060 -0.7837 0.8698 0.1092 0.7361 -13.1867 0 -13.1867


There are nine random-effects parameters (u_smith, u_john, and so on). There are nine elements of the p array (p_smith, p_john, and so on); each is the logit transformation of corresponding u elements.

You can use the same statements for subject variables that are numeric non-indexing variables. The following statements create a students variable that take large numbers that cannot be used as indices to an array. The rest of the program monitors functions of the effect u. The output is not displayed here.

data a;
   call streaminit(1);
   do i = 1 to 20;
      students = rand("poisson", 20);
      output;
   end;
   drop i;
run;

proc sql noprint;
   select count(distinct(students)) into :nuniq from a;   
quit;
%put &nuniq;

proc freq data=a order=data noprint;
   tables students / out=_f(keep=students);
run;

data a(drop=n);
   set a;
   do i = 1 to nobs until(students=n);
      set _f(keep=students rename=(students=n)) point=i nobs=nobs;
      Index = i;
   end;
run;

data _f;
   set _f;
   subj = compress('p_'||students);
proc sql noprint;
   select subj into :pnames separated by ' ' from _f;
   quit;
%put &pnames;

proc mcmc data=a monitor=(p) diag=none stats=none outpost=a1 
   plots=none seed=1;
   array p[&nuniq] &pnames;
   random u ~ n(0, sd=1) subject=students;
   p[index] = logistic(u);
   model general(0);
run;

proc print data=a1(obs=3);
run;


[5] The programming code that creates and adds the Index variable to the data set a keeps all variables from the original data set and does not discard them.