BY
variables ;
A BY statement is used with the FIT statement to obtain separate estimates for observations in groups defined by the BY variables. If an output model file is written using the OUTMODEL= option, the parameter values that are stored are those from the last BY group processed. To save parameter estimates for each BY group, use the OUTEST= option in the FIT statement.
A BY statement is used with the SOLVE statement to obtain solutions for observations in groups defined by the BY variables. If the BY variables in the DATA= data set and the ESTDATA= data set are identical, then the two data sets are synchronized and the calculations are performed by using the data and parameters for each BY group. This holds for BY variables in the SDATA= data set as well. If the BY variables do not match, BY-group processing is abandoned in either the ESTDATA= data set or the SDATA= data set, whichever has the missing BY value. If the DATA= data set does not contain BY variables and the ESTDATA= data set or the SDATA= data set does, then BY-group processing is performed for the ESTDATA= data set and the SDATA= data set by reusing the data in the DATA= data set for each BY group.
If both FIT and SOLVE tasks require BY-group processing, then two separate BY statements are needed. If parameters for each BY group in the OUTEST = data set that is obtained from the FIT task are to be used for the corresponding BY group for the SOLVE task, then one of the two BY statements must appear after the SOLVE statement.
The following linear regression example illustrates the use of BY-group processing. Both the data sets A and D to be used for fitting and solving, respectively, have three groups.
/*------ data set for fit task------ */ data a ; do group = 1 to 3 ; do i = 1 to 100 ; x = normal(1); y = 2 + 3*x + rannor(1) ; output ; end ; end ; run ; /*------ data set for solve task------ */ data d ; do group = 1 to 3 ; x = normal(1) ; output ; end ; run ;
/* ------ 2 BY statements, one of them appear after SOLVE statement ------ */ proc model data = a ; by group ; y = a0 + a1*x ; fit y / outest = b1 ; solve y / data = d estdata = b1 out = c1 ; by group ; run; proc print data = b1 ;run; proc print data = c1 ; run;
Each of the parameter estimates obtained from the BY group processing in the FIT statement shown in Figure 19.15 is used in the corresponding BY group variables in the SOLVE statement. The output dataset is shown in Figure 19.16.
Figure 19.15: Listing of OUTEST= Data Set Created in the FIT Statement with Two BY Statements
Obs | group | _NAME_ | _TYPE_ | _STATUS_ | _NUSED_ | a0 | a1 |
---|---|---|---|---|---|---|---|
1 | 1 | OLS | 0 Converged | 100 | 2.00338 | 3.00298 | |
2 | 2 | OLS | 0 Converged | 100 | 2.05091 | 3.08808 | |
3 | 3 | OLS | 0 Converged | 100 | 2.15528 | 3.04290 |
Figure 19.16: Listing of OUT= Data Set Created in the SOLVE Statement with Two BY Statements
Obs | group | _TYPE_ | _MODE_ | _ERRORS_ | y | x |
---|---|---|---|---|---|---|
1 | 1 | PREDICT | SIMULATE | 0 | 7.42322 | 1.80482 |
2 | 2 | PREDICT | SIMULATE | 0 | 1.80413 | -0.07992 |
3 | 3 | PREDICT | SIMULATE | 0 | 3.36202 | 0.39658 |
If only one BY statement is used and it appears before the SOLVE statement, then parameters for the last BY group in the
OUTEST = data set are used for all BY groups for the SOLVE task.
/*------ 1 BY statement that appears before SOLVE statement------ */ proc model data = a ; by group ; y = a0 + a1*x ; fit y / outest = b2 ; solve y / data = d estdata = b2 out = c2 ; run; proc print data = b2 ; run; proc print data = c2 ; run;
The estimates of the parameters are shown in Figure 19.17, and the output data set of the SOLVE statement is shown in Figure 19.18. Hence, the estimates and the predicted values obtained in the last BY group variable of both DATA C1 and C2 are the same while the others do not match.
Figure 19.17: Listing of OUTEST= Data Set Created in the FIT Statement with One BY Statement That Appears before the SOLVE Statement
Obs | group | _NAME_ | _TYPE_ | _STATUS_ | _NUSED_ | a0 | a1 |
---|---|---|---|---|---|---|---|
1 | 1 | OLS | 0 Converged | 100 | 2.00338 | 3.00298 | |
2 | 2 | OLS | 0 Converged | 100 | 2.05091 | 3.08808 | |
3 | 3 | OLS | 0 Converged | 100 | 2.15528 | 3.04290 |
Figure 19.18: Listing of OUT= Data Set Created in the SOLVE Statement with One BY Statement That Appears before the SOLVE Statement
Obs | _TYPE_ | _MODE_ | _ERRORS_ | y | x |
---|---|---|---|---|---|
1 | PREDICT | SIMULATE | 0 | 7.64717 | 1.80482 |
2 | PREDICT | SIMULATE | 0 | 1.91211 | -0.07992 |
3 | PREDICT | SIMULATE | 0 | 3.36202 | 0.39658 |
If only one BY statement is used and it appears after the SOLVE statement, then BY group processing does not apply to the
FIT task. In this case, the OUTEST=data set does not contain the BY variable, and the single set of parameter estimates obtained
from the FIT task are used for all BY groups during the SOLVE task.
/*------ 1 BY statement that appears after SOLVE statement------*/ proc model data = a ; y = a0 + a1*x ; fit y / outest = b3 ; solve y / data = d estdata = b3 out = c3 ; by group ; run; proc print data = b3 ; run; proc print data = c3 ; run;
The output data B3 and C3 are listed in Figure 19.19 and Figure 19.20, respectively.
Figure 19.19: Listing of OUTEST= Data Set Created in the FIT Statement with One BY Statement That Appears after the SOLVE Statement
Obs | _NAME_ | _TYPE_ | _STATUS_ | _NUSED_ | a0 | a1 |
---|---|---|---|---|---|---|
1 | OLS | 0 Converged | 300 | 2.06624 | 3.04219 |
Figure 19.20: Listing of OUT= Data Set Created in the First SOLVE Statement with One BY Statement That Appears after the SOLVE Statement
Obs | group | _TYPE_ | _MODE_ | _ERRORS_ | y | x |
---|---|---|---|---|---|---|
1 | 1 | PREDICT | SIMULATE | 0 | 7.55686 | 1.80482 |
2 | 2 | PREDICT | SIMULATE | 0 | 1.82312 | -0.07992 |
3 | 3 | PREDICT | SIMULATE | 0 | 3.27270 | 0.39658 |