The PARTITION statement specifies how observations in the input data set are logically partitioned into disjoint subsets for
model training, validation, and testing. Either you can designate a variable in the input data set and a set of formatted
values of that variable to determine the role of each observation, or you can specify proportions to use for random assignment
of observations for each role.
An alternative to using a PARTITION statement is to provide a variable named _ROLE_
in the input data set to define roles of observations in the input data. If you specify a PARTITION statement, then the _ROLE_
variable is ignored if it is present in the input data set. If you do not specify a PARTITION statement and the input data
do not contain a variable named _ROLE_
, then all observations in the input data set are assigned to model training.
You can specify the following mutually exclusive options:
-
ROLEVAR=variable (<TEST='value'> <TRAIN='value'> <VALIDATE='value'>)
ROLE=variable (<TEST='value'> <TRAIN='value'> <VALIDATE='value'>)
-
names the variable in the input data set whose values are used to assign roles to each observation. The formatted values of this variable that
are used to assign observations roles are specified in the TEST=, TRAIN=, and VALIDATE= suboptions. If you do not specify
the TRAIN= suboption, then all observations whose role is not determined by the TEST= or VALIDATE= suboptions are assigned
to training. If you specify a TESTDATA= data set in the PROC ADAPTIVEREG statement, then you cannot also specify the TEST=
suboption in the PARTITION statement. If you specify a VALDATA= data set in the PROC ADAPTIVEREG statement, then you cannot
also specify the VALIDATE= suboption in the PARTITION statement.
-
FRACTION(<TEST=fraction> <VALIDATE=fraction>)
-
randomly assigns training and validation roles to the observations in the input data according to the proportions that are
specified by the fraction values in the TEST= and VALIDATE= suboptions. If you specify both the TEST= and VALIDATE= suboptions,
then the sum of the specified fractions must be less than 1 and the remaining fraction of the observations are assigned to
the training role. If you specify a TESTDATA= data set in the PROC ADAPTIVEREG statement, then you cannot also specify the
TEST= suboption in the PARTITION statement. If you specify a VALDATA= data set in the PROC ADAPTIVEREG statement, then you
cannot also specify the VALIDATE= suboption in the PARTITION statement.