The MultiLoad bulk-load operation (or MultiLoad job)
works in phases. The first is the
acquisition phase,
during which data is transferred from SAS to Teradata work tables.
The second is the application phase, during which data is applied
to the target table.
If the MultiLoad job
fails during the acquisition phase, you can restart the job from the
last successful checkpoint. The exact observation from which the MultiLoad
job must be restarted is displayed in the SAS log. If the MultiLoad
job fails in the application phase—when data is loaded onto
the target tables from the work table—restart the MultiLoad
job outside of SAS. The MultiLoad restart script is displayed in the
SAS log. You can run the generated MultiLoad script outside of SAS
to complete the load.
You can use ML_CHECKPOINT=
to specify the checkpoint rate. Specify a value for ML_CHECKPOINT=
if you want restart capability. If checkpoint tracking is not used
and the MultiLoad fails in the acquisition phase, the load needs to
be restarted from the beginning. ML_CHECKPOINT=0 is the default,
and no checkpoints are recoded if you use the default.
If ML_CHECKPOINT is
between 1 and 59 inclusive, checkpoints are recorded at the specified
interval in minutes. If ML_CHECKPOINT is greater than or equal to
60, a checkpoint occurs after a multiple of the specified rows are
loaded.
ML_CHECKPOINT= functions
very much like the Teradata MultiLoad utility checkpoint, but it differs
from the DBCOMMIT= data set option.
These restrictions apply
when you restart a failed MultiLoad job.
-
The failed MultiLoad job must have
specified a checkpoint rate other than 0 using the ML_CHECKPOINT=
data set option. Otherwise, restarting begins from the first record
of the source data.
Checkpoints are relevant
only to the acquisition phase of MultiLoad. Even if ML_CHECKPOINT=0
is specified, a checkpoint takes place at the end of the acquisition
phase. If the job fails after that (in the application phase) you
must restart the job outside of SAS using the MultiLoad script written
to the SAS log.
For example, this MultiLoad
job takes a checkpoint every 1000 records.
libname trlib teradata user=testuser pw=XXXXXX server=dbc;
/* Create data to MultiLoad */
data work.testdata;
do x=1 to 50000;
output;
end;
end;
data trlib.mlfloat(MultiLoad=yes ML_CHECKPOINT=1000);
set work.testdata;
run;
-
You must restart the failed MultiLoad
job as an append process because the target table already exists.
It is also necessary to identify the work tables, restart table, and
the error tables used in the original job.
For example, assume
that the DATA step shown above failed with this error message in the
SAS log:
ERROR: MultiLoad failed with DBS error 2644 after a checkpoint was
taken for 13000 records. Correct error and restart as an append
process with data set options ML_RESTART=SAS_ML_RS_1436199780,
ML_ERROR1=SAS_ML_ET_1436199780, ML_ERROR2=SAS_ML_UT_1436199780,
and ML_WORK=SAS_ML_WT_1436199780. If the first run used FIRSTOBS=n,
use the value (7278+n-1) for FIRSTOBS in the restart. Otherwise,
use FIRSTOBS=7278. Sometimes the FIRSTOBS value that is used
on the restart can be an earlier position than the last checkpoint
because restart is block-oriented and not record-oriented.
After you fix the error,
you must restart the job as an append process. You must also specify
the same work, error, and restart tables as you used in the earlier
run. You use a FIRSTOBS= value on the source table to specify the
record from which to restart.
/* Restart a MultiLoad job that failed in the acquisition phase
after correcting the error */
proc append data=work.testdata(FIRSTOBS=7278)
base=trmlib.mlfloat(MultiLoad=YES ML_RESTART=SAS_ML_RS_1436199780
ML_ERROR1=SAS_ML_ET_1436199780 ML_ERROR2=SAS_ML_UT_1436199780
ML_WORK=SAS_ML_WT_1436199780 ML_CHECKPOINT=1000);
run;
-
If you used ML_LOG= in the run
that failed, you can specify the same value for ML_LOG= on restart.
Therefore, you need not specify four data set options to identify
the temporary tables that MultiLoad uses.
For example, assume
that this is how the original run used ML_LOG=:
data trlib.mlfloat(MultiLoad=yes ML_CHECKPOINT=1000 ML_LOG=MY_ERRORS);
set work.testdata;
run;
If this DATA step fails
with this error, the restart capability needs only ML_LOG= to identify
all necessary tables.
ERROR: MultiLoad failed with DBS error 2644 after a
checkpoint was taken for 13000 records. Correct error and restart
as an append process with data set options ML_RESTART=SAS_ML_RS_1436199780,
ML_ERROR1=SAS_ML_ET_1436199780, ML_ERROR2=SAS_ML_UT_1436199780, and
ML_WORK=SAS_ML_WT_1436199780. If the first run used FIRSTOBS=n, use
the value (7278+n-1) for FIRSTOBS in the restart. Otherwise use
FIRSTOBS=7278. Sometimes the FIRSTOBS value used on the restart can be an
earlier position than the last checkpoint because restart is block-oriented
and not record-oriented.
proc append data=work.testdata(FIRSTOBS=7278)
base=trlib.mlfloat(MultiLoad=YES ML_LOG=MY_ERRORS ML_CHECKPOINT=1000);
run;
-
If the MultiLoad process fails
in the application phase, SAS has already transferred all data to
be loaded to Teradata. You must restart a MultiLoad job outside of
SAS using the script that is written to the SAS log. See your Teradata
documentation on the MultiLoad utility for instructions on how to
run MultiLoad scripts. Here is an example of a script that is written
in the SAS log.
=-=-= MultiLoad restart script starts here =-=-=
.LOGTABLE MY_ERRORS_RS;
.LOGON boom/mloaduser,********;
.begin import mload tables "mlfloat" CHECKPOINT 0 WORKTABLES
MY_ERRORS_WT ERRORTABLES
MY_ERRORS_ET MY_ERRORS_UT
/*TIFY HIGH EXIT SASMLNE.DLL TEXT '2180*/;
.layout saslayout indicators;
.FIELD "x" * FLOAT;
.DML Label SASDML;
insert into "mlfloat".*;
.IMPORT INFILE DUMMY
/*SMOD SASMLAM.DLL '2180 2180 2180 */
FORMAT UNFORMAT LAYOUT SASLAYOUT
APPLY SASDML;
.END MLOAD;
.LOGOFF;
=-=-= MultiLoad restart script ends here =-=-=
ERROR: MultiLoad failed with DBS error 2644 in the application
phase. Run the MultiLoad restart script listed above outside of
SAS to restart the job.
-
If the original run used a value
for FIRSTOBS= for the source data, use the formula from the SAS log
error message to calculate the value for FIRSTOBS= upon restart. These
examples show how to do this.
/* Create data to MultiLoad */
data work.testdata;
do x=1 to 50000;
output;
end;
run;
libname trlib teradata user=testuser pw=testpass server=boom;
/* Load 40,000 rows to the Teradata table */
data trlib.mlfloat(MultiLoad=yes ML_CHECKPOINT=1000 ML_LOG=MY_ERRORS);
set work.testdata(FIRSTOBS=10001);
run;
Assume that the DATA
step shown above failed with this error message:
ERROR: MultiLoad failed with DBS error 2644 after a checkpoint
was taken for 13000 records. Correct the error and restart the load as an
append process with data set option ML_LOG=MY_ERRORS. If the first run used
FIRSTOBS=n, use the value (7278+n-1) for FIRSTOBS in the restart.
Otherwise use FIRSTOBS=7278. Sometimes the FIRSTOBS value specified on the
restart can be an earlier position than the last checkpoint because MultiLoad
restart is block-oriented and not record-oriented.
The FIRSTOBS for the
restart step can be calculated using the formula provided—that
is, FIRSTOBS=7278+100001-1=17278. Use FIRSTOBS=17278 on the source
data.
proc append data=work.testdata(FIRSTOBS=17278)
base=trlib.mlfloat(MultiLoad=YES ML_LOG=MY_ERRORS ML_CHECKPOINT=1000);
run;
Please keep these considerations
in mind.
-
DBCOMMIT= is disabled for MultiLoad
in order to prevent any conflict with ML_CHECKPOINT=.
-
ERRLIMIT= is not available for
MultiLoad because the number of errors are known only at the end of
each load phase.
-
For restart to work correctly,
the data source must return data in the same order. If the order
of data that is read varies from one run to another and the load job
fails in the application phase, delete temporary tables and restart
the load as a new process. If the job fails in the application phase,
restart the job outside of SAS as usual. You do this because the data
that is needed to complete the load has already been transferred.
-
The restart capability in MultiLoad
is block-oriented, not record-oriented. For example, if a checkpoint
was taken at 5000 records, you might need to restart from an earlier
record, such as record 4000. You do this because the block of data
containing record 5001 might have started at record 4000. The exact
record where restart should occur is displayed in the SAS log.