MULTILOAD= Data Set Option

Specifies whether Teradata Insert and Append operations should use the Teradata MultiLoad utility.
Valid in: DATA and PROC steps (when creating and appending to DBMS tables using SAS/ACCESS software)
Default: NO
Restriction: You cannot use ML_LOG with any of these options: ML_ERROR1, ML_ERROR2, ML_WORK, and ML_RESTART.
Data source: Teradata
See: BUFFERS= data set option, BULKLOAD= LIBNAME option, BULKLOAD= data set option, DBCOMMIT= LIBNAME option, DBCOMMIT= data set option, FASTEXPORT= LIBNAME option, Maximizing Teradata Load Performance , MBUFSIZE= data set option, ML_CHECKPOINT= data set option, ML_ERROR1= data set option, ML_ERROR2= data set option, ML_LOG= data set option, ML_RESTART= data set option, ML_WORK= data set option, QUERY_BAND= LIBNAME option, QUERY_BAND= data set option, SLEEP= data set option, , TENACITY= data set option, Using MultiLoad

Syntax

MULTILOAD=YES | NO

Syntax Description

YES
uses the Teradata MultiLoad utility, if available, to load Teradata tables.
NO
sends inserts to Teradata tables one row at a time.

Details

Bulk Loading

The SAS/ACCESS MultiLoad facility provides a bulk-loading method of loading both empty and existing Teradata tables. Unlike FastLoad, MultiLoad can append data to existing tables.
To determine whether threaded Reads are actually generated, turn on SAS tracing by setting OPTIONS SASTRACE=”,,,d” in your program.

Data Buffers

Two data set options are available for tuning the number and the size of data buffers that are used for transferring data from SAS to Teradata. Data is transferred from SAS to Teradata using shared memory. The default shared memory buffer size is 64K. The default number of shared memory buffers used for the transfer is 2. You can use BUFFERS= to vary the number of buffers for data transfer from 1 to 8. You can use MBUFSIZE= to vary the size of the shared memory buffers from the size of each data row up to 1MB.

Temporary Tables

The Teradata MultiLoad utility uses four different temporary tables when it performs the bulk-load operation. It uses a log table to track restart information, two error tables to track errors, and a work table to hold data before the Insert operation is made.
By default, the SAS/ACCESS MultiLoad facility generates names for these temporary tables, where randnum represents a random number. To specify a different name for these tables, use ML_RESTART=, ML_ERROR1=, ML_ERROR2=, and ML_WORK=, respectively.
Temporary Table
Table Name
Restart table
SAS_ML_RS_randnum
Acquisition error table
SAS_ML_ET_randnum
Application error table
SAS_ML_UT_randnum
Work table
SAS_ML_WT_randnum
You can use ML_LOG= to specify a prefix for the temporary table names that MultiLoad uses.
Here is the order that is used for naming the temporary tables that MultiLoad uses:
  1. If you set ML_LOG=, the prefix that you specified is used when naming temporary tables for MultiLoad.
  2. If you do not specify ML_LOG=, the values that you specified for ML_ERROR1, ML_ERROR2, ML_WORK, ML_RESTART are used.
  3. If you do not specify any table naming options, temporary table names are generated by default.

Restarting MultiLoad

The MultiLoad bulk-load operation (or MultiLoad job) works in phases. The first is the acquisition phase, during which data is transferred from SAS to Teradata work tables. The second is the application phase, during which data is applied to the target table.
If the MultiLoad job fails during the acquisition phase, you can restart the job from the last successful checkpoint. The exact observation from which the MultiLoad job must be restarted is displayed in the SAS log. If the MultiLoad job fails in the application phase—when data is loaded onto the target tables from the work table—restart the MultiLoad job outside of SAS. The MultiLoad restart script is displayed in the SAS log. You can run the generated MultiLoad script outside of SAS to complete the load.
You can use ML_CHECKPOINT= to specify the checkpoint rate. Specify a value for ML_CHECKPOINT= if you want restart capability. If checkpoint tracking is not used and the MultiLoad fails in the acquisition phase, the load needs to be restarted from the beginning. ML_CHECKPOINT=0 is the default, and no checkpoints are recoded if you use the default.
If ML_CHECKPOINT is between 1 and 59 inclusive, checkpoints are recorded at the specified interval in minutes. If ML_CHECKPOINT is greater than or equal to 60, a checkpoint occurs after a multiple of the specified rows are loaded.
ML_CHECKPOINT= functions very much like the Teradata MultiLoad utility checkpoint, but it differs from the DBCOMMIT= data set option.
These restrictions apply when you restart a failed MultiLoad job.
  • The failed MultiLoad job must have specified a checkpoint rate other than 0 using the ML_CHECKPOINT= data set option. Otherwise, restarting begins from the first record of the source data.
    Checkpoints are relevant only to the acquisition phase of MultiLoad. Even if ML_CHECKPOINT=0 is specified, a checkpoint takes place at the end of the acquisition phase. If the job fails after that (in the application phase) you must restart the job outside of SAS using the MultiLoad script written to the SAS log.
    For example, this MultiLoad job takes a checkpoint every 1000 records.
    libname trlib teradata user=testuser pw=XXXXXX server=dbc;
    /* Create data to MultiLoad */
    data work.testdata;
         do x=1 to 50000;
            output;
         end;
    end;
    data trlib.mlfloat(MultiLoad=yes ML_CHECKPOINT=1000);
    set work.testdata;
    run;
  • You must restart the failed MultiLoad job as an append process because the target table already exists. It is also necessary to identify the work tables, restart table, and the error tables used in the original job.
    For example, assume that the DATA step shown above failed with this error message in the SAS log:
    ERROR: MultiLoad failed with DBS error 2644 after a checkpoint was 
    taken for 13000 records.  Correct error and restart as an append 
    process with data set options ML_RESTART=SAS_ML_RS_1436199780, 
    ML_ERROR1=SAS_ML_ET_1436199780, ML_ERROR2=SAS_ML_UT_1436199780, 
    and ML_WORK=SAS_ML_WT_1436199780.  If the first run used FIRSTOBS=n, 
    use the value (7278+n-1) for FIRSTOBS in the restart.  Otherwise, 
    use FIRSTOBS=7278.  Sometimes the FIRSTOBS value that is used 
    on the restart can be an earlier position than the last checkpoint 
    because restart is block-oriented and not record-oriented.
    After you fix the error, you must restart the job as an append process. You must also specify the same work, error, and restart tables as you used in the earlier run. You use a FIRSTOBS= value on the source table to specify the record from which to restart.
    /* Restart a MultiLoad job that failed in the acquisition phase 
    after correcting the error */
    proc append data=work.testdata(FIRSTOBS=7278)
      base=trmlib.mlfloat(MultiLoad=YES ML_RESTART=SAS_ML_RS_1436199780
        ML_ERROR1=SAS_ML_ET_1436199780 ML_ERROR2=SAS_ML_UT_1436199780
        ML_WORK=SAS_ML_WT_1436199780 ML_CHECKPOINT=1000);
    run;
  • If you used ML_LOG= in the run that failed, you can specify the same value for ML_LOG= on restart. Therefore, you need not specify four data set options to identify the temporary tables that MultiLoad uses.
    For example, assume that this is how the original run used ML_LOG=:
    data trlib.mlfloat(MultiLoad=yes ML_CHECKPOINT=1000 ML_LOG=MY_ERRORS);
      set work.testdata;
    run; 
    If this DATA step fails with this error, the restart capability needs only ML_LOG= to identify all necessary tables.
    ERROR:  MultiLoad failed with DBS error 2644 after a 
    checkpoint was taken for 13000 records.  Correct error and restart 
    as an append process with data set options ML_RESTART=SAS_ML_RS_1436199780, 
    ML_ERROR1=SAS_ML_ET_1436199780, ML_ERROR2=SAS_ML_UT_1436199780, and 
    ML_WORK=SAS_ML_WT_1436199780.  If the first run used FIRSTOBS=n, use 
    the value (7278+n-1) for FIRSTOBS in the restart.  Otherwise use 
    FIRSTOBS=7278.  Sometimes the FIRSTOBS value used on the restart can be an 
    earlier position than the last checkpoint because restart is block-oriented 
    and not record-oriented.
    proc append data=work.testdata(FIRSTOBS=7278)
       base=trlib.mlfloat(MultiLoad=YES ML_LOG=MY_ERRORS ML_CHECKPOINT=1000);
    run; 
  • If the MultiLoad process fails in the application phase, SAS has already transferred all data to be loaded to Teradata. You must restart a MultiLoad job outside of SAS using the script that is written to the SAS log. See your Teradata documentation on the MultiLoad utility for instructions on how to run MultiLoad scripts. Here is an example of a script that is written in the SAS log.
    =-=-= MultiLoad restart script starts here =-=-=
    .LOGTABLE MY_ERRORS_RS;
    .LOGON boom/mloaduser,********;
    .begin import mload tables "mlfloat" CHECKPOINT 0 WORKTABLES
       MY_ERRORS_WT ERRORTABLES
       MY_ERRORS_ET MY_ERRORS_UT
    /*TIFY HIGH EXIT SASMLNE.DLL TEXT '2180*/;
    .layout saslayout indicators;
    .FIELD "x" *  FLOAT;
    .DML Label SASDML;
    insert into "mlfloat".*;
    .IMPORT INFILE DUMMY
    /*SMOD SASMLAM.DLL '2180 2180 2180 */
    FORMAT UNFORMAT LAYOUT SASLAYOUT
    APPLY SASDML;
    .END MLOAD;
    .LOGOFF;
    =-=-=  MultiLoad restart script ends here  =-=-=
    ERROR:  MultiLoad failed with DBS error 2644 in the application 
    phase.  Run the MultiLoad restart script listed above outside of 
    SAS to restart the job. 
  • If the original run used a value for FIRSTOBS= for the source data, use the formula from the SAS log error message to calculate the value for FIRSTOBS= upon restart. These examples show how to do this.
    /* Create data to MultiLoad */
    data work.testdata;
         do x=1 to 50000;
            output;
         end;
    run;
    libname trlib teradata user=testuser pw=testpass server=boom;
    /* Load 40,000 rows to the Teradata table */
    data trlib.mlfloat(MultiLoad=yes ML_CHECKPOINT=1000 ML_LOG=MY_ERRORS);
    set work.testdata(FIRSTOBS=10001);
    run;
    Assume that the DATA step shown above failed with this error message:
    ERROR: MultiLoad failed with DBS error 2644 after a checkpoint 
    was taken for 13000 records.  Correct the error and restart the load as an 
    append process with data set option ML_LOG=MY_ERRORS.  If the first run used 
    FIRSTOBS=n, use the value (7278+n-1) for FIRSTOBS in the restart.  
    Otherwise use FIRSTOBS=7278.  Sometimes the FIRSTOBS value specified on the 
    restart can be an earlier position than the last checkpoint because MultiLoad
    restart is block-oriented and not record-oriented.
    The FIRSTOBS for the restart step can be calculated using the formula provided—that is, FIRSTOBS=7278+100001-1=17278. Use FIRSTOBS=17278 on the source data.
    proc append data=work.testdata(FIRSTOBS=17278)
       base=trlib.mlfloat(MultiLoad=YES ML_LOG=MY_ERRORS ML_CHECKPOINT=1000);
    run;
    Please keep these considerations in mind.
    • DBCOMMIT= is disabled for MultiLoad in order to prevent any conflict with ML_CHECKPOINT=.
    • ERRLIMIT= is not available for MultiLoad because the number of errors are known only at the end of each load phase.
    • For restart to work correctly, the data source must return data in the same order. If the order of data that is read varies from one run to another and the load job fails in the application phase, delete temporary tables and restart the load as a new process. If the job fails in the application phase, restart the job outside of SAS as usual. You do this because the data that is needed to complete the load has already been transferred.
    • The restart capability in MultiLoad is block-oriented, not record-oriented. For example, if a checkpoint was taken at 5000 records, you might need to restart from an earlier record, such as record 4000. You do this because the block of data containing record 5001 might have started at record 4000. The exact record where restart should occur is displayed in the SAS log.

Examples

Example 1: Load SAS Data to an Alternate Database

This example uses MultiLoad to load SAS data to an alternate database. It specifies database=mloaduser in the LIBNAME statement.
libname trlib teradata user=testuser pw=testpass server=dbc database=mloaduser;
/*MultiLoad 20000 observations into alternate database mloaduser */
data trlib.trmload14(DBCREATE_TABLE_OPTS="PRIMARY INDEX(IDNUM)" MultiLoad=yes
   ML_LOG=TRMLOAD14  ML_CHECKPOINT=5000);
 set permdata.BIG1MIL(drop=year obs=20000);
run;

Example 2: Extract Data from One Table to Another

This example extracts data from one table using FastExport and loads data into another table using MultiLoad.
libname trlib teradata user=testuser pw=testpass server=dbc;
/*  Create data to load  */
data trlib.trodd(DBCREATE_TABLE_OPTS="PRIMARY INDEX(IDNUM)" MultiLoad=yes);
  set permdata.BIG1MIL(drop=year obs=10000);
where mod(IDNUM,2)=1;
run;
/*  FastExport from one table and MultiLoad into another  */
proc append data=trlib.treven(dbsliceparm=all)  base=trlib.trall(MultiLOAD=YES);
run;