Bulk Loading for DB2 under z/OS

Overview

By default, the DB2 under z/OS interface loads data into tables by preparing an SQL INSERT statement, executing the INSERT statement for each row, and issuing a COMMIT statement. You must specify BULKLOAD=YES to start the DB2 LOAD utility. You can then bulk-load rows of data as a single unit, which can significantly enhance performance. For smaller tables, the extra overhead of the bulk-loading process might slow performance. For larger tables, the speed of the bulk-loading process outweighs the overhead costs. DB2 under z/OS bulk-loading examples are available.
When you use bulk load, see the SYSPRINT output for information about the load. If you run the LOAD utility and it fails, ignore the messages in the SAS log because they might be inaccurate. However, if errors existed before you ran the LOAD utility, error messages in the SAS log might be valid.
SAS/ACCESS Interface to DB2 under z/OS provides bulk loading through DSNUTILS, an IBM stored procedure that start the DB2 LOAD utility. DSNUTILS is included in DB2 Version 6 and later, and it is available for DB2 Version 5 in a maintenance release. Because the LOAD utility is complex, familiarize yourself with it before you use it through SAS/ACCESS. Also check with your database administrator to determine whether this utility is available.

Data Set Options for Bulk Loading

Below are the DB2 under z/OSbulk-load data set options. All begin with BL_ for bulk load. To use the bulk-load facility, you must specify BULKLOAD=YES or all bulk-load options are ignored. (The DB2 under z/OS interface alias for BULKLOAD= is DB2LDUTIL=.)

File Allocation and Naming for Bulk Loading

When you use bulk loading, these files (data sets) are allocated.
  • The DB2 DSNUTILS procedure allocates these as new and catalogs the SysDisc, SysMap, and SysErr files unless BL_DB2LDEXT=USERUN. If BL_DB2LDEXT=USERUN, data sets are allocated as old and are kept.
  • The DB2 interface engine allocates as new and catalogs the files SysIn and SysRec when the execution method specifies to generate them.
  • The DB2 interface engine allocates as new and catalogs the file SysPrint when the execution method specifies to run the utility.
All allocations of these data sets are reversed by the end of the step. If errors occur before SysRec is generated, any of these data sets that were allocated as new and cataloged are deleted as part of cleanup because they would be empty.
The interface engine uses these options when it allocates nonexisting SYS data set names.
  • DSNUTILS uses BL_DB2DEVT_PERM= and BL_DB2SPC_PERM= for SysDisc, SysMap, and SysErr.
  • The DB2 interface engine uses BL_DB2DEVT_PERM= for SysIn, SysRec, and SysPrint.
  • SysRec uses BL_DB2RECSPC=. BL_DB2RECSPC= is necessary because the engine cannot determine how much space the SysRec requires—it depends on the volume of data being loaded into the table.
  • DSNUTILs uses BL_DB2DEVT_TEMP= and BL_DB2SPC_TEMP= to allocate the other data set names that the LOAD utility requires.
This table shows how SysIn and SysRec are allocated based on the values of BL_DB2LDEXT= and BL_DB2IN=, and BL_DB2REC=.
SysIn and SysRec Allocation
BL_DB2LDEXT=
BL_DB2IN=/ BL_DB2REC=
Data set name
DISPOSITION
GENRUN
not specified
generated
NEW, CATALOG, DELETE
GENRUN
specified
specified
NEW, CATALOG, DELETE
GENONLY
not specified
generated
NEW, CATALOG, DELETE
GENONLY
specified
specified
NEW, CATALOG, DELETE
USERUN
not specified
ERROR
USERUN
specified
specified
OLD, KEEP, KEEP
When SAS/ACCESS Interface to DB2 under z/OS uses existing files, you must specify the filenames. When the interface generates the files, it creates them with names that you provide or with unique names that it generates. Engine-generated filenames use system-generated data set names with the format SYSyyddd.Thhmmss.RA000.jobname.name.Hgg, where
SYSyyddd
is replaced by the user ID. The user ID that is used to prequalify these generated data set names is determined the same as within the rest of SAS, except when running in a server environment, where the authenticated ID of the client is used.
name
is replaced by the given SYS ddname of the data set.
For example, if you do not specify any data set names and run GENRUN under TSO, you obtain a set of files allocated with names such as
USERID.T125547.RA000.USERID.DB2DISC.H01
USERID.T125547.RA000.USERID.DB2ERR.H01
USERID.T125547.RA000.USERID.DB2IN.H01
USERID.T125547.RA000.USERID.DB2MAP.H01
USERID.T125547.RA000.USERID.DB2PRINT.H01
USERID.T125547.RA000.USERID.DB2REC.H01
This naming convention produces unique names, even within a sysplex (within one second per user ID per system). It therefore makes it easy to associate all information for each utility execution and separate it from other executions.
Bulk-load files are removed at the end of the load process to save space. They are not removed if the utility fails to allow for the load process to be restarted.

Examples

Use these LIBNAME statements for all examples.
libname db2lib db2;
libname shlib db2 connection=shared;
Create a table.
data db2lib.table1 (bulkload=yes);
   x=1;
   name='Tom';
run;
Append Table1 to itself.
data shlib.table1
   (bulkload=yes bl_db2tblxst=yes bl_db2ldct1='RESUME YES');
   set shlib.table1;
run;
Replace Table1 with itself.
data shlib.table1
   (bulkload=yes bl_db2tblxst=yes bd_db2ldct1='REPLACE');
   set shlib.table1;
run;
Load DB2 tables directly from other objects.
data db2lib.emp (bulkload=yes);
   bl_db2ldct1='replace log no nocopypend' 
   bl_db2cursor='select * from dsn8710.emp');
   set db2lib.emp (obs=0);
run;
You can also use this option in a PROC SQL statement to load DB2 tables directly from other objects, as shown below.
options sastrace=',,,d';
libname db2lib db2 authid=dsn8710;
libname mylib db2;

proc datasets library=mylib;
  delete emp;run;

proc sql;
     connect to db2;
     create table mylib.emp
            (BULKLOAD=YES
             BL_DB2LDCT1='REPLACE LOG NO NOCOPYPEND'
             BL_DB2CURSOR='SELECT FIRSTNAME, LASTNAME, WORKDEPT,
                                  HIREDATE, JOB, SALARY, BONUS, COMM
                           FROM DSN8710.EMP')
     as select firstname, lastname, workdept,
               hiredate, job, salary, bonus, comm
        from db2lib.emp (obs=0);
quit;
Here is another similar example.
options sastrace=',,,d';
libname db2lib db2 authid=dsn8710;
libname mylib db2;

proc datasets library=mylib;
  delete emp;run;

proc sql;
     connect to db2;
     create table mylib.emp
            (BULKLOAD=YES
             BL_DB2LDCT1='REPLACE LOG NO NOCOPYPEND'
             BL_DB2CURSOR='SELECT FIRSTNAME, LASTNAME, WORKDEPT,
                                  HIREDATE, JOB, SALARY, BONUS, COMM
                           FROM DSN8710.EMP'
             BL_DB2LDCT3='RUNSTATS TABLESPACE DSNDB04.TEMPTTABL
                          TABLE(ALL) INDEX(ALL) REPORT YES')
     as select firstname, lastname, workdept,
               hiredate, job, salary, bonus, comm
        from db2lib.emp (obs=0);
quit;
Generate control and data files, create the table, but do not run the utility to load it.
data shlib.table2 (bulkload=yes
   bl_db2ldext=genonly bl_db2in='userid.sysin' bl_db2rec='userid.sysrec');
   set shlib.table1;
run;
Use the control and data files that you generated in the preceding example load the table. The OBS=1 data set option on the input file prevents the DATA step from reading the whole file. Because the data is really in SysRec, you need only the input file to satisfy the engine.
data db2lib.table2 (bulkload=yes bl_db2tblxst=yes
     bl_db2ldext=userun bl_db2in='userid.sysin' bl_db2rec='userid.sysrec');
   set db2lib.table1 (obs=1);
run; 
A more efficient approach than the previous example is to eliminate going to DB2 to read even one observation from the input table. This also means that the DATA step processes only one observation, without any input I/O. Note that the one variable V is not on the table. Any variables listed here (there is no need for more than one), are irrelevant because the table already exists; they are not used.
data db2lib.table2 (bulkload=yes bl_db2tblxst=yes
     bl_db2ldext=userun bl_db2in='userid.sysin' bl_db2rec='userid.sysrec');
   v=0;
run;
Generate control and data files, but do not create the table or run the utility. Setting BL_DB2TBLXST=YES when the table does not exist prevents you from creating the table; this only makes sense because you are not going to load any data into the table at this time.
data db2lib.table3 (bulkload=yes bl_db2tblxst=yes
     bl_db2ldext=genonly bl_db2in='userid.sysin' bl_db2rec='userid.sysrec');
   set db2lib.table1;
run;
Use the control and data files that you generated in the preceding example to load the table. The OBS=1 data set option on the input file prevents the DATA step from reading the whole file. In this case, you must specify the input file because it contains the column definitions that are necessary to create the table.
data shlib.table3 (bulkload=yes bl_db2ldext=userun
     bl_db2in='userid.sysin' bl_db2rec='userid.sysrec');
   set shlib.table1 (obs=1);
run;
If you know the column names, a more efficient approach than the previous example is to eliminate going to DB2 to obtain the column definitions. In this case, the variable names and data types must match, because they are used to create the table. However, the values specified for the variables are not included on the table, because all data to load comes from the existing SysRec.
data db2lib.table3 (bulkload=yes bl_db2ldext=userun
     bl_db2in='userid.sysin' bl_db2rec='userid.sysrec');
   x=0;
   name='???';
run;
You can use other applications that do output processing.
data work.a;
   x=1;
run;

proc sql;
   create db2lib.table4 (bulkload=yes) as select * from a;
quit;