In-database processing
has several advantages over processing within SAS. These advantages
include increased security, reduced network traffic, and the potential
for faster processing. Increased security is possible because sensitive
data does not have to be extracted from the DBMS. Faster processing
is possible because data is manipulated locally, on the DBMS, using
high-speed secondary storage devices instead of being transported
across a relatively slow network connection, because the DBMS might
have more processing resources at its disposal, and because the DBMS
might be capable of optimizing a query for execution in a highly parallel
and scalable fashion.
When the DATA= input
data set is stored as a table or view in a database management system
(DBMS), the PROC TABULATE procedure can use in-database processing
to perform most of its work within the database. In-database processing
can provide the advantages of faster processing and reduced data transfer
between the database and SAS software.
In-database processing
for PROC TABULATE supports the following database management systems:
PROC TABULATE performs
in-database processing by using SQL implicit pass-through. The procedure
generates SQL queries that are based on the classifications and the
statistics that you specify in the TABLE statement. The database executes
these SQL queries to construct initial summary tables, which are then
transmitted to PROC TABULATE.
If class variables are
specified, the procedure creates an SQL GROUP BY clause that represents
the n-way type. Only the n-way class tree is generated on the DBMS.
The result set that is created when the aggregation query executes
in the database is read by SAS into the internal PROC TABULATE data
structure.
When SAS format definitions
have been deployed in the database, formatting of class variables
occurs in the database. If the SAS format definitions have not been
deployed in the database, the in-database aggregation occurs on the
raw values, and the relevant formats are applied by SAS as the results'
set is merged into the PROC TABULATE internal structures. Multi-label
formatting is always done by SAS using the initially aggregated result
set that is returned by the database.
The following statistics
are supported for in-database processing: N, NMISS, MIN, MAX, RANGE,
SUM, SUMWGT, CSS, USS, VAR, STD, STDERR, UCLM, LCLM, and CV.
The SQLGENERATION system
option or LIBNAME statement option controls whether and how in-database
procedures are run inside the database. By default, the in-database
procedures are run inside the database when possible. There are many
data set options that will prevent in-database processing. For a complete
listing, see “In-Database Procedures” in
SAS/ACCESS for Relational Databases: Reference.