In-database
processing has several advantages over processing within SAS. These
advantages include increased security, reduced network traffic, and
the potential for faster processing. Increased security is possible
because sensitive data does not have to be extracted from the DBMS.
Faster processing is possible because data is manipulated locally,
on the DBMS, using high-speed secondary storage devices instead of
being transported across a relatively slow network connection, because
the DBMS might have more processing resources at its disposal, and
because the DBMS might be capable of optimizing a query for execution
in a highly parallel and scalable fashion.
When the DATA= input
data set is stored as a table or view in a database management system
(DBMS), the PROC SORT procedure can use in-database processing to
sort the data. In-database processing can provide the advantages of
faster processing and reduced data transfer between the database and
SAS software.
In-database processing
for PROC SORT now supports the following database management systems:
PROC SORT performs in-database
processing using SQL explicit pass-through. The pass-through facility
uses
SAS/ACCESS to connect to a DBMS and to send statements directly
to the DBMS for execution. This facility lets you use the SQL syntax
of your DBMS. For details, see "Pass-Through Facility for Relational
Databases" in
SAS/ACCESS for Relational Databases: Reference.
In the third maintenance
release for SAS 9.2, in-database processing is used by PROC SORT when
a combination of procedure and system options are properly set. When
system option SORTPGM=BEST, system option SQLGENERATION= is set to
cause in-database processing, and when the PROC SORT NODUPKEY option
is specified, PROC SORT generates a DBMS SQL query that sorts the
data. The sorted results can either remain as a new table within the
DBMS or can be returned to SAS. To view the SQL queries generated,
set the SASTRACE= option.
The SAS system option
SORTPGM= can also be used without setting the SQLGENERATION option
to instruct PROC SORT to use either the DBMS, SAS, or the HOST to
perform the sort. If SORTPGM=BEST is specified, then either the DBMS,
SAS, or HOST will perform the sort. The observation ordering that
is produced by PROC SORT will depend on whether the DBMS or SAS performs
the sorting.
If the DBMS performs
the sort, then the configuration and characteristics of the DBMS sorting
program will affect the resulting data order. The DBMS configuration
settings and characteristics that can affect data order include character
collation, ordering of NULL values, and sort stability. Most database
management systems do not guarantee sort stability, and the sort might
be performed by the DBMS regardless of the state of the SORTEQUALS/NOSORTEQUALS
system option and EQUALS/NOEQUALS procedure option.
If you set the SAS system
option SORTPGM= to SAS, then unordered data is delivered from the
DBMS to SAS and SAS performs the sorting. However, consistency in
the delivery order of observations from a DBMS is not guaranteed.
Therefore, even though SAS can perform a stable sort on the DBMS data,
SAS cannot guarantee that the ordering of observations within output
BY groups is the same from one PROC SORT execution to the next. To
achieve consistency in the ordering of observations within BY groups,
first populate a SAS data set with the DBMS data, and then use the
EQUALS or SORTEQUALS option to perform a stable sort.
In-database processing
is affected by the following circumstances:
-
When PROC SORT options,
SORTSEQ=, or DUPOUT=, are specified, no in-database processing occurs.
-
For in-database processing, the
OUT= procedure option must be specified and the output data set cannot
refer to the input table on the DBMS.
-
LIBNAME options and data set options
can also affect whether in-database processing occurs and what type
of query is generated. See "In-Database Procedures" in
SAS/ACCESS for Relational Databases: Reference for a complete list of these options. The user can
also set OPTIONS MSGLEVEL=I in SAS to see which options prevent or
affect in-database processing.