When large data sets
are stored in an external database, the transfer of the data sets
to computers that run SAS can be impacted by performance, security,
and resource management issues. SAS in-database processing can greatly
reduce data transfer by having the database perform the initial data
aggregation.
In-database processing
for PROC MEANS supports the following database management systems:
Under the correct conditions,
PROC MEANS generates an SQL query based on the statements that are
used and the output statistics that are specified in the PROC step.
If class variables are specified, the procedure creates an SQL GROUP
BY clause that represents the n-way type. The result set that is created
when the aggregation query executes in the database is read by SAS
into the internal PROC MEANS data structure, and all subsequent types
are derived from the original n-way type to form the final analysis
results. When SAS format definitions have been deployed in the database,
formatting of class variables occurs in the database. If the SAS format
definitions have not been deployed in the database, the in-database
aggregation occurs on the raw values, and the relevant formats are
applied by SAS as the results' set is merged into the PROC MEANS internal
structures. Multi-label formatting is always done by SAS using the
initially aggregated result set that is returned by the database.
The CLASS, TYPES, WAYS, VAR, BY, FORMAT, and WHERE statements are
supported when PROC MEANS is processed inside the database. FREQ,
ID, IDMIN, IDMAX, and IDGROUPS are not supported. The following statistics
are supported for in-database processing: N, NMISS, MIN, MAX, RANGE,
SUM, SUMWGT, MEAN, CSS, USS, VAR, STD, STDERR, PRET, UCLM, LCLM, CLM,
and CV.
Weighting for in-database
processing is supported only for N, NMISS, MIN, MAX, RANGE, SUM, SUMWGT,
and MEAN.
The following statistics
are currently not supported for in-database processing: SKEW, KURT,
P1, P5, P10, P20, P25/Q1, P30, P40, P50/MEDIAN, P60, P70, P75/Q3,
P80, P90, P95, P99, and MODE.
The SQLGENERATION system
option or LIBNAME statement option controls whether and how in-database
procedures are run inside the database. By default, the in-database
procedures are run inside the database when possible. There are many
data set options that will prevent in-database processing: OBS=, FIRSTOBS=,
RENAME=, and DBCONDITION=. For a complete listing, refer to “In-Database
Procedures in Teradata” in
SAS/ACCESS for Relational Databases: Reference.
In-database processing
can greatly reduce the volume of data transferred to the procedure
if there are no class variables (one row is returned) or if the selected
class variables have a small number of unique values. However, because
PROC MEANS loads the result set into its internal structures, the
memory requirements for the SAS process will be equivalent to what
would have been required without in-database processing. The CPU requirements
for the SAS process should be significantly reduced if the bulk of
the data summarization occurs inside the database. The real time required
for summarization should be significantly reduced because many database-process
queries are in parallel.
For more information
about database processing, see
SAS/ACCESS for Relational Databases: Reference.