|
|
Use the improved SAS®9
sort algorithm
|
SAS®9 includes
a rewritten SORT algorithm that incorporates threading and data latency
reduction algorithms. The SAS®9 sort uses multiple threads and
outperforms a SAS 8 sort in almost all circumstances.
|
|
Perform the following
steps:
-
-
Drop unnecessary columns.
-
|
Direct sort utility
files to fast storage devices
|
Use the WORK invocation
option, the UTILLOC invocation option, or both options to direct SORT
procedure utility files to fast, less-utilized storage devices. Some
procedure utility files are accessed heavily, and separating them
from other active files might improve performance.
|
Distribute sort utility
files across multiple devices
|
Distribute SORT procedure
utility files across multiple fast, less-utilized devices. Direct
the SORT procedure utility file of each job to a different device.
Use the WORK invocation option, the UTILLOC invocation option, or
both options.
|
Pre-sort explicitly
on the most common sort key
|
SAS Data Integration
Studio might arrange a table in sort order, one or multiple times.
For large tables in which sort order is required multiple times, look
for a common sort order. Use the MSGLEVEL=I option to expose information
that is in the SAS log to determine where sorts occur.
|
Change the default SORTSIZE
value
|
For large tables, set
SORTSIZE to 256 MB or 512 MB. For extremely large tables (a billion
or more wide rows), set SORTSIZE to 1 GB or higher. Tune these recommended
values further based on empirical testing or based on in-depth knowledge
of your hardware and operating system.
|
Change the default MEMSIZE
value
|
Set MEMSIZE at least
50% larger than SORTSIZE.
|
Set the NOSORTEQUALS
system option
|
In an ETL process flow,
maintaining relative row order is rarely a requirement. If maintaining
the relative order of rows with identical key values is not important,
set the system option NOSORTEQUALS to save resources.
|
Set the UBUFNO option
to the maximum of 20
|
The UBUFNO option specifies
the number of utility I/O buffers. In some cases, maximizing UBUFNO
increases sort performance up to 10%. Increasing UBUFNO has no negative
ramifications.
|
Use the TAGSORT option
for nearly sorted data
|
TAGSORT is an alternative
SAS 8 sort algorithm that is useful for data that is almost in sort
order. The option is most effective when the sort-key width is no
more than 5 percent of the total uncompressed column width. Using
the TAGSORT option on a large unsorted data set results in extremely
long sort times compared to a SAS®9 sort that uses multiple threads.
|
Use relational database
sort engines to pre-sort tables without data order issues
|
Pre-sorting in relational
databases might outperform sorting that is based on SAS. Use options
of the SAS Data Integration Studio Extract transformation to generate
an ORDER BY clause in the SAS SQL. The ORDER BY clause asks the relational
database to return the rows in that particular sorted order.
|
Determine disk space
requirements to complete a sort
|
Size the following sort
data components:
-
-
SORT procedure utility file
-
|
|
Because sorting is so
I/O intensive, it is important to start with only the rows and columns
that are needed for the sort. The SORT procedure WORK files and the
output file are dependent on the input file size.
|
Size SORT procedure
utility files
|
Consider a number of
factors to size the SORT procedure utility files:
-
sizing information of the input
data
-
any pad bytes added to character
columns
-
any pad bytes added to short numeric
columns
-
pad bytes that align each row by
8 bytes (for SAS data sets)
-
8 bytes per row overhead for EQUALS
processing
-
per-page unused space in the SORT
procedure utility files
-
multi-pass merge: doubling of SORT
procedure utility files (or sort failure)
|
|
To size the output data,
apply the sizing rules of the destination data store to the columns
that are produced by the sort.
|