Indexing
a SAS data set can significantly improve the performance of WHERE
processing. An index is an optional file that you can create for SAS
data files in order to provide direct access to specific observations.
Processing a WHERE expression
without an index requires SAS to sequentially read observations in
order to find the ones that match the selection criteria. Without
an index, SAS first checks for the sort indicator, which is stored
with the data file from a previous SORT procedure or SORTEDBY= data
set option. If the sort indicator is validated, SAS takes advantage
of it and stops reading the file once it is clear there are no more
values that satisfy the WHERE expression. For example, consider a
data set that is sorted by Age, without an index. To process the expression
where age le 25
, SAS stops reading observations after
it finds an observation that is greater than 25. Note that while SAS
can determine when to stop reading observations, without an index,
there is no indication where to begin, so SAS always begins with the
first observation, which can require reading a lot of observations.
Having an index enables
SAS to determine which observations satisfy the criteria, which is
referred to as optimizing the WHERE expression. However, by default,
SAS decides whether to use the index or read the entire data set sequentially.
For details about how SAS uses an index to process a WHERE expression, see
Using an Index for WHERE Processing.
In addition to creating
indexes for the data set, here are some guidelines for writing efficient
WHERE expressions:
Constructing Efficient WHERE Expressions
|
|
|
Avoid using the LIKE
operator that begins with % or _.
|
where
country like 'A%INA';
|
where
country like '%INA';
|
Avoid using arithmetic
expressions.
|
|
|