An index is an optional file that you can create to
provide direct access to specific rows. The index stores values in
ascending value order for a specific column or columns and includes
information about the location of those values within rows in the
table. In other words, an index enables you to locate a row by value.
For example, if you use SAS to find a specific Social Security number
(123-45-6789), SAS performs the search differently depending on whether
there is an index on the row that contains the Social Security numbers:
-
Without an index, SAS accesses
rows sequentially in the order in which they are stored in the table.
SAS reads each row, looking for SSN=123-45-6789 until the value is
found, or all observations are read.
-
With an index on column SSN, SAS
accesses the row directly. SAS satisfies the condition by using the
index and going straight to the row that contains the value. SAS does
not have to read each row.
When you create an index,
you designate which columns to index. You can create two types of
indexes:
-
a simple index, which consists
of the values of one column
-
a composite index, which consists
of the values of more than one column, with the values concatenated
to form a single value
For each indexed column,
you can also perform these tasks:
-
declare unique values. A unique
index guarantees that values for one column or the combination of
a composite group of columns remain unique for every row in the table.
If an update tries to add a duplicate value to that column, then the
update is rejected.
-
keep missing values from using
space in the index by specifying that missing values are not maintained
by the index.
In addition to writing
SAS code to create indexes, you can create indexes on target tables
by using SAS Data Integration Studio. In SAS Data Integration Studio,
you use the properties window for the table to index individual columns.
When you create the index, you can also specify
Unique
values and
No missing values.
Note that any indexes registered in metadata for a target table are
physically created when the job is run. Simply editing the properties
for an existing table and adding indexes does not update the physical
table. The following figure shows the SAS Data Integration Studio
properties dialog box for a table:
In general, SAS can
use an index to improve performance in these situations:
-
For cube loading, a composite index
on the columns that make up the cube's hierarchies might provide best
results.
-
For WHERE processing, an index
can provide faster and more efficient access to a subset of data.
Note that to process a WHERE expression, SAS decides whether to use
an index, or to read the table sequentially.
Note: For WHERE processing, the
Base SAS engine uses a maximum of one index. The SPD Engine can use
multiple indexes.
Even though an index
can reduce the time that is required to locate a set of rows, especially
for a large table, there are costs that are associated with creating,
storing, and maintaining the index. When deciding whether to create
an index, you must consider increased resource usage, along with the
performance improvement.
Once an index exists,
SAS treats it as part of the table. That is, if you add or delete
columns or modify values, the index is automatically updated.
For more information
about indexes, see
SAS Language Reference: Concepts.