Searching through a large data set for observations that satisfy some complex criteria can take a long time. You can reduce this search time by indexing the data set. The INDEX statement builds a special companion file that contains the values and record numbers of the indexed variables. After the index is built, queries that use a WHERE clauses might use the index to make the processing more efficient. Any number of variables can be indexed, but only one index is in use at a given time.
If you sort a data set in place or use the PURGE statement to delete observations, indices for the data set are deleted.
After you index a data set, the SAS/IML language has the option to use the index when a search is conducted with respect to the indexed variables. The indexes are updated automatically whenever you change values in indexed variables. When an index is in use, observations cannot be randomly accessed by their physical location numbers. In other words, you cannot use the POINT clause when an index is in effect.
To see how the INDEX statement works, make a copy of the Sashelp.Class
data set, as follows:
data Class; set Sashelp.Class; run;
If you want a list of all female students in the Class
data set, you can first index the data by the Sex
variable:
proc iml; use Class; index Sex;
If you subsequently submit a WHERE clause that uses the Sex
variable, the index is used. Of course, the Class
data set is small, so you will not notice any performance improvement for these data. However, for large data sets indexing
can improve performance.
Now list all students by using the following statement:
list all;
Figure 7.23: Indexed Observations
OBS Name Sex Age Height Weight ------ -------- --- --------- --------- --------- 2 Alice F 13.0000 56.5000 84.0000 3 Barbara F 13.0000 65.3000 98.0000 4 Carol F 14.0000 62.8000 102.5000 7 Jane F 12.0000 59.8000 84.5000 8 Janet F 15.0000 62.5000 112.5000 11 Joyce F 11.0000 51.3000 50.5000 12 Judy F 14.0000 64.3000 90.0000 13 Louise F 12.0000 56.3000 77.0000 14 Mary F 15.0000 66.5000 112.0000 1 Alfred M 14.0000 69.0000 112.5000 5 Henry M 14.0000 63.5000 102.5000 6 James M 12.0000 57.3000 83.0000 9 Jeffrey M 13.0000 62.5000 84.0000 10 John M 12.0000 59.0000 99.5000 15 Philip M 16.0000 72.0000 150.0000 16 Robert M 12.0000 64.8000 128.0000 17 Ronald M 15.0000 67.0000 133.0000 18 Thomas M 11.0000 57.5000 85.0000 19 William M 15.0000 66.5000 112.0000 |
Notice that the indexed observations are sorted by Sex
rather than by the OBS
number. Retrievals that use the Sex
variable are quicker than retrievals of data that are not indexed.