To conditionally
select observations from a SAS data set, you can use either a WHERE
expression or a subsetting IF statement. While they both test a condition
to determine whether SAS should process an observation, they differ
as follows:
-
The subsetting IF statement can
be used only in a DATA step. A subsetting IF statement tests the condition
after an observation is read into the Program Data Vector (PDV). If
the condition is true, SAS continues processing the current observation.
Otherwise, the observation is discarded, and processing continues
with the next observation.
-
You can use a WHERE expression
in both a DATA step and SAS procedures, as well as in a windowing
environment, SCL programs, and as a data set option. A WHERE expression
tests the condition before an observation is read into the PDV. If
the condition is true, the observation is read into the PDV and processed. If
the condition is false, the observation is not read into the PDV,
and processing continues with the next observation, which can yield
substantial savings when observations contain many variables or very
long character variables (up to 32K bytes). In addition, a WHERE expression
can be optimized with an index, and the WHERE expression enables more
operators, such as LIKE and CONTAINS.
Note: Although it is generally
more efficient to use a WHERE expression and avoid the move to the
PDV before processing, if the data set contains observations with
very few variables, the move to the PDV could be cheap. However, one
variable containing 32K bytes of character data is not cheap, even
though it is only one variable.
In most cases, you can
use either method. However, the following table provides a list of
tasks that require you to use a specific method:
Tasks Requiring Either WHERE Expression or Subsetting IF Statement
|
|
Make the selection in
a procedure without using a preceding DATA step
|
|
Take advantage of the
efficiency available with an indexed data set
|
|
Use one of a group of
special operators, such as BETWEEN-AND, CONTAINS, IS MISSING or IS
NULL, LIKE, SAME-AND, and Sounds-Like
|
|
Base the selection on
anything other than a variable value that already exists in a SAS
data set. For example, you can select a value that is read from raw
data, or a value that is calculated or assigned during the course
of the DATA step
|
|
Make the selection at
some point during a DATA step rather than at the beginning
|
|
Execute the selection
conditionally
|
|