To create a subset data set, specify the name of the subset data set in the DATA statement, bring in the full data set with a SET statement, and specify the subsetting criteria with either subsetting IF statements or WHERE statements.
For example, suppose you have a data set that contains time series observations for each of several states. The following DATA step uses a WHERE statement to exclude observations with dates before 1970 and uses a subsetting IF statement to select observations for the state NC:
data subset; set full; where date >= '1jan1970'd; if state = 'NC'; run;
In this case, it makes no difference logically whether the WHERE statement or the IF statement is used, and you can combine several conditions in one subsetting statement. The following statements produce the same results as the previous example:
data subset; set full; if date >= '1jan1970'd & state = 'NC'; run;
The WHERE statement acts on the input data sets specified in the SET statement before observations are processed by the DATA step program, whereas the IF statement is executed as part of the DATA step program. If the input data set is indexed, using the WHERE statement can be more efficient than using the IF statement. However, the WHERE statement can refer only to variables in the input data set, not to variables computed by the DATA step program.
To subset the variables of a data set, use KEEP or DROP statements or use KEEP= or DROP= data set options. See SAS Language Reference: Dictionary for information about KEEP and DROP statements and SAS data set options.
For example, suppose you want to subset the data set as in the preceding example, but you want to include in the subset data set only the variables DATE, X, and Y. You could use the following statements:
data subset; set full; if date >= '1jan1970'd & state = 'NC'; keep date x y; run;