The DATASOURCE Procedure

OUT= Data Set

The OUT= data set can contain the following variables:

the BY variables, which identify cross-sectional dimensions when the input data file contains time series replicated for different values of the BY variables. Use the BY variables in a WHERE statement to process the OUT= data set by cross sections. The order in which BY variables are defined in the OUT= data set corresponds to the order in which the data file is sorted.
DATE, a SAS date-, time-, or datetime-valued variable that reports the time period of each observation. The values of the DATE variable may span different time ranges for different BY groups. The format of the DATE variable depends on the INTERVAL= option.
the periodic time series variables, which are included in the OUT= data set only if they have data in at least one selected BY group and they are not discarded by a KEEP or DROP statement
the event variables, which are included in the OUT= data set if they are not discarded by a KEEP or DROP statement. By default, these variables are not output to OUT= data set.

The values of BY variables remain constant in each cross section. Observations within each BY group correspond to the sampling of the series variables at the time periods indicated by the DATE variable.

You can create a set of single indexes for the OUT= data set by using the INDEX option, provided there are BY variables. Under some circumstances, this may increase the efficiency of subsequent PROC and DATA steps that use BY and WHERE statements. However, there is a cost associated with creation and maintenance of indexes. The SAS Language Reference: Concepts lists the conditions under which the benefits of indexes outweigh the cost.

With data files containing cross sections, there can be various degrees of overlap among the series variables. One extreme is when all the series variables contain data for all the cross sections. In this case, the output data set is very compact. In the other extreme case, however, the set of time series variables are unique for each cross section, making the output data set very sparse, as depicted in Table 12.4.

Table 12.4: The OUT= Data Set Containing Unique Series for Each BY Group

BY	Series in	Series in	${\dots }$	Series in
Variables	first BY group	second BY group	${\dots }$	last BY group
BY1 ${\dots }$ BYP	F1 F2 F3 ${\dots }$ FN	S1 S2 S3 ${\dots }$ SM	${\dots }$	T1 T2 T3 ${\dots }$ TK
BY	DATA
group	is
1	here
BY		DATA	data is missing
group		is	everywhere except
2		here	on diagonal
			DATA
${\vdots }$			is
			here
BY				DATA
group				is
N				here

The data in Table 12.4 can be represented more compactly if cross-sectional information is incorporated into series variable names.