Working with Time Series Data


Interleaved Time Series

Normally, a time series data set has only one observation for each time period, or one observation for each time period within a cross section for a time series cross-sectional-form data set. However, it is sometimes useful to store several related time series in the same variable when the different series do not correspond to levels of a cross-sectional dimension of the data.

In this case, the different time series can be interleaved. An interleaved time series data set is similar to a time series cross-sectional data set, except that the observations are sorted differently and the ID variable that distinguishes the different time series does not represent a cross-sectional dimension.

Some SAS/ETS procedures produce interleaved output data sets. The interleaved time series form is a convenient way to store procedure output when the results consist of several different kinds of series for each of several input series. (Interleaved time series are also easy to process with plotting procedures. See the section Plotting Time Series.)

For example, the FORECAST procedure fits a model to each input time series and computes predicted values and residuals from the model. The FORECAST procedure then uses the model to compute forecast values beyond the range of the input data and also to compute upper and lower confidence limits for the forecast values.

Thus, the output from PROC FORECAST consists of up to five related time series for each variable forecast. The five resulting time series for each input series are stored in a single output variable with the same name as the series that is being forecast. The observations for the five resulting series are identified by values of the variable _TYPE_. These observations are interleaved in the output data set with observations for the same date grouped together.

The following statements show how to use PROC FORECAST to forecast the variable CPI in the USCPI data set. Figure 3.5 shows part of the output data set produced by PROC FORECAST and illustrates the interleaved structure of this data set.

proc forecast data=uscpi interval=month lead=12
              out=foreout outfull outresid;
   var cpi;
   id date;
run;
proc print data=foreout(obs=6);
run;

Figure 3.5: Partial Listing of Output Data Set Produced by PROC FORECAST

Obs date _TYPE_ _LEAD_ cpi
1 JUN1990 ACTUAL 0 129.900
2 JUN1990 FORECAST 0 130.817
3 JUN1990 RESIDUAL 0 -0.917
4 JUL1990 ACTUAL 0 130.400
5 JUL1990 FORECAST 0 130.678
6 JUL1990 RESIDUAL 0 -0.278



Observations with _TYPE_=ACTUAL contain the values of CPI read from the input data set. Observations with _TYPE_=FORECAST contain one-step-ahead predicted values for observations with dates in the range of the input series and contain forecast values for observations for dates beyond the range of the input series. Observations with _TYPE_=RESIDUAL contain the difference between the actual and one-step-ahead predicted values. Observations with _TYPE_=U95 and _TYPE_=L95 contain the upper and lower bounds, respectively, of the 95% confidence interval for the forecasts.

Using Interleaved Data Sets as Input to SAS/ETS Procedures

Interleaved time series data sets are not directly accepted as input by SAS/ETS procedures. However, it is easy to use a WHERE statement with any procedure to subset the input data and select one of the interleaved time series as the input.

For example, to analyze the residual series contained in the PROC FORECAST output data set with another SAS/ETS procedure, include a WHERE _TYPE_=’RESIDUAL’ statement. The following statements perform a spectral analysis of the residuals produced by PROC FORECAST in the preceding example:

proc spectra data=foreout out=spectout;
   var cpi;
   where _type_='RESIDUAL';
run;

Combined Cross Sections and Interleaved Time Series Data Sets

Interleaved time series output data sets produced from BY-group processing of time series cross-sectional input data sets have a complex structure that combines a cross-sectional dimension, a time dimension, and the values of the _TYPE_ variable. For example, consider the PROC FORECAST output data set produced by the following statements:

title "FORECAST Output Data Set with BY Groups";

proc forecast data=cpicity interval=month
              method=expo lead=2
              out=foreout outfull outresid;
   var cpi;
   id date;
   by city;
run;
proc print data=foreout(obs=6);
run;

The output data set FOREOUT contains many different time series in the single variable CPI. (The first few observations of FOREOUT are shown in Figure 3.6.) BY groups that are identified by the variable CITY contain the result series for the different cities. Within each value of CITY, the actual, forecast, residual, and confidence limits series are stored in interleaved form, with the observations for the different series identified by the values of _TYPE_.

Figure 3.6: Combined Cross Sections and Interleaved Time Series Data

FORECAST Output Data Set with BY Groups

Obs city date _TYPE_ _LEAD_ cpi
1 Chicago JAN90 ACTUAL 0 128.100
2 Chicago JAN90 FORECAST 0 128.252
3 Chicago JAN90 RESIDUAL 0 -0.152
4 Chicago FEB90 ACTUAL 0 129.200
5 Chicago FEB90 FORECAST 0 128.896
6 Chicago FEB90 RESIDUAL 0 0.304