The VARMAX Procedure

Forecasting

The optimal (minimum MSE) $l$-step-ahead forecast of $\mb {y} _{t+l}$ is

\begin{eqnarray*}  \mb {y} _{t+l|t} & =&  \sum _{j=1}^ p \Phi _ j \mb {y} _{t+l-j|t} + \sum _{j=0}^ s \Theta _ j^* \mb {x} _{t+l-j|t} - \sum _{j=l}^ q \Theta _ j \bepsilon _{t+l-j}, ~ ~ l \leq q \\ \mb {y} _{t+l|t} & =&  \sum _{j=1}^ p \Phi _ j \mb {y} _{t+l-j|t} + \sum _{j=0}^ s \Theta _ j^* \mb {x} _{t+l-j|t}, ~ ~ l > q \end{eqnarray*}

with $\mb {y} _{t+l-j|t} = \mb {y} _{t+l-j}$ and $\mb {x} _{t+l-j|t} = \mb {x} _{t+l-j}$ for $l\leq j$. For the forecasts $\mb {x} _{t+l-j|t}$, see the section State-Space Representation.

Covariance Matrices of Prediction Errors without Exogenous (Independent) Variables

Under the stationarity assumption, the optimal (minimum MSE) $l$-step-ahead forecast of $\mb {y} _{t+l}$ has an infinite moving-average form, $ \mb {y} _{t+l|t} = \sum _{j=l}^{\infty }\Psi _{j}\bepsilon _{t+l-j}$. The prediction error of the optimal $l$-step-ahead forecast is $\mb {e} _{t+l|t} = \mb {y} _{t+l}-\mb {y} _{t+l|t} = \sum _{j=0}^{l-1}\Psi _{j}\bepsilon _{t+l-j}$, with zero mean and covariance matrix,

\[  \Sigma (l) = \mr {Cov} (\mb {e} _{t+l|t} ) = \sum _{j=0}^{l-1}\Psi _{j}\Sigma \Psi _{j}’ = \sum _{j=0}^{l-1}\Psi ^{o}_{j}\Psi _{j}^{o}  \]

where $\Psi _ j^ o = \Psi _ j P$ with a lower triangular matrix $P$ such that $\Sigma =PP’$. Under the assumption of normality of the $\bepsilon _ t$, the $l$-step-ahead prediction error $\mb {e} _{t+l|t}$ is also normally distributed as multivariate $N(0, \Sigma (l))$. Hence, it follows that the diagonal elements $\sigma ^2_{ii}(l)$ of $\Sigma (l)$ can be used, together with the point forecasts $y_{i,t+l|t}$, to construct $l$-step-ahead prediction intervals of the future values of the component series, $y_{i,t+l}$.

The following statements use the COVPE option to compute the covariance matrices of the prediction errors for a VAR(1) model. The parts of the VARMAX procedure output are shown in Figure 35.36 and Figure 35.37.

proc varmax data=simul1;
   model y1 y2 / p=1 noint lagmax=5
                 printform=both
                 print=(decompose(5) impulse=(all) covpe(5));
run;

Figure 35.36 is the output in a matrix format associated with the COVPE option for the prediction error covariance matrices.

Figure 35.36: Covariances of Prediction Errors (COVPE Option)

The VARMAX Procedure

Prediction Error Covariances
Lead Variable y1 y2
1 y1 1.28875 0.39751
  y2 0.39751 1.41839
2 y1 2.92119 1.00189
  y2 1.00189 2.18051
3 y1 4.59984 1.98771
  y2 1.98771 3.03498
4 y1 5.91299 3.04856
  y2 3.04856 4.07738
5 y1 6.69463 3.85346
  y2 3.85346 5.07010


Figure 35.37 is the output in a univariate format associated with the COVPE option for the prediction error covariances. This printing format more easily explains the prediction error covariances of each variable.

Figure 35.37: Covariances of Prediction Errors

Prediction Error Covariances by Variable
Variable Lead y1 y2
y1 1 1.28875 0.39751
  2 2.92119 1.00189
  3 4.59984 1.98771
  4 5.91299 3.04856
  5 6.69463 3.85346
y2 1 0.39751 1.41839
  2 1.00189 2.18051
  3 1.98771 3.03498
  4 3.04856 4.07738
  5 3.85346 5.07010


Covariance Matrices of Prediction Errors in the Presence of Exogenous (Independent) Variables

Exogenous variables can be both stochastic and nonstochastic (deterministic) variables. Considering the forecasts in the VARMAX($p$,$q$,$s$) model, there are two cases.

When exogenous (independent) variables are stochastic (future values not specified):

As defined in the section State-Space Representation, $\mb {y} _{t+l|t}$ has the representation

\[  \mb {y} _{t+l|t} = \sum _{j=l}^{\infty }V_{j}\mb {a} _{t+l-j} + \sum _{j=l}^{\infty }\Psi _{j}\bepsilon _{t+l-j}  \]

and hence

\[  \mb {e} _{t+l|t} = \sum _{j=0}^{l-1}V_{j}\mb {a} _{t+l-j} + \sum _{j=0}^{l-1}\Psi _{j}\bepsilon _{t+l-j}  \]

Therefore, the covariance matrix of the $l$-step-ahead prediction error is given as

\[  \Sigma (l) = \mr {Cov} (\mb {e} _{t+l|t} ) = \sum _{j=0}^{l-1}V_{j}\Sigma _{a} V_{j}’ + \sum _{j=0}^{l-1}\Psi _{j}\Sigma _{\epsilon }\Psi _{j}’  \]

where $\Sigma _{a}$ is the covariance of the white noise series $\mb {a} _ t$, and $\mb {a} _ t$ is the white noise series for the VARMA($p$,$q$) model of exogenous (independent) variables, which is assumed not to be correlated with $\bepsilon _{t}$ or its lags.

When future exogenous (independent) variables are specified:

The optimal forecast $\mb {y} _{t+l|t}$ of $\mb {y} _{t}$ conditioned on the past information and also on known future values $\mb {x} _{t+1}, \ldots , \mb {x} _{t+l}$ can be represented as

\[  \mb {y} _{t+l|t} = \sum _{j=0}^{\infty }\Psi _{j}^{*}\mb {x} _{t+l-j} + \sum _{j=l}^{\infty }\Psi _{j}\bepsilon _{t+l-j}  \]

and the forecast error is

\[  \mb {e} _{t+l|t} = \sum _{j=0}^{l-1}\Psi _{j}\bepsilon _{t+l-j}  \]

Thus, the covariance matrix of the $l$-step-ahead prediction error is given as

\[  \Sigma (l) = \mr {Cov} (\mb {e} _{t+l|t} ) = \sum _{j=0}^{l-1}\Psi _{j}\Sigma _{\epsilon }\Psi _{j}’  \]

Decomposition of Prediction Error Covariances

In the relation $\Sigma (l) = \sum _{j=0}^{l-1}\Psi ^{o}_{j}\Psi _{j}^{o}$, the diagonal elements can be interpreted as providing a decomposition of the $l$-step-ahead prediction error covariance $\sigma _{ii}^2(l)$ for each component series $y_{it}$ into contributions from the components of the standardized innovations $\bepsilon _ t$.

If you denote the ($i,n$)th element of $\Psi ^{o}_{j}$ by $\psi _{j,in}$, the MSE of $y_{i,t+h|t}$ is

\[  \mr {MSE} (y_{i,t+h|t})=\mr {E} (y_{i,t+h} - y_{i,t+h|t})^2 = \sum _{j=0}^{l-1} \sum _{n=1}^{k} \psi _{j,in}^2  \]

Note that $\sum _{j=0}^{l-1} \psi _{j,in}^2$ is interpreted as the contribution of innovations in variable $n$ to the prediction error covariance of the $l$-step-ahead forecast of variable $i$.

The proportion, $\omega _{l,in}$, of the $l$-step-ahead forecast error covariance of variable $i$ accounting for the innovations in variable $n$ is

\[  \omega _{l,in} = \sum _{j=0}^{l-1} \psi _{j,in}^2/\mr {MSE} (y_{i,t+h|t})  \]

The following statements use the DECOMPOSE option to compute the decomposition of prediction error covariances and their proportions for a VAR(1) model:

proc varmax data=simul1;
   model y1 y2 / p=1 noint print=(decompose(15))
                 printform=univariate;
run;

The proportions of decomposition of prediction error covariances of two variables are given in Figure 35.38. The output explains that about 91.356% of the one-step-ahead prediction error covariances of the variable $y_{2t}$ is accounted for by its own innovations and about 8.644% is accounted for by $y_{1t}$ innovations.

Figure 35.38: Decomposition of Prediction Error Covariances (DECOMPOSE Option)

Proportions of Prediction Error Covariances
by Variable
Variable Lead y1 y2
y1 1 1.00000 0.00000
  2 0.88436 0.11564
  3 0.75132 0.24868
  4 0.64897 0.35103
  5 0.58460 0.41540
y2 1 0.08644 0.91356
  2 0.31767 0.68233
  3 0.50247 0.49753
  4 0.55607 0.44393
  5 0.53549 0.46451


Forecasting of the Centered Series

If the CENTER option is specified, the sample mean vector is added to the forecast.

Forecasting of the Differenced Series

If dependent (endogenous) variables are differenced, the final forecasts and their prediction error covariances are produced by integrating those of the differenced series. However, if the PRIOR option is specified, the forecasts and their prediction error variances of the differenced series are produced.

Let $\mb {z} _{t}$ be the original series with some appended zero values that correspond to the unobserved past observations. Let $\Delta (B)$ be the ${k \times k}$ matrix polynomial in the backshift operator that corresponds to the differencing specified by the MODEL statement. The off-diagonal elements of $\Delta _{i}$ are zero, and the diagonal elements can be different. Then $\mb {y} _{t} = \Delta (B)\mb {z} _{t}$.

This gives the relationship

\[  \mb {z} _{t} = \Delta ^{-1}(B) \mb {y} _{t} = \sum _{j=0}^{\infty } \Lambda _{j}\mb {y} _{t-j}  \]

where $\Delta ^{-1}(B) =\sum _{j=0}^{\infty }\Lambda _{j} B^{j}$ and $\Lambda _{0} = I_{k}$.

The $l$-step-ahead prediction of $\mb {z} _{t+l}$ is

\[  \mb {z} _{t+l|t} = \sum _{j=0}^{l-1}\Lambda _{j} \mb {y} _{t+l-j|t} + \sum _{j=l}^{\infty }\Lambda _{j} \mb {y} _{t+l-j}  \]

The $l$-step-ahead prediction error of $\mb {z} _{t+l}$ is

\begin{eqnarray*}  \sum _{j=0}^{l-1} \Lambda _{j} \left(\mb {y} _{t+l-j} - \mb {y} _{t+l-j|t}\right) = \sum _{j=0}^{l-1} \left(\sum _{u=0}^{j} \Lambda _{u} \Psi _{j-u}\right) \bepsilon _{t+l-j} \end{eqnarray*}

Letting $\Sigma _{\mb {z} }(0)=0$, the covariance matrix of the l-step-ahead prediction error of $\mb {z} _{t+l}$, $\Sigma _{\mb {z} }(l)$, is

\begin{eqnarray*}  \Sigma _{\mb {z} }(l) & =& \sum _{j=0}^{l-1}{\left(\sum _{u=0}^{j} \Lambda _{u}\Psi _{j-u}\right) \Sigma _{\epsilon } \left(\sum _{u=0}^{j}\Lambda _{u} \Psi _{j-u}\right)’} \\ & =&  \Sigma _{\mb {z} }(l-1) + \left(\sum _{j=0}^{l-1} \Lambda _{j} \Psi _{l-1-j}\right) \Sigma _{\epsilon } \left(\sum _{j=0}^{l-1} \Lambda _{j} \Psi _{l-1-j}\right)’ \end{eqnarray*}

If there are stochastic exogenous (independent) variables, the covariance matrix of the l-step-ahead prediction error of $\mb {z} _{t+l}$, $\Sigma _{\mb {z} }(l)$, is

\begin{eqnarray*}  \Sigma _{\mb {z} }(l) & =&  \Sigma _{\mb {z} }(l-1) +\left(\sum _{j=0}^{l-1}\Lambda _{j} \Psi _{l-1-j}\right) \Sigma _{\epsilon } \left(\sum _{j=0}^{l-1}\Lambda _{j} \Psi _{l-1-j}\right)’ \\ & &  +\left(\sum _{j=0}^{l-1}\Lambda _{j} V_{l-1-j}\right) \Sigma _{a} \left(\sum _{j=0}^{l-1}\Lambda _{j} V_{l-1-j}\right)’ \end{eqnarray*}