Given the order p, let be the vector of current and past values relevant to prediction of
:
Let be the vector of current and future values:
In the canonical correlation analysis, consider submatrices of the sample covariance matrix of and
. This covariance matrix,
, has a block Hankel form:
The canonical correlation analysis forms a sequence of potential state vectors . Examine a sequence
of subvectors of
, form the submatrix
that consists of the rows and columns of
that correspond to the components of
, and compute its canonical correlations.
The smallest canonical correlation of is then used in the selection of the components of the state vector. The selection process is described in the following
discussion. For more details about this process, see Akaike (1976).
In the following discussion, the notation denotes the wide sense conditional expectation (best linear predictor) of
, given all
with s less than or equal to t. In the notation
, the first subscript denotes the ith component of
.
The initial state vector is set to
. The sequence
is initialized by setting
That is, start by considering whether to add to the initial state vector
.
The procedure forms the submatrix that corresponds to
and computes its canonical correlations. Denote the smallest canonical correlation of
as
. If
is significantly greater than 0,
is added to the state vector.
If the smallest canonical correlation of is not significantly greater than 0, then a linear combination of
is uncorrelated with the past,
. Assuming that the determinant of
is not 0, (that is, no input series is a constant), you can take the coefficient of
in this linear combination to be 1. Denote the coefficients of
in this linear combination as
. This gives the relationship:
Therefore, the current state vector already contains all the past information useful for predicting and any greater leads of
. The variable
is not added to the state vector, nor are any terms
considered as possible components of the state vector. The variable
is no longer active for state vector selection.
The process described for is repeated for the remaining elements of
. The next candidate for inclusion in the state vector is the next component of
that corresponds to an active variable. Components of
that correspond to inactive variables that produced a zero
in a previous step are skipped.
Denote the next candidate as . The vector
is formed from the current state vector and
as follows:
The matrix is formed from
and its canonical correlations are computed. The smallest canonical correlation of
is judged to be either greater than or equal to 0. If it is judged to be greater than 0,
is added to the state vector. If it is judged to be 0, then a linear combination of
is uncorrelated with the
, and the variable
is now inactive.
The state vector selection process continues until no active variables remain.
For each step in the canonical correlation sequence, the significance of the smallest canonical correlation is judged by an information criterion from Akaike (1976). This information criterion is
where q is the dimension of at the current step, r is the order of the state vector, p is the order of the vector autoregressive process, and
is the value of the SIGCORR= option. The default is SIGCORR=2. If this information criterion is less than or equal to 0,
is taken to be 0; otherwise, it is taken to be significantly greater than 0. (Do not confuse this information criterion with
the AIC.)
Variables in are not added in the model, even with positive information criterion, because of the singularity of
. You can force the consideration of more candidate state variables by increasing the size of the
matrix by specifying a PASTMIN= option value larger than p.
To print the details of the canonical correlation analysis process, specify the CANCORR option in the PROC STATESPACE statement. The CANCORR option prints the candidate state vectors, the canonical correlations, and the information criteria for testing the significance of the smallest canonical correlation.
Bartlett’s and its degrees of freedom are also printed when the CANCORR option is specified. The formula used for Bartlett’s
is
with degrees of freedom.
Figure 28.12 shows the output of the CANCORR option for the introductory example shown in the Getting Started: STATESPACE Procedure.
proc statespace data=in out=out lead=10 cancorr; var x(1) y(1); id t; run;
New variables are added to the state vector if the information criteria are positive. In this example, and
are not added to the state space vector because the information criteria for these models are negative.
If the information criterion is nearly 0, then you might want to investigate models that arise if the opposite decision is
made regarding . This investigation can be accomplished by using a FORM statement to specify part or all of the state vector.
When a candidate variable yields a zero
and is not added to the state vector, a linear combination of
is uncorrelated with the
. Because of the method used to construct the
sequence, the coefficient of
in
can be taken as 1. Denote the coefficients of
in this linear combination as
.
This gives the relationship:
The vector is used as a preliminary estimate of the first r columns of the row of the transition matrix
corresponding to
.