In a presentation by Ansley and de Jong (2012), three monthly time series are jointly modeled to obtain long-term—several decades long—temperature predictions for certain
regions of the northern hemisphere. This example shows how you can specify and fit the final model that this presentation
proposed. The following data set, Temp
, contains four variables: date
dates the monthly observations; UAH
contains monthly satellite global temperature readings, starting in December 1978; CRU
contains monthly temperature data, starting in January 1850 (from a different source); and GISS
contains monthly temperature data, starting in January 1880 (from yet another source). All these temperature data are scaled
suitably so that the numbers represent temperature readings in centigrade.
data Temp; input UAH CRU GISS @@; date = intnx('month', '01jan1850'd, _n_-1); format date date.; datalines; . 8.243 . . 9.733 . ... more lines ...
The following statements produce scatter plots of these three series in a single graph:
proc sgplot data=Temp; title "Scatter Plots of the Temperature Series"; scatter x=date y=cru; scatter x=date y=uah ; scatter x=date y=giss; run;
Output 27.12.1 shows the resulting graph. As already noted, these three series start at different points in the past. However, they all
end at the same time: they all have measurements until January 2012, which is the last month in the data set. The mean levels
of these series are different: the GISS
measurements are generally larger than CRU
and UAH
by about 4 degrees. In addition, the variability in the CRU
values seems to decrease with time (this is more apparent when the series is plotted by itself). The goal of the analysis
is to use these data to make long-term predictions about future temperature levels. The following statements append 1,200
missing measurements to Temp
, so that the model fitted by using the SSM procedure can be extrapolated to obtain temperature forecasts 100 years in the
future:
data append(keep=date cru giss uah); do i=1 to 1200; cru = .; giss=.; uah=.; date = intnx('month','01jan2012'D, i); format date monyy7.; output; end; run;
proc append base=Temp data=Append; run;
Ansley and de Jong propose a parsimonious model that links these three time series. It can be described as
where and are intercepts that are associated with CRU
and UAH
, respectively; is an integrated random walk trend; is a zero-mean, autoregressive noise term (which is scaled by an unknown scaling factor a); and () are independent observation errors with different variances that are also scaled suitably. Note that the trend and the autoregressive noise term are shared by the models of the three series, and, for identification purposes, the intercept for GISS
is taken to be zero. In addition, the model parameters are assumed to be interrelated and are parameterized in a particular
way (which leads to fewer parameters to estimate, and their relative scaling helps in parameter estimation). This special
parameterization can be expressed as a function of seven basic parameters: loga1, logr1, logr3, logsigma, b, c, and rhoParm (this naming convention is different from that used by Ansley and de Jong (2012)).
Let , and let the scaling factors , , and . Then the model parameters can be described as follows:
The parameters that are associated with the autoregression : the damping factor , the variance of the disturbance term = , and the variance of the initial state =
The parameters that are associated with the integrated random walk trend: the variance of the disturbance term in the slope equation =
The variance of the observation errors for GISS
=
The variance of the observation errors for UAH
=
The variance of the observation errors for CRU
is taken to be time-varying with the following form: for
where = 1,945 (the number of observations in the unappended data set). For 1,945, it is fixed at its last value: .
The following DATA step adds an observation index, tindex
, to Temp
, which is used in the SSM procedure to define the time-varying observation error variance for CRU
:
data temp; set temp; tindex = _n_; run;
The following statements fit the preceding model to the Temp
data:
ods output RegressionEstimates=regEst; proc ssm data=Temp; id date interval=month; parms loga1 logr1 logr3 logsigma; parms b=0 c=0; parms rhoParm; rho = (exp(rhoParm)-1)/(exp(rhoParm)+1); sigmaSq = exp(2*logsigma); initSigmaSq = sigmaSq/(1-rho*rho); a1 = exp(loga1); a1Sq = a1*a1; r1sq = exp(2*logr1); r3sq = exp(2*logr3); giss_var = a1Sq*r1sq*sigmaSq; nobs=1945; if tindex <= nobs then cru_var = a1Sq*exp(2*b + 2*c*tindex/nobs)*sigmaSq; else cru_var = a1Sq*exp(2*b + 2*c)*sigmaSq; uah_var = a1Sq*r3sq*sigmaSq; UAH_Intercept=1.0; CRU_Intercept=1.0; trend level(ll) variance=0 slopevar=sigmaSq; state auto(1) T(g)=(rho) cov(g)=(sigmaSq) cov1(g)=(initSigmaSq); comp auto_common = auto*(a1); state wn(3) type=wn cov(d)=(giss_var cru_var uah_var); comp wn_giss = wn[1]; comp wn_cru = wn[2]; comp wn_uah = wn[3]; model GISS = level auto_common wn_giss; model CRU = CRU_Intercept level auto_common wn_cru; model UAH = UAH_Intercept level auto_common wn_uah; comp slope = level_state_*(0 1); output out=For pdv press; run;
Output 27.12.2 shows the estimated intercepts and . As expected, they are quite close (see the scatter plots of CRU
and UAH
in Output 27.12.1).
The estimates of the basic parameters that underlie the model parameters are shown in Output 27.12.3.
The following DATA steps add two variables (CRU_ADJ
= CRU
– and UAH_ADJ
= UAH
– ) to the output data set For
. These adjusted versions of CRU
and UAH
have the same mean level as GISS
—estimated .
data _NULL_; set regEst; if _n_ = 1 then call symput('intercept1',trim(left(estimate))); else call symput('intercept2',trim(left(estimate))); run; data for; set For; cru_adj = cru - &intercept1; uah_adj = uah - &intercept2; run;
The following statements produce a graph that contains four plots: scatter plots of GISS
, CRU_ADJ
, and UAH_ADJ
and a series plot of the estimated .
proc sgplot data=For; where date < '01feb2021'd; title "Fitted Trend for the Temp Series (Up to Year 2020) "; title2 "(CRU and UAH Adjusted by Their Estimated Intercepts) "; scatter x=date y=cru_adj / LEGENDLABEL="Adjusted CRU" MARKERATTRS=GraphData1(symbol=star size=3); scatter x=date y=uah_adj / LEGENDLABEL="Adjusted UAH" MARKERATTRS=GraphData2(symbol=plus size=3); scatter x=date y=giss / MARKERATTRS=GraphData3(symbol=triangle size=3); series x=date y=smoothed_level / MARKERATTRS=GraphData4; run;
Output 27.12.4 shows the resulting graph. It shows that the estimated mean level tracks the observed data quite well.
The following statements produce the time series plot of the estimated along with the 95% confidence band:
proc sgplot data=For; title "Temperature Projections for the Next 100 Years"; band x=date lower=smoothed_lower_level upper=smoothed_upper_level; series x=date y=smoothed_level; refline '01feb2012'd / axis=x lineattrs=(pattern=shortdash) LEGENDLABEL= "Start of Multistep Forecasts" name="Forecast Reference Line"; run;
Output 27.12.5 shows the resulting graph.
The following statements produce a similar graph for the estimated slope of :
proc sgplot data=For; where date <= '01feb2031'd; title "The Monthly Rate of Temperature Change (Up to Year 2030)"; band x=date lower=smoothed_lower_slope upper=smoothed_upper_slope; series x=date y=smoothed_slope; refline '01feb2012'd / axis=x lineattrs=(pattern=shortdash) LEGENDLABEL= "Start of Multistep Forecasts" name="Forecast Reference Line"; refline 0 / axis=y lineattrs=(pattern=dash); run;
Output 27.12.6 shows the resulting plot of the estimated slope of .
Based on the preceding analysis (see the plots of and its slope in Output 27.12.5 and Output 27.12.6), it appears that there has been statistically significant warming over the last 10 years, but the warming does not appear
to be accelerating.