This example is a module that calculates statistics that are associated with a linear regression. The following module is similar to the REGRESS module , which is included in the IMLMLIB library:
proc iml; start regress( x, y, name, tval=, l1=, l2=, l3= ); n = nrow(x); /* number of observations */ k = ncol(x); /* number of variables */ xpx = x` * x; /* cross-products */ xpy = x` * y; xpxi = inv(xpx); /* inverse crossproducts */ b = xpxi * xpy; /* parameter estimates */ yhat = x * b; /* predicted values */ resid = y - yhat; /* residuals */ sse = resid` * resid; /* sum of squared errors */ dfe = n - k; /* degrees of freedom error */ mse = sse / dfe; /* mean squared error */ rmse = sqrt(mse); /* root mean squared error */ covb = xpxi # mse; /* covariance of estimates */ stdb = sqrt(vecdiag(covb)); /* standard errors */ t = b / stdb; /* ttest for estimates=0 */ probt = 1 - cdf("F",t#t,1,dfe); /* significance probability */ paramest = b || stdb || t || probt; print paramest[c={"Estimate" "StdErr" "t" "Pr>|t|"} r=name l="Parameter Estimates" f=Best6.]; s = diag(1/stdb); corrb = s * covb * s; /* correlation of estimates */ reset fw=6 spaces=3; /* for proper formatting */ print covb[r=name c=name l="Covariance of Estimates"], corrb[r=name c=name l="Correlation of Estimates"]; if nrow(tval) = 0 then return; /* is a t-value specified? */ projx = x * xpxi * x`; /* hat matrix */ vresid = (i(n) - projx) * mse; /* covariance of residuals */ vpred = projx # mse; /* covariance of pred vals */ h = vecdiag(projx); /* hat leverage values */ lowerm = yhat - tval # sqrt(h*mse); /* lower conf limit for mean*/ upperm = yhat + tval # sqrt(h*mse); /* upper CL for mean */ lower = yhat - tval # sqrt(h*mse+mse);/* lower CL for individual */ upper = yhat + tval # sqrt(h*mse+mse);/* upper CL */ R = y || yhat || resid || h || lowerm || upperm || lower || upper; labels = {"y" "yhat" "resid" "h" "lowerm" "upperm" "lower" "upper"}; reset fw=6 spaces=1; print R[c=labels label="Predicted values, Residuals, and Limits"]; /* test hypoth that L*b = 0, where L is linear comb of estimates */ do i = 1 to 3; L = value("L"+ strip(char(i))); /* get L matrix for L1, L2, and L3 */ if nrow(L) = 0 then return; dfn = nrow(L); Lb = L * b; vLb = L * xpxi * L`; q = Lb` * inv(vLb) * Lb / dfn; f = q / mse; prob = 1 - cdf("F", f,dfn,dfe); test = dfn || dfe || f || prob; print L, test[c={"dfn" "dfe" "F" "Pr>F"} f=Best6. l="Test Hypothesis that L*b = 0"]; end; finish;
The module accepts up to three matrices for testing the hypothesis that a linear combination of the parameters is zero. For more information about the computation, see the documentation for the TEST statement in the REG procedure in SAS/STAT User's Guide.
The following statements call the REGRESS module on data that describes the size of the US population during eight decades 1790–1860. The following statements fit a quadratic regression model to the data. The program also tests three hypotheses about the parameters in the model.
/* Quadratic regression on US population for decades beginning 1790 */ decade = T(1:8); name={"Intercept", "Decade", "Decade**2" }; x= decade##0 || decade || decade##2; y= {3.929,5.308,7.239,9.638,12.866,17.069,23.191,31.443}; /* n-p=5 dof at 0.025 level to get 95% confidence interval */ tval = quantile("T", 1-0.025, nrow(x)-ncol(x)); L1 = { 0 1 0 }; /* test hypothesis Lb=0 for linear coef */ L2 = { 0 1 0, /* test hypothesis Lb=0 for linear,quad */ 0 0 1 }; L3 = { 0 1 1 }; /* test hypothesis Lb=0 for linear+quad */ option linesize=100; run regress( x, y, name, tval, L1, L2, L3 );
The parameters estimates are shown in the first table in Output 9.3.1. The next two tables show the covariance and correlation of the estimates, respectively.
The predicted values, residuals, leverage, and confidence limits for the mean and individual predictions are shown in Output 9.3.2.
Output 9.3.2: Regression Results: Predicted Values and Residuals
Predicted values, Residuals, and Limits | |||||||
---|---|---|---|---|---|---|---|
y | yhat | resid | h | lowerm | upperm | lower | upper |
3.929 | 4.499 | -0.57 | 0.7083 | 3.0017 | 5.9964 | 2.1737 | 6.8244 |
5.308 | 5.008 | 0.3 | 0.2798 | 4.067 | 5.949 | 2.9954 | 7.0207 |
7.239 | 6.5963 | 0.6427 | 0.2321 | 5.7391 | 7.4535 | 4.6214 | 8.5711 |
9.638 | 9.2638 | 0.3742 | 0.2798 | 8.3228 | 10.205 | 7.2511 | 11.276 |
12.866 | 13.011 | -0.145 | 0.2798 | 12.07 | 13.952 | 10.998 | 15.023 |
17.069 | 17.837 | -0.768 | 0.2321 | 16.979 | 18.694 | 15.862 | 19.812 |
23.191 | 23.742 | -0.551 | 0.2798 | 22.801 | 24.683 | 21.729 | 25.755 |
31.443 | 30.727 | 0.7164 | 0.7083 | 29.229 | 32.224 | 28.401 | 33.052 |
The results of the hypothesis tests are shown in Output 9.3.3. The first hypothesis is that the coefficient of the linear term is 0. This hypothesis is not rejected at the 0.05 significance level. The second hypothesis is that the coefficients of the linear and quadratic terms are simultaneously 0. This hypothesis is soundly rejected. The third hypothesis is that the linear coefficient is equal to the negative of the quadratic coefficient. Given the data, this hypothesis is not rejected.