The single equation example shown in the preceding section was written in normalized form and specified as an assignment of the regression function to the dependent variable LHUR. However, sometimes it is impossible or inconvenient to write a nonlinear model in normalized form.
To write a general form equation, give the equation a name with the prefix "EQ.". This EQ.-prefixed variable represents the equation error. Write the equation as an assignment to this variable.
For example, suppose you have the following nonlinear model that relates the variables x and y :
Naming this equation ‘one’, you can fit this model with the following statements:
proc model data=xydata; eq.one = a + b * log( c * y + d * x ); fit one; run;
The use of the EQ. prefix tells PROC MODEL that the variable is an error term and that it should not expect actual values for the variable ONE in the input data set.
General form specifications are often useful when you have several equations for the same dependent variable. This is common in supply and demand models, where both the supply equation and the demand equation are written as predictions for quantity as functions of price.
For example, consider the following supply and demand system:
Assume the quantity of interest is the amount of energy consumed in the U.S., the price is the price of gasoline, and the income variable is the consumer debt. When the market is at equilibrium, these equations determine the market price and the equilibrium quantity. These equations are written in general form as
Note that the endogenous variables quantity and price depend on two error terms so that OLS should not be used. The following example uses three-stage least squares estimation.
Data for this model is obtained from the SASHELP.CITIMON data set.
title1 'Supply-Demand Model using General-form Equations'; proc model data=sashelp.citimon; endogenous eegp eec; exogenous exvus cciutc; parameters a1 a2 b1 b2 b3 ; label eegp = 'Gasoline Retail Price' eec = 'Energy Consumption' cciutc = 'Consumer Debt'; /* -------- Supply equation ------------- */ eq.supply = eec - (a1 + a2 * eegp ); /* -------- Demand equation ------------- */ eq.demand = eec - (b1 + b2 * eegp + b3 * cciutc); /* -------- Instrumental variables -------*/ lageegp = lag(eegp); lag2eegp=lag2(eegp); /* -------- Estimate parameters --------- */ fit supply demand / n3sls fsrsq; instruments _EXOG_ lageegp lag2eegp; run;
The FIT statement specifies the two equations to estimate and the method of estimation, N3SLS. Note that ‘3SLS’ is an alias for N3SLS. The option FSRSQ is selected to get a report of the first stage R to determine the acceptability of the selected instruments.
Since three-stage least squares is an instrumental variables method, instruments are specified with the INSTRUMENTS statement. The instruments selected are all the exogenous variables, selected with the _EXOG_ option, and two lags of the variable EEGP: LAGEEGP and LAG2EEGP.
The data set CITIMON has four observations that generate missing values because values for EEGP, EEC, or CCIUTC are missing. This is revealed in the "Observations Processed" output shown in Figure 19.7. Missing values are also generated when the equations cannot be computed for a given observation. Missing observations are not used in the estimation.
The lags used to create the instruments also reduce the number of observations used. In this case, the first two observations were used to fill the lags of EEGP.
The data set has a total of 145 observations, of which four generated missing values and two were used to fill lags, which left 139 observations for the estimation. In the estimation summary, in Figure 19.8, the total degrees of freedom for the model and error is 139.
Figure 19.8: Supply-Demand Parameter Estimates
Nonlinear 3SLS Parameter Estimates | |||||
---|---|---|---|---|---|
Parameter | Estimate | Approx Std Err | t Value | Approx Pr > |t| |
1st Stage R-Square |
a1 | 7.30952 | 0.3799 | 19.24 | <.0001 | 1.0000 |
a2 | -0.00853 | 0.00328 | -2.60 | 0.0103 | 0.9617 |
b1 | 6.82196 | 0.3788 | 18.01 | <.0001 | 1.0000 |
b2 | -0.00614 | 0.00303 | -2.02 | 0.0450 | 0.9617 |
b3 | 9E-7 | 3.165E-7 | 2.84 | 0.0051 | 1.0000 |
One disadvantage of specifying equations in general form is that there are no actual values associated with the equation, so the R statistic cannot be computed.