This example demonstrates how you can use PROC ADAPTIVEREG to fit a nonparametric Poisson regression model.
The example concerns a study of mackerel egg density. The data are a subset of the 1992 mackerel egg survey conducted over the Porcupine Bank west of Ireland. The survey took place in the peak spawning area. Scientists took samples by hauling a net up from deep sea to the sea surface. Then they counted the number of spawned mackerel eggs and used other geographic information to estimate the sizes and distributions of spawning stocks. The data set is used as an example in Bowman and Azzalini (1997).
The following SAS DATA step creates the data set Mackerel
. This data set contains 634 observations and five variables. The response variable Egg_Count
is the number of mackerel eggs collected from each sampling net. Longitude
and Latitude
are the location values in degrees east and north, respectively, of each sample station. Net_Area
is the area of the sampling net in square meters. Depth
records the sea bed depth in meters at the sampling location. And Distance
is the distance in geographic degrees from the sample location to the continental shelf edge.
title 'Mackerel Egg Density Study'; data Mackerel; input Egg_Count Longitude Latitude Net_Area Depth Distance; datalines; 0 -4.65 44.57 0.242 4342 0.8395141177 0 -4.48 44.57 0.242 4334 0.8591926336 0 -4.3 44.57 0.242 4286 0.8930152895 1 -2.87 44.02 0.242 1438 0.3956408691 4 -2.07 44.02 0.242 166 0.0400088237 3 -2.13 44.02 0.242 460 0.0974234463 0 -2.27 44.02 0.242 810 0.2362566569 ... more lines ... 22 -4.22 46.25 0.19 205 0.1181120828 21 -4.28 46.25 0.19 237 0.129990854 0 -4.73 46.25 0.19 2500 0.3346500536 5 -4.25 47.23 0.19 114 0.718192582 3 -3.72 47.25 0.19 100 0.9944669778 0 -3.25 47.25 0.19 64 1.2639918431 ;
The response values are counts, so the Poisson distribution might be a reasonable model. The study of interest is the mackerel egg density, which can be formed as
|
This is equivalent to a Poisson regression with the response variable Egg_Count
and an offset variable and other covariates.
The following statements produce the plot of the mackerel egg density with respect to the sampling station location:
data temp; set mackerel; density = egg_count/net_area; run;
proc template; define statgraph surface; dynamic _title _z; begingraph / designwidth=defaultDesignHeight; entrytitle _title; layout overlay / xaxisopts=(offsetmin=0 offsetmax=0 linearopts=(thresholdmin=0 thresholdmax=0)) yaxisopts=(offsetmin=0 offsetmax=0 linearopts=(thresholdmin=0 thresholdmax=0)); contourplotparm z=_z y=latitude x=longitude / gridded=FALSE; endlayout; endgraph; end; run; ods graphics on; proc sgrender data=temp template=surface; dynamic _title='Mackerel Egg Density' _z='density'; run;
Output 24.4.1 displays the mackerel egg density in the sampling area. The black hole in the upper right corner is due to missing values in that area.
Output 24.4.1: Mackerel Egg Density
In this example, the dependent variable is the mackerel egg counts, the independent variables are the geographical information about each of the sampling stations, and the logarithm of the sampling area is the offset variable. The following statements fit the nonparametric Poisson regression model:
data mackerel; set mackerel; log_net_area = log(net_area); run;
proc adaptivereg data=mackerel; model egg_count = longitude latitude depth distance / offset=log_net_area dist=poisson; output out=mackerelout p(ilink); run;
Output 24.4.2 lists basic model information such as the offset variable, distribution, and link function.
Output 24.4.2: Model Information
Mackerel Egg Density Study |
Model Information | |
---|---|
Data Set | WORK.MACKEREL |
Response Variable | Egg_Count |
Offset Variable | log_net_area |
Distribution | Poisson |
Link Function | Log |
Output 24.4.3 lists fit statistics for the final model.
Output 24.4.3: Fit Statistics
Fit Statistics | |
---|---|
GCV | 6.94340 |
GCV R-Square | 0.79204 |
Effective Degrees of Freedom | 29 |
Log Likelihood | -2777.21279 |
Deviance | 4008.60601 |
The final model consists of basis functions and interactions between basis functions of three geographic variables. Output 24.4.4 lists seven functional components of the final model, including three one-way spline transformations and four two-way spline interactions.
Output 24.4.4: ANOVA Decomposition
ANOVA Decomposition | ||||
---|---|---|---|---|
Functional Component |
Number of Bases |
DF | Change If Omitted | |
Lack of Fit | GCV | |||
Longitude | 3 | 6 | 2035.77 | 3.3216 |
Depth | 1 | 2 | 420.59 | 0.6780 |
Latitude | 1 | 2 | 265.05 | 0.4104 |
Longitude Latitude | 2 | 4 | 199.17 | 0.2496 |
Depth Distance | 3 | 6 | 552.75 | 0.8030 |
Depth Latitude | 2 | 4 | 680.45 | 1.0723 |
Depth Longitude | 2 | 4 | 415.77 | 0.6198 |
The “Variable Importance” table in Output 24.4.5 displays the relative variable importance among the four variables. Longitude
is the most important one.
Output 24.4.5: Variable Importance
Variable Importance | ||
---|---|---|
Variable | Number of Bases |
Importance |
Longitude | 7 | 100.00 |
Depth | 8 | 30.26 |
Latitude | 5 | 18.93 |
Distance | 3 | 8.56 |
Output 24.4.6 displays the predicted mackerel egg density over the spawning area.
Output 24.4.6: Predicted Mackerel Egg Density