Suppose that, in a junior high school, there are a total of 4,000 students in grades 7, 8, and 9. You want to know how household income and the number of children in a household affect students’ average weekly spending for ice cream.
In order to answer this question, you draw a sample by using simple random sampling from the student population in the junior
high school. You randomly select 40 students and ask them their average weekly expenditure for ice cream, their household
income, and the number of children in their household. The answers from the 40 students are saved as the following SAS data
set IceCream
:
data IceCream; input Grade Spending Income Kids @@; datalines; 7 7 39 2 7 7 38 1 8 12 47 1 9 10 47 4 7 1 34 4 7 10 43 2 7 3 44 4 8 20 60 3 8 19 57 4 7 2 35 2 7 2 36 1 9 15 51 1 8 16 53 1 7 6 37 4 7 6 41 2 7 6 39 2 9 15 50 4 8 17 57 3 8 14 46 2 9 8 41 2 9 8 41 1 9 7 47 3 7 3 39 3 7 12 50 2 7 4 43 4 9 14 46 3 8 18 58 4 9 9 44 3 7 2 37 1 7 1 37 2 7 4 44 2 7 11 42 2 9 8 41 2 8 10 42 2 8 13 46 1 7 2 40 3 9 6 45 1 9 11 45 4 7 2 36 1 7 9 46 1 ;
In the data set IceCream
, the variable Grade
indicates a student’s grade. The variable Spending
contains the dollar amount of each student’s average weekly spending for ice cream. The variable Income
specifies the household income, in thousands of dollars. The variable Kids
indicates how many children are in a student’s family.
The following PROC SURVEYREG statements request a regression analysis:
title1 'Ice Cream Spending Analysis'; title2 'Simple Random Sample Design'; proc surveyreg data=IceCream total=4000; class Kids; model Spending = Income Kids / solution; run;
The PROC SURVEYREG statement invokes the procedure. The TOTAL=4000 option specifies the total in the population from which
the sample is drawn. The CLASS statement requests that the procedure use the variable Kids
as a classification variable in the analysis. The MODEL statement describes the linear model that you want to fit, with Spending
as the dependent variable and Income
and Kids
as the independent variables. The SOLUTION option in the MODEL statement requests that the procedure output the regression
coefficient estimates.
Figure 94.1 displays the summary of the data, the summary of the fit, and the levels of the classification variable Kids
. The “Fit Statistics” table displays the denominator degrees of freedom, which are used in F tests and t tests in the regression analysis.
Figure 94.1: Summary of Data
Ice Cream Spending Analysis |
Simple Random Sample Design |
Data Summary | |
---|---|
Number of Observations | 40 |
Mean of Spending | 8.75000 |
Sum of Spending | 350.00000 |
Fit Statistics | |
---|---|
R-square | 0.8132 |
Root MSE | 2.4506 |
Denominator DF | 39 |
Class Level Information | ||
---|---|---|
Class Variable | Levels | Values |
Kids | 4 | 1 2 3 4 |
Figure 94.2 displays the tests for model effects. The effect Income
is significant in the linear regression model, while the effect Kids
is not significant at the 5% level.
Figure 94.2: Testing Effects in the Regression
Tests of Model Effects | |||
---|---|---|---|
Effect | Num DF | F Value | Pr > F |
Model | 4 | 119.15 | <.0001 |
Intercept | 1 | 153.32 | <.0001 |
Income | 1 | 324.45 | <.0001 |
Kids | 3 | 0.92 | 0.4385 |
Note: | The denominator degrees of freedom for the F tests is 39. |
The regression coefficient estimates and their standard errors and associated t tests are displayed in Figure 94.3.
Figure 94.3: Regression Coefficients
Estimated Regression Coefficients | ||||
---|---|---|---|---|
Parameter | Estimate | Standard Error | t Value | Pr > |t| |
Intercept | -26.084677 | 2.46720403 | -10.57 | <.0001 |
Income | 0.775330 | 0.04304415 | 18.01 | <.0001 |
Kids 1 | 0.897655 | 1.12352876 | 0.80 | 0.4292 |
Kids 2 | 1.494032 | 1.24705263 | 1.20 | 0.2381 |
Kids 3 | -0.513181 | 1.33454891 | -0.38 | 0.7027 |
Kids 4 | 0.000000 | 0.00000000 | . | . |
Note: | The denominator degrees of freedom for the t tests is 39. Matrix X'X is singular and a generalized inverse was used to solve the normal equations. Estimates are not unique. |