Recall the example in the section Getting Started: SURVEYREG Procedure, which analyzed a stratified simple random sample from a junior high school to examine how household income and the number of children in a household affect students’ average weekly spending for ice cream. You can use the same sample to analyze the average weekly spending among male and female students. Because student gender is unrelated to the design of the sample, this kind of analysis is called domain analysis (subgroup analysis).
This example shows how you can use PROC SURVEYREG to perform domain analysis. The data set follows:
data IceCreamDataDomain; input Grade Spending Income Gender$ @@; datalines; 7 7 39 M 7 7 38 F 8 12 47 F 9 10 47 M 7 1 34 M 7 10 43 M 7 3 44 M 8 20 60 F 8 19 57 M 7 2 35 M 7 2 36 F 9 15 51 F 8 16 53 F 7 6 37 F 7 6 41 M 7 6 39 M 9 15 50 M 8 17 57 F 8 14 46 M 9 8 41 M 9 8 41 F 9 7 47 F 7 3 39 F 7 12 50 M 7 4 43 M 9 14 46 F 8 18 58 M 9 9 44 F 7 2 37 F 7 1 37 M 7 4 44 M 7 11 42 M 9 8 41 M 8 10 42 M 8 13 46 F 7 2 40 F 9 6 45 F 9 11 45 M 7 2 36 F 7 9 46 F ; data IceCreamDataDomain; set IceCreamDataDomain; if Grade=7 then Prob=20/1824; if Grade=8 then Prob=9/1025; if Grade=9 then Prob=11/1151; Weight=1/Prob; run;
In the data set IceCreamDataDomain
, the variable Grade
indicates a student’s grade, which is the stratification variable. The variable Spending
contains the dollar amount of each student’s average weekly spending for ice cream. The variable Income
specifies the household income, in thousands of dollars. The variable Gender
indicates a student’s gender. The sampling weights are created by using the reciprocals of the probabilities of selection,
as follows:
data StudentTotals; input Grade _TOTAL_; datalines; 7 1824 8 1025 9 1151 ;
In the data set StudentTotals
, the variable Grade
is the stratification variable, and the variable _TOTAL_
contains the total numbers of students in the strata in the survey population.
The following statements demonstrate how you can analyze the relationship between spending and income among male and female students:
title1 'Ice Cream Spending Analysis'; title2 'Domain Analysis by Gender'; proc surveyreg data=IceCreamDataDomain total=StudentTotals; strata Grade; model Spending = Income; domain Gender; weight Weight; run;
Output 94.7.1 gives a summary of the domains.
Output 94.7.1: Domain Analysis Summary
Ice Cream Spending Analysis |
Domain Analysis by Gender |
Domain Summary | |
---|---|
Number of Observations | 40 |
Number of Observations in Domain | 19 |
Number of Observations Not in Domain | 21 |
Sum of Weights in Domain | 1926.9 |
Weighted Mean of Spending | 9.37611 |
Weighted Sum of Spending | 18066.5 |
Ice Cream Spending Analysis |
Domain Analysis by Gender |
Domain Summary | |
---|---|
Number of Observations | 40 |
Number of Observations in Domain | 21 |
Number of Observations Not in Domain | 19 |
Sum of Weights in Domain | 2073.1 |
Weighted Mean of Spending | 8.92305 |
Weighted Sum of Spending | 18498.7 |
Output 94.7.2 shows the parameter estimates for the model within each domain.
Output 94.7.2: Parameter Estimates within Domain
Ice Cream Spending Analysis |
Domain Analysis by Gender |
Estimated Regression Coefficients | ||||
---|---|---|---|---|
Parameter | Estimate | Standard Error | t Value | Pr > |t| |
Intercept | -23.751681 | 2.30795437 | -10.29 | <.0001 |
Income | 0.735366 | 0.04757001 | 15.46 | <.0001 |
Note: | The denominator degrees of freedom for the t tests is 37. |
Ice Cream Spending Analysis |
Domain Analysis by Gender |
Estimated Regression Coefficients | ||||
---|---|---|---|---|
Parameter | Estimate | Standard Error | t Value | Pr > |t| |
Intercept | -23.213291 | 2.13361241 | -10.88 | <.0001 |
Income | 0.729419 | 0.04589801 | 15.89 | <.0001 |
Note: | The denominator degrees of freedom for the t tests is 37. |
For this particular example, the effect Income
is significant for both models built within subgroups of male and female students, and the models are quite similar. In many
other cases, regression models vary from subgroup to subgroup.