Consider the example in the section Stratified Sampling. The study population is a junior high school with a total of 4,000 students in grades 7, 8, and 9. Researchers want to know how much these students spend weekly for ice cream, on the average, and what percentage of students spend at least $10 weekly for ice cream.
The example in the section Stratified Sampling assumes that the sample of students was selected using a stratified simple random sample design. This example shows analysis based on a more complex sample design.
Suppose that every student belongs to a study group and that study groups are formed within each grade level. Each study group contains between two and four students. Table 92.5 shows the total number of study groups for each grade.
Table 92.5: Study Groups and Students by Grade
Grade |
Number of Study Groups |
Number of Students |
---|---|---|
7 |
608 |
1,824 |
8 |
252 |
1,025 |
9 |
403 |
1,151 |
Total |
1263 |
4,000 |
It is quicker and more convenient to collect data from students in the same study group than to collect data from students individually. Therefore, this study uses a stratified clustered sample design. The primary sampling units, or clusters, are study groups. The list of all study groups in the school is stratified by grade level. From each grade level, a sample of study groups is randomly selected, and all students in each selected study group are interviewed. The sample consists of eight study groups from the 7th grade, three groups from the 8th grade, and five groups from the 9th grade.
The SAS data set IceCreamStudy
saves the responses of the selected students:
data IceCreamStudy; input Grade StudyGroup Spending @@; if (Spending < 10) then Group='less'; else Group='more'; datalines; 7 34 7 7 34 7 7 412 4 9 27 14 7 34 2 9 230 15 9 27 15 7 501 2 9 230 8 9 230 7 7 501 3 8 59 20 7 403 4 7 403 11 8 59 13 8 59 17 8 143 12 8 143 16 8 59 18 9 235 9 8 143 10 9 312 8 9 235 6 9 235 11 9 312 10 7 321 6 8 156 19 8 156 14 7 321 3 7 321 12 7 489 2 7 489 9 7 78 1 7 78 10 7 489 2 7 156 1 7 78 6 7 412 6 7 156 2 9 301 8 ;
In the data set IceCreamStudy
, the variable Grade
contains a student’s grade. The variable StudyGroup
identifies a student’s study group. It is possible for students from different grades to have the same study group number
because study groups are sequentially numbered within each grade. The variable Spending
contains a student’s response regarding how much he spends per week for ice cream, in dollars. The variable GROUP
indicates whether a student spends at least $10 weekly for ice cream. It is not necessary to store the data in order of grade
and study group.
The SAS data set StudyGroup
is created to provide PROC SURVEYMEANS with the sample design information shown in Table 92.5:
data StudyGroups; input Grade _total_; datalines; 7 608 8 252 9 403 ;
The variable Grade
identifies the strata, and the variable _TOTAL_
contains the total number of study groups in each stratum. As discussed in the section Specification of Population Totals and Sampling Rates, the population totals stored in the variable _TOTAL_
should be expressed in terms of the primary sampling units (PSUs), which are study groups in this example. Therefore, the
variable _TOTAL_
contains the total number of study groups for each grade, rather than the total number of students.
In order to obtain unbiased estimates, you create sampling weights by using the following SAS statements:
data IceCreamStudy; set IceCreamStudy; if Grade=7 then Prob=8/608; if Grade=8 then Prob=3/252; if Grade=9 then Prob=5/403; Weight=1/Prob; run;
The sampling weights are the reciprocals of the probabilities of selections. The variable Weight
contains the sampling weights. Because the sampling design is clustered and all students from each selected cluster are interviewed,
the sampling weights equal the inverse of the cluster (or study group) selection probabilities.
The following SAS statements perform the analysis for this sample design:
title1 'Analysis of Ice Cream Spending'; proc surveymeans data=IceCreamStudy total=StudyGroups; strata Grade / list; cluster StudyGroup; var Spending Group; weight Weight; run;
Output 92.1.1 provides information about the sample design and the input data set. There are three strata in the sample design, and the
sample contains 16 clusters and 40 observations. The variable Group
has two levels, 'less' and 'more.'
Output 92.1.1: Data Summary and Class Information
Analysis of Ice Cream Spending |
Data Summary | |
---|---|
Number of Strata | 3 |
Number of Clusters | 16 |
Number of Observations | 40 |
Sum of Weights | 3162.6 |
Class Level Information | ||
---|---|---|
Class Variable | Levels | Values |
Group | 2 | less more |
Output 92.1.2 displays information for each stratum. Since the primary sampling units in this design are study groups, the population totals shown in Output 92.1.2 are the total numbers of study groups for each stratum or grade. This differs from Figure 92.3, which provides the population totals in terms of students since students were the primary sampling units for that design. Output 92.1.2 also displays the number of clusters for each stratum and analysis variable.
Output 92.1.2: Stratum Information
Stratum Information | ||||||||
---|---|---|---|---|---|---|---|---|
Stratum Index |
Grade | Population Total | Sampling Rate | N Obs | Variable | Level | N | Clusters |
1 | 7 | 608 | 1.32% | 20 | Spending | 20 | 8 | |
Group | less | 17 | 8 | |||||
more | 3 | 3 | ||||||
2 | 8 | 252 | 1.19% | 9 | Spending | 9 | 3 | |
Group | less | 0 | 0 | |||||
more | 9 | 3 | ||||||
3 | 9 | 403 | 1.24% | 11 | Spending | 11 | 5 | |
Group | less | 6 | 4 | |||||
more | 5 | 4 |
Output 92.1.3 displays the estimates of the average weekly ice cream expenditure and the percentage of students spending at least $10 weekly for ice cream.
Output 92.1.3: Statistics
Statistics | ||||||
---|---|---|---|---|---|---|
Variable | Level | N | Mean | Std Error of Mean | 95% CL for Mean | |
Spending | 40 | 8.923860 | 0.650859 | 7.51776370 | 10.3299565 | |
Group | less | 23 | 0.561437 | 0.056368 | 0.43966057 | 0.6832130 |
more | 17 | 0.438563 | 0.056368 | 0.31678698 | 0.5603394 |