This example uses the Customers
data set from the section Getting Started: SURVEYSELECT Procedure. The data set Customers
contains an Internet service provider’s current subscribers, and the service provider wants to select a sample from this
population for a customer satisfaction survey. This example illustrates proportional allocation, which allocates the total
sample size among the strata in proportion to the strata sizes.
The section Getting Started: SURVEYSELECT Procedure gives an example of stratified sampling, where the list of customers is stratified by State
and Type
. Figure 102.4 displays the strata in a table of State
by Type
for the 13,471 customers. There are four states and two levels of Type
, forming a total of eight strata. A sample of 15 customers was selected from each stratum by using the following PROC SURVEYSELECT
statements:
title1 'Customer Satisfaction Survey'; title2 'Stratified Sampling'; proc surveyselect data=Customers method=srs n=15 seed=1953 out=SampleStrata; strata State Type; run;
The STRATA statement names the stratification variables State
and Type
. In the PROC SURVEYSELECT statement, the N= option specifies a sample size of 15 customers in each stratum.
Instead of specifying the number of customers to select from each stratum, you can specify the total sample size and request allocation of the total sample size among the strata. The following PROC SURVEYSELECT statements request proportional allocation, which allocates the total sample size in proportion to the stratum sizes:
title1 'Customer Satisfaction Survey'; title2 'Proportional Allocation'; proc surveyselect data=Customers n=1000 out=SampleSizes; strata State Type / alloc=prop nosample; run;
The STRATA statement names the stratification variables State
and Type
. In the STRATA statement, the ALLOC=PROP option requests proportional allocation. The NOSAMPLE option requests that no sample
be selected after the procedure computes the sample size allocation. In the PROC SURVEYSELECT statement, the N= option specifies
a total sample size of 1000 customers to be allocated among the strata.
Output 102.4.1 displays the output from PROC SURVEYSELECT, which summarizes the sample allocation. The total sample size of 1000 is allocated
among the eight strata by using proportional allocation. The allocated sample sizes are stored in the SAS data set SampleSizes
.
The following PROC PRINT statements display the allocation output data set SampleSizes
, which is shown in Output 102.4.2:
title1 'Customer Satisfaction Survey'; title2 'Proportional Allocation'; proc print data=SampleSizes; run;
Output 102.4.2: Stratum Sample Sizes
Customer Satisfaction Survey |
Proportional Allocation |
Obs | State | Type | Total | AllocProportion | SampleSize | ActualProportion |
---|---|---|---|---|---|---|
1 | AL | New | 1238 | 0.09190 | 92 | 0.092 |
2 | AL | Old | 706 | 0.05241 | 52 | 0.052 |
3 | FL | New | 2170 | 0.16109 | 161 | 0.161 |
4 | FL | Old | 1370 | 0.10170 | 102 | 0.102 |
5 | GA | New | 3488 | 0.25893 | 259 | 0.259 |
6 | GA | Old | 1940 | 0.14401 | 144 | 0.144 |
7 | SC | New | 1684 | 0.12501 | 125 | 0.125 |
8 | SC | Old | 875 | 0.06495 | 65 | 0.065 |
The output data set SampleSizes
includes one observation for each of the eight strata, which are identified by the stratification variables State
and Type
. The variable Total
contains the number of sampling units in the stratum, and the variable AllocProportion
contains the proportion of the total sample size to allocate to the stratum. The variable SampleSize
contains the allocated stratum sample size. For the first stratum (State
='AL' and Type
='New'), the total number of sampling units is 1238 customers, the allocation proportion is 0.09190, and the allocated sample
size is 92 customers. The sum of the allocated sample sizes equals the requested total sample size of 1000 customers.
The output data set also includes the variable ActualProportion
, which contains actual stratum proportions of the total sample size. The actual proportion for a stratum is the stratum sample
size divided by the total sample size. For the first stratum (State
='AL' and Type
='New'), the actual proportion is 0.092, while the allocation proportion is 0.09190. The target sample sizes computed from
the allocation proportions are often not integers, and PROC SURVEYSELECT uses a rounding algorithm to obtain integer sample
sizes and maintain the requested total sample size. Due to rounding and other restrictions, the actual proportions can differ
from the target allocation proportions. For more information, see the section Sample Size Allocation.
If you want to use the allocated sample sizes in a later invocation of PROC SURVEYSELECT, you can name the allocation data set in the N=SAS-data-set option, as shown in the following PROC SURVEYSELECT statements:
title1 'Customer Satisfaction Survey'; title2 'Stratified Sampling'; proc surveyselect data=Customers method=srs n=SampleSizes seed=1953 out=SampleStrata; strata State Type; run;