The SURVEYSELECT Procedure

Stratified Sampling

In this section, stratification is added to the sample design for the customer satisfaction survey. The sampling frame, which is the list of all customers, is stratified by State and Type. This divides the sampling frame into nonoverlapping subgroups formed from the values of the State and Type variables. Samples are then selected independently within the strata.

PROC SURVEYSELECT requires that the input data set be sorted by the STRATA variables. The following PROC SORT statements sort the Customers data set by the stratification variables State and Type:

proc sort data=Customers;
   by State Type;
run;

The following PROC FREQ statements display the crosstabulation of the Customers data set by State and Type:

title1 'Customer Satisfaction Survey';
title2 'Strata of Customers';
proc freq data=Customers;
   tables State*Type;
run;

Figure 102.4 presents the table of State by Type for the 13,471 customers. There are four states and two levels of Type, forming a total of eight strata.

Figure 102.4: Stratification of Customers by State and Type

Customer Satisfaction Survey
Strata of Customers

The FREQ Procedure

Frequency
Percent
Row Pct
Col Pct
Table of State by Type
State Type
New Old Total
AL
1238
9.19
63.68
14.43
706
5.24
36.32
14.43
1944
14.43
 
 
FL
2170
16.11
61.30
25.29
1370
10.17
38.70
28.01
3540
26.28
 
 
GA
3488
25.89
64.26
40.65
1940
14.40
35.74
39.66
5428
40.29
 
 
SC
1684
12.50
65.81
19.63
875
6.50
34.19
17.89
2559
19.00
 
 
Total
8580
63.69
4891
36.31
13471
100.00



The following PROC SURVEYSELECT statements select a probability sample of customers from the Customers data set according to the stratified sample design:

title1 'Customer Satisfaction Survey';
title2 'Stratified Sampling';
proc surveyselect data=Customers method=srs n=15
                  seed=1953 out=SampleStrata;
   strata State Type;
run;

The STRATA statement names the stratification variables State and Type. In the PROC SURVEYSELECT statement, the METHOD=SRS option specifies simple random sampling. The N= option specifies a sample size of 15 customers in each stratum. If you want to specify different sample sizes for different strata, you can use the N=SAS-data-set option to name a secondary data set that contains the stratum sample sizes. The SEED= option specifies 1953 as the initial seed for random number generation.

Figure 102.5 displays the output from PROC SURVEYSELECT, which summarizes the sample selection. A total of 120 customers are selected.

Figure 102.5: Sample Selection Summary

Customer Satisfaction Survey
Stratified Sampling

The SURVEYSELECT Procedure

Selection Method Simple Random Sampling
Strata Variables State
  Type

Input Data Set CUSTOMERS
Random Number Seed 1953
Stratum Sample Size 15
Number of Strata 8
Total Sample Size 120
Output Data Set SAMPLESTRATA



The following PROC PRINT statements display the first 30 observations of the output data set SampleStrata:

title1 'Customer Satisfaction Survey';
title2 'Sample Selected by Stratified Design';
title3 '(First 30 Observations)';
proc print data=SampleStrata(obs=30);
run;

Figure 102.6 displays the first 30 observations of the output data set SampleStrata, which contains the sample of 120 customers, 15 customers from each of the eight strata. The variable SelectionProb contains the selection probability for each customer in the sample. Because customers are selected with equal probability within strata in this design, the selection probability equals the stratum sample size (15) divided by the stratum population size. The selection probabilities differ from stratum to stratum because the stratum population sizes differ. The selection probability for each customer in the first stratum (State='AL' and Type='New') is 0.012116, and the selection probability for customers in the second stratum is 0.021246. The variable SamplingWeight contains the sampling weights, which are computed as inverse selection probabilities.

Figure 102.6: Customer Sample (First 30 Observations)

Customer Satisfaction Survey
Sample Selected by Stratified Design
(First 30 Observations)

Obs State Type CustomerID Usage SelectionProb SamplingWeight
1 AL New 015-57-9903 26 0.012116 82.5333
2 AL New 052-18-5029 576 0.012116 82.5333
3 AL New 064-72-0145 88 0.012116 82.5333
4 AL New 291-22-2497 1221 0.012116 82.5333
5 AL New 305-62-6833 187 0.012116 82.5333
6 AL New 309-63-9722 534 0.012116 82.5333
7 AL New 413-76-0209 435 0.012116 82.5333
8 AL New 492-18-7867 70 0.012116 82.5333
9 AL New 508-16-8324 189 0.012116 82.5333
10 AL New 561-82-0366 392 0.012116 82.5333
11 AL New 685-24-1718 74 0.012116 82.5333
12 AL New 800-20-2155 21 0.012116 82.5333
13 AL New 857-94-2672 77 0.012116 82.5333
14 AL New 918-29-9618 540 0.012116 82.5333
15 AL New 963-93-4916 33 0.012116 82.5333
16 AL Old 000-88-0484 401 0.021246 47.0667
17 AL Old 005-80-0241 114 0.021246 47.0667
18 AL Old 171-99-9085 210 0.021246 47.0667
19 AL Old 182-45-1938 160 0.021246 47.0667
20 AL Old 208-99-1105 60 0.021246 47.0667
21 AL Old 229-48-6213 1169 0.021246 47.0667
22 AL Old 265-55-4763 1370 0.021246 47.0667
23 AL Old 467-73-7465 14 0.021246 47.0667
24 AL Old 509-38-7128 173 0.021246 47.0667
25 AL Old 601-71-3629 142 0.021246 47.0667
26 AL Old 603-40-7787 302 0.021246 47.0667
27 AL Old 702-39-0977 270 0.021246 47.0667
28 AL Old 861-79-5340 101 0.021246 47.0667
29 AL Old 908-20-0603 340 0.021246 47.0667
30 AL Old 937-69-9106 182 0.021246 47.0667