This example uses the Customers
data set from the section Getting Started: SURVEYSELECT Procedure. The data set Customers
contains an Internet service provider’s current subscribers, and the service provider wants to select a sample from this
population for a customer satisfaction survey.
This example illustrates replicated sampling, which selects multiple samples from the survey population according to the same design. You can use replicated sampling to provide a simple method of variance estimation, or to evaluate variable nonsampling errors such as interviewer differences. For information about replicated sampling, see Lohr (2010), Wolter (2007), Kish (1965), Kish (1987), and Kalton (1983).
This design includes four replicates, each with a sample size of 50 customers. The sampling frame is stratified by State
and sorted by Type
and Usage
within strata. Customers are selected by sequential random sampling with equal probability within strata. The following PROC
SURVEYSELECT statements select a probability sample of customers from the Customers
data set by using this design:
title1 'Customer Satisfaction Survey'; title2 'Replicated Sampling'; proc surveyselect data=Customers method=seq n=(8 12 20 10) reps=4 seed=40070 ranuni out=SampleRep; strata State; control Type Usage; run;
The STRATA statement names the stratification variable State
. The CONTROL statement names the control variables Type
and Usage
.
In the PROC SURVEYSELECT statement, the METHOD=SEQ option requests sequential random sampling. The REPS= option specifies
four replicates of this sample. The N=(8 12 20 10) option lists the stratum sample sizes for each replicate. The N= option
lists the stratum sample sizes in the same order as the strata appear in the Customers
data set, which has been sorted by State
. The sample size of eight customers corresponds to the first stratum, State
= 'AL'. The sample size 12 corresponds to the next stratum, State
= 'FL', and so on.
The SEED= option specifies 40070 as the initial seed for random number generation. The RANUNI option requests random number generation by the RANUNI generator, which PROC SURVEYSELECT uses in releases before SAS/STAT 12.1. (Beginning in SAS/STAT 12.1, PROC SURVEYSELECT uses the Mersenne-Twister random number generator by default.) You can specify the RANUNI option with the SEED= option to reproduce samples that PROC SURVEYSELECT selects in releases before SAS/STAT 12.1. To reproduce a sample by using the RANUNI and SEED= options, you must also specify the same input data set and sample selection parameters.
Output 102.1.1 displays the output from PROC SURVEYSELECT, which summarizes the sample selection. A total of 200 customers is selected in
four replicates. PROC SURVEYSELECT selects each replicate by using sequential random sampling within strata determined by
State
. The sampling frame Customers
is sorted by the control variables Type
and Usage
within strata, according to hierarchic serpentine sorting. The output data set SampleRep
contains the sample.
The following PROC PRINT statements display the selected customers for the first stratum, State
= 'AL', from the output data set SampleRep
:
title1 'Customer Satisfaction Survey'; title2 'Sample Selected by Replicated Design'; title3 '(First Stratum)'; proc print data=SampleRep; where State = 'AL'; run;
Output 102.1.2 displays the 32 sample customers of the first stratum (State
= 'AL') from the output data set SampleRep
, which includes the entire sample of 200 customers. The variable SelectionProb
contains the selection probability, and SamplingWeight
contains the sampling weight. Because customers are selected with equal probability within strata in this design, all customers
in the same stratum have the same selection probability. These selection probabilities and sampling weights apply to a single
replicate, and the variable Replicate
contains the sample replicate number.
Output 102.1.2: Customer Sample (First Stratum)
Customer Satisfaction Survey |
Sample Selected by Replicated Design |
(First Stratum) |
Obs | State | Replicate | CustomerID | Type | Usage | SelectionProb | SamplingWeight |
---|---|---|---|---|---|---|---|
1 | AL | 1 | 882-37-7496 | New | 572 | .004115226 | 243 |
2 | AL | 1 | 581-32-5534 | New | 863 | .004115226 | 243 |
3 | AL | 1 | 980-29-2898 | Old | 571 | .004115226 | 243 |
4 | AL | 1 | 172-56-4743 | Old | 128 | .004115226 | 243 |
5 | AL | 1 | 998-55-5227 | Old | 35 | .004115226 | 243 |
6 | AL | 1 | 625-44-3396 | New | 60 | .004115226 | 243 |
7 | AL | 1 | 627-48-2509 | New | 114 | .004115226 | 243 |
8 | AL | 1 | 257-66-6558 | New | 172 | .004115226 | 243 |
9 | AL | 2 | 622-83-1680 | New | 22 | .004115226 | 243 |
10 | AL | 2 | 343-57-1186 | New | 53 | .004115226 | 243 |
11 | AL | 2 | 976-05-3796 | New | 110 | .004115226 | 243 |
12 | AL | 2 | 859-74-0652 | New | 303 | .004115226 | 243 |
13 | AL | 2 | 476-48-1066 | New | 839 | .004115226 | 243 |
14 | AL | 2 | 109-27-8914 | Old | 2102 | .004115226 | 243 |
15 | AL | 2 | 743-25-0298 | Old | 376 | .004115226 | 243 |
16 | AL | 2 | 722-08-2215 | Old | 105 | .004115226 | 243 |
17 | AL | 3 | 668-57-7696 | New | 200 | .004115226 | 243 |
18 | AL | 3 | 300-72-0129 | New | 471 | .004115226 | 243 |
19 | AL | 3 | 073-60-0765 | New | 656 | .004115226 | 243 |
20 | AL | 3 | 526-87-0258 | Old | 672 | .004115226 | 243 |
21 | AL | 3 | 726-61-0387 | Old | 150 | .004115226 | 243 |
22 | AL | 3 | 632-29-9020 | Old | 51 | .004115226 | 243 |
23 | AL | 3 | 417-17-8378 | New | 56 | .004115226 | 243 |
24 | AL | 3 | 091-26-2366 | New | 93 | .004115226 | 243 |
25 | AL | 4 | 336-04-1288 | New | 419 | .004115226 | 243 |
26 | AL | 4 | 827-04-7407 | New | 650 | .004115226 | 243 |
27 | AL | 4 | 317-70-6496 | Old | 452 | .004115226 | 243 |
28 | AL | 4 | 002-38-4582 | Old | 206 | .004115226 | 243 |
29 | AL | 4 | 181-83-3990 | Old | 33 | .004115226 | 243 |
30 | AL | 4 | 675-34-7393 | New | 47 | .004115226 | 243 |
31 | AL | 4 | 228-07-6671 | New | 65 | .004115226 | 243 |
32 | AL | 4 | 298-46-2434 | New | 161 | .004115226 | 243 |