In this example, an Internet service provider conducts a customer satisfaction survey. The survey population consists of the company’s current subscribers. The company plans to select a sample of customers from this population, interview the selected customers, and then make inferences about the entire survey population from the sample data.
The SAS data set Customers
contains the sampling frame, which is the list of units in the survey population. The sample of customers will be selected
from this sampling frame. The data set Customers
is constructed from the company’s customer database. It contains one observation for each customer, with a total of 13,471
observations.
The following PROC PRINT statements display the first 10 observations of the data set Customers
and produce Figure 102.1:
title1 'Customer Satisfaction Survey'; title2 'First 10 Observations'; proc print data=Customers(obs=10); run;
Figure 102.1: Customers
Data Set (First 10 Observations)
Customer Satisfaction Survey |
First 10 Observations |
Obs | CustomerID | State | Type | Usage |
---|---|---|---|---|
1 | 416-87-4322 | AL | New | 839 |
2 | 288-13-9763 | GA | Old | 224 |
3 | 339-00-8654 | GA | Old | 2451 |
4 | 118-98-0542 | GA | New | 349 |
5 | 421-67-0342 | FL | New | 562 |
6 | 623-18-9201 | SC | New | 68 |
7 | 324-55-0324 | FL | Old | 137 |
8 | 832-90-2397 | AL | Old | 1563 |
9 | 586-45-0178 | GA | New | 615 |
10 | 801-24-5317 | SC | New | 728 |
In the SAS data set Customers
, the variable CustomerID
uniquely identifies each customer. The variable State
contains the state of the customer’s address. The company has customers in four states: Georgia (GA), Alabama (AL), Florida
(FL), and South Carolina (SC). The variable Type
equals 'Old' if the customer has subscribed to the service for more than one year; otherwise, the variable Type
equals 'New'. The variable Usage
contains the customer’s average monthly service usage, in minutes.
The following sections illustrate the use of PROC SURVEYSELECT for probability sampling with three different designs for the customer satisfaction survey. All three designs are one-stage, with customers as the sampling units. The first design is simple random sampling without stratification. In the second design, customers are stratified by state and type, and the sample is selected by simple random sampling within strata. In the third design, customers are sorted within strata by usage, and the sample is selected by systematic random sampling within strata.