Introduction to Survey Sampling and Analysis Procedures

Overview: Survey Sampling and Analysis Procedures

This chapter introduces the SAS/STAT procedures for survey sampling and describes how you can use these procedures to analyze survey data.

Researchers often use sample survey methodology to obtain information about a large population by selecting and measuring a sample from that population. Because of variability among items, researchers apply probability-based scientific designs to select the sample. This reduces the risk of a distorted view of the population and enables statistically valid inferences to be made from the sample. For more information about statistical sampling and analysis of complex survey data, see Lohr (2010); Kalton (1983); Cochran (1977); Kish (1965). To select probability-based random samples from a study population, you can use the SURVEYSELECT procedure, which provides a variety of methods for probability sampling. To analyze sample survey data, you can use the SURVEYMEANS, SURVEYFREQ, SURVEYREG, SURVEYLOGISTIC, and SURVEYPHREG procedures, which incorporate the sample design into the analyses.

Many SAS/STAT procedures, such as the MEANS, FREQ, GLM, LOGISTIC, and PHREG procedures, can compute sample means, produce crosstabulation tables, and estimate regression relationships. However, in most of these procedures, statistical inference is based on the assumption that the sample is drawn from an infinite population by simple random sampling. If the sample is in fact selected from a finite population by using a complex survey design, these procedures generally do not calculate the estimates and their variances according to the design actually used. Using analyses that are not appropriate for your sample design can lead to incorrect statistical inferences.

The SURVEYMEANS, SURVEYFREQ, SURVEYREG, SURVEYLOGISTIC, and SURVEYPHREG procedures properly analyze complex survey data by taking into account the sample design. These procedures can be used for multistage or single-stage designs, with or without stratification, and with or without unequal weighting. The survey analysis procedures provide a choice of variance estimation methods, which include Taylor series linearization, balanced repeated replication (BRR), and the jackknife.

Table 14.1 briefly describes the SAS/STAT sampling and analysis procedures.

Table 14.1: Survey Sampling and Analysis Procedures in SAS/STAT Software

PROC SURVEYSELECT
Selection Methods	Simple random sampling (without replacement)
	Unrestricted random sampling (with replacement)
	Systematic
	Sequential
	Bernoulli
	Poisson
	Probability proportional to size (PPS) sampling,
	$\quad$ with and without replacement
	PPS systematic
	PPS for two units per stratum
	PPS sequential with minimum replacement
Allocation Methods	Proportional
	Optimal
	Neyman
Sampling Tools	Stratified sampling
	Cluster sampling
	Replicated sampling
	Serpentine sorting
PROC SURVEYMEANS
Statistics	Means and totals
	Proportions
	Quantiles
	Geometric means
	Ratios
	Standard errors
	Confidence limits
Analyses	Hypothesis tests
	Domain analysis
	Poststratification
Graphics	Histograms
	Box plots
	Summary panel plots
	Domain box plots
PROC SURVEYFREQ
Tables	One-way frequency tables
	Two-way and multiway crosstabulation tables
	Estimates of population totals and proportions
	Standard errors
	Confidence limits
Analyses	Tests of goodness of fit
	Tests of independence
	Risks and risk differences
	Odds ratios and relative risks
	Kappa coefficients
Graphics	Weighted frequency and percent plots
	Mosaic plots
	Odds ratio, relative risk, and risk difference plots
	Kappa plots
PROC SURVEYREG
Analyses	Linear regression model fitting
	Regression coefficients
	Covariance matrices
	Confidence limits
	Hypothesis tests
	Estimable functions
	Contrasts
	Least squares means (LS-means) of effects
	Custom hypothesis tests among LS-means
	Regression with constructed effects
	Predicted values and residuals
	Domain analysis
Graphics	Fit plots
PROC SURVEYLOGISTIC
Analyses	Cumulative logit regression model fitting
	Logit, probit, and complementary log-log link functions
	Generalized logit regression model fitting
	Regression coefficients
	Covariance matrices
	Confidence limits
	Hypothesis tests
	Odds ratios
	Estimable functions
	Contrasts
	Least squares means (LS-means) of effects
	Custom hypothesis tests among LS-means
	Regression with constructed effects
	Model diagnostics
	Domain analysis
PROC SURVEYPHREG
Analyses	Proportional hazards regression model fitting
	Breslow and Efron likelihoods
	Regression coefficients
	Covariance matrices
	Confidence limits
	Hypothesis tests
	Hazard ratios
	Contrasts
	Predicted values and standard errors
	Martingale, Schoenfeld, score, and deviance residuals
	Domain analysis