This section uses a data set that is obtained by stratified random sampling from a simulated finite population to illustrate some of the basic features of PROC SURVEYPHREG.
Suppose the library system for a small county wants to study the length of time that books are borrowed over a specified study
period, adjusting for the age of the borrower and accounting for the fact that some books are never returned. Suppose there
are 10 branch libraries in the county. Assume that a list of 11,617 (simulated) transactions is available for the study period
October 1, 2008, to December 31, 2008, and assume that this list can be used as the sampling frame. A stratified random sample
with replacement is used to select 100 transactions, where branch libraries are the strata. The total number of transactions
within branches range from 510 to 2,011 for the study period. The total sample size of 100 transactions is allocated proportionally
across branches based on the number of transactions. For each selected transaction, telephone interviews were conducted to
find out additional characteristics of the borrower. The data set LibrarySurvey
contains the following variables for all units (transactions) in the sample:
Branch
, the library branch from which the book was borrowed
SampleWeight
, the survey sampling weight for the transaction
CheckOut
, the date the book was borrowed
CheckIn
, the date the book was returned, with a missing value if the book was not returned by December 31, 2008
Age
, the age of the borrower
data LibrarySurvey; input Branch 2. SamplingWeight 7.2 CheckOut date10. CheckIn date10. Age; datalines; 1 103.60 08NOV2008 13NOV2008 18 1 103.60 01OCT2008 07OCT2008 30 1 103.60 05NOV2008 06NOV2008 73 1 103.60 25OCT2008 26OCT2008 53 1 103.60 09NOV2008 10NOV2008 55 2 127.50 10DEC2008 15DEC2008 39 2 127.50 19DEC2008 . 33 2 127.50 26NOV2008 27NOV2008 41 2 127.50 03NOV2008 07NOV2008 33 ... more lines ... 10 118.35 14NOV2008 17NOV2008 29 10 118.35 11DEC2008 13DEC2008 35 10 118.35 21NOV2008 23NOV2008 46 ;
data LibrarySurvey; set LibrarySurvey; Returned = (CheckIn ^= .); if (Returned) then lenBorrow = CheckIn - CheckOut; else lenBorrow = input('31Dec2008',date9.) - CheckOut; run;
PROC SURVEYPHREG can be used to estimate the regression parameters of a proportional hazards model and the design-based variance of the estimated coefficients. The design-based variance is useful when the finite population is considered fixed, as in this example. See Lohr (2010) and Särndal, Swensson, and Wretman (1992) for details.
The following statements request a proportional hazards regression of lenBorrow
on Age
with Returned
as the censor indicator. A transaction is considered to be censored if its check-in date is missing. The WEIGHT statement
specifies the sampling weight variable (SamplingWeight
), and the STRATA statement specifies the stratification variable (Branch
).
proc surveyphreg data = LibrarySurvey; weight SamplingWeight; strata Branch; model lenBorrow*Returned(0) = Age; run;
Summary information about the model, number of observations, survey design, censored values, and variance estimation method are shown in Figure 100.1. The "Model Information" table summarizes the model you fit. The "Number of Observations" table displays the number of observations read and used by the procedure. This table also displays the sum of weights read and used. The sum of weights read (11,616.79) can be used as an estimator of the population size, and the sum of weights used can be used as an estimator of the respondent size in the population. The "Design Summary" table displays survey design information such as stratification and clustering. This example implements a stratified design with 10 strata. The "Censored Summary" and "Weighted Censored Summary" tables display the (weighted) number of censored and event units. Weighted counts can be used as estimators of the corresponding finite population quantities. For example, Figure 100.1 shows that 10% of the sampled units are censored and an estimated 10.05% of the population units are censored.
Parameter estimates and their standard errors are shown in Figure 100.2. The estimated regression coefficient is highly significant with a value of 0.062, indicating a positive association between age and the length of time books are borrowed (recall that these are simulated data). In this example, the procedure uses the STRATA and WEIGHT statements to incorporate stratification and unequal weighting, respectively, into variance estimation. The degrees of freedom are calculated as the number of sampling units (100) minus the number of strata (10). Note that the estimated variance reported in Figure 100.2 ignores the finite population correction (fpc). You can use the TOTAL= or RATE= option in the PROC statement to include an fpc in your variance estimator.