In a stratified sample, it is possible that some strata might have only one sampling unit. When this happens, PROC SURVEYREG collapses the strata that contain a single sampling unit into a pooled stratum. For more detailed information about stratum collapse, see the section Stratum Collapse.
Suppose that you have the following data:
data Sample; input Stratum X Y W; datalines; 10 0 0 5 10 1 1 5 11 1 1 10 11 1 2 10 12 3 3 16 33 4 4 45 14 6 7 50 12 3 4 16 ;
The variable Stratum
is again the stratification variable, the variable X
is the independent variable, and the variable Y
is the dependent variable. You want to regress Y
on X
. In the data set Sample
, both Stratum
=33 and Stratum
=14 contain one observation. By default, PROC SURVEYREG collapses these strata into one pooled stratum in the regression analysis.
To input the finite population correction information, you create the SAS data set StratumTotals
:
data StratumTotals; input Stratum _TOTAL_; datalines; 10 10 11 20 12 32 33 40 33 45 14 50 15 . 66 70 ;
The variable Stratum
is the stratification variable, and the variable _TOTAL_
contains the stratum totals. The data set StratumTotals
contains more strata than the data set Sample
. Also in the data set StratumTotals
, more than one observation contains the stratum totals for Stratum
=33:
33 40 33 45
PROC SURVEYREG allows this type of input. The procedure simply ignores strata that are not present in the data set Sample
; for the multiple entries of a stratum, the procedure uses the first observation. In this example, Stratum
=33 has the stratum total _TOTAL_
=40.
The following SAS statements perform the regression analysis:
title1 'Stratified Sample with Single Sampling Unit in Strata'; title2 'With Stratum Collapse'; proc surveyreg data=Sample total=StratumTotals; strata Stratum/list; model Y=X; weight W; run;
Output 101.6.1 shows that there are a total of five strata in the input data set and two strata are collapsed into a pooled stratum. The denominator degrees of freedom is 4, due to the collapse (see the section Denominator Degrees of Freedom).
Output 101.6.2 displays the stratification information, including stratum collapse. Under the column Collapsed, the fourth stratum (Stratum
=14) and the fifth (Stratum
=33) are marked as 'Yes,' which indicates that these two strata are collapsed into the pooled stratum (Stratum Index=0). The
sampling rate for the pooled stratum is 2% (see the section Sampling Rate of the Pooled Stratum from Collapse).
Output 101.6.3 displays the parameter estimates and the tests of the significance of the model effects.
Alternatively, if you prefer not to collapse strata with a single sampling unit, you can specify the NOCOLLAPSE option in the STRATA statement:
title1 'Stratified Sample with Single Sampling Unit in Strata'; title2 'Without Stratum Collapse'; proc surveyreg data=Sample total=StratumTotals; strata Stratum/list nocollapse; model Y = X; weight W; run;
Output 101.6.4 does not contain the stratum collapse information displayed in Output 101.6.1, and the denominator degrees of freedom are 3 instead of 4.
In Output 101.6.5, although the fourth stratum and the fifth stratum contain only one observation, no stratum collapse occurs.
As a result of not collapsing strata, the standard error estimates of the parameters, shown in Output 101.6.6, are different from those in Output 101.6.3, as are the tests of the significance of model effects.