In a stratified sample, it is possible that some strata might have only one sampling unit. When this happens, PROC SURVEYREG collapses the strata that contain a single sampling unit into a pooled stratum. For more detailed information about stratum collapse, see the section Stratum Collapse.
Suppose that you have the following data:
data Sample; input Stratum X Y W; datalines; 10 0 0 5 10 1 1 5 11 1 1 10 11 1 2 10 12 3 3 16 33 4 4 45 14 6 7 50 12 3 4 16 ;
The variable Stratum
is again the stratification variable, the variable X
is the independent variable, and the variable Y
is the dependent variable. You want to regress Y
on X
. In the data set Sample
, both Stratum
=33 and Stratum
=14 contain one observation. By default, PROC SURVEYREG collapses these strata into one pooled stratum in the regression analysis.
To input the finite population correction information, you create the SAS data set StratumTotals
:
data StratumTotals; input Stratum _TOTAL_; datalines; 10 10 11 20 12 32 33 40 33 45 14 50 15 . 66 70 ;
The variable Stratum
is the stratification variable, and the variable _TOTAL_
contains the stratum totals. The data set StratumTotals
contains more strata than the data set Sample
. Also in the data set StratumTotals
, more than one observation contains the stratum totals for Stratum
=33:
33 40 33 45
PROC SURVEYREG allows this type of input. The procedure simply ignores strata that are not present in the data set Sample
; for the multiple entries of a stratum, the procedure uses the first observation. In this example, Stratum
=33 has the stratum total _TOTAL_
=40.
The following SAS statements perform the regression analysis:
title1 'Stratified Sample with Single Sampling Unit in Strata'; title2 'With Stratum Collapse'; proc surveyreg data=Sample total=StratumTotals; strata Stratum/list; model Y=X; weight W; run;
Output 94.6.1 shows that there are a total of five strata in the input data set and two strata are collapsed into a pooled stratum. The denominator degrees of freedom is 4, due to the collapse (see the section Denominator Degrees of Freedom).
Output 94.6.1: Summary of Data and Regression
Stratified Sample with Single Sampling Unit in Strata |
With Stratum Collapse |
Data Summary | |
---|---|
Number of Observations | 8 |
Sum of Weights | 157.00000 |
Weighted Mean of Y | 4.31210 |
Weighted Sum of Y | 677.00000 |
Design Summary | |
---|---|
Number of Strata | 5 |
Number of Strata Collapsed | 2 |
Fit Statistics | |
---|---|
R-square | 0.9564 |
Root MSE | 0.5111 |
Denominator DF | 4 |
Output 94.6.2 displays the stratification information, including stratum collapse. Under the column Collapsed, the fourth stratum (Stratum
=14) and the fifth (Stratum
=33) are marked as 'Yes,' which indicates that these two strata are collapsed into the pooled stratum (Stratum Index=0). The
sampling rate for the pooled stratum is 2% (see the section Sampling Rate of the Pooled Stratum from Collapse).
Output 94.6.3 displays the parameter estimates and the tests of the significance of the model effects.
Output 94.6.2: Stratification Information
Stratum Information | |||||
---|---|---|---|---|---|
Stratum Index |
Collapsed | Stratum | N Obs | Population Total | Sampling Rate |
1 | 10 | 2 | 10 | 20.0% | |
2 | 11 | 2 | 20 | 10.0% | |
3 | 12 | 2 | 32 | 6.25% | |
4 | Yes | 14 | 1 | 50 | 2.00% |
5 | Yes | 33 | 1 | 40 | 2.50% |
0 | Pooled | 2 | 90 | 2.22% |
Note: | Strata with only one observation are collapsed into the stratum with Stratum Index "0". |
Output 94.6.3: Parameter Estimates and Effect Tests
Tests of Model Effects | |||
---|---|---|---|
Effect | Num DF | F Value | Pr > F |
Model | 1 | 173.01 | 0.0002 |
Intercept | 1 | 0.00 | 0.9961 |
X | 1 | 173.01 | 0.0002 |
Note: | The denominator degrees of freedom for the F tests is 4. |
Estimated Regression Coefficients | ||||
---|---|---|---|---|
Parameter | Estimate | Standard Error | t Value | Pr > |t| |
Intercept | 0.00179469 | 0.34306373 | 0.01 | 0.9961 |
X | 1.12598708 | 0.08560466 | 13.15 | 0.0002 |
Note: | The denominator degrees of freedom for the t tests is 4. |
Alternatively, if you prefer not to collapse strata with a single sampling unit, you can specify the NOCOLLAPSE option in the STRATA statement:
title1 'Stratified Sample with Single Sampling Unit in Strata'; title2 'Without Stratum Collapse'; proc surveyreg data=Sample total=StratumTotals; strata Stratum/list nocollapse; model Y = X; weight W; run;
Output 94.6.4 does not contain the stratum collapse information displayed in Output 94.6.1, and the denominator degrees of freedom are 3 instead of 4.
Output 94.6.4: Summary of Data and Regression
Stratified Sample with Single Sampling Unit in Strata |
Without Stratum Collapse |
Data Summary | |
---|---|
Number of Observations | 8 |
Sum of Weights | 157.00000 |
Weighted Mean of Y | 4.31210 |
Weighted Sum of Y | 677.00000 |
Design Summary | |
---|---|
Number of Strata | 5 |
Fit Statistics | |
---|---|
R-square | 0.9564 |
Root MSE | 0.5111 |
Denominator DF | 3 |
In Output 94.6.5, although the fourth stratum and the fifth stratum contain only one observation, no stratum collapse occurs.
Output 94.6.5: Stratification Information
Stratum Information | ||||
---|---|---|---|---|
Stratum Index |
Stratum | N Obs | Population Total | Sampling Rate |
1 | 10 | 2 | 10 | 20.0% |
2 | 11 | 2 | 20 | 10.0% |
3 | 12 | 2 | 32 | 6.25% |
4 | 14 | 1 | 50 | 2.00% |
5 | 33 | 1 | 40 | 2.50% |
As a result of not collapsing strata, the standard error estimates of the parameters, shown in Output 94.6.6, are different from those in Output 94.6.3, as are the tests of the significance of model effects.
Output 94.6.6: Parameter Estimates and Effect Tests
Tests of Model Effects | |||
---|---|---|---|
Effect | Num DF | F Value | Pr > F |
Model | 1 | 347.27 | 0.0003 |
Intercept | 1 | 0.00 | 0.9962 |
X | 1 | 347.27 | 0.0003 |
Note: | The denominator degrees of freedom for the F tests is 3. |
Estimated Regression Coefficients | ||||
---|---|---|---|---|
Parameter | Estimate | Standard Error | t Value | Pr > |t| |
Intercept | 0.00179469 | 0.34302581 | 0.01 | 0.9962 |
X | 1.12598708 | 0.06042241 | 18.64 | 0.0003 |
Note: | The denominator degrees of freedom for the t tests is 3. |