This example illustrates how you can use PROC NPAR1WAY to perform a one-way nonparametric analysis. The data from Halverson and Sherwood (1930) consist of weight gain measurements for five different levels of gossypol additive in animal feed. Gossypol is a substance contained in cottonseed shells, and these data were collected to study the effect of gossypol on animal nutrition.
The following DATA step statements create the SAS data set Gossypol
:
data Gossypol; input Dose n; do i=1 to n; input Gain @@; output; end; datalines; 0 16 228 229 218 216 224 208 235 229 233 219 224 220 232 200 208 232 .04 11 186 229 220 208 228 198 222 273 216 198 213 .07 12 179 193 183 180 143 204 114 188 178 134 208 196 .10 17 130 87 135 116 118 165 151 59 126 64 78 94 150 160 122 110 178 .13 11 154 130 130 118 118 104 112 134 98 100 104 ;
The data set Gossypol
contains the variable Dose
, which represents the amount of gossypol additive, and the variable Gain
, which represents the weight gain.
Researchers are interested in whether there is a difference in weight gain among animals receiving the different dose levels of gossypol. The following statements invoke the NPAR1WAY procedure to perform a nonparametric analysis of this problem:
proc npar1way data=Gossypol; class Dose; var Gain; run;
The variable Dose
is the CLASS variable, and the VAR statement specifies the variable Gain
is the response variable. The CLASS statement is required, and you must name only one CLASS variable. You can name one or
more analysis variables in the VAR statement. If you omit the VAR statement, PROC NPAR1WAY analyzes all numeric variables
in the data set except for the CLASS variable, the FREQ variable, and the BY variables.
When no analysis options are specified in the PROC NPAR1WAY statement, the ANOVA, WILCOXON, MEDIAN, VW, SAVAGE, and EDF options are invoked by default. The tables in the following figures show the results of these analyses.
The tables in Figure 65.1 are produced with the ANOVA option. For each level of the CLASS variable Dose
, PROC NPAR1WAY displays the number of observations and the mean of the analysis variable Gain
. PROC NPAR1WAY displays a standard analysis of variance on the raw data. This gives the same results as the GLM and ANOVA
procedures. The p-value for the F test is <0.0001, which indicates that Dose
accounts for a significant portion of the variability of the dependent variable Gain
.
Figure 65.1: Analysis of Variance
Analysis of Variance for Variable Gain Classified by Variable Dose |
||
---|---|---|
Dose | N | Mean |
0 | 16 | 222.187500 |
0.04 | 11 | 217.363636 |
0.07 | 12 | 175.000000 |
0.1 | 17 | 120.176471 |
0.13 | 11 | 118.363636 |
Source | DF | Sum of Squares | Mean Square | F Value | Pr > F |
---|---|---|---|---|---|
Among | 4 | 140082.986077 | 35020.74652 | 55.8143 | <.0001 |
Within | 62 | 38901.998997 | 627.45160 | ||
Average scores were used for ties. |
The WILCOXON option produces the output in Figure 65.2. PROC NPAR1WAY first provides a summary of the Wilcoxon scores for the analysis variable Gain
by class level. For each level of the CLASS variable Dose
, PROC NPAR1WAY displays the following information: number of observations, sum of the Wilcoxon scores, expected sum under
the null hypothesis of no difference among class levels, standard deviation under the null hypothesis, and mean score.
Next PROC NPAR1WAY displays the one-way ANOVA statistic, which for Wilcoxon scores is known as the Kruskal-Wallis test. The
statistic equals 52.6656, with four degrees of freedom, which is the number of class levels minus one. The p-value (probability of a larger statistic under the null hypothesis) is <0.0001. This leads to rejection of the null hypothesis
that there is no difference in location for Gain
among the levels of Dose
. This p-value is asymptotic, computed from the asymptotic chi-square distribution of the test statistic. For certain data sets it
might also be useful to compute the exact p-value—for example, for small data sets or for data sets that are sparse, skewed, or heavily tied. You can use the EXACT statement
to request exact p-values for any of the location or scale tests available in PROC NPAR1WAY.
Figure 65.2: Wilcoxon Score Analysis
Wilcoxon Scores (Rank Sums) for Variable Gain Classified by Variable Dose |
|||||
---|---|---|---|---|---|
Dose | N | Sum of Scores |
Expected Under H0 |
Std Dev Under H0 |
Mean Score |
0 | 16 | 890.50 | 544.0 | 67.978966 | 55.656250 |
0.04 | 11 | 555.00 | 374.0 | 59.063588 | 50.454545 |
0.07 | 12 | 395.50 | 408.0 | 61.136622 | 32.958333 |
0.1 | 17 | 275.50 | 578.0 | 69.380741 | 16.205882 |
0.13 | 11 | 161.50 | 374.0 | 59.063588 | 14.681818 |
Average scores were used for ties. |
Kruskal-Wallis Test | |
---|---|
Chi-Square | 52.6656 |
DF | 4 |
Pr > Chi-Square | <.0001 |
Figure 65.3 through Figure 65.5 display the analyses produced by the MEDIAN, VW, and SAVAGE options. For each score type, PROC NPAR1WAY provides a summary of scores and the one-way ANOVA statistic, as previously described for Wilcoxon scores. Other score types available in PROC NPAR1WAY are Siegel-Tukey, Ansari-Bradley, Klotz, and Mood, which can be used to test for scale differences. Conover scores can be used to test for differences in both location and scale. Additionally, you can specify the SCORES=DATA option, which uses the input data as scores. This option gives you the flexibility to construct any scores for your data with the DATA step and then analyze these scores with PROC NPAR1WAY.
Figure 65.3: Median Score Analysis
Median Scores (Number of Points Above Median) for Variable Gain Classified by Variable Dose |
|||||
---|---|---|---|---|---|
Dose | N | Sum of Scores |
Expected Under H0 |
Std Dev Under H0 |
Mean Score |
0 | 16 | 16.0 | 7.880597 | 1.757902 | 1.00 |
0.04 | 11 | 11.0 | 5.417910 | 1.527355 | 1.00 |
0.07 | 12 | 6.0 | 5.910448 | 1.580963 | 0.50 |
0.1 | 17 | 0.0 | 8.373134 | 1.794152 | 0.00 |
0.13 | 11 | 0.0 | 5.417910 | 1.527355 | 0.00 |
Average scores were used for ties. |
Median One-Way Analysis | |
---|---|
Chi-Square | 54.1765 |
DF | 4 |
Pr > Chi-Square | <.0001 |
Figure 65.4: Van der Waerden (Normal) Score Analysis
Van der Waerden Scores (Normal) for Variable Gain Classified by Variable Dose |
|||||
---|---|---|---|---|---|
Dose | N | Sum of Scores |
Expected Under H0 |
Std Dev Under H0 |
Mean Score |
0 | 16 | 16.116474 | 0.0 | 3.325957 | 1.007280 |
0.04 | 11 | 8.340899 | 0.0 | 2.889761 | 0.758264 |
0.07 | 12 | -0.576674 | 0.0 | 2.991186 | -0.048056 |
0.1 | 17 | -14.688921 | 0.0 | 3.394540 | -0.864054 |
0.13 | 11 | -9.191777 | 0.0 | 2.889761 | -0.835616 |
Average scores were used for ties. |
Van der Waerden One-Way Analysis | |
---|---|
Chi-Square | 47.2972 |
DF | 4 |
Pr > Chi-Square | <.0001 |
Figure 65.5: Savage Score Analysis
Savage Scores (Exponential) for Variable Gain Classified by Variable Dose |
|||||
---|---|---|---|---|---|
Dose | N | Sum of Scores |
Expected Under H0 |
Std Dev Under H0 |
Mean Score |
0 | 16 | 16.074391 | 0.0 | 3.385275 | 1.004649 |
0.04 | 11 | 7.693099 | 0.0 | 2.941300 | 0.699373 |
0.07 | 12 | -3.584958 | 0.0 | 3.044534 | -0.298746 |
0.1 | 17 | -11.979488 | 0.0 | 3.455082 | -0.704676 |
0.13 | 11 | -8.203044 | 0.0 | 2.941300 | -0.745731 |
Average scores were used for ties. |
Savage One-Way Analysis | |
---|---|
Chi-Square | 39.4908 |
DF | 4 |
Pr > Chi-Square | <.0001 |
The tables in Figure 65.6 display the empirical distribution function statistics, comparing the distribution of Gain
for the different levels of Dose
. These tables are produced by the EDF option, and they include Kolmogorov-Smirnov statistics and Cramér–von Mises statistics.
Figure 65.6: Empirical Distribution Function Analysis
Kolmogorov-Smirnov Test for Variable Gain Classified by Variable Dose |
|||
---|---|---|---|
Dose | N | EDF at Maximum |
Deviation from Mean at Maximum |
0 | 16 | 0.000000 | -1.910448 |
0.04 | 11 | 0.000000 | -1.584060 |
0.07 | 12 | 0.333333 | -0.499796 |
0.1 | 17 | 1.000000 | 2.153861 |
0.13 | 11 | 1.000000 | 1.732565 |
Total | 67 | 0.477612 | |
Maximum Deviation Occurred at Observation 36 | |||
Value of Gain at Maximum = 178.0 |
Kolmogorov-Smirnov Statistics (Asymptotic) |
|||
---|---|---|---|
KS | 0.457928 | KSa | 3.748300 |
Cramer-von Mises Test for Variable Gain Classified by Variable Dose |
||
---|---|---|
Dose | N | Summed Deviation from Mean |
0 | 16 | 2.165210 |
0.04 | 11 | 0.918280 |
0.07 | 12 | 0.348227 |
0.1 | 17 | 1.497542 |
0.13 | 11 | 1.335745 |
Cramer-von Mises Statistics (Asymptotic) |
|||
---|---|---|---|
CM | 0.093508 | CMa | 6.265003 |
PROC NPAR1WAY uses ODS Graphics to create graphs as part of its output. The following statements produce a box plot of Wilcoxon
scores for Gain
classified by Dose
. ODS Graphics must be enabled before producing graphs.
ods graphics on; proc npar1way data=Gossypol plots(only)=wilcoxonboxplot; class Dose; var Gain; run; ods graphics off;
Figure 65.7 displays the box plot of Wilcoxon scores. This graph corresponds to the Wilcoxon scores analysis shown in Figure 65.2. To remove the p-value from the box plot display, you can specify the NOSTATS plot option in parentheses following the WILCOXONBOXPLOT option.
Box plots are available for all PROC NPAR1WAY score types except median scores, which are displayed in a stacked bar chart. If ODS Graphics is enabled but you do not specify the PLOTS= option, then PROC NPAR1WAY produces all plots that are associated with the analyses that you request.
Figure 65.7: Box Plot of Wilcoxon Scores
In the preceding example, the CLASS variable Dose
has five levels, and the analyses examine possible differences among these five levels (samples). The following statements
invoke the NPAR1WAY procedure to perform a nonparametric analysis of the two lowest levels of Dose
:
proc npar1way data=Gossypol; where Dose <= .04; class Dose; var Gain; run;
The tables in the following figures show the results of this two-sample analysis. The tables in Figure 65.8 are produced by the ANOVA option.
Figure 65.8: Analysis of Variance for Two-Sample Data
Analysis of Variance for Variable Gain Classified by Variable Dose |
||
---|---|---|
Dose | N | Mean |
0 | 16 | 222.187500 |
0.04 | 11 | 217.363636 |
Source | DF | Sum of Squares | Mean Square | F Value | Pr > F |
---|---|---|---|---|---|
Among | 1 | 151.683712 | 151.683712 | 0.5587 | 0.4617 |
Within | 25 | 6786.982955 | 271.479318 | ||
Average scores were used for ties. |
Figure 65.9 displays the output produced by the WILCOXON option. PROC NPAR1WAY provides a summary of the Wilcoxon scores for the analysis
variable Gain
for each of the two class levels. Since there are only two levels, PROC NPAR1WAY displays the two-sample test, based on the
simple linear rank statistic with Wilcoxon scores. The normal approximation includes a continuity correction. To remove the
continuity correction, you can specify the CORRECT=NO option. PROC NPAR1WAY also gives a t approximation for the Wilcoxon two-sample test. Like the multisample analysis, PROC NPAR1WAY computes a one-way ANOVA statistic,
which for Wilcoxon scores is known as the Kruskal-Wallis test. All these p-values show no difference in Gain
for the two Dose
levels at the 0.05 level of significance.
Figure 65.10 through Figure 65.12 display the two-sample analyses produced by the MEDIAN, VW, and SAVAGE options.
Figure 65.9: Wilcoxon Two-Sample Analysis
Wilcoxon Scores (Rank Sums) for Variable Gain Classified by Variable Dose |
|||||
---|---|---|---|---|---|
Dose | N | Sum of Scores |
Expected Under H0 |
Std Dev Under H0 |
Mean Score |
0 | 16 | 253.50 | 224.0 | 20.221565 | 15.843750 |
0.04 | 11 | 124.50 | 154.0 | 20.221565 | 11.318182 |
Average scores were used for ties. |
Wilcoxon Two-Sample Test | |
---|---|
Statistic | 124.5000 |
Normal Approximation | |
Z | -1.4341 |
One-Sided Pr < Z | 0.0758 |
Two-Sided Pr > |Z| | 0.1515 |
t Approximation | |
One-Sided Pr < Z | 0.0817 |
Two-Sided Pr > |Z| | 0.1635 |
Z includes a continuity correction of 0.5. |
Kruskal-Wallis Test | |
---|---|
Chi-Square | 2.1282 |
DF | 1 |
Pr > Chi-Square | 0.1446 |
Figure 65.10: Median Two-Sample Analysis
Median Scores (Number of Points Above Median) for Variable Gain Classified by Variable Dose |
|||||
---|---|---|---|---|---|
Dose | N | Sum of Scores |
Expected Under H0 |
Std Dev Under H0 |
Mean Score |
0 | 16 | 9.0 | 7.703704 | 1.299995 | 0.562500 |
0.04 | 11 | 4.0 | 5.296296 | 1.299995 | 0.363636 |
Average scores were used for ties. |
Median Two-Sample Test | |
---|---|
Statistic | 4.0000 |
Z | -0.9972 |
One-Sided Pr < Z | 0.1593 |
Two-Sided Pr > |Z| | 0.3187 |
Median One-Way Analysis | |
---|---|
Chi-Square | 0.9943 |
DF | 1 |
Pr > Chi-Square | 0.3187 |
Figure 65.11: Van der Waerden (Normal) Two-Sample Analysis
Van der Waerden Scores (Normal) for Variable Gain Classified by Variable Dose |
|||||
---|---|---|---|---|---|
Dose | N | Sum of Scores |
Expected Under H0 |
Std Dev Under H0 |
Mean Score |
0 | 16 | 3.346520 | 0.0 | 2.320336 | 0.209157 |
0.04 | 11 | -3.346520 | 0.0 | 2.320336 | -0.304229 |
Average scores were used for ties. |
Van der Waerden Two-Sample Test | |
---|---|
Statistic | -3.3465 |
Z | -1.4423 |
One-Sided Pr < Z | 0.0746 |
Two-Sided Pr > |Z| | 0.1492 |
Van der Waerden One-Way Analysis | |
---|---|
Chi-Square | 2.0801 |
DF | 1 |
Pr > Chi-Square | 0.1492 |
Figure 65.12: Savage Two-Sample Analysis
Savage Scores (Exponential) for Variable Gain Classified by Variable Dose |
|||||
---|---|---|---|---|---|
Dose | N | Sum of Scores |
Expected Under H0 |
Std Dev Under H0 |
Mean Score |
0 | 16 | 1.834554 | 0.0 | 2.401839 | 0.114660 |
0.04 | 11 | -1.834554 | 0.0 | 2.401839 | -0.166778 |
Average scores were used for ties. |
Savage Two-Sample Test | |
---|---|
Statistic | -1.8346 |
Z | -0.7638 |
One-Sided Pr < Z | 0.2225 |
Two-Sided Pr > |Z| | 0.4450 |
Savage One-Way Analysis | |
---|---|
Chi-Square | 0.5834 |
DF | 1 |
Pr > Chi-Square | 0.4450 |
The tables in Figure 65.13 display the empirical distribution function statistics, comparing the distribution of Gain
for the two levels of Dose
. The p-value for the Kolmogorov-Smirnov two-sample test is 0.6199, which indicates no rejection of the null hypothesis that the
Gain
distributions are identical for the two levels of Dose
.
Figure 65.13: Two-Sample EDF Tests
Kolmogorov-Smirnov Test for Variable Gain Classified by Variable Dose |
|||
---|---|---|---|
Dose | N | EDF at Maximum |
Deviation from Mean at Maximum |
0 | 16 | 0.250000 | -0.481481 |
0.04 | 11 | 0.545455 | 0.580689 |
Total | 27 | 0.370370 | |
Maximum Deviation Occurred at Observation 4 | |||
Value of Gain at Maximum = 216.0 |
Kolmogorov-Smirnov Two-Sample Test (Asymptotic) |
|||
---|---|---|---|
KS | 0.145172 | D | 0.295455 |
KSa | 0.754337 | Pr > KSa | 0.6199 |
Cramer-von Mises Test for Variable Gain Classified by Variable Dose |
||
---|---|---|
Dose | N | Summed Deviation from Mean |
0 | 16 | 0.098638 |
0.04 | 11 | 0.143474 |
Cramer-von Mises Statistics (Asymptotic) |
|||
---|---|---|---|
CM | 0.008967 | CMa | 0.242112 |
Kuiper Test for Variable Gain Classified by Variable Dose |
||
---|---|---|
Dose | N | Deviation from Mean |
0 | 16 | 0.090909 |
0.04 | 11 | 0.295455 |
Kuiper Two-Sample Test (Asymptotic) | |||||
---|---|---|---|---|---|
K | 0.386364 | Ka | 0.986440 | Pr > Ka | 0.8383 |