BOXCHART Statement: ANOM Procedure

Creating ANOM Boxcharts from Group Summary Data

Note: See Creating BOXCHARTS from Group Summary Data in the SAS/QC Sample Library.

The previous example illustrates how you can create ANOM charts for means using measurement data. However, in many applications, the data are provided as group summary statistics. This example illustrates how you can use the BOXCHART statement with data of this type.

The following data set (Labels) provides the data from the preceding example in summarized form:

data Labels;
   input Position DeviationL Deviation1 DeviationX
         DeviationM Deviation3 DeviationH DeviationS;
   DeviationN = 10;
   datalines;
1  -0.0647  -0.0362  -0.02234  -0.02620  -0.0016  0.0094  0.02281
2  -0.0332  -0.0201   0.01625   0.02045   0.0438  0.0564  0.03347
3  -0.0440  -0.0139   0.00604   0.00570   0.0285  0.0486  0.02885
4   0.0362   0.0530   0.06473   0.06030   0.0755  0.1073  0.02150
5  -0.0464  -0.0074   0.00813   0.00760   0.0302  0.0374  0.02593
6  -0.0384  -0.0285  -0.01283  -0.00950   0.0017  0.0071  0.01599
;

A listing of Labels is shown in FigureĀ 4.4. There is exactly one observation for each group (note that the groups are still indexed by Position). There are eight summary variables in Labels.

  • DeviationL contains the group minimums (low values).

  • Deviation1 contains the 25th percentile (first quartile) of each group.

  • DeviationX contains the group means.

  • DeviationM contains the group medians.

  • Deviation3 contains the 75th percentile (third quartile) of each group.

  • DeviationH contains the group maximums (high values).

  • DeviationS contains the group standard deviations.

  • DeviationN contains the group sample sizes (these are all 10 in this case).

Figure 4.4: The Summary Data Set Labels

The Data Set Labels

Position DeviationL Deviation1 DeviationX DeviationM Deviation3 DeviationH DeviationS DeviationN
1 -0.0647 -0.0362 -0.02234 -0.02620 -0.0016 0.0094 0.02281 10
2 -0.0332 -0.0201 0.01625 0.02045 0.0438 0.0564 0.03347 10
3 -0.0440 -0.0139 0.00604 0.00570 0.0285 0.0486 0.02885 10
4 0.0362 0.0530 0.06473 0.06030 0.0755 0.1073 0.02150 10
5 -0.0464 -0.0074 0.00813 0.00760 0.0302 0.0374 0.02593 10
6 -0.0384 -0.0285 -0.01283 -0.00950 0.0017 0.0071 0.01599 10



You can read this data set by specifying it as a SUMMARY= data set in the PROC ANOM statement, as follows:

ods graphics on;
title 'Analysis of Label Deviations';
proc anom summary=Labels;
   boxchart Deviation*Position / odstitle=title1;
run;

The resulting ANOM boxchart is shown in FigureĀ 4.5.

Note that Deviation is not the name of a SAS variable in the data set but is, instead, the common prefix for the names of the eight summary variables. The suffix characters L, 1, X, M, 3, H, S, and N indicate the contents of the variable. For example, the suffix characters 1 and 3 indicate first and third quartiles. Thus, you can specify three group summary variables in a SUMMARY= data set with a single name (Deviation), which is referred to as the response. The name Position specified after the asterisk is the name of the group-variable.

Figure 4.5: ANOM Chart for Means in Data Set Labels

ANOM Chart for Means in Data Set Labels


In general, a SUMMARY= input data set used with the BOXCHART statement must contain the following variables:

  • group variable

  • group minimum variable

  • group first quartile variable

  • group mean variable

  • group median variable

  • group third quartile variable

  • group maximum variable

  • group standard deviation variable

  • group sample size variable

Furthermore, the names of the summary variables must begin with the response name specified in the BOXCHART statement and end with the appropriate suffix characters. If the names do not follow this convention, you can use the RENAME option in the PROC ANOM statement to rename the variables for the duration of the ANOM procedure step. If a label is associated with the group mean variable, it is used to label the vertical axis.

In summary, the interpretation of response depends on the input data set.

  • If raw data are read using the DATA= option (as in the previous example), response is the name of the SAS variable containing the response measurements.

  • If summary data are read using the SUMMARY= option (as in this example), response is the common prefix for the names of the variables containing the summary statistics.

For more information, see SUMMARY= Data Set.