The OUTBOX= data set saves group summary statistics and outlier values. The following variables can be saved:
the group variable
the variable _VAR_
, containing the analysis variable name
the variable _TYPE_
, identifying features of box-and-whiskers plots
the variable _VALUE_
, containing values of box-and-whiskers plot features
the variable _ID_
, containing labels for outliers
the variable _HTML_
, containing URLs associated with plot features
_ID_
is included in the OUTBOX= data set only if the keyword SCHEMATICID or SCHEMATICIDFAR is specified with the BOXSTYLE=
option. _HTML_
is present only if one or more of the HTML=
, OUTHIGHHTML=
, and OUTLOWHTML=
options are specified.
Each observation in an OUTBOX= data set records the value of a single feature of one group’s box-and-whiskers plot, such as
its mean. The _TYPE_
variable identifies the feature whose value is recorded in _VALUE_
. Table 28.8 lists valid _TYPE_
variable values.
Table 28.8: Valid _TYPE_ Values in an OUTBOX= Data Set
_TYPE_ |
Description |
---|---|
N |
group size |
MIN |
minimum group value |
Q1 |
group first quartile |
MEDIAN |
group median |
MEAN |
group mean |
Q3 |
group third quartile |
MAX |
group maximum value |
STDDEV |
group standard deviation |
LOW |
low outlier value |
HIGH |
high outlier value |
LOWHISKR |
low whisker value, if different from MIN |
HIWHISKR |
high whisker value, if different from MAX |
FARLOW |
low far outlier value |
FARHIGH |
high far outlier value |
Additionally, the following variables, if specified, are included:
block variables
symbol variable
BY variables
ID variables
The OUTHISTORY= data set saves group summary statistics. The following variables are saved:
the group variable
group minimum variables named by analysis-variable suffixed with L
group first-quartile variables named by analysis-variable suffixed with 1
group mean variables named by analysis-variable suffixed with X
group median variables named by analysis-variable suffixed with M
group third-quartile variables named by analysis-variable suffixed with 3
group maximum variables named by analysis-variable suffixed with H
group standard deviation variables named by analysis-variable suffixed with S
group size variables named by analysis-variable suffixed with N
If an analysis variable name has the maximum length of 32 characters, PROC BOXPLOT forms summary statistic names from its first 16 characters, its last 15 characters, and the appropriate suffix.
Group summary variables are created for each analysis variable specified in the PLOT statement. For example, consider the following statements:
proc boxplot data=Steel; plot (Width Diameter)*Lot / outhistory=Summary; run;
The data set Summary
contains variables named Lot
, WidthL
, Width1
, WidthM
, WidthX
, Width3
, WidthH
, WidthS
, WidthN
, DiameterL
, Diameter1
, DiameterM
, DiameterX
, Diameter3
, DiameterH
, DiameterS
, and DiameterN
.
Additionally, the following variables, if specified, are included:
BY variables
block variables
symbol variable
ID variables
Note that an OUTHISTORY= data set does not contain outlier values, and therefore cannot be used, in general, to save a schematic box plot. You can use an OUTBOX= data set to save a schematic box plot summary.