You can select one of the following three types of graphics in PROC REG: ODS, traditional, and line printer. ODS Graphics is the preferred method of creating graphs, superseding the other two. This section describes the options that are available on the PROC REG, PAINT, and PLOT statements for traditional and line printer graphics.
When ODS Graphics is enabled, you can use the PLOTS= option in the PROC REG statement to create plots by using ODS Graphics. For more information about ODS Graphics options see the PLOTS= option in the section Syntax: REG Procedure.
If ODS Graphics is not enabled and you specify the LINEPRINTER option, line printer plots are produced; otherwise traditional graphics are produced.
Table 85.9 summarizes the options available in the PROC REG statement for line printer and traditional graphics.
Table 85.9: PROC REG Statement Traditional Graphics and Line Printer Options
Option |
Description |
---|---|
Specifies an annotation data set |
|
Specifies the graphics catalog in which graphics output is saved |
|
Creates printer plots |
The following options are used to produce line printer and traditional graphics:
The PAINT statement is used with line printer plots. See the PLOTS= option for information about using ODS graphics to create modern statistical graphics.
The PAINT statement selects observations to be painted or highlighted in a scatter plot on line printer output; the PAINT statement is ignored if the LINEPRINTER option is not specified in the PROC REG statement.
All observations that satisfy condition are painted using some specific symbol. The PAINT statement does not generate a scatter plot and must be followed by a PLOT statement, which does generate a scatter plot. Several PAINT statements can be used before a PLOT statement, and all prior PAINT statement requests are applied to all later PLOT statements.
The PAINT statement lists the observation numbers of the observations selected, the total number of observations selected, and the plotting symbol used to paint the points.
On a plot, paint symbols take precedence over all other symbols. If any position contains more than one painted point, the paint symbol for the observation plotted last is used.
The PAINT statement cannot be used when a TYPE=CORR, TYPE=COV, or TYPE=SSCP data set is used as the input data set for PROC REG. Also, the PAINT statement cannot be used for models with more than one dependent variable. Note that the syntax for the PAINT statement is the same as the syntax for the REWEIGHT statement.
Condition is used to select observations to be painted. The syntax of condition is
variable compare value
or
variable compare value logical variable compare value
where
is one of the following:
a variable name in the input data set
OBS., which is the observation number
keyword., where keyword is a keyword for a statistic requested in the OUTPUT statement
is an operator that compares variable to value. Compare can be any one of the following: <, <=, >, >=, =, ^=. The operators LT, LE, GT, GE, EQ, and NE, respectively, can be used instead of the preceding symbols. See the "Expressions" section in SAS Language Reference: Concepts for more information about comparison operators.
gives an unformatted value of variable. Observations are selected to be painted if they satisfy the condition created by variable compare value. Value can be a number or a character string. If value is a character string, it must be eight characters or less and must be enclosed in quotes. In addition, value is case-sensitive. In other words, the statements
paint name='henry';
and
paint name='Henry';
are not the same.
is one of two logical operators. Either AND or OR can be used. To specify AND, use AND or the symbol &. To specify OR, use OR or the symbol |.
Here are some examples of the variable compare value form:
paint name='Henry'; paint residual.>=20; paint obs.=99;
Here are some examples of the variable compare value logical variable compare value form:
paint name='Henry'|name='Mary'; paint residual.>=20 or residual.<=10; paint obs.>=11 and residual.<=20;
Instead of specifying condition, the ALLOBS option can be used to select all observations. This is most useful when you want to unpaint all observations. For example,
paint allobs / reset;
resets the symbols for all observations.
The following options can be used when either a condition is specified, the ALLOBS option is specified, or nothing is specified before the slash. If only an option is listed, the option applies to the observations selected in the previous PAINT statement, not to the observations selected by reapplying the condition from the previous PAINT statement. For example, in the statements
paint r.>0 / symbol='a'; reweight r.>0; refit; paint / symbol='b';
the second PAINT statement paints only those observations selected in the first PAINT statement. No additional observations are painted even if, after refitting the model, there are new observations that meet the condition in the first PAINT statement.
Note: Options are not available when either the UNDO or STATUS option is used.
You can specify the following options after a slash (/).
The PLOT statement is used with line printer and traditional graphics. See the PLOTS= option for information about using ODS graphics to create modern statistical graphics.
The PLOT statement in PROC REG displays scatter plots with yvariable on the vertical axis and xvariable on the horizontal axis. Line printer plots are generated if the LINEPRINTER option is specified in the PROC REG statement; otherwise, the traditional graphics are created. Points in line printer plots can be marked with symbols, while global graphics statements such as GOPTIONS and SYMBOL are used to enhance the traditional graphics. Note that the plots you request by using the PLOT statement are independent of the ODS graphical displays (see the section ODS Graphics) that are available in PROC REG.
As with most other interactive statements, the PLOT statement implicitly refits the model. For example, if a PLOT statement is preceded by a REWEIGHT statement, the model is recomputed, and the plot reflects the new model.
If there are multiple MODEL statements preceding a PLOT statement, then the PLOT statement refers to the latest MODEL statement.
The PLOT statement cannot be used when a TYPE=CORR, TYPE=COV, or TYPE=SSCP data set is used as input to PROC REG.
You can specify several PLOT statements for each MODEL statement, and you can specify more than one plot in each PLOT statement.
More than one yvariable*xvariable pair can be specified to request multiple plots. The yvariables and xvariables can be as follows:
any variables specified in the VAR or MODEL statement before the first RUN statement
keyword., where keyword is a regression diagnostic statistic available in the OUTPUT statement (see Table 85.10). For example,
plot predicted.*residual.;
generates one plot of the predicted values by the residuals for each dependent variable in the MODEL statement. These statistics can also be plotted against any of the variables in the VAR or MODEL statements.
the keyword OBS. (the observation number), which can be plotted against any of the preceding variables
the keyword NPP. or NQQ., which can be used with any of the preceding variables to construct normal P-P or Q-Q plots, respectively (see the section Construction of Q-Q and P-P Plots for more information)
keywords for model fit summary statistics available in the OUTEST= data set with _TYPE_
= PARMS (see Table 85.10). A SELECTION= method (other than NONE) must be requested in the MODEL
statement for these variables to be plotted. If one member of a yvariable*xvariable pair is from the OUTEST= data set, the other member must also be from the OUTEST= data set.
The OUTPUT statement and the OUTEST= option are not required when their keywords are specified in the PLOT statement.
The yvariable and xvariable specifications can be replaced by a set of variables and statistics enclosed in parentheses. When this occurs, all possible combinations of yvariable and xvariable are generated. For example, the following two statements are equivalent:
plot (y1 y2)*(x1 x2); plot y1*x1 y1*x2 y2*x1 y2*x2;
The statement
plot;
is equivalent to respecifying the most recent PLOT statement without any options. However, the line printer options COLLECT, HPLOTS=, SYMBOL=, and VPLOTS=, described in the section Line Printer Plots, apply across PLOT statements and remain in effect if they have been previously specified.
Options used for the traditional graphics are described in the following section; see Line Printer Plots for more information.
The display of traditional graphics is described in the following paragraphs, the options are summarized in Table 85.10 and described in the section "Dictionary of PLOT Statement Options" .
Several line printer statements and options are not supported for the traditional graphics. In particular the PAINT statement is disabled, as are the PLOT statement options CLEAR, COLLECT, HPLOTS=, NOCOLLECT, SYMBOL=, and VPLOTS=. To display more than one plot per page or to collect plots from multiple PLOT statements, use the PROC GREPLAY statement (see SAS/GRAPH: Reference). Also note that traditional graphics options are not recognized for line printer plots.
The fitted model equation and a label are displayed in the top margin of the plot; this display can be suppressed with the NOMODEL option. If the label is requested but cannot fit on one line, it is not displayed. The equation and label are displayed on one line when possible; if more lines are required, the label is displayed in the first line with the model equation in successive lines. If displaying the entire equation causes the plot to be unacceptably small, the equation is truncated. Table 85.11 lists options to control the display of the equation.
Four statistics are displayed by default in the right margin: the number of observations, R square, the adjusted R square, and the root mean square error. The display of these statistics can be suppressed with the NOSTAT option. You can specify other options to request the display of various statistics in the right margin; see Table 85.11.
A default reference line at zero is displayed if residuals are plotted. If the dependent variable is plotted against the independent variable in a simple linear regression model, the fitted regression line is displayed by default. Default reference lines can be suppressed with the NOLINE option; the lines are not displayed if the OVERLAY option is specified.
Specialized plots are requested with special options. For each coefficient, the RIDGEPLOT option plots the ridge estimates against the ridge values k; see the description of the RIDGEPLOT option in the section "Dictionary of PLOT Statement Options" for more details. The CONF option plots % confidence intervals for the mean while the PRED option plots % prediction intervals; see the description of these options in the section "Dictionary of PLOT Statement Options" for more details.
If a SELECTION= method is requested, the fitted model equation and the statistics displayed in the margin correspond to the selected model. For the ADJRSQ and CP methods, the selected model is treated as a submodel of the full model. If a CP.*NP. plot is requested, the CHOCKING= and CMALLOWS= options display model selection reference lines; see the descriptions of these options in the section "Dictionary of PLOT Statement Options" for more details.
The following table lists the keywords available as PLOT statement xvariables and yvariables. All keywords have a trailing dot; for example, "COOKD." requests Cook’s D statistic. Neither the OUTPUT statement nor the OUTEST= option needs to be specified.
Table 85.10: Keywords for PLOT Statement xvariables
Keyword |
Description |
---|---|
Diagnostic Statistics |
|
COOKD. |
Cook’s D influence statistics |
COVRATIO. |
standard influence of observation on covariance of betas |
DFFITS. |
standard influence of observation on predicted value |
H. |
leverage |
LCL. |
lower bound of % confidence interval for individual |
LCLM. |
lower bound of % confidence interval for the mean of |
PREDICTED. |
predicted values |
PRESS. |
residuals from refitting the model with current observation deleted |
RESIDUAL. | R. |
residuals |
RSTUDENT. |
studentized residuals with the current observation deleted |
STDI. |
standard error of the individual predicted value |
STDP. |
standard error of the mean predicted value |
STDR. |
standard error of the residual |
STUDENT. |
residuals divided by their standard errors |
UCL. |
upper bound of % confidence interval for individual |
UCLM. |
upper bound of % confidence interval for the mean of |
Other Keywords Used with Diagnostic Statistics |
|
NPP. |
normal probability-probability plot |
NQQ. |
normal quantile-quantile plot |
OBS. |
observation number (cannot plot against OUTEST= statistics) |
Model Fit Summary Statistics |
|
ADJRSQ. |
adjusted R-square |
AIC. |
Akaike’s information criterion |
BIC. |
Sawa’s Bayesian information criterion |
CP. |
Mallows’ statistic |
EDF. |
error degrees of freedom |
GMSEP. |
estimated MSE of prediction, assuming multivariate normality |
IN. |
number of regressors in the model not including the intercept |
JP. |
final prediction error |
MSE. |
mean squared error |
NP. |
number of parameters in the model (including the intercept) |
PC. |
Amemiya’s prediction criterion |
RMSE. |
root MSE |
RSQ. |
R-square |
SBC. |
SBC statistic |
SP. |
SP statistic |
SSE. |
error sum of squares |
Table 85.11 summarizes the options available in the PLOT statement. These options are available unless the LINEPRINTER option is specified in the PROC REG statement. For complete descriptions, see the section "Dictionary of PLOT Statement Options" .
Table 85.11: Traditional Graphics Options
Option |
Description |
---|---|
General Graphics Options |
|
Specifies the annotate data set |
|
Requests a reference line for model selection criteria |
|
Requests a reference line for the model selection criterion |
|
Requests plots of % confidence intervals for the mean |
|
Specifies a description for graphics catalog member |
|
Names the plot in the graphics catalog |
|
Overlays plots from the same model |
|
Requests plots of % prediction intervals for individual |
|
Requests the ridge trace for ridge regression |
|
Axis and Legend Options |
|
Specifies LEGEND statement to be used |
|
Suppresses display of the legend |
|
Specifies tick mark values for horizontal axis |
|
Specifies tick mark values for vertical axis |
|
Reference Line Options |
|
Specifies reference lines perpendicular to horizontal axis |
|
Specifies line style for HREF= lines |
|
Specifies line style for lines displayed by default |
|
Specifies line style for VREF= lines |
|
Suppresses display of any default reference line |
|
Specifies reference lines perpendicular to vertical axis |
|
Color Options |
|
Specifies color for axis line and tick marks |
|
Specifies color for frame |
|
Specifies color for HREF= lines |
|
Specifies color for lines displayed by default |
|
Specifies color for text |
|
Specifies color for VREF= lines |
|
Options for Displaying the Fitted Model Equation |
|
Specifies font of model equation and model label |
|
Specifies text height of model equation and model label |
|
Specifies model label |
|
Suppresses display of the fitted model and the label |
|
Options for Displaying Statistics in the Plot Margin |
|
Displays Akaike’s information criterion |
|
Displays Sawa’s Bayesian information criterion |
|
Displays Mallows’ statistic |
|
Displays the error degrees of freedom |
|
Displays the estimated MSE of prediction assuming |
|
Displays the number of regressors in the model not including |
|
Displays the statistic |
|
Displays the mean squared error |
|
Suppresses display of the default statistics: the number of |
|
Displays the number of parameters in the model including the |
|
Displays the PC statistic |
|
Displays the SBC statistic |
|
Displays the statistic |
|
Displays the error sum of squares |
|
Specifies font of text displayed in the margin |
|
Specifies height of text displayed in the margin |
The following entries describe the PLOT statement options in detail. Note that these options are available unless you specify the LINEPRINTER option in the PROC REG statement.
Line printer plots are requested with the LINEPRINTER option in the PROC REG statement. Points in line printer plots can be marked with symbols, which can be specified as a single character enclosed in quotes or the name of any variable in the input data set.
If a character variable is used for the symbol, the first (leftmost) nonblank character in the formatted value of the variable is used as the plotting symbol. If a character in quotes is specified, that character becomes the plotting symbol. If a character is used as the plotting symbol, and if there are different plotting symbols needed at the same point, the symbol '?' is used at that point.
If an unformatted numeric variable is used for the symbol, the symbols '1', '2', …, '9' are used for variable values 1, 2, …, 9. For noninteger values, only the integer portion is used as the plotting symbol. For values of 10 or greater, the symbol '*' is used. For negative values, a '?' is used. If a numeric variable is used, and if there is more than one plotting symbol needed at the same point, the sum of the variable values is used at that point. If the sum exceeds 9, the symbol '*' is used.
If a symbol is not specified, the number of replicates at the point is displayed. The symbol '*' is used if there are 10 or more replicates.
If the LINEPRINTER option is used, you can specify the following options in the PLOT statement after a slash (/):