The BIVAR statement computes bivariate kernel density estimates. Table 48.1 summarizes the options available in the BIVAR statement.
Table 48.1: BIVAR Statement Options
Option
|
Description
|
BIVSTATS
|
Produces a table for each density estimate
|
BWM=
|
Specifies the bandwidth multiplier
|
GRIDL=
|
Specifies the lower grid limit
|
GRIDU=
|
Specifies the upper grid limit
|
LEVELS
|
Requests a table of levels for contours of the bivariate density
|
NGRID=
|
Specifies the number of grid points associated with each variable
|
NOPRINT
|
Suppresses output tables
|
OUT=
|
Specifies the name of the output data set
|
PERCENTILES
|
Requests that a table of percentiles be computed
|
PLOTS=
|
Requests one or more plots
|
UNISTATS
|
Produces a table for each density estimate containing standard univariate statistics and the bandwidths
|
The basic syntax for the BIVAR statement specifies two variables:
BIVAR
v1 <(v-options)> v2 <(v-options)> </ options> ;
This statement requests a bivariate kernel density estimate for the variables v1
and v2
. The v-options optionally specified in parentheses after a variable name apply only to that variable, and override corresponding global
options specified following a slash (/).
You can specify a list of more than two variables:
BIVAR
v1 <(v-options)> v2 <(v-options)> …vN <(v-options)> </ options> ;
This statement requests a bivariate kernel density estimate for each distinct pair of variables in the list. For example,
if you specify
bivar x y z;
then a bivariate kernel density estimate is computed for each of the variable pairs (x
, y
), (x
, z
), and (y
, z
).
Alternatively, you can specify an explicit list of variable pairs, with each pair enclosed in parentheses:
BIVAR
(v1 v2)(v3 v4)…(vN-1 vN)</ options> ;
(You can also specify v-options following a variable name appearing in an explicit pair, but they are omitted here for clarity.) This statement requests
a bivariate kernel density estimate for each pair of variables. For example, if you specify
bivar (x y) (y z);
then bivariate kernel density estimates are computed for (x
, y
) and (y
, z
).
Note: The VAR statement supported by PROC KDE in SAS 8 and earlier releases is now obsolete. The VAR statement has been replaced
by the UNIVAR and the BIVAR statements, which enable you to produce multiple kernel density estimates with a single invocation
of the procedure.
You can specify the following options in the BIVAR statement. As noted, some options can be used as v-options.
-
BIVSTATS
-
produces a table for each density estimate containing the covariance and correlation between the two variables.
-
BWM=number
-
specifies the bandwidth multiplier applied to each variable in each kernel density estimate. The default value is 1. Larger multipliers produce a smoother estimate, and smaller ones produce a rougher
estimate. To specify different bandwidth multipliers for different variables, specify BWM= as a v-option.
-
GRIDL=number
-
specifies the lower grid limit applied to each variable in each kernel density estimate. The default value for a given variable is the minimum observed value of that variable. To specify different
lower grid limits for different variables, specify GRIDL= as a v-option.
-
GRIDU=number
-
specifies the upper grid limit applied to each variable in each kernel density estimate. The default value for a given variable is the maximum observed value of that variable. To specify different
upper grid limits for different variables, specify GRIDU= as a v-option.
-
LEVELS
LEVELS=numlist
-
requests a table of levels for contours of the bivariate density. The contours are defined in such a way that the density has a constant level along each contour, and the volume enclosed by
each contour corresponds to a specified percent. In other words, the contours correspond to slices or levels of the density
surface taken along the density axis. You can specify the percents used to define the contours. The default values are 1,
5, 10, 50, 90, 95, 99, and 100. The “Levels” table also provides the minimum and maximum values for each contour along the directions of the two data variables.
-
NGRID=number
NG=number
-
specifies the number of grid points associated with each variable in each kernel density estimate. The default value is 60. To specify different numbers of grid points for different variables,
specify NGRID= as a v-option.
-
NOPRINT
-
suppresses output tables produced by the BIVAR statement. You can use the NOPRINT option when you want to produce graphical output only.
-
OUT=SAS-data-set
-
specifies the name of the output data set in which kernel density estimates are saved. This output data set contains the following variables:
-
var1
, whose value is the name of the first variable in a bivariate kernel density estimate
-
var2
, whose value is the name of the second variable in a bivariate kernel density estimate
-
value1
, with values corresponding to grid coordinates for the first variable
-
value2
, with values corresponding to grid coordinates for the second variable
-
density
, with values equal to kernel density estimates at the associated grid point
-
count
, containing the number of original observations contained in the bin corresponding to a grid point
-
PERCENTILES
PERCENTILES=numlist
-
requests that a table of percentiles be computed for each BIVAR variable. You can specify a list of percentiles to be computed.
The default percentiles are 0.5, 1, 2.5, 5, 10, 25, 50, 75, 90, 95, 97.5, 99, and 99.5.
-
PLOTS=plot-request<(options)> | ALL | NONE
PLOTS=(plot-request<(options)> <
plot-request<(options)>>)
-
requests one or more plots of the bivariate data and kernel density estimate. When you specify only one plot request, you can omit the parentheses around the plot request.
ODS Graphics must be enabled before plots can be requested. For example:
ods graphics on;
proc kde data=octane;
bivar Rater Customer / plots=all;
run;
ods graphics off;
For more information about enabling and disabling ODS Graphics, see the section Enabling and Disabling ODS Graphics in Chapter 21: Statistical Graphics Using ODS.
By default, if ODS Graphics is enabled and you do not specify the PLOTS= option, then the BIVAR statement creates a contour
plot. If you specify the PLOTS= option, you get only the requested plots.
The following plot-requests are available.
-
ALL
-
produces all bivariate plots.
-
CONTOUR
-
produces a contour plot of the bivariate density estimate.
-
CONTOURSCATTER
-
produces a contour plot of the bivariate density estimate overlaid with a scatter plot of the data.
-
HISTOGRAM <(view-options)>
-
produces a bivariate histogram of the data. The following view-options can be specified:
-
ROTATE=angle
-
rotates the histogram angle degrees, where –180 < angle < 180. By default, angle = 54.
-
TILT=angle
-
tilts the histogram angle degrees, where –180 < angle < 180. By default, angle = 20.
-
HISTSURFACE <(view-options)>
-
produces a bivariate histogram of the data overlaid with a surface plot of the bivariate kernel density estimate. The following
view-options can be specified:
-
ROTATE=angle
-
rotates the histogram and kernel density surface angle degrees, where –180 < angle < 180. By default, angle = 54.
-
TILT=angle
-
tilts the histogram and kernel density surface angle degrees, where –180 < angle < 180. By default, angle = 20.
-
NONE
-
suppresses all plots, including the contour plot that is produced by default when ODS Graphics is enabled and the PLOTS= option
is not specified.
-
SCATTER
-
produces a scatter plot of the data.
-
SURFACE <(view-options)>
-
produces a surface plot of the bivariate kernel density estimate. The following view-options can be specified:
-
ROTATE=angle
-
rotates the kernel density surface angle degrees, where –180 < angle < 180. By default, angle = 54.
-
TILT=angle
-
tilts the kernel density surface angle degrees, where –180 < angle < 180. By default, angle = 20.
-
UNISTATS
-
produces a table for each density estimate containing standard univariate statistics for each of the two variables and the bandwidths used to compute the kernel density estimate.
The statistics listed are the mean, variance, standard deviation, range, and interquartile range.