The KDE Procedure

BIVAR Statement

The BIVAR statement computes bivariate kernel density estimates. Table 52.1 summarizes the options available in the BIVAR statement.

Table 52.1: BIVAR Statement Options

Option

Description

BIVSTATS

Produces a table for each density estimate

BWM=

Specifies the bandwidth multiplier

GRIDL=

Specifies the lower grid limit

GRIDU=

Specifies the upper grid limit

LEVELS

Requests a table of levels for contours of the bivariate density

NGRID=

Specifies the number of grid points associated with each variable

NOPRINT

Suppresses output tables

OUT=

Specifies the name of the output data set

PERCENTILES

Requests that a table of percentiles be computed

PLOTS=

Requests one or more plots

UNISTATS

Produces a table for each density estimate containing standard univariate statistics and the bandwidths


The basic syntax for the BIVAR statement specifies two variables:

BIVAR v1 <(v-options)> v2 <(v-options)> </ options> ;

This statement requests a bivariate kernel density estimate for the variables v1 and v2. The v-options optionally specified in parentheses after a variable name apply only to that variable, and override corresponding global options specified following a slash (/).

You can specify a list of more than two variables:

BIVAR v1 <(v-options)> v2 <(v-options)> …vN <(v-options)> </ options> ;

This statement requests a bivariate kernel density estimate for each distinct pair of variables in the list. For example, if you specify

bivar x y z;

then a bivariate kernel density estimate is computed for each of the variable pairs (x, y), (x, z), and (y, z).

Alternatively, you can specify an explicit list of variable pairs, with each pair enclosed in parentheses:

BIVAR (v1 v2)(v3 v4)(vN-1 vN)</ options> ;

(You can also specify v-options following a variable name appearing in an explicit pair, but they are omitted here for clarity.) This statement requests a bivariate kernel density estimate for each pair of variables. For example, if you specify

bivar (x y) (y z);

then bivariate kernel density estimates are computed for (x, y) and (y, z).

Note: The VAR statement supported by PROC KDE in SAS 8 and earlier releases is now obsolete. The VAR statement has been replaced by the UNIVAR and the BIVAR statements, which enable you to produce multiple kernel density estimates with a single invocation of the procedure.

You can specify the following options in the BIVAR statement. As noted, some options can be used as v-options.

BIVSTATS

produces a table for each density estimate containing the covariance and correlation between the two variables.

BWM=number

specifies the bandwidth multiplier applied to each variable in each kernel density estimate. The default value is 1. Larger multipliers produce a smoother estimate, and smaller ones produce a rougher estimate. To specify different bandwidth multipliers for different variables, specify BWM= as a v-option.

GRIDL=number

specifies the lower grid limit applied to each variable in each kernel density estimate. The default value for a given variable is the minimum observed value of that variable. To specify different lower grid limits for different variables, specify GRIDL= as a v-option.

GRIDU=number

specifies the upper grid limit applied to each variable in each kernel density estimate. The default value for a given variable is the maximum observed value of that variable. To specify different upper grid limits for different variables, specify GRIDU= as a v-option.

LEVELS
LEVELS=numlist

requests a table of levels for contours of the bivariate density. The contours are defined in such a way that the density has a constant level along each contour, and the volume enclosed by each contour corresponds to a specified percent. In other words, the contours correspond to slices or levels of the density surface taken along the density axis. You can specify the percents used to define the contours. The default values are 1, 5, 10, 50, 90, 95, 99, and 100. The Levels table also provides the minimum and maximum values for each contour along the directions of the two data variables.

NGRID=number
NG=number

specifies the number of grid points associated with each variable in each kernel density estimate. The default value is 60. To specify different numbers of grid points for different variables, specify NGRID= as a v-option.

NOPRINT

suppresses output tables produced by the BIVAR statement. You can use the NOPRINT option when you want to produce graphical output only.

OUT=SAS-data-set

specifies the name of the output data set in which kernel density estimates are saved. This output data set contains the following variables:

  • var1, whose value is the name of the first variable in a bivariate kernel density estimate

  • var2, whose value is the name of the second variable in a bivariate kernel density estimate

  • value1, with values corresponding to grid coordinates for the first variable

  • value2, with values corresponding to grid coordinates for the second variable

  • density, with values equal to kernel density estimates at the associated grid point

  • count, containing the number of original observations contained in the bin corresponding to a grid point

PERCENTILES
PERCENTILES=numlist

requests that a table of percentiles be computed for each BIVAR variable. You can specify a list of percentiles to be computed. The default percentiles are 0.5, 1, 2.5, 5, 10, 25, 50, 75, 90, 95, 97.5, 99, and 99.5.

PLOTS=plot-request<(options)> | ALL | NONE
PLOTS=(plot-request<(options)> <... plot-request <(options)>>)

requests one or more plots of the bivariate data and kernel density estimate. When you specify only one plot request, you can omit the parentheses around the plot request.

ODS Graphics must be enabled before plots can be requested. For example:

ods graphics on;

proc kde data=octane;
   bivar Rater Customer / plots=all;
run;

ods graphics off;

For more information about enabling and disabling ODS Graphics, see the section Enabling and Disabling ODS Graphics in Chapter 21: Statistical Graphics Using ODS.

By default, if ODS Graphics is enabled and you do not specify the PLOTS= option, then the BIVAR statement creates a contour plot. If you specify the PLOTS= option, you get only the requested plots.

The following plot-requests are available.

ALL

produces all bivariate plots.

CONTOUR

produces a contour plot of the bivariate density estimate.

CONTOURSCATTER

produces a contour plot of the bivariate density estimate overlaid with a scatter plot of the data.

HISTOGRAM <(view-options)>

produces a bivariate histogram of the data. The following view-options can be specified:

ROTATE=angle

rotates the histogram angle degrees, where –180 < angle < 180. By default, angle = 54.

TILT=angle

tilts the histogram angle degrees, where –180 < angle < 180. By default, angle = 20.

HISTSURFACE <(view-options)>

produces a bivariate histogram of the data overlaid with a surface plot of the bivariate kernel density estimate. The following view-options can be specified:

ROTATE=angle

rotates the histogram and kernel density surface angle degrees, where –180 < angle < 180. By default, angle = 54.

TILT=angle

tilts the histogram and kernel density surface angle degrees, where –180 < angle < 180. By default, angle = 20.

NONE

suppresses all plots, including the contour plot that is produced by default when ODS Graphics is enabled and the PLOTS= option is not specified.

SCATTER

produces a scatter plot of the data.

SURFACE <(view-options)>

produces a surface plot of the bivariate kernel density estimate. The following view-options can be specified:

ROTATE=angle

rotates the kernel density surface angle degrees, where –180 < angle < 180. By default, angle = 54.

TILT=angle

tilts the kernel density surface angle degrees, where –180 < angle < 180. By default, angle = 20.

UNISTATS

produces a table for each density estimate containing standard univariate statistics for each of the two variables and the bandwidths used to compute the kernel density estimate. The statistics listed are the mean, variance, standard deviation, range, and interquartile range.