When your data contains null values or when internal
processing generates intermediate data sets that contain null values,
you might receive different result sets depending on whether the processing
is done by SAS or by the DBMS. Although in many cases this does not
present a problem, it is important to understand how these differences
occur.
Most relational database
systems have a special value called null, which means an absence of
information and is analogous to a SAS missing value.
SAS/ACCESS translates
SAS missing values to DBMS null values when creating DBMS tables from
within SAS. Conversely,
SAS/ACCESS translates DBMS null values to
SAS missing values when reading DBMS data into SAS.
However, there is an
important difference in the behavior of DBMS null values and SAS missing
values.
-
A DBMS null value is interpreted
as the absence of data, so you cannot sort a DBMS null value or evaluate
it with standard comparison operators.
-
A SAS missing value is interpreted
as its internal floating-point representation because SAS supports
28 missing values (where a period (.) is the most common missing
value). Because SAS supports multiple missing values, you can sort
a SAS missing value and evaluate it with standard comparison operators.
This means that SAS
and the DBMS interpret null values differently, which has significant
implications when
SAS/ACCESS passes queries to a DBMS for processing.
This can be an issue in these situations:
-
when filtering data (for example,
in a WHERE clause, a HAVING clause, or an outer join ON clause). SAS
interprets null values as missing; many DBMS exclude null values from
consideration. For example, if you have null values in a DBMS column
that is used in a WHERE clause, your results might differ depending
on whether the WHERE clause is processed in SAS or is passed to the
DBMS for processing. This is because the DBMS removes null values
from consideration in a WHERE clause, but SAS does not.
-
when using certain functions. For
example, if you use the MIN aggregate function on a DBMS column that
contains null values, the DBMS does not consider the null values,
but SAS interprets the null values as missing. This interpretation
affects the result.
-
when submitting outer joins where
internal processing generates nulls for intermediate result sets.
-
when sorting data. SAS sorts null
values low; most DBMSs sort null values high.
See Sorting DBMS Data for more information.
For example, create
a simple data set that consists of one observation and one variable.
libname myoralib oracle user=testuser password=testpass;
data myoralib.table;
x=.; /*create a missing value */
run;
Then print the data
set using a WHERE clause, which
SAS/ACCESS passes to the DBMS for
processing.
proc print data=myoralib.table;
where x<0;
run;
The log indicates that
no observations were selected by the WHERE clause, because Oracle
interprets the missing value as the absence of data, and does not
evaluate it with the less-than (<) comparison operator.
When there is the potential
for inconsistency, consider using one of these strategies.
-
Use the DIRECT_SQL= LIBNAME
option to control whether SAS or the DBMS handles processing.
-
-
Add the
is not
null expression to WHERE clauses and ON clauses
to ensure that you obtain the same result regardless of whether SAS
or the DBMS does the processing.
Use the NULLCHAR= data
set option to specify how the DBMS interprets missing SAS character
values when updating DBMS data or inserting rows into a DBMS table.
You can use the first
of these strategies to force SAS to process the data in this example.
libname myoralib oracle user=testuser password=testpass
direct_sql=nowhere; /* forces SAS to process WHERE clauses */
data myoralib.table;
x=.; /*create a missing value */
run;
You can then print the
data set using a WHERE clause
proc print data=myoralib.table;
where x<0;
run;
This time the log indicates
that one observation was read from the data set because SAS evaluates
the missing value as satisfying the less-than-zero condition in the
WHERE clause.