Choosing Your Degree of Numeric Precision

Factors That Can Cause Calculation Differences

Different factors affect numeric precision. This issue is common for many people, including SAS users. Though computers and software can help, you are limited in how precisely you can calculate, compare, and represent data. Therefore, only those people who generate and use data can determine the exact degree of precision that suits their enterprise needs.
As you decide the degree of precision that you want, you need to consider that these system factors can cause calculation differences:
  • hardware limitations
  • differences among operating systems
  • different software or different versions of the same software
  • different database management systems (DBMSs)
These factors can also cause differences:
  • the use of finite number sets to represent infinite real numbers
  • how numbers are stored, because storage sizes can vary
You also need to consider how conversions are performed—on, between, or across any of these system or calculation factors.

Examples of Problems That Result in Numeric Imprecision

Overview

Depending on the degree of precision that you want, calculating the value of r can result in a tiny residual in a floating-point unit. When you compare the value of r to 0.0, you might find that r≠0.0. The numbers are very close but not equal. This type of discrepancy in results can stem from problems in representing, rounding, displaying, and selectively extracting data.

Representing Data

Some numbers can be represented exactly, but others cannot. As shown in this example, the number 10.25, which terminates in binary, can be represented exactly.
data x;
   x=10.25;
   put x hex16.;
run;
The output from this DATA step is an exact number: 4024800000000000. However, the number 10.1 cannot be represented exactly, as this example shows.
data x;
   x=10.1;
   put x hex16.;
run;
The output from this DATA step is an inexact number: 4024333333333333.

Rounding Data

As this example shows, rounding errors can result from platform-specific differences. No solution exists for such situations.
data x;
   x=10.1;
   put x hex16.;
   y=100000;
   newx=(x+y)-y;
   put newx hex16.;
 run;
In Windows and Linux environments, the output from this DATA step is 4024333333333333 (8/10-byte hardware double). In the Solaris x64 environment, the output is 4024333333334000 (8/8-byte hardware double).

Displaying Data

For certain numbers such as x.5, the precision of displayed data depends on whether you round up or down. Low-precision formatting (rounding down) can produce different results on different platforms. In this example, the same high-precision (rounding up) result occurs for X=8.3, X=8.5, or X=hex16. However, a different result occurs for X=8.1 because this number does not yield the same level of precision.
data;
   x=input('C047DFFFFFFFFFFF', hex16.);
   put x= 8.1 x= 8.3 x= 8.5 x= hex16.;
run;
Here is the output under Windows or Linux (high-precision formatting).
x=-47.8
x=-47.750 x=-47.7500
x=C047DFFFFFFFFFFF
Here is the output under Solaris x64 (low-precision formatting).
x=-47.7
x=-47.750 x=-47.7500
x=C047DFFFFFFFFFFF
To fix the problem that this example illustrates, you must select a number that yields the next precision level—in this case, 8.2.

Selectively Extracting Data

Results can also vary when you access data that is stored on one system by using a client on a different system. This example illustrates running a DATA step from a Windows client to access SAS data in the z/OS environment.
data z(keep=x);
   x=5.2;
   output;
   y=1000;
   x=(x+y)-y;   /*almost 5.2 */
   output;
run;

proc print data=z;
run;
Here is the output this DATA step produces.
Obs   x
1    5.2
2    5.2
The next example illustrates the output that you receive when you execute the DATA step interactively under Windows or under z/OS.
data z1;
   set z(where=(x=5.2));
run;
Here is the corresponding z/OS output.
NOTE: There were 1 observations read from the data set WORK.Z.
WHERE x=5.2;
NOTE: The data set WORK.Z1 has 1 observations and 1 variables.
The DATA statement used 0.00 CPU seconds and 14476K.
In the above example, the expected count was not returned correctly under z/OS because the imperfection of the data and finite precision are not taken into account. You cannot use equality to obtain a correct count because it does not include the “almost 5.2” cases in that count. To obtain the correct results under z/OS, you must run this DATA step:
data z1;
   set z(where=(compfuzz(x,5.2,1e-10)=0));
run;
Here is the z/OS output from this DATA step.
NOTE: There were 2 observations read from the data set WORK.Z.
WHERE COMPFUZZ(x, 5.2, 1E-10)=0;
NOTE: The data set WORK.Z1 has 2 observations and 1 variables.

Your Options When Choosing the Degree of Precision That You Need

After you determine the degree of precision that your enterprise needs, you can refine your software. You can use macros, sensitivity analyses, or fuzzy comparisons such as extractions or filters to extract data from databases or from different versions of SAS.
If you are running SAS 9.2, use the COMPFUZZ (fuzzy comparison) function. Otherwise, use this macro.
/*****************************************************************************/
/* This macro defines an EQFUZZ operator.  The subsequent DATA step shows    */
/* how to use this operator to test for equality within a certain tolerance. */
/*****************************************************************************/
%macro eqfuzz(var1, var2, fuzz=1e-12);
abs((&var1 - &var2) / &var1) < &fuzz
%mend;

data _null_;
   x=0;
   y=1;
   do i=1 to 10;
      x+0.1;
   end;
   if x=y then put 'x exactly equal to y';
   else if %eqfuzz(x,y) then put 'x close to y';
   else put 'x nowhere close to y';
run;
When you read numbers in from an external DBMS that supports precision beyond 15 digits, you can lose that precision. You cannot do anything about this for existing databases. However, when you design new databases, you can set constraints to limit precision to about 15 digits or you can select a numeric DBMS data type to match the numeric SAS data type. For example, select the BINARY_DOUBLE type in Oracle (precise up to 15 digits) instead of the NUMBER type (precise up to 38 digits).
When you read numbers in from an external DBMS for noncomputational purposes, use the DBSASTYPE= data set option, as shown in this example.
libname ora oracle user=scott password=tiger path=path;
data sasdata;
   set ora.catalina2( dbsastype= ( c1='char(20)') ) ;
run;
This option retrieves numbers as character strings and preserves precision beyond 15 digits. For details, see the DBSASTYPE= data set option.

References

See these resources for more detail about numeric precision, including variables that can affect precision.
  • The Aggregate. 2008. "Numerical Precision, Accuracy, and Range," Aggregate.org: Unbridled Computing. Lexington, KY: University of Kentucky. Available at http://aggregate.org/NPAR. Accessed on July 30, 2012.
  • IEEE. 2008. “IEEE 754: Standard for Binary Floating-Point Arithmetic.” Available at http://grouper.ieee.org/groups/754/index.html. This standard defines 32-bit and 64-bit floating-point representations and computational results. Accessed on July 30, 2012.
  • SAS Institute Inc. 2007. TS-230. Dealing with Numeric Representation Error in SAS Applications. Cary, NC: SAS Institute Inc. Available at http://support.sas.com/techsup/technote/ts230.html. Accessed on July 30, 2012.
  • SAS Institute Inc. 2007. TS-654. Numeric Precision 101. Cary, NC: SAS Institute Inc. Available at http://support.sas.com/techsup/technote/ts654.pdf. This document is an overview of numeric precision and how it is represented in SAS applications. Accessed on July 30, 2012.