When dealing with floating-point numbers, we often encounter a rounding error known as the double precision issue.
In this short tutorial, we’ll learn what causes such a problem, how it affects our code, and how to deal with it.
2. Floating-Point Numbers
Before we dive in, let’s briefly discuss how floating-point numbers work. Now, in the computer world, they’re represented using the IEEE 754 standard. It’s the standard that defines the way to transform real numbers into a binary format.
Floating-point numbers use binary representation, which can’t always precisely represent decimal numbers. As we know, Java provides two basic data types when dealing with floating-point numbers: float and double. Both types have finite precision, 32 bits for float and 64 bits for double type.
According to the standard, the representation of a double-precision data type consists of three parts:
- Sign bit – contains the sign of the number (1 bit)
- Exponent – controls the scale of the number (11 bits)
- Fraction (Mantissa) – contains the significant digits of the number (52 bits)
3. The Double Precision Issue
Now, to understand the double precision issue, let’s perform a simple addition of two decimal numbers:
double first = 0.1; double second = 0.2; double result = first + second;
Using basic math, we’d expect the 0.3 as the result. However, if we run the code, we see the actual result is different:
assertNotEquals(0.3, result); assertEquals(0.30000000000000004, result);
The issue behind this rounding error lies in the binary representation of the floating-point numbers.
Since we have a fixed number of bits, some decimal numbers, such as 0.1, can’t be accurately represented using a binary format.
As an example, let’s write the 0.1 value using the IEEE 754 standard. We can use tools such as Float Exposed, Float Toy, or IEEE 754 visualization to see what the binary format of the value looks like.
Here’s the number 0.1 converted from the decimal system to IEEE 754 binary:
0 - 01111111011 - 1001100110011001100110011001100110011001100110011001
Here, we see the “0011” sequence repeats in the Mantissa part of the value. Moreover, the same sequence is truncated at the end, indicating the number is represented as an infinite number in binary format.
Unfortunately, we can’t keep the infinite numbers in our code. Therefore, the number must be rounded to fit into its finite binary representation.
Consequently, when performing calculations, the computer doesn’t use the entire binary representation of a number. As a result, we see rounding errors during arithmetic computations.
It’s important to note not all floating-point numbers produce the rounding error. The values that don’t produce errors are the ones that have a finite binary representation.
4. Dealing With the Double Precision Problem
We can avoid the double precision issue by incorporating classes such as BigDecimal that offer higher precision and accuracy.
Now, let’s perform the same addition, but this time using BigDecimal instead of the double type:
BigDecimal first = BigDecimal.valueOf(0.1); BigDecimal second = BigDecimal.valueOf(0.2); BigDecimal result = first.add(second); assertEquals(BigDecimal.valueOf(0.3), result);
Here, as opposed to the previous example, we get the expected result of 0.3.
In this short article, we learned what the double precision issue is and how to deal with it.
To sum up, rounding error occurs due to the IEEE 754 standard used to represent floating-point numbers. We can use types like BigDecimal that offer high precision when dealing with this problem.