<

I just announced the new Spring 5 modules in REST With Spring:

>> CHECK OUT THE COURSE

1. Overview

From the dawn of Java, all numerical data types are signed. In many situations, however, it’s required to use unsigned values. For example, if we count the number of occurrences of an event, we don’t want to encounter a negative value.

The support for unsigned arithmetic has finally been part of the JDK as of version 8. This support came in the form of the Unsigned Integer API, primarily containing static methods in the Integer and Long classes.

In this tutorial, we’ll go over this API and give instructions on how to use unsigned numbers correctly.

2. Bit-Level Representations

To understand how to handle signed and unsigned numbers, let’s take a look at their representation at the bit level first.

In Java, numbers are encoded using the two’s complement system. This encoding implements many basic arithmetic operations, including addition, subtraction, and multiplication, in the same way, whether the operands are signed or unsigned.

Things should be clearer with a code example. For the sake of simplicity, we’ll use variables of the byte primitive data type. Operations are similar for other integral numerical types, such as short, int, or long.

Assume we have some type byte with the value of 100. This number has the binary representation 0110_0100.

Let’s double this value:

byte b1 = 100;
byte b2 = (byte) (b1 << 1);

The left shift operator in the given code moves all the bits in variable b1 a position to the left, technically making its value twice as large. The binary representation of variable b2 will then be 1100_1000.

In an unsigned type system, this value represents a decimal number equivalent to 2^7 + 2^6 + 2^3, or 200. Nevertheless, in a signed system, the left-most bit works as the sign bit. Therefore, the result is -2^7 + 2^6 + 2^3, or -56.

A quick test can verify the outcome:

assertEquals(-56, b2);

We can see that the computations of signed and unsigned numbers are the same. Differences only appear when the JVM interprets a binary representation as a decimal number.

The addition, subtraction, and multiplication operations can work with unsigned numbers without requiring any changes in the JDK. Other operations, such as comparison or division, handle signed and unsigned numbers differently.

This is where the Unsigned Integer API comes into play.

3. The Unsigned Integer API

The Unsigned Integer API provides support for unsigned integer arithmetic in Java 8. Most members of this API are static methods in the Integer and Long classes.

Methods in these classes work similarly. We’ll thus focus on the Integer class only, leaving off the Long class for brevity.

3.1. Comparison

The Integer class defines a method named compareUnsigned to compare unsigned numbers. This method considers all binary values unsigned, ignoring the notion of the sign bit.

Let’s start with two numbers at the boundaries of the int data type:

int positive = Integer.MAX_VALUE;
int negative = Integer.MIN_VALUE;

If we compare these numbers as signed values, positive is obviously greater than negative:

int signedComparison = Integer.compare(positive, negative);
assertEquals(1, signedComparison);

When comparing numbers as unsigned values, the left-most bit is considered the most significant bit instead of the sign bit. Thus, the result is different, with positive being smaller than negative:

int unsignedComparison = Integer.compareUnsigned(positive, negative);
assertEquals(-1, unsignedComparison);

It should be clearer if we take a look at the binary representation of those numbers:

  • MAX_VALUE -> 0111_1111_…_1111
  • MIN_VALUE -> 1000_0000_…_0000

When the left-most bit is a regular value bit, MIN_VALUE is one unit larger than MAX_VALUE in the binary system. This test confirms that:

assertEquals(negative, positive + 1);

3.2. Division and Modulo

Just like the comparison operation, the unsigned division and modulo operations process all bits as value bits. The quotients and remainders are therefore different when we perform these operations on signed and unsigned numbers:

int positive = Integer.MAX_VALUE;
int negative = Integer.MIN_VALUE;

assertEquals(-1, negative / positive);
assertEquals(1, Integer.divideUnsigned(negative, positive));

assertEquals(-1, negative % positive);
assertEquals(1, Integer.remainderUnsigned(negative, positive));

3.3. Parsing

When parsing a String using the parseUnsignedInt method, the text argument can represent a number greater than MAX_VALUE.

A large value like that cannot be parsed with the parseInt method, which can only handle textual representation of numbers from MIN_VALUE to MAX_VALUE.

The following test case verifies the parsing results:

Throwable thrown = catchThrowable(() -> Integer.parseInt("2147483648"));
assertThat(thrown).isInstanceOf(NumberFormatException.class);

assertEquals(Integer.MAX_VALUE + 1, Integer.parseUnsignedInt("2147483648"));

Notice that the parseUnsignedInt method can parse a string indicating a number larger than MAX_VALUE, but will fail to parse any negative representation.

3.4. Formatting

Similar to parsing, when formatting a number, an unsigned operation regards all bits as value bits. Consequently, we can produce the textual representation of a number about twice as large as MAX_VALUE.

The following test case confirms the formatting result of MIN_VALUE in both cases — signed and unsigned:

String signedString = Integer.toString(Integer.MIN_VALUE);
assertEquals("-2147483648", signedString);

String unsignedString = Integer.toUnsignedString(Integer.MIN_VALUE);
assertEquals("2147483648", unsignedString);

4. Pros and Cons

Many developers, especially those coming from a language that supports unsigned data types, such as C, welcome the introduction of unsigned arithmetic operations. However, this isn’t necessarily a good thing.

There are two main reasons for the demand for unsigned numbers.

First, there are cases for which a negative value can never occur, and using an unsigned type can prevent such a value in the first place. Second, with an unsigned type, we can double the range of usable positive values compared to its signed counterpart.

Let’s analyze the rationale behind the appeal for unsigned numbers.

When a variable should always be non-negative, a value less than 0 may be handy in indicating an exceptional situation.

For instance, the String.indexOf method returns the position of the first occurrence of a certain character in a string. The index -1 can easily denote the absence of such a character.

The other reason for unsigned numbers is the expansion of the value space. However, if the range of a signed type isn’t enough, it’s unlikely that a doubled range would suffice.

In case a data type isn’t large enough, we need to use another data type that supports much larger values, such as using long instead of int, or BigInteger rather than long.

Another problem with the Unsigned Integer API is that the binary form of a number is the same regardless of whether it’s signed or unsigned. It’s therefore easy to mix signed and unsigned values, which may lead to unexpected results.

5. Conclusion

The support for unsigned arithmetic in Java has come at the request of many people. However, the benefits it brings in are unclear. We should exercise caution when using this new feature to avoid unexpected outcomes.

As always, the source code for this article is available over on GitHub.

I just announced the new Spring 5 modules in REST With Spring:

>> CHECK OUT THE LESSONS

1
Leave a Reply

avatar
1 Comment threads
0 Thread replies
0 Followers
 
Most reacted comment
Hottest comment thread
1 Comment authors
James Roper Recent comment authors
  Subscribe  
newest oldest most voted
Notify of
James Roper
Guest

I think one of the biggest benefits is working with binary protocols that contain unsigned integers. If you have to parse a 32 bit unsigned int, you can do it by reading a regular int (make sure to get the endianness right!), but then if you have to use it, eg if you have to do a comparison, what option do you have? You have to special case negatives. Having unsigned operations allows easy working with types prescribed by binary protocols and formats in their native type.