1. Introduction
When working with Google’s Protocol Buffers (Protobuf) in Java, we inevitably encounter the need to handle binary data. This often leads to a choice between the standard byte[] and Protobuf’s custom ByteString class. While both represent a sequence of bytes, they have fundamental differences in their design and intended use.
In this article, we’ll explore the characteristics of both types, highlight their key differences with code examples, and provide guidance on when to use each for optimal performance and maintainability.
2. Defining Maven Dependencies
To start, we’ll need to include the protobuf-java dependency in our project:
<dependency>
<groupId>com.google.protobuf</groupId>
<artifactId>protobuf-java</artifactId>
<version>4.31.1</version>
</dependency>
This dependency provides access to the ByteString class and the necessary Protobuf APIs.
3. Understanding byte[]
The byte[] is a core Java data structure for representing a sequence of raw bytes. Its primary characteristic is mutability. This allows us to modify its elements directly after creation, which is essential for tasks like building a buffer to read data from a stream.
Let’s illustrate its mutable nature with a simple example test. We’ll define a byte array and then replace an element in it:
@Test
public void givenByteArray_whenModified_thenChangesPersist() {
// Here, we'll initialize a mutable buffer
byte[] data = new byte[4];
// We'll read data into the buffer
ByteArrayInputStream inputStream = new ByteArrayInputStream(new byte[] { 0x01, 0x02, 0x03, 0x04 });
try {
inputStream.read(data);
} catch (IOException e) {
e.printStackTrace();
}
// Note, the first byte is 1
assertEquals(1, data[0]);
// We can directly modify the first byte
data[0] = 0x05;
// The modification is persisted
assertEquals(5, data[0]);
}
As shown in the above test, a byte[] can be changed in-place, making it a flexible choice for scenarios where we need to manipulate the contents of a buffer.
4. Understanding ByteString
ByteString is a class provided by the Protobuf library for handling sequences of bytes. Unlike byte[], ByteString is immutable. Once created, its contents cannot be altered, which is similar to how the String class works in Java.
This immutability offers several advantages like thread safety because an immutable object is inherently safe to share across multiple threads without synchronization.
Also, increased efficiency because operations like substring() and concat() are highly optimized. Instead of copying all the data, these methods often create new ByteString objects that share a reference to the original data, which is far more efficient in terms of both memory and performance.
Let’s look at the immutability of ByteString:
@Test
public void givenByteString_whenCreated_thenIsImmutable() {
// We'll create an immutable ByteString from a mutable byte array
byte[] originalArray = new byte[] { 0x01, 0x02, 0x03, 0x04 };
ByteString byteString = ByteString.copyFrom(originalArray);
// The value of the first byte is 1
assertEquals(1, byteString.byteAt(0));
// We'll try to modify the original array
originalArray[0] = 0x05;
// The ByteString's contents remain unchanged
assertEquals(1, byteString.byteAt(0));
}
The test confirms that even if the source byte[] is modified, the ByteString remains unchanged. This behavior is key to its reliability within Protobuf.
5. Key Differences
The contrasting natures of byte[] and ByteString lead to key differences that influence our design decisions.
5.1. Mutability vs. Immutability
This is the most fundamental difference. byte[] is mutable, making it ideal for data that needs to be modified in place, such as in-memory buffers or during stream processing.
In contrast, ByteString is immutable, which ensures data integrity and thread-safety. This makes it the perfect choice for persistent or shared data, especially within the context of a message format.
For simple read/write operations, performance is similar. However, ByteString demonstrates its true efficiency during more complex operations like concatenation.
To concatenate two byte[] arrays, we must create a new, larger array and copy all the data, which can be an expensive operation. ByteString‘s concat() method is highly optimized, often creating a new instance that references both original objects without performing a full data copy, which significantly reduces memory allocations.
5.3. API and Protobuf Integration
byte[] has a minimal API, so most complex operations require custom logic. ByteString, on the other hand, offers a rich API tailored for binary data, including methods like startsWith(), substring(), and indexOf().
Most importantly, ByteString is the native type for the bytes fields within Protobuf messages. It ensures seamless and efficient serialization and deserialization. We can see this by looking at a simple Protobuf definition:
message UserData {
string name = 1;
bytes profile_image = 2;
}
The generated Java class will represent the profile_image field as a ByteString, not a byte[]. This integration is a core part of Protobuf’s design.
6. Conversion Between Types
When working with common scenarios, we’ll often need to convert between byte[] and ByteString when interoperating with standard Java APIs.
6.1. byte[] to ByteString
To convert a byte[] to a ByteString, we use the static ByteString.copyFrom() method. This operation creates a new ByteString and copies the data, ensuring the new instance’s immutability:
@Test
public void givenByteArray_whenCopiedToByteString_thenDataIsCopied() {
// We'll start with a mutable byte array
byte[] byteArray = new byte[] { 0x01, 0x02, 0x03 };
// Create a new ByteString from it
ByteString byteString = ByteString.copyFrom(byteArray);
// We'll assert that the data is the same
assertEquals(byteArray[0], byteString.byteAt(0));
// Here, we change the original array
byteArray[0] = 0x05;
// Note, the ByteString remains unchanged, confirming the copy
assertEquals(1, byteString.byteAt(0));
assertNotSame(byteArray, byteString.toByteArray());
}
6.2. ByteString to byte[]
The conversion in the other direction uses the toByteArray() method. This method returns a new byte[] instance with a copy of the ByteString‘s data:
@Test
public void givenByteString_whenConvertedToByteArray_thenDataIsCopied() {
// We'll start with an immutable ByteString
ByteString byteString = ByteString.copyFromUtf8("Baeldung");
// Create a mutable byte array from it
byte[] byteArray = byteString.toByteArray();
// Here, the byte array now has a copy of the data
assertEquals('B', (char) byteArray[0]);
// We'll change the new array
byteArray[0] = 'X';
// Note, the original ByteString remains unchanged
assertEquals('B', (char) byteString.byteAt(0));
assertNotSame(byteArray, byteString.toByteArray());
}
It’s essential to note that both conversions involve a complete data copy, which can introduce overhead for large byte sequences.
7. Conclusion
In this article, we first explored the fundamental differences between byte[] and ByteString, starting with the mutable nature of byte[] and its use in low-level stream operations. We also examined the key differences in performance and API, and finally, saw how to convert between the two types.
Ultimately, the choice between them comes down to a simple principle: we use byte[] for mutable, general-purpose buffers, and we use ByteString as the default for all binary data in our Protobuf messages.
The code backing this article is available on GitHub. Once you're
logged in as a Baeldung Pro Member, start learning and coding on the project.