Course – LS – All

Get started with Spring and Spring Boot, through the Learn Spring course:

>> CHECK OUT THE COURSE

1. Overview

The Java programming language provides arrays and collections to group objects together. Mostly, a collection is backed by an array and modeled with a set of methods to process the elements it contains.

While developing software, it’s quite common to use both of these data structures. Hence, programmers need a bridging mechanism to convert these elements from one form to another. The asList method from the Arrays class and the Collection interface’s toArray method form this bridge.

In this tutorial, we’ll do an in-depth analysis of an interesting argument: which toArray method to use and why? We’ll also use JMH-assisted benchmarking to support these arguments.

2. The toArray Rabbit Hole

Before aimlessly invoking the toArray method, let’s understand what’s inside the box. The Collection interface offers two methods to transform a collection into an array:

Object[] toArray()

<T> T[] toArray(T[] a)

Both methods return an array containing all elements of the collection. To demonstrate this, let’s create a list of natural numbers:

List<Integer> naturalNumbers = IntStream
    .range(1, 10000)
    .boxed()
    .collect(Collectors.toList());

2.1. Collection.toArray()

The toArray() method allocates a new in-memory array with a length equal to the size of the collection. Internally, it invokes the Arrays.copyOf on the underlying array backing the collection. Therefore, the returned array has no references to it and is safe to use:

Object[] naturalNumbersArray = naturalNumbers.toArray();

However, we cannot merely cast the result into an Integer[]. Doing so will result in a ClassCastException.

2.2. <T> T[] Collection.toArray(T[] a)

Unlike the non-parameterized method, this one accepts a pre-allocated array as an argument. Additionally, the use of Generics in the method’s definition mandates having the same type for the input and the returned array. This also solves the previously observed problem of iterating over an Object[].

This variant works distinctively based on the size of the input array:

  • If the length of the pre-allocated array is less than the collection’s size, a new array of the required length and the same type is allocated:
Integer[] naturalNumbersArray = naturalNumbers.toArray(new Integer[0]);
  • If the input array is large enough to contain the collection’s elements, it’s returned with those elements inside:
Integer[] naturalNumbersArray = naturalNumbers.toArray(new Integer[naturalNumbers.size]);

Now, let’s switch back to the original question of selecting the faster and better-performing candidate.

3. Performance Trials

Let’s begin with a simple experiment that compares the zero-sized (toArray(new T[0]) and the pre-sized (toArray(new T[size]) variants. We’ll use the popular ArrayList and the AbstractCollection backed TreeSet for the trials. Also, we’ll include differently sized (small, medium, and large) collections to have a broad spectrum of sample data.

3.1. The JMH Benchmark

Next, let’s put together a JMH (Java Microbenchmark Harness) benchmark for our trials. We’ll configure the size and type parameters of the collection for the benchmark:

@Param({ "10", "10000", "10000000" })
private int size;

@Param({ "array-list", "tree-set" })
private String type;

Additionally, we’ll define benchmark methods for the zero-sized and the pre-sized toArray variants:

@Benchmark
public String[] zero_sized() {
    return collection.toArray(new String[0]);
}

@Benchmark
public String[] pre_sized() {
    return collection.toArray(new String[collection.size()]);
}

3.2. Benchmark Results

Running the above benchmark on an 8 vCPU, 32 GB RAM, Linux x86_64 Virtual Machine with JMH (v1.28) and JDK (1.8.0_292) furnishes the results shown below. The Score reveals the average execution time, in nanoseconds per operation, for each of the benchmarked methods.

The lower the value, the better the performance:

Benchmark                   (size)      (type)  Mode  Cnt          Score          Error  Units

<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

TestBenchmark.zero_sized        10  array-list  avgt   15         24.939 ±        1.202  ns/op
TestBenchmark.pre_sized         10  array-list  avgt   15         38.196 ±        3.767  ns/op
----------------------------------------------------------------------------------------------
TestBenchmark.zero_sized     10000  array-list  avgt   15      15244.367 ±      238.676  ns/op
TestBenchmark.pre_sized      10000  array-list  avgt   15      21263.225 ±      802.684  ns/op
----------------------------------------------------------------------------------------------
TestBenchmark.zero_sized  10000000  array-list  avgt   15   82710389.163 ±  6616266.065  ns/op
TestBenchmark.pre_sized   10000000  array-list  avgt   15  100426920.878 ± 10381964.911  ns/op

<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

TestBenchmark.zero_sized        10    tree-set  avgt   15         66.802 ±        5.667  ns/op
TestBenchmark.pre_sized         10    tree-set  avgt   15         66.009 ±        4.504  ns/op
----------------------------------------------------------------------------------------------
TestBenchmark.zero_sized     10000    tree-set  avgt   15      85141.622 ±     2323.420  ns/op
TestBenchmark.pre_sized      10000    tree-set  avgt   15      89090.155 ±     4895.966  ns/op
----------------------------------------------------------------------------------------------
TestBenchmark.zero_sized  10000000    tree-set  avgt   15  211896860.317 ± 21019102.769  ns/op
TestBenchmark.pre_sized   10000000    tree-set  avgt   15  212882486.630 ± 20921740.965  ns/op

After careful observation of the above results, it’s quite evident that the zero-sized method calls win it all, for all sizes and collection types in this trial.

For now, these numbers are just data. To have a detailed understanding, let’s dig deep and analyze them.

3.3. The Allocation Rate

Hypothetically, it can be assumed that the zero-sized toArray method calls perform better than the pre-sized ones due to optimized memory allocations per operation. Let’s clarify this by executing another benchmark and quantifying the average allocation rates – the memory in bytes allocated per operation – for the benchmarked methods.

The JMH provides a GC profiler (-prof gc) that internally uses ThreadMXBean#getThreadAllocatedBytes to calculate the allocation rate per @Benchmark:

Benchmark                                                    (size)      (type)  Mode  Cnt          Score           Error   Units

<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

TestBenchmark.zero_sized:·gc.alloc.rate.norm                     10  array-list  avgt   15         72.000 ±         0.001    B/op
TestBenchmark.pre_sized:·gc.alloc.rate.norm                      10  array-list  avgt   15         56.000 ±         0.001    B/op
---------------------------------------------------------------------------------------------------------------------------------
TestBenchmark.zero_sized:·gc.alloc.rate.norm                  10000  array-list  avgt   15      40032.007 ±         0.001    B/op
TestBenchmark.pre_sized:·gc.alloc.rate.norm                   10000  array-list  avgt   15      40016.010 ±         0.001    B/op
---------------------------------------------------------------------------------------------------------------------------------
TestBenchmark.zero_sized:·gc.alloc.rate.norm               10000000  array-list  avgt   15   40000075.796 ±         8.882    B/op
TestBenchmark.pre_sized:·gc.alloc.rate.norm                10000000  array-list  avgt   15   40000062.213 ±         4.739    B/op

<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

TestBenchmark.zero_sized:·gc.alloc.rate.norm                     10    tree-set  avgt   15         56.000 ±         0.001    B/op
TestBenchmark.pre_sized:·gc.alloc.rate.norm                      10    tree-set  avgt   15         56.000 ±         0.001    B/op
---------------------------------------------------------------------------------------------------------------------------------
TestBenchmark.zero_sized:·gc.alloc.rate.norm                  10000    tree-set  avgt   15      40055.818 ±        16.723    B/op
TestBenchmark.pre_sized:·gc.alloc.rate.norm                   10000    tree-set  avgt   15      41069.423 ±      1644.717    B/op
---------------------------------------------------------------------------------------------------------------------------------
TestBenchmark.zero_sized:·gc.alloc.rate.norm               10000000    tree-set  avgt   15   40000155.947 ±         9.416    B/op
TestBenchmark.pre_sized:·gc.alloc.rate.norm                10000000    tree-set  avgt   15   40000138.987 ±         7.987    B/op

Clearly, the above numbers prove that the allocation rate is more or less the same for identical sizes, irrespective of the collection type or the toArray variant. Therefore, it negates any speculative assumptions that the pre-sized and zero-sized toArray variants perform differently due to the irregularities in their memory allocation rates.

3.4. The toArray(T[] a) Internals

To further root out the cause of the problem, let’s delve into the ArrayList internals:

if (a.length < size)
    return (T[]) Arrays.copyOf(elementData, size, a.getClass());
System.arraycopy(elementData, 0, a, 0, size);
if (a.length > size)
    a[size] = null;
return a;

Basically, depending on the length of the pre-allocated array, it’s either an Arrays.copyOf or the native System.arraycopy method call that copies the underlying elements of the collection into an array.

Further, gazing at the copyOf method, it’s evident that first a copy array of length equal to the size of the collection is created and then followed by the System.arraycopy invocation:

T[] copy = ((Object)newType == (Object)Object[].class)
    ? (T[]) new Object[newLength]
    : (T[]) Array.newInstance(newType.getComponentType(), newLength);
System.arraycopy(original, 0, copy, 0,
    Math.min(original.length, newLength));

When both the zero-sized and the pre-sized methods eventually invoke the native System.arraycopy method, how is the zero-sized method call faster?

The mystery lies in the direct costs of the CPU time spent in performing Zero Initializations for the externally pre-allocated arrays that make the toArray(new T[size]) method much slower.

4. Zero Initializations

The Java language specification directs that newly instantiated arrays and objects should have the default field values and not the irregular leftovers from memory. Hence, the runtime must zero-out the pre-allocated storage. Benchmarking experiments have proved that the zero-sized array method calls managed to avoid zeroing, but the pre-sized case could not.

Let’s consider a couple of benchmarks:

@Benchmark
public Foo[] arraycopy_srcLength() {
    Object[] src = this.src;
    Foo[] dst = new Foo[size];
    System.arraycopy(src, 0, dst, 0, src.length);
    return dst;
}

@Benchmark
public Foo[] arraycopy_dstLength() {
    Object[] src = this.src;
    Foo[] dst = new Foo[size];
    System.arraycopy(src, 0, dst, 0, dst.length);
    return dst;
}

Experimental observations show that the System.arraycopy immediately following the array allocation in the arraycopy_srcLength benchmark is able to avoid the pre-zeroing of the dst array. However, the arraycopy_dstLength execution could not avoid pre-zeroing.

Coincidently, the latter arraycopy_dstLength case is similar to the pre-sized array method collection.toArray(new String[collection.size()]) where zeroing cannot be eliminated, hence its slowness.

5. Benchmarks on Newer JDKs

Finally, let’s execute the original benchmark on the recently released JDKs, and also configure the JVM to use the newer and much improved G1 garbage collector:

# VM version: JDK 11.0.2, OpenJDK 64-Bit Server VM, 11.0.2+9
-----------------------------------------------------------------------------------
Benchmark                    (size)      (type)  Mode  Cnt    Score    Error  Units
-----------------------------------------------------------------------------------
ToArrayBenchmark.zero_sized     100  array-list  avgt   15  199.920 ± 11.309  ns/op
ToArrayBenchmark.pre_sized      100  array-list  avgt   15  237.342 ± 14.166  ns/op
-----------------------------------------------------------------------------------
ToArrayBenchmark.zero_sized     100    tree-set  avgt   15  819.306 ± 85.916  ns/op
ToArrayBenchmark.pre_sized      100    tree-set  avgt   15  972.771 ± 69.743  ns/op
###################################################################################

# VM version: JDK 14.0.2, OpenJDK 64-Bit Server VM, 14.0.2+12-46
------------------------------------------------------------------------------------
Benchmark                    (size)      (type)  Mode  Cnt    Score    Error   Units
------------------------------------------------------------------------------------
ToArrayBenchmark.zero_sized     100  array-list  avgt   15  158.344 ±   3.862  ns/op
ToArrayBenchmark.pre_sized      100  array-list  avgt   15  214.340 ±   5.877  ns/op
------------------------------------------------------------------------------------
ToArrayBenchmark.zero_sized     100    tree-set  avgt   15  877.289 ± 132.673  ns/op
ToArrayBenchmark.pre_sized      100    tree-set  avgt   15  934.550 ± 148.660  ns/op

####################################################################################

# VM version: JDK 15.0.2, OpenJDK 64-Bit Server VM, 15.0.2+7-27
------------------------------------------------------------------------------------
Benchmark                    (size)      (type)  Mode  Cnt    Score     Error  Units
------------------------------------------------------------------------------------
ToArrayBenchmark.zero_sized     100  array-list  avgt   15  147.925 ±   3.968  ns/op
ToArrayBenchmark.pre_sized      100  array-list  avgt   15  213.525 ±   6.378  ns/op
------------------------------------------------------------------------------------
ToArrayBenchmark.zero_sized     100    tree-set  avgt   15  820.853 ± 105.491  ns/op
ToArrayBenchmark.pre_sized      100    tree-set  avgt   15  947.433 ± 123.782  ns/op

####################################################################################

# VM version: JDK 16, OpenJDK 64-Bit Server VM, 16+36-2231
------------------------------------------------------------------------------------
Benchmark                    (size)      (type)  Mode  Cnt    Score     Error  Units
------------------------------------------------------------------------------------
ToArrayBenchmark.zero_sized     100  array-list  avgt   15  146.431 ±   2.639  ns/op
ToArrayBenchmark.pre_sized      100  array-list  avgt   15  214.117 ±   3.679  ns/op
------------------------------------------------------------------------------------
ToArrayBenchmark.zero_sized     100    tree-set  avgt   15  818.370 ± 104.643  ns/op
ToArrayBenchmark.pre_sized      100    tree-set  avgt   15  964.072 ± 142.008  ns/op

####################################################################################

Interestingly, the toArray(new T[0]) method has been consistently faster than toArray(new T[size]). Also, its performance has constantly improved with every new release of the JDK.

5.1. Java 11 Collection.toArray(IntFunction<T[]>) 

In Java 11, the Collection interface introduced a new default toArray method that accepts an IntFunction<T[]> generator as an argument (one that will generate a new array of the desired type and the provided length).

This method guarantees new T[0] array initialization by invoking the generator function with a value of zero, thereby ensuring that the faster and better performing zero-sized toArray(T[]) method will always be executed.

6. Conclusion

In this article, we probed into the different toArray overloaded methods of the Collection interface. We also ran performance trials leveraging the JMH micro-benchmarking tool across different JDKs.

We understood the necessity and the impact of zeroing and observed how the internally allocated array eliminates the zeroing, thus winning the performance race. Lastly, we can firmly conclude that the toArray(new T[0]) variant is faster than the toArray(new T[size]) and, therefore, should always be the preferred option when we have to convert a collection to an array.

As always, the code used in this article can be found over on GitHub.

Course – LS – All

Get started with Spring and Spring Boot, through the Learn Spring course:

>> CHECK OUT THE COURSE
res – REST with Spring (eBook) (everywhere)
Comments are open for 30 days after publishing a post. For any issues past this date, use the Contact form on the site.