**I just announced the new *** Learn Spring * course, focused on the fundamentals of Spring 5 and Spring Boot 2:

*Learn Spring*course, focused on the fundamentals of Spring 5 and Spring Boot 2:

Last modified: May 7, 2019

If you have a few years of experience in the Java ecosystem, and you're interested in sharing that experience with the community (and getting paid for your work of course), have a look at the "Write for Us" page. Cheers. Eugen

In this tutorial, **we’ll talk about the performance of different collections from the Java Collection API**. When we talk about collections, we usually think about the *List, Map, *and* Set* data structures and their common implementations.

First of all, we’ll look at Big-O complexity insights for common operations, and after, we’ll show the real numbers of some collection operations running time.

Usually, **when we talk about time complexity, we refer to Big-O notation**. Simply put, the notation describes how the time to perform the algorithm grows with the size of the input.

Useful write-ups are available to learn more about Big-O notation theory or practical Java examples.

Let’s start with a simple list – which is an ordered collection.

Here, we’ll have a look at a performance overview of the *ArrayList, LinkedList, *and* CopyOnWriteArrayList* implementations.

**The ArrayList in Java is backed by an array**. This helps to understand the internal logic of its implementation. A more comprehensive guide for the

So, let’s first focus on the time complexity of the common operations, at a high level:

– takes*add()**O(1)*time– in average runs in*add(index, element)**O(n)*time– is always a constant time*get()**O(1)*operation– runs in linear*remove()**O(n)*time. We have to iterate the entire array to find the element qualifying for removal– also runs in linear time. It iterates through the internal array and checking each element one by one. So the time complexity for this operation always requires**indexOf()***O(n)*time– implementation is based on*contains()**indexOf()*. So it will also run in*O(n)*time

This implementation of the *List* interface is **very useful when working with multi-threaded applications**. It’s thread-safe and explained well in this guide here.

Here’s the performance Big-O notation overview for *CopyOnWriteArrayList*:

– depends on the position we add value, so the complexity is**add()***O(n)*– is**get()***O(1)*constant time operation– takes**remove()***O(n)*time– likewise, the complexity is**contains()***O(n)*

As we can see, using this collection is very expensive because of the performance characteristics of the *add()* method.

** LinkedList is a linear data structure which consists of nodes holding a data field and a reference to another node**. For more

Let’s present the average estimate of the time we need to perform some basic operations:

– supports*add()**O(1)*constant-time insertion at any position– searching for an element takes*get()**O(n)*time– removing an element also takes*remove()**O(1)*operation, as we provide the position of the element– also has*contains()**O(n)*time complexity

Now, to prove the theory, let’s play with actual data. **To be more precise, we’ll present the JMH (Java Microbenchmark Harness) test results of the most common collection operations**.

In case you aren’t familiar with JMH tool, check out this useful guide.

First, we present the main parameters of our benchmark tests:

@BenchmarkMode(Mode.AverageTime) @OutputTimeUnit(TimeUnit.MICROSECONDS) @Warmup(iterations = 10) public class ArrayListBenchmark { }

Then, we set the warmup iterations number to *10*. Also, we wish to see the average running time of our results displayed in microseconds.

Now, it’s time to run our performance tests. First, we start with the *ArrayList*:

@State(Scope.Thread) public static class MyState { List<Employee> employeeList = new ArrayList<>(); long iterations = 100000; Employee employee = new Employee(100L, "Harry"); int employeeIndex = -1; @Setup(Level.Trial) public void setUp() { for (long i = 0; i < iterations; i++) { employeeList.add(new Employee(i, "John")); } employeeList.add(employee); employeeIndex = employeeList.indexOf(employee); } }

Inside of our *ArrayListBenchmark*, we add the *State* class to hold the initial data.

Here, we create an *ArrayList* of *Employee* objects. After, **we initialize it with 100.000 items inside of the setUp() method. The @State indicates that the @Benchmark tests have full access to the variables declared in it within the same thread.**

Finally, it’s time to add the benchmark tests for the *add(), contains(), indexOf(), remove(), *and *get()* methods:

@Benchmark public void testAdd(ArrayListBenchmark.MyState state) { state.employeeList.add(new Employee(state.iterations + 1, "John")); } @Benchmark public void testAddAt(ArrayListBenchmark.MyState state) { state.employeeList.add((int) (state.iterations), new Employee(state.iterations, "John")); } @Benchmark public boolean testContains(ArrayListBenchmark.MyState state) { return state.employeeList.contains(state.employee); } @Benchmark public int testIndexOf(ArrayListBenchmark.MyState state) { return state.employeeList.indexOf(state.employee); } @Benchmark public Employee testGet(ArrayListBenchmark.MyState state) { return state.employeeList.get(state.employeeIndex); } @Benchmark public boolean testRemove(ArrayListBenchmark.MyState state) { return state.employeeList.remove(state.employee); }

All the results are presented in microseconds:

Benchmark Mode Cnt Score Error ArrayListBenchmark.testAdd avgt 20 2.296 ± 0.007 ArrayListBenchmark.testAddAt avgt 20 101.092 ± 14.145 ArrayListBenchmark.testContains avgt 20 709.404 ± 64.331 ArrayListBenchmark.testGet avgt 20 0.007 ± 0.001 ArrayListBenchmark.testIndexOf avgt 20 717.158 ± 58.782 ArrayListBenchmark.testRemove avgt 20 624.856 ± 51.101

**From the results we can learn, that testContains() and testIndexOf() methods run in approximately the same time**. We can also clearly see the huge difference between the

While searching or removing an element roughly costs *700* microseconds. These numbers are the proof of the theoretical part, where we learned that *add(), *and *get()* has *O(1)* time complexity and the other methods are *O(n)*. *n=10.000* elements in our example.

Likewise, we can write the same tests for *CopyOnWriteArrayList* collection. All we need is to replace the *ArrayList* in employeeList with the *CopyOnWriteArrayList* instance.

Here are the results of the benchmark test:

Benchmark Mode Cnt Score Error CopyOnWriteBenchmark.testAdd avgt 20 652.189 ± 36.641 CopyOnWriteBenchmark.testAddAt avgt 20 897.258 ± 35.363 CopyOnWriteBenchmark.testContains avgt 20 537.098 ± 54.235 CopyOnWriteBenchmark.testGet avgt 20 0.006 ± 0.001 CopyOnWriteBenchmark.testIndexOf avgt 20 547.207 ± 48.904 CopyOnWriteBenchmark.testRemove avgt 20 648.162 ± 138.379

Here, again, the numbers confirm the theory. As we can see, *testGet()* on average runs in 0.006 ms which we can consider as *O(1)*. **Comparing to ArrayList, we also notice the significant difference between testAdd() method results. As we have here O(n) complexity for the add() method versus ArrayList’s O(1). **

**We can clearly see the linear growth of the time, as performance numbers are 878.166 compared to 0.051**.

Now, it’s *LinkedList* time:

Benchmark Cnt Score Error testAdd 20 2.580 ± 0.003 testContains 20 1808.102 ± 68.155 testGet 20 1561.831 ± 70.876 testRemove 20 0.006 ± 0.001

We can see from the scores, that adding and removing elements in *LinkedList* are quite fast.

Furthermore, there’s a significant performance gap between add/remove and get/contains operations.

With the latest JDK versions, we’re witnessing significant performance improvement for *Map* implementations, such as replacing the *LinkedList* with the balanced tree node structure in *HashMap, LinkedHashMap *internal implementations. **This shortens the element lookup worst-case scenario from O(n) to O(log(n)) time during the HashMap collisions**.

However, if we implement proper *.equals()* and *.hashcode()* methods collisions are unlikely.

To learn more about *HashMap* collisions check out this write-up. **From the write-up, we can also learn, that storing and retrieving elements from the HashMap takes constant O(1) time**.

Let’s show some actual numbers. First, for the *HashMap*:

Benchmark Mode Cnt Score Error HashMapBenchmark.testContainsKey avgt 20 0.009 ± 0.002 HashMapBenchmark.testGet avgt 20 0.011 ± 0.001 HashMapBenchmark.testPut avgt 20 0.019 ± 0.002 HashMapBenchmark.testRemove avgt 20 0.010 ± 0.001

**As we see, the numbers prove the O(1) constant time for running the methods listed above.** Now, let’s do a comparison of the

For all of the listed methods, **we have O(1) for HashMap, LinkedHashMap, IdentityHashMap, WeakHashMap, EnumMap and ConcurrentHashMap.**

Let’s present the results of the remaining test scores in form of one table:

Benchmark LinkedHashMap IdentityHashMap WeakHashMap ConcurrentHashMap testContainsKey 0.008 0.009 0.014 0.011 testGet 0.011 0.109 0.019 0.012 testPut 0.020 0.013 0.020 0.031 testRemove 0.011 0.115 0.021 0.019

From the output numbers, we can confirm the claims of *O(1)* time complexity.

**For the tree structure TreeMap and ConcurrentSkipListMap the put(), get(), remove(), containsKey() operations time is O(log(n)).**

Here, **we want to make sure that our performance tests will run approximately in logarithmic time**. For that reason, we initialize the maps with n*=1000, 10,000, 100,000, 1,000,000* items continuously.

In this case, we’re interested in the total time of execution:

items count (n) 1000 10,000 100,000 1,000,000 all tests total score 00:03:17 00:03:17 00:03:30 00:05:27

When *n=1000* we have the total of *00:03:17* milliseconds execution time. *n=10,000* the time is almost unchanged *00:03:18 ms. n=100,000* has minor increase *00:03:30*. And finally, when *n=1,000,000* the run completes in *00:05:27 ms*.

**After comparing the runtime numbers with the log(n) function of each n, we can confirm that the correlation of both functions matches.**

Generally, *Set* is a collection of unique elements. Here, we’re going to examine the *HashSet*, *LinkedHashSet*, *EnumSet, TreeSet, CopyOnWriteArraySet,* and* ConcurrentSkipListSet* implementations of the *Set* interface.

To better understand the internals of the *HashSet*, this guide is here to help.

Now, let’s jump ahead to present the time complexity numbers. **For HashSet, LinkedHashSet, and EnumSet the add(), remove() and contains() operations cost constant O(1) time. Thanks to the internal HashMap implementation.**

**Likewise, the **** TreeSet has O(log(n)) time complexity** for the operations listed for the previous group. That’s because of the

For *CopyOnWriteArraySet,* the *add(), remove() *and *contains()* methods have O(n) average time complexity.

Now, let’s jump to our benchmark tests:

@Benchmark public boolean testAdd(SetBenchMark.MyState state) { return state.employeeSet.add(state.employee); } @Benchmark public Boolean testContains(SetBenchMark.MyState state) { return state.employeeSet.contains(state.employee); } @Benchmark public boolean testRemove(SetBenchMark.MyState state) { return state.employeeSet.remove(state.employee); }

Furthermore, we leave the remaining benchmark configurations as they are.

Let’s see the behavior of the runtime execution score for *HashSet *and *LinkedHashSet *having *n = 1000; 10,000; 100,000* items.

For the *HashSet, * the numbers are:

Benchmark 1000 10,000 100,000 .add() 0.026 0.023 0.024 .remove() 0.009 0.009 0.009 .contains() 0.009 0.009 0.010

Similarly, the results for *LinkedHashSet* are:

Benchmark 1000 10,000 100,000 .add() 0.022 0.026 0.027 .remove() 0.008 0.012 0.009 .contains() 0.008 0.013 0.009

As we see, the scores remain almost the same for each operation. Even more, when we compare them with the *HashMap* test outputs, they look the same as well.

**As a result, we confirm that all the tested methods run in constant O(1) time.**

In this article, **we present the time complexity of the most common implementations of the Java data structures.**

Separately, we show the actual runtime performance of each type of collection through the JVM benchmark tests. We have also compared the performance of the same operations in different collections. As a result, we learn to choose the right collection that fits our needs.

As usual, the complete code for this article is available over on GitHub.