Course – LS – All

Get started with Spring and Spring Boot, through the Learn Spring course:

>> CHECK OUT THE COURSE

1. Overview

In this tutorial, we’ll discuss the differences between Set and List in Java with the help of a simple example. Also, we’ll compare the two data structures in terms of performance and memory allocation.

2. Conceptual Difference

Both List and Set are members of Java Collections. However, there are a few important differences:

  • A List can contain duplicates, but a Set can’t
  • A List will preserve the order of insertion, but a Set may or may not
  • Since insertion order may not be maintained in a Set, it doesn’t allow index-based access as in the List

Please note that there are a few implementations of the Set interface which maintain order, for example, LinkedHashSet.

3. Code Example

3.1. Allowing Duplicates

Adding a duplicate item is allowed for a List. However, it isn’t for a Set:

@Test
public void givenList_whenDuplicates_thenAllowed(){
    List<Integer> integerList = new ArrayList<>();
    integerList.add(2);
    integerList.add(3);
    integerList.add(4);
    integerList.add(4);
    assertEquals(integerList.size(), 4);
}
@Test
public void givenSet_whenDuplicates_thenNotAllowed(){
    Set<Integer> integerSet = new HashSet<>();
    integerSet.add(2);
    integerSet.add(3);
    integerSet.add(4);
    integerSet.add(4);
    assertEquals(integerSet.size(), 3);
}

3.2. Maintaining Insertion Order

A Set maintains order depending on the implementation. For example, a HashSet is not guaranteed to preserve order, but a LinkedHashSet is. Let’s see an example of ordering with LinkedHashSet:

@Test
public void givenSet_whenOrdering_thenMayBeAllowed(){
    Set<Integer> set1 = new LinkedHashSet<>();
    set1.add(2);
    set1.add(3);
    set1.add(4);
    Set<Integer> set2 = new LinkedHashSet<>();
    set2.add(2);
    set2.add(3);
    set2.add(4);
    Assert.assertArrayEquals(set1.toArray(), set2.toArray());
}

Since a Set is not guaranteed to maintain order, it can’t be indexed.

4. Performance Comparison Between List and Set

Let’s compare the performance of the List and Set data structures using the Java Microbench Harness (JMH). First, we’ll create two classes: ListAndSetAddBenchmark and ListAndSetContainBenchmark. Then, we’ll measure the execution time for add() and contains() methods for the List and Set data structures.

4.1. JMH Parameters

We’ll execute the benchmark tests with the following parameters:

@BenchmarkMode(Mode.SingleShotTime)
@Warmup(iterations = 3, time = 10, timeUnit = TimeUnit.MILLISECONDS)
@Measurement(iterations = 3, time = 10, timeUnit = TimeUnit.MILLISECONDS)
public class ListAndSetAddBenchmark {
}

In the class above, we specify the mode of the benchmark. The @BenchmarkMode(Mode.SingleShotTime) annotation sets the mode in which the benchmark is to be run. In our example, the mode is SingleShotTime, which means that the benchmark will run once and measures the time it takes to execute.

The @Warmup annotation specifies the number of iterations and the time to run each iteration during the warm-up phase. In our case,  the warm-up phase will consist of three iterations and each iteration will run for 10 milliseconds.

Furthermore, the @Measurement annotation specifies the number of iterations and the time to run each iteration during the measurement phase. Our example class shows that the measurement phase will consist of three iterations and each iteration will run for 10 milliseconds.

4.2. add()

First, let’s create an inner class to declare variables that the benchmark methods will use:

@State(Scope.Benchmark)
public static class Params {
    public int addNumber = 10000000;
    public List<Integer> arrayList = new ArrayList<>();
    public List<Set> hashSet = new HashSet<>(); 
}

The @State annotation helps to make the class a state class. The state class holds data that’s being used by the benchmark method for computation.

Next, let’s test the add() operation for an ArrayList():

@Benchmark
public void addElementsToArrayList(Params param, Blackhole blackhole) {
    param.arrayList.clear(); 
    for (int i = 0; i < param.addNumber; i++) {
        blackhole.consume(arrayList.add(i));
    }
}

The method above measures the time it takes to add an element to an ArrayList. Also, the @Benchmark annotation indicates that it’s a benchmark method. The Blackhole parameter is used to consume the results of the benchmark method.

Furthermore, let’s test adding an element to a HashSet():

@Benchmark
public void addElementToHashSet(Params param, Blackhole blackhole) {
    param.hashSet.clear(); 
    for (int i = 0; i < param.addNumber; i++) {
        blackhole.consume(hashSet.add(i));
    }
}

Here, we measure the time it takes to add 10000000 to a HashSet. The @Benchmark annotation indicates that the method is a benchmark method. When JMH encounters the method, it generates code to measure the performance of the method.

Finally, let’s compare the test result:

Benchmark             Mode  Cnt  Score   Error  Units
addElementToArrayList   ss   15  0.386 ± 1.266   s/op
addElementToHashSet     ss   15  0.419 ± 2.535   s/op

The result shows that adding elements to an ArrayList is faster than adding elements to a HashSet. In a scenario where we need to add elements to a collection as fast as possible, an ArrayList is more efficient.

4.3. contains()

First, let’s define an inner class to fill up the ArrayList and HashSet:

@State(Scope.Benchmark)
public static class Params {
    @Param({"5000000"})
    public int searchElement;
    
    @Param({"10000000"})
    public int collectionSize;
        
    public List<Integer> arrayList;
    public Set<Integer> hashSet;
        
    @Setup(Level.Iteration)
    public void setup() {
        arrayList = new ArrayList<>();
        hashSet = new HashSet<>();
        for (int i = 0; i < collectionSize; i++) {
            arrayList.add(i);
            hashSet.add(i);
        }
    }
}

The @Param annotation specifies the parameter for the benchmark. In this case, it defines a parameter named searchElement and collectionSize with a single value. These parameters will be used to configure the benchmark.

Also, the @Setup annotation marks the method that should be executed before each iteration.

Next, let’s test contains() operation using an ArrayList:

@Benchmark
public void searchElementInArrayList(Params param, Blackhole blackhole) {
    for (int i = 0; i < param.containNumber; i++) {
        blackhole.consume(arrayList.contains(searchElement));
    }
}

The searchElementInArrayList() method search for 5000000 in the ArrayList.

Finally, let’s implement contains() operation using a HashSet:

@Benchmark
public void searchElementInHashSet(Params param, Blackhole blackhole) {
    for (int i = 0; i < param.containNumber; i++) {
        blackhole.consume(hashSet.contains(searchElement));
    }
}

Like the searchElementInArrayList() method, we search for 5000000 in the HashSet.

Here’s the result:

Benchmark                 Mode   Cnt   Score   Error  Units
searchElementInArrayList     ss   15   0.014 ± 0.015   s/op
searchElementInHashSet       ss   15   ≈ 10⁻⁵          s/op

The result shows that searching for an element in a HashSet is faster than searching for an element in an ArrayList. This ascertains that a HashSet is more efficient in a scenario where we want to search for an element in a collection in a fast and efficient way.

5. Memory Allocation Comparison Between List and Set

In the previous section, we saw different metrics that measure the performance of List and Set with respect to time. Let’s measure the memory allocation for the benchmark methods by specifying the gc profiler option “-prof gc” while running the benchmark.

Let’s modify the main() method and configure the JMH run options for the two benchmark classes:

public static void main(String[] args) throws RunnerException {
    Options opt = new OptionsBuilder()
      .include(ListAndSetAddBenchmark.class.getSimpleName())
      .forks(1)
      .addProfiler("gc")
      .build();
    new Runner(opt).run();
}

In the method above, we create a new Options object to configure the JMH. First, we use the include() method to specify the benchmark that should be run. Next, we specify the number of times the benchmark should run with the fork() method.

Furthermore, we specify the profiler to use with the addProfiler() method. In this case, we are using the gc profiler.

This configuration works for the ListAndSetAddBenchmark class. Also, we need to modify the main() method of ListAndSetContainBenchmark to add the gc profiler:

public static void main(String[] args) throws RunnerException { 
    Options opt = new OptionsBuilder()
      .include(ListAndSetContainBenchmark.class.getSimpleName()) 
      .forks(1) 
      .addProfiler("gc") 
      .build(); 
    new Runner(opt).run(); 
}

Here’s the result of the test:

Benchmark                                         Mode  Cnt    Score     Error    Units
addElementToArrayList:·gc.alloc.rate                ss    3     172.685 ± 254.719  MB/sec
addElementToHashSet:·gc.alloc.rate                  ss    3     504.746 ± 1323.322 MB/sec
searchElementInArrayList:·gc.alloc.rate             ss    3     248.628 ± 395.569  MB/sec
searchElementInHashSet:·gc.alloc.rate               ss    3     254.192 ± 235.294  MB/sec

The result shows that for the add() operation, addElementToHashSet() has a higher gc.alloc.rate of 504.746 MB/sec compared to addElementToArrayList() with a value of 172.685 MB/sec. This suggests that HashSet is allocating more memory during execution compared to an ArrayList.

Furthermore, the result shows that HashSet allocates slightly more memory for search operations compared to an ArrayList.

The error values indicate that there are some variabilities in the result, which may depend on factors such as JVM warm-up and code optimization.

6. Conclusion

In this article, we learned the difference between a List and a Set in Java. Additionally, we saw a benchmark test to compare the performance of List and Set with respect to time and memory allocation. Depending on the use case, List and Set can be better for a specific operation.

As always, the source code for the examples is available over on GitHub.

Course – LS – All

Get started with Spring and Spring Boot, through the Learn Spring course:

>> CHECK OUT THE COURSE
res – REST with Spring (eBook) (everywhere)
2 Comments
Oldest
Newest
Inline Feedbacks
View all comments
Comments are closed on this article!