## 1. Overview

*HashSet* is a collection for storing unique elements.

In this tutorial, we'll discuss the performance of the *removeAll()* method in the *java.util.HashSet *class.

## 2. *HashSet.removeAll()*

The *removeAll* method removes all the elements, that are contained in the *collection*:

```
Set<Integer> set = new HashSet<Integer>();
set.add(1);
set.add(2);
set.add(3);
set.add(4);
Collection<Integer> collection = new ArrayList<Integer>();
collection.add(1);
collection.add(3);
set.removeAll(collection);
Integer[] actualElements = new Integer[set.size()];
Integer[] expectedElements = new Integer[] { 2, 4 };
assertArrayEquals(expectedElements, set.toArray(actualElements));
```

As a result, elements 1 and 3 will be removed from the set.

## 3. Internal Implementation and Time Complexity

*The removeAll()* method determines which one is smaller – the set or the collection. This is done by invoking the *size() *method on the set and the collection.

**If the collection has fewer elements than the set**, then it iterates over the specified collection with the time complexity O(*n*). It also checks if the element is present in the set with the time complexity O(1). And if the element is present, it's being removed from the set using the *remove()* method of the set, which again has a time complexity of O(1). So **the overall time complexity is O( n)**.

**If the set has fewer elements than the collection**, then it iterates over this set using O(*n*). Then it checks if each element is present in the collection by invoking its *contains()* method. And if such an element is present, then the element is removed from the set. So this depends on the time complexity of the *contains()* method.

Now in this case, if the collection is an *ArrayList*, the time complexity of the *contains()* method is O(*m*). So **overall time complexity to remove all elements present in the ArrayList from the set is O(n * m)**.

If the collection is again *HashSet*, the time complexity of the *contains()* method is O(1). So **overall time complexity to remove all elements present in the HashSet from the set is O(n)**.

## 4. Performance

To see the performance difference between the above 3 cases, let's write a simple JMH benchmark test.

For the first case, we'll initialize the set and collection, where we have more elements in the set than the collection. In the second case, we'll initialize the set and the collection, where we have more elements in the collection than the set. And in the third case, we'll initialize 2 sets, where we'll have 2nd set having more number of elements than the 1st one:

```
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Warmup(iterations = 5)
public class HashSetBenchmark {
@State(Scope.Thread)
public static class MyState {
private Set employeeSet1 = new HashSet<>();
private List employeeList1 = new ArrayList<>();
private Set employeeSet2 = new HashSet<>();
private List employeeList2 = new ArrayList<>();
private Set<Employee> employeeSet3 = new HashSet<>();
private Set<Employee> employeeSet4 = new HashSet<>();
private long set1Size = 60000;
private long list1Size = 50000;
private long set2Size = 50000;
private long list2Size = 60000;
private long set3Size = 50000;
private long set4Size = 60000;
@Setup(Level.Trial)
public void setUp() {
// populating sets
}
}
}
```

After, we add our benchmark tests:

```
@Benchmark
public boolean given_SizeOfHashsetGreaterThanSizeOfCollection_whenRemoveAllFromHashSet_thenGoodPerformance(MyState state) {
return state.employeeSet1.removeAll(state.employeeList1);
}
@Benchmark
public boolean given_SizeOfHashsetSmallerThanSizeOfCollection_whenRemoveAllFromHashSet_thenBadPerformance(MyState state) {
return state.employeeSet2.removeAll(state.employeeList2);
}
@Benchmark
public boolean given_SizeOfHashsetSmallerThanSizeOfAnotherHashSet_whenRemoveAllFromHashSet_thenGoodPerformance(MyState state) {
return state.employeeSet3.removeAll(state.employeeSet4);
}
```

And here are the results:

```
Benchmark Mode Cnt Score Error Units
HashSetBenchmark.testHashSetSizeGreaterThanCollection avgt 20 2700457.099 ± 475673.379 ns/op
HashSetBenchmark.testHashSetSmallerThanCollection avgt 20 31522676649.950 ± 3556834894.168 ns/op
HashSetBenchmark.testHashSetSmallerThanOtherHashset avgt 20 2672757.784 ± 224505.866 ns/op
```

**We can see the HashSet.removeAll() performs pretty bad when the HashSet has fewer elements than the Collection, which is passed as an argument to the removeAll() method. But when the other collection is again HashSet, then the performance is good.**

## 5. Conclusion

In this article, we saw the performance of *removeAll()* in HashSet. When the set has fewer elements than the collection, then the performance of *removeAll()* depends on the time complexity of the *contains()* method of the collection.

As usual, the complete code for this article is available over on GitHub.