Course – LS – All

Get started with Spring and Spring Boot, through the Learn Spring course:

>> CHECK OUT THE COURSE

1. Overview

HashMap is a powerful tool for storing and managing key-value pairs in Java programming. However, sometimes our data may contain duplicates in value.

In this tutorial, we’ll explore removing duplicate values from a HashMap. 

2. Introduction to the Problem

HashMap allows multiple keys to have the same value, making duplicates inevitable in certain scenarios. Let’s see an example:

Map<String, String> initDevMap() {
    Map<String, String> devMap = new HashMap<>();
    devMap.put("Tom", "Linux");
    devMap.put("Kent", "Linux");

    devMap.put("Bob", "MacOS");
    devMap.put("Eric", "MacOS");

    devMap.put("Peter", "Windows");
    devMap.put("Saajan", "Windows");
    devMap.put("Jan", "Windows");

    devMap.put("Kevin", "FreeBSD");
    return devMap;
}

In the example above, the initDevMap() method initializes a new HashMap carrying the associations of the developer name and the operating system (OS) they work with. Let’s say we want to remove the duplicate OS names from the map. So, we’ll get a map with only four entries. 

A straightforward approach might be iterating through the map, keeping track of the values, and deleting the map entry when we find duplicate ones. Further, we may want to use Iterator to walk through the map and remove entries to avoid the ConcurrentModificationException.

However, in this tutorial, we’ll learn a different approach. Also, we’ll cover two deduplication scenarios:

  • As long as the result only contains unique values, we don’t care about which entries are removed.
  • For deduplication, we should follow specific rules like retaining the first/last key from “A-Z sorting,” preserving the longest/shortest name, and more.

3. Inverting the Map Twice

We know HashMap doesn’t allow duplicate keys. Therefore, if we invert the input map from “developer -> OS” to “OS -> developer,” the same OS names are removed. Then, we can invert the map back to get the final result.

Next, let’s implement this “inverting twice” idea and check if it works as expected:

Map<String, String> devMap = initDevMap();
Map<String, String> tmpReverseMap = new HashMap<>();
Map<String, String> result = new HashMap<>();
for (String name : devMap.keySet()) {
    tmpReverseMap.put(devMap.get(name), name);
}
for (String os : tmpReverseMap.keySet()) {
    result.put(tmpReverseMap.get(os), os);
}
assertThat(result.values()).hasSize(4)
  .containsExactlyInAnyOrder("Windows", "MacOS", "Linux", "FreeBSD");

As we can see, we use two for loops to invert the map twice. Then, we verify the result with the handy assertj library.

4. Using the Stream API

If we work with Java 8 or later, we can implement the “inverting twice” approach using the Stream API:

Map<String, String> devMap = initDevMap();
Map<String, String> result = devMap.entrySet()
  .stream()
  .collect(toMap(Map.Entry::getValue, Map.Entry::getKey, (keyInMap, keyNew) -> keyInMap))
  .entrySet()
  .stream()
  .collect(toMap(Map.Entry::getValue, Map.Entry::getKey));
assertThat(result.values()).hasSize(4)
  .containsExactlyInAnyOrder("Windows", "MacOS", "Linux", "FreeBSD");

The stream-based implementation is more fluent than the two for-loop ones. It’s worth noting that we used the Collector.toMap() method for the inversion operation. The first inversion aims to remove duplicate values. Since we have duplicate values, after key-value inversion, we have key (OS names) collisions. Therefore, we put a merge function there to handle key conflicts. Also, since we don’t care which entry is left in the map after the deduplication, we made the merge function return the key already in the map. In other words, we discard the duplicate values coming later.

5. Adapting the Specific Deduplication Requirement.

As HashMap doesn’t guarantee the entry order, using the solution so far, we cannot decide which entry is left in the map after the deduplication.

However, we can satisfy different requirements by adjusting the merge function. Next, let’s see an example.

Let’s say we’ve got a new requirement: if multiple developers use the same OS, the developer with the longest name should be kept in the map. So, the expected result looks like this:

Map<String, String> expected = ImmutableMap.of(
  "Eric", "MacOS",
  "Kent", "Linux",
  "Saajan", "Windows",
  "Kevin", "FreeBSD");

Next, let’s see how to achieve that:

Map<String, String> result = devMap.entrySet()
  .stream()
  .collect(toMap(Map.Entry::getValue, Map.Entry::getKey, (k1, k2) -> k1.length() > k2.length() ? k1 : k2))
  .entrySet()
  .stream()
  .collect(toMap(Map.Entry::getValue, Map.Entry::getKey));
assertThat(result).hasSize(4)
  .isEqualTo(expected);

As we can see, we took the previous stream-based implementation and only modified the merge function in toMap() to return the longer key when a collision occurs.

6. Conclusion

In this article, we’ve learned the “inverting twice” approach to removing duplicate HashMap values. Also, we’ve seen how to modify toMap()’s merge function to adapt a specific deduplication requirement.

As always, the complete source code for the examples is available over on GitHub.

Course – LS – All

Get started with Spring and Spring Boot, through the Learn Spring course:

>> CHECK OUT THE COURSE
res – REST with Spring (eBook) (everywhere)
Comments are open for 30 days after publishing a post. For any issues past this date, use the Contact form on the site.