Course – LS – All

Get started with Spring and Spring Boot, through the Learn Spring course:

>> CHECK OUT THE COURSE

1. Overview

Finding the most frequent characters in a string is a common task in programming.

In this tutorial, we’ll discuss various ways to achieve that in Java.

2. Using Java Stream API

Stream API is a significant new feature that Java 8 brings us. First, let’s solve the problem using Java Streams.

But before we dive into the implementation, let’s understand the problem quickly through an example. Let’s say we have a string INPUT1:

static final String INPUT1 = "aaaaaaaaaa(10) bbbbbbb ccccc dddd eee ff";

Here, the most frequent character in INPUT1  is the ‘a‘ characterSo, the goal is to find ‘a‘.

Next, let’s create a method to solve the problem:

static Character byStream(String input) {
    return input.chars()
      .mapToObj(x -> (char) x)
      .collect(groupingBy(x -> x, counting()))
      .entrySet()
      .stream()
      .max(comparingByValue())
      .get()
      .getKey();
}

As we can see, using the Stream API, we can chain multiple methods calls into one single statement. This is because Java Stream API applies the fluent interface pattern.

So next, let’s understand how this single statement does the job:

  • input.chars() – break the input string into an IntStream
  • mapToObject( x -> (char) x) – cast the integer to char and “box” it to Character
  • collect(groupingBy(x -> x, counting())) – group by Characters, counting() the occurrences, and collect them as Map<Character, Long>
  • entrySet().stream() – create a stream of all map entries
  • max(comparingByValue()) – find the maximal entry by the entry’s value. Here, we should note that the return type is Optional<Entry<Character, Long>>
  • get() – get the entry object
  • getKey() – get the entry’s key, which is the Character object with the maximal times of occurrences

Now that we understand how the byStream() method works, let’s test if it does the job correctly:

char result1 = CharacterWithHighestFrequency.byStream(INPUT1);
assertEquals('a', result1);

The test passes when we run it. So, we’ve solved the problem.

The method works correctly until we encounter some inputs like the following one day:

static final String INPUT2 = "YYYYYYY(7) bbbbb -------(7) dddd eee kkkkkkk(7) ff";

In INPUT2, three characters appear in the string in the same frequency (7): ‘Y‘, ‘‘, and ‘k‘. Our byStream() method will still work if the requirement defines any character from ‘Y‘, ‘‘, and ‘k‘ as the correct result.

However, if we need all the most frequent characters, our solution won’t work anymore.

Next, let’s build solutions covering both INPUT1 and INPUT2 cases.

3. Using a HashMap

HashMap maintains key-value associations. We can create a HashMap<Character, Integer> to hold Character and its occurrences mappings.

Then, we can find the entries with maximal occurrences. The entries keys are the answer.

So next, let’s implement this idea as a method:

static Set<Character> byMap(String input) {
    Map<Character, Integer> map = new HashMap<>();
    for (char c : input.toCharArray()) {
        map.compute(c, (character, count) -> count == null ? 1 : ++count);
    }

    int maxCount = map.values()
      .stream()
      .mapToInt(Integer::intValue)
      .max()
      .getAsInt();

    return map.keySet()
      .stream()
      .filter(c -> map.get(c) == maxCount)
      .collect(toSet());
}

The byMap() method above returns Set<Character> instead of a single char or Character object.

Next, let’s walk through the implementation quickly.

As we’ve discussed, we first create an empty Map object. Then, we loop through the characters in the input string and increment characters’ occurrences in the map.

It’s worth mentioning that we use the Map.compute() method to simplify the implementation of “If a key (Character) exists, increment its value, otherwise create an entry: theCharacter: 1“.

After we pass through the characters, we need to find the highest occurrences count. So, we use the Stream API to find out maxCount. We can also find maxCount while looping through the characters. We’ll see how this is done later in the buckets approach.

Finally, we use the Stream API again to collect the characters with maxCount occurrences.

One goal of the byMap() method is to cover both INPUT1 and INPUT2 scenarios. We can write a test to verify that it works as expected:

static final Set<Character> EXPECTED1 = Collections.singleton('a');
static final Set<Character> EXPECTED2 = ImmutableSet.of('Y', '-', 'k');

Set<Character> result1 = CharacterWithHighestFrequency.byMap(INPUT1);
assertEquals(EXPECTED1, result1);
                                                                     
Set<Character> result2 = CharacterWithHighestFrequency.byMap(INPUT2);
assertEquals(EXPECTED2, result2);

4. Using “Buckets”

We’ve learned that the HashMap-based approach holds character -> occurrences mappings and then finds the characters with the maximal occurrences. Assuming that we only focus on the ASCII character set, we can improve the HashMap-based solution by maintaining character -> occurrences associations with an array. Thus, we can save the Map and HashMap‘s overhead.

We know that the ASCII character set consists of 128 characters. Also, the ASCII codes of all these 128 characters range from 0 to 127. Therefore, we can initialize an int[] array with 128 elements (buckets). The array indexes are from 0 to 127, which covers all ASCII characters’ ASCII codes.

We increment the corresponding character bucket when going through the given string’s characters.

Next, let’s implement this idea in the byBucket() method:

static Set<Character> byBucket(String input) {
    int[] buckets = new int[128];

    int maxCount = 0;
    for (char c : input.toCharArray()) {
        buckets[c]++;
        maxCount = Math.max(buckets[c], maxCount);
    }

    int finalMaxCount = maxCount;
    return IntStream.range(0, 128)
      .filter(c -> buckets[c] == finalMaxCount)
      .mapToObj(i -> (char) i)
      .collect(Collectors.toSet());
}

As the code above shows, buckets is the array with 128 elements. As it’s an int[] primitive array, the initial values are all zero.

When we loop through the characters, we increment the bucket value simply by buckets[c]++. After that, we update the maxCount variable in the loop too. Thus, after the loop, we have all characters’ occurrence counts and the highest occurrences value in the maxCount variable.

The final step is finding out which bucket has the same value as maxCount, and converting their indexes to Character objects as the answer. We use the Stream API again to do that.

It’s worth mentioning that we reassigned maxCount to another int variable to make it effectively final so that the lambda expression in the filter() method call can accept it.

Finally, let’s create a test to cover the INPUT1 and INPUT2 cases to check if the byBucket() method works as expected:

Set<Character> result1 = CharacterWithHighestFrequency.byBucket(INPUT1);
assertEquals(EXPECTED1, result1);

Set<Character> result2 = CharacterWithHighestFrequency.byBucket(INPUT2);
assertEquals(EXPECTED2, result2);

The test passes if we give it a run.

5. Conclusion

In this article, we’ve explored how to find the most frequent characters in a string. We’ve discussed three approaches through examples:

  • one single statement solution using Java Stream API
  • HashMap-based solution
  • bucket-based solution

As usual, all code snippets presented here are available over on GitHub.

Course – LS – All

Get started with Spring and Spring Boot, through the Learn Spring course:

>> CHECK OUT THE COURSE
res – REST with Spring (eBook) (everywhere)
2 Comments
Oldest
Newest
Inline Feedbacks
View all comments
Comments are open for 30 days after publishing a post. For any issues past this date, use the Contact form on the site.