Working on getting your persistence layer right with Spring?

# Finding the N-th Occurrence of a Substring in a String in Java

Last updated: November 7, 2023

## 1. Overview

Locating a substring in a larger string is a common operation when we work with Java. Usually, we can find a substring’s index using the *indexOf()* method.

In this tutorial, we’ll explore various approaches to solve an interesting and pragmatic problem: finding the n-th occurrence of a substring in a larger string.

## 2. Introduction to the Problem

The standard *indexOf()* method can give us the index of a substring in a string. For example, we can find the index (*8*) of the substring *“a”* in “*This is a word.*“:

```
int firstIdx = "This is a word.".indexOf("a");
assertEquals(8, firstIdx);
```

However, **the indexOf(substring) method always returns the index of the substring’s first occurrence. **Therefore, sometimes, it’s inconvenient when we want to know a substring’s n-th occurrence index using this method. Let’s see an example:

```
final static String INPUT = "a word, a word, a word, a word";
// indexes of "a": "0 8 16 24 "
```

As the example above shows, *INPUT* contains four “*a*“s. We can obtain the first “a” index (*0*) by calling *indexOf(“a”)* directly. However, we may need solutions to get other indexes of “a”‘s occurrences, such as 8, 16, and 24.

Next, let’s see how to solve this problem.

For simplicity, we’ll leverage unit test assertions to verify whether each approach returns the expected result.

## 3. Finding the Substring’s Second Occurrence

Before we solve the “finding the n-th occurrence” problem, let’s first find the index of the second (*n=2*) occurrence of the substring “a”. Then, we’ll extend the “find the second occurrence” solution to various “the n-th occurrence” solutions.

We’ve learned that the *indexOf(“a”)* returns the index of the first occurrence of “a”. Further, we can pass the *fromIndex int* parameter to the *indexOf()* method. **The fromIndex parameter indicates the character index we start searching from. **

Therefore, a straightforward idea to get the second occurrence’s index is calling *indexOf()* twice:

- Get the index of the first occurrence by calling
*indexOf(substring)*and save the result in a variable, let’s say*firstIdx* - Get the index of the second occurrence by calling
*indexOf(substring, firstIdx + substring.length())*

Next, let’s implement this approach and test with our *INPUT* string:

```
int firstIdx = INPUT.indexOf("a");
int secondIdx = INPUT.indexOf("a", firstIdx + "a".length());
assertEquals(8, secondIdx);
```

Now, some of us might have realized **we can call indexOf() n times with the corresponding fromIdx parameter to get the index of the n-th occurrence. **For example, we can add another

*indexOf()*call to the test above to get the third occurrence’s index:

*thirdIdx = INPUT.indexOf(“a”, secondIdx + “a”.length());*.

So next, let’s extend the “finding the second occurrence” solution to “finding the n-th occurrence”.

## 4. Finding the Substring’s N-th Occurrence

Usually, we’d use recursion or looping to implement repeated operations.

### 4.1. The Recursive Approach

So, let’s first implement the recursive solution:

```
int nthIndexOf(String input, String substring, int nth) {
if (nth == 1) {
return input.indexOf(substring);
} else {
return input.indexOf(substring, nthIndexOf(input, substring, nth - 1) + substring.length());
}
}
```

The implementation is pretty straightforward. **The nth variable works as a counter. We reduce its value in each recursion step. **

Next, let’s test the method with our example data:

```
int result1 = nthIndexOf(INPUT, "a", 1);
assertEquals(0, result1);
int result2 = nthIndexOf(INPUT, "a", 2);
assertEquals(8, result2);
int result3 = nthIndexOf(INPUT, "a", 3);
assertEquals(16, result3);
int result4 = nthIndexOf(INPUT, "a", 4);
assertEquals(24, result4);
int result5 = nthIndexOf(INPUT, "a", 5);
assertEquals(-1, result5);
```

The test passes if we give it a run. Also, as we can see, when the total occurrence count is less than the *nth* value, the method returns *-1*.

### 4.2. The Iterative Approach

Similarly, we can implement the same idea in an iterative approach:

```
static int nthIndexOf2(String input, String substring, int nth) {
int index = -1;
while (nth > 0) {
index = input.indexOf(substring, index + substring.length());
if (index == -1) {
return -1;
}
nth--;
}
return index;
}
```

The test with the same input passes as well:

```
int result1 = nthIndexOf2(INPUT, "a", 1);
assertEquals(0, result1);
int result2 = nthIndexOf2(INPUT, "a", 2);
assertEquals(8, result2);
int result3 = nthIndexOf2(INPUT, "a", 3);
assertEquals(16, result3);
int result4 = nthIndexOf2(INPUT, "a", 4);
assertEquals(24, result4);
int result5 = nthIndexOf2(INPUT, "a", 5);
assertEquals(-1, result5);
```

## 5. The Regex-Based Solution

We’ve seen how to solve the problem using the *indexOf()* method. Alternatively, we can use Java’s Regex API to solve the problem.

** Matcher.find() allows us to find the next matched occurrence in the input string. **Therefore, we can call

*Matcher.find()*n times to get the n-th match. Also,

**we can get each match’s start index using**

*Matcher.start()*:```
int nthOccurrenceIndex(String input, String regexPattern, int nth) {
Matcher matcher = Pattern.compile(regexPattern).matcher(INPUT);
for (int i = 0; i < nth; i++) {
if (!matcher.find()) {
return -1;
}
}
return matcher.start();
}
```

Next, let’s create a test to verify whether the regex-based solution does the job correctly:

```
int result1 = nthOccurrenceIndex(INPUT, "a", 1);
assertEquals(0, result1);
int result2 = nthOccurrenceIndex(INPUT, "a", 2);
assertEquals(8, result2);
int result3 = nthOccurrenceIndex(INPUT, "a", 3);
assertEquals(16, result3);
int result4 = nthOccurrenceIndex(INPUT, "a", 4);
assertEquals(24, result4);
int result5 = nthOccurrenceIndex(INPUT, "a", 5);
assertEquals(-1, result5);
```

It’s worth noting that this solution allows us to match dynamic substrings matching a pattern in the input. However, on the other hand, the *indexOf()* based approach only works with fixed substrings.

## 6. Conclusion

In this article, we’ve learned various ways to locate the n-th occurrence of a substring within a string:

- The recursive solution based on the
*indexOf()*method - The iterative solution based on the
*indexOf()*method - Regex-based solution

As always, the complete source code for the examples is available over on GitHub.