I just announced the new Spring 5 modules in REST With Spring:

>> CHECK OUT THE COURSE

1. Overview

In this article, we’ll look at different ways to search an array for a specified value.

We’ll also compare how these perform using JMH (the Java Microbenchmark Harness) to determine which method works best.

2. Setup

For our examples, we’ll use an array that contains randomly generated Strings for each test:

String[] seedArray(int length) {
    String[] strings = new String[length];
    Random value = new Random();
    for (int i = 0; i < length; i++) {
        strings[i] = String.valueOf(value.nextInt());
    }
    return strings;
}

To reuse the array in each benchmark, we’ll declare an inner class to hold the array and the count so we can declare its scope for JMH:

@State(Scope.Benchmark)
public static class SearchData {
    static int count = 1000;
    static String[] strings = seedArray(1000);
}

Three commonly used methods for searching an array are as a List, a Set, or with a loop that examines each member until it finds a match.

Let’s start with three methods that implement each algorithm:

boolean searchList(String[] strings, String searchString) {
    return Arrays.asList(SearchData.strings)
      .contains(searchString);
}

boolean searchSet(String[] strings, String searchString) {
    Set<String> stringSet = new HashSet<>(Arrays.asList(SearchData.strings));
    
    return stringSet.contains(searchString);
}

boolean searchLoop(String[] strings, String searchString) {
    for (String string : SearchData.strings) {
        if (string.equals(searchString))
        return true;
    }
    
    return false;
}

We’ll use these class annotations to tell JMH to output average time in microseconds and run for five warmup iterations to ensure that our tests are reliable:

@BenchmarkMode(Mode.AverageTime)
@Warmup(iterations = 5)
@OutputTimeUnit(TimeUnit.MICROSECONDS)

And run each test in a loop:

@Benchmark
public void searchArrayLoop() {
    for (int i = 0; i < SearchData.count; i++) {
        searchLoop(SearchData.strings, "T");
    }
}

@Benchmark
public void searchArrayAllocNewList() {
    for (int i = 0; i < SearchData.count; i++) {
        searchList(SearchData.strings, "T");
    }
}

@Benchmark
public void searchArrayAllocNewSet() {
    for (int i = 0; i < SearchData.count; i++) {
        searchSet(SearchData.strings, "S");
    }
}

When we run with 1000 searches for each method, our results look something like this:

SearchArrayTest.searchArrayAllocNewList  avgt   20    937.851 ±  14.226  us/op
SearchArrayTest.searchArrayAllocNewSet   avgt   20  14309.122 ± 193.844  us/op
SearchArrayTest.searchArrayLoop          avgt   20    758.060 ±   9.433  us/op

The loop search is more efficient than others. But this is at least partly because of how we’re using collections.

We’re creating a new List instance with each call to searchList() and a new List and a new HashSet with each call to searchSet(). Creating these objects creates an additional cost that looping through the array doesn’t.

4. More Efficient Search

What happens when we create single instances of List and Set and then reuse them for each search?

Let’s give it a try:

public void searchArrayReuseList() {
    List asList = Arrays.asList(SearchData.strings);
    for (int i = 0; i < SearchData.count; i++) {
        asList.contains("T");
    }
}

public void searchArrayReuseSet() {
    Set asSet = new HashSet<>(Arrays.asList(SearchData.strings));
    for (int i = 0; i < SearchData.count; i++) {
        asSet.contains("T");
    }
}

We’ll run these methods with the same JMH annotations as above, and include the results for the simple loop for comparison.

We see very different results:

SearchArrayTest.searchArrayLoop          avgt   20    758.060 ±   9.433  us/op
SearchArrayTest.searchArrayReuseList     avgt   20    837.265 ±  11.283  us/op
SearchArrayTest.searchArrayReuseSet      avgt   20     14.030 ±   0.197  us/op

While searching the List is marginally faster than before, Set drops to less than 1 percent of the time required for the loop!

Now that we’ve removed the time required for creating new Collections from each search, these results make sense.

Searching a hash table, the structure underlying a HashSet, has a time complexity of 0(1), while an array, which underlies the ArrayList is 0(n).

Another method for searching an array is a binary search. While very efficient, a binary search requires that the array is sorted in advance.

Let’s sort the array and try the binary search:

@Benchmark
public void searchArrayBinarySearch() {
    Arrays.sort(SearchData.strings);
    for (int i = 0; i < SearchData.count; i++) {
        Arrays.binarySearch(SearchData.strings, "T");
    }
}
SearchArrayTest.searchArrayBinarySearch  avgt   20     26.527 ±   0.376  us/op

Binary search is very fast, although less efficient than the HashSet: the worst case performance for a binary search is 0(log n), which places its performance between that of an array search and a hash table.

6. Conclusion

We’ve seen several methods of searching through an array.

Based on our results, a HashSet works best for searching through a list of values. However, we need to create them in advance and store them in the Set.

As always, the full source code of the examples is available over on GitHub.

I just announced the new Spring 5 modules in REST With Spring:

>> CHECK OUT THE LESSONS

Leave a Reply

2 Comments on "Check if a Java Array Contains a Value"

Notify of
avatar
Sort by:   newest | oldest | most voted
Krzysztof Bieroń
Guest

– 5 iterations are not reliable at all in such simple test, because these methods won’t be touched by JIT compiler. Maybe it’s not something that would change results dramatically, but in perf tests you want to be precise.
– even if JIT would catch this up (eg. because of big loops) it would make it even worse, because in every test there is loop with potential to dead code elimination, by simplifying it to one method call (last loop iteration is enough to get same result and JIT sees it clearly).

Grzegorz Piwowarek
Editor

You’re generally right but here we don’t care about the precise measurement of the absolute execution time – we’re looking and a relative comparison between various methods