1. Overview

Sometimes we want to extract certain elements from a collection. In Scala, we have both the filter and withFilter methods that can be used for this purpose.

In this tutorial, we’ll learn about the differences between them and how they can affect performance.

2. Example

To make our comparisons tangible, we’ll filter a collection of Programmers. Each of them has a name, list of knownLanguages, and level, which could be Junior, Mid, or Senior:

case class Programmer(name: String,
                      level: Level,
                      knownLanguages: List[String])
sealed trait Level
  object Level {
    case object Junior extends Level
    case object Mid extends Level
    case object Senior extends Level
  }

Let’s define a shortlist of programmers to use in our examples:

val programmers: List[Programmer] = List(
  Programmer(name = "Kelly",
             level = Level.Mid,
             knownLanguages = List("JavaScript")),
  Programmer(name = "John",
             level = Level.Senior,
             knownLanguages = List("Java", "Scala", "Kotlin")),
  Programmer(name = "Dave",
             level = Level.Junior,
             knownLanguages = List("C", "C++"))
)

We’ll try to find the names of the programmers who know more than one language and are on the Mid or Senior level:

def isMidOrSenior(implicit counter: AtomicInteger): Programmer => Boolean =
  programmer => {
    counter.incrementAndGet()
    println("verify level " + programmer)
    List(Level.Mid, Level.Senior).contains(programmer.level)
  }

def knowsMoreThan1Language(implicit counter: AtomicInteger): Programmer => Boolean =
  programmer => {
    counter.incrementAndGet()
    println("verify number of known languages " + programmer)
    programmer.knownLanguages.size > 1
  }

val getName: Programmer => String =
  programmer => {
    println("get name " + programmer)
    programmer.name
  }

To make our performance comparisons easier to understand, we’re additionally printing and incrementing counters in the above methods.

3. The filter Method

3.1. Signature

The filter method takes a predicate function and returns a new collection that keeps only those elements that satisfy the given predicate:

def filter(p: A => Boolean): Repr

Repr is the type of the actual collection, which is List in our case.

A is the type of element in the collection, which is the Programmer in our case.

3.2. Evaluation

Let’s find the desired programmers:

implicit val counter: AtomicInteger = new AtomicInteger(0)

val desiredProgrammers: List[Programmer] = programmers
  .filter(isMidOrSenior)
  .filter(knowsMoreThan1Language)

counter.get() shouldBe 5
desiredProgrammers.map(getName) shouldBe List("John")
counter.get() shouldBe 5

It prints the following result into the console:

verify level Programmer(Kelly,Mid,List(JavaScript))
verify level Programmer(John,Senior,List(Java, Scala, Kotlin))
verify level Programmer(Dave,Junior,List(C, C++))
verify number of known languages Programmer(Kelly,Mid,List(JavaScript))
verify number of known languages Programmer(John,Senior,List(Java, Scala, Kotlin))
get name Programmer(John,Senior,List(Java, Scala, Kotlin))

We can see that the isMidOrSenior predicate was applied to all programmers. It filtered out Dave, who is a Junior. Then it went through these results and applied the knowsMoreThan1Language predicate. As Kelly knows only one language, it filtered her out.

In the end, we were left with John, who both isn’t Junior and knows more than one language.

It took five predicate operations to achieve the desired result.

4. The withFilter Method

4.1. Signature

The withFilter method has almost the same signature as the filter method. The only difference is that it returns a WithFilter instead of a collection:

def withFilter(p: A => Boolean): WithFilter[A, Repr]

Repr and A have the same meanings as in the filter method.

4.2. Evaluation

Now let’s check the behavior of the withFilter method:

implicit val counter: AtomicInteger = new AtomicInteger(0)

val desiredProgrammers: WithFilter[Programmer, List[Programmer]] =
  programmers
    .withFilter(isMidOrSenior)
    .withFilter(knowsMoreThan1Language)

counter.get() shouldBe 0

desiredProgrammers.map(getName) shouldBe List("John")
counter.get() shouldBe 5

From the above code, we can see that calling the withFilter method doesn’t evaluate anything. Evaluation happens only after the map method.

Executing map prints the following result to the console:

verify level Programmer(Kelly,Mid,List(JavaScript))
verify number of known languages Programmer(Kelly,Mid,List(JavaScript))
verify level Programmer(John,Senior,List(Java, Scala, Kotlin))
verify number of known languages Programmer(John,Senior,List(Java, Scala, Kotlin))
get name Programmer(John,Senior,List(Java, Scala, Kotlin))
verify level Programmer(Dave,Junior,List(C, C++))

From the above output, we can see that both predicates (isMidOrSenior and knowsMoreThan1Language) and mapping function (getName) were applied for each programmer in one iteration.

This behavior can significantly improve performance because there is only a single iteration through the collection for the two filtering operations and one mapping operation, compared with the three iterations that occurred using the filter method.

5. Comparison

We saw that despite the same signature of both methods, they have completely different evaluation characteristics. When we have to provide more than one filtering operation, it’s better from a performance point of view to use withFilter.

Therefore, we may ask ourselves, why would we ever use the filter method? We should remember that the withFilter method iterates only when we call the foreach, map, or flatMap method on WithFilter. If we only want to apply a single predicate without any mapping on the result, we should use the filter method to provide us the collection directly.

If we only have a WithFilter, after some previous operation, we could force it to produce a collection by calling the map method with the identity function.

6. Conclusion

In this short article, we explored differences between the filter and withFilter methods. We saw how to use them and how to choose between them depending on our use case.

As always, the full source code of the article is available over on GitHub.

1 Comment
Oldest
Newest
Inline Feedbacks
View all comments
Comments are open for 30 days after publishing a post. For any issues past this date, use the Contact form on the site.