1. Overview

Regular expressions are extremely useful for text processing. In this tutorial, we’ll learn about the functionalities of the scala.util.matching.Regex class in Scala and how we can use it in practice.

2. Regex Class in Scala

scala.util.matching.Regex is based on the java.util.regex package in Java. It provides a very clean and concise API. Additionally, with pattern matching, the Regex class gains extra readability.

There are two ways to define a Regex object. First, by explicitly creating it:

val polishPostalCode = new Regex("([0-9]{2})\\-([0-9]{3})")

Second, by using the r method, which is a canonical, Scala-like way to do the same:

val polishPostalCode = "([0-9]{2})\\-([0-9]{3})".r

Now, let’s have a closer look at typical use cases for regular expressions with Regex.

3. Finding Matches

One of the most common use cases is finding matches in text.

We can use the findFirstIn method, which returns an Option[String] object:

val postCode = polishPostalCode.findFirstIn("Warsaw 01-011, Jerusalem Avenue")
assertEquals("01-011", postCode)

Alternatively, we can use findFirstMatchIn, which returns Option[Match]:

val postCodes = polishPostalCode.findFirstMatchIn("Warsaw 01-011, Jerusalem Avenue")
assertEquals(Some("011"), for (m <- postCodes) yield m.group(2))

To find all matches, we have similarly-named methods: findAllIn that returns MatchIterator:

val postCodes = polishPostalCode.findAllIn("Warsaw 01-011, Jerusalem Avenue, Cracow 30-059, Mickiewicza Avenue")
  .toList
assertEquals(List("01-011", "30-059"), postCodes)

and findAllMatchIn that returns Iterator[Match]:

val postCodes = polishPostalCode.findAllMatchIn("Warsaw 01-011, Jerusalem Avenue, Cracow 30-059, Mickiewicza Avenue")
  .toList
val postalDistricts = for (m <- postCodes) yield m.group(1)
assertEquals(List("01", "30"), postalDistricts)

4. Extracting Values

When a regular expression is matched, we can use Regex as an extractor using pattern matching:

val timestamp = "([0-9]{2}):([0-9]{2}):([0-9]{2}).([0-9]{3})".r
val description = "11:34:01.411" match {
  case timestamp(hour, minutes, _, _) => s"It's $minutes minutes after $hour"
}

assertEquals("It's 34 minutes after 11", description)

By default, Regex behaves as if the pattern was “anchored” — that is, put in the middle of characters ^$ – “^pattern$”. However, we can remove those characters by using the unanchored method of the UnanchoredRegex class:

val timestampUnanchored = timestamp.unanchored

Now, we can put additional text around the match, and we’ll be still able to find it:

val description = "Timestamp: 11:34:01.411 error appeared" match {
  case timestampUnanchored(hour, minutes, _, _) => s"It's $minutes minutes after $hour"
}

assertEquals("It's 34 minutes after 11", description)

5. Replacing Text

Another crucial feature is replacing text. We can achieve it with the overloaded replaceAllIn method:

val minutes = timestamp.replaceAllIn("11:34:01.311", m => m.group(2))

assertEquals("34", minutes)

Also, we can nicely combine this function with pattern matching:

val secondsThatDayInTotal = timestamp.replaceAllIn("11:34:01.311", _ match {
  case timestamp(hours, minutes, seconds, _) => s"$hours-$minutes"
})

assertEquals("11-34", secondsThatDayInTotal)

6. Conclusion

In this tutorial, we introduced the Regex class located in Scala’s standard library. As we have seen, it provides a super-readable API that can help us with the most common regular expression use cases.

As usual, the full source code can be found over on GitHub.

guest
0 Comments
Inline Feedbacks
View all comments