Data Validation With Cats | Baeldung on Scala

1. Introduction

Data Validation is the process of verifying the integrity and structure of data before it’s used in business operations. In fact, Cats offers data structures and methods that ease the process of data validation in a Scala application.

In this tutorial, we’ll look at different approaches to data validation and find out the limitations and benefits of each.

2. Problem Statement

We’ll explore the following scenario. A university has contracted us to help automate the processing of receiving scholarship applications. In addition, the university has the following requirements for eligibility:

Should be a citizen from Uganda, Kenya, or Tanzania.
They should be 25 years old or older as of the current year.
A CGPA(Cumulative Grade Point Average) of over 3.0 is required.

We’d like to return the appropriate error message(s) if a candidate doesn’t meet one or more of the scholarship requirements. However, if the candidate meets all the requirements, we’ll return a case class showing all the supplied information.

3. Setting Up

To be able to follow along, we’ll need to add the Cats Effect library to our build.sbt file:

val scala3Version = "3.3.1"

lazy val root = project
  .in(file("."))
  .settings(
    name := "dataValidation",
    version := "0.1.0-SNAPSHOT",

    scalaVersion := scala3Version,

    libraryDependencies += "org.typelevel" %% "cats-effect" % "3.5.2",
    libraryDependencies += "org.scalatest" %% "scalatest" % "3.2.17" % Test
  )

Cats Effect provides a pure asynchronous runtime for our application. In addition it also automatically installs Cats as a dependency. Furthermore, we’ll be using Scala 3.3.1, which is the current LTS release version. Lastly, all tests for this project will be written with ScalaTest, and the implementations will be available on GitHub.

4. Smart Constructors With Option

In this section, we’ll create the necessary case classes to validate the scholarship data and employ smart constructors for data validation.

Firstly, we define case classes for country, age and cgpa to represent the needs of the university:

object Utilities:
  case class Country(value: String)
  object Country:
    private val countries = List("uganda", "kenya", "tanzania")
    def apply(value: String): Option[Country] =
      Some(value)
        .filter(v => countries.contains(v.toLowerCase))
        .map(c => new Country(c.toLowerCase))

  case class Age(value: Int)
  object Age:
    def apply(value: Int): Option[Age] =
      Some(value).filter(_ >= 25).map(new Age(_))

  case class Cgpa(value: Double)
  object Cgpa:
    def apply(value: Double): Option[Cgpa] =
      Some(value).filter(_ >= 3.0).map(new Cgpa(_))

Here we define case classes to represent the required inputs for a scholarship.Additionally, each case class now has a companion object with an apply() method that returns an Option of that class.

In the body of the apply() method, we pass the argument to a Some and call the filter() method with our predicate. Therefore, if the value satisfies the predicate, it’s passed to a constructor to produce the required case class wrapped in a Some; otherwise, we return a None.

5. Error Accumulation With Cats

Here we’ll explore the different approaches we can use to handle error accumulation with Cats.

5.1. Error Accumulation Using EitherNec

We can accumulate these errors with the use of EitherNec from Cats, which is an Either with a NonEmptyChain as the error channel.

Cats provides Chain as an immutable sequence data structure that allows constant time prepending, appending, and concatenation. In contrast to List, Chain provides better performance since it supports a constant O(1) time append, prepend, and concat:

import cats.data.EitherNec
import cats.syntax.all.*

object Version1:
  import Utilities.*

  case class Scholarship(country: Country, age: Age, cgpa: Cgpa)
  object Scholarship:
    def apply(
      value1: String,
      value2: Int,
      value3: Double
    ): EitherNec[String, Scholarship] =
      (
        Country(value1).toRightNec("Invalid Coutry"),
        Age(value2).toRightNec("Invalid Age"),
        Cgpa(value3).toRightNec("Invalid Cgpa")
      ).parMapN(
        Scholarship.apply
      )

Here we’ve a Scholarship case class with three elements, Country, Age, and Cgpa. Beyond this, the Scholarship companion object takes a String, Int, and Double and passes it to the appropriate constructors.

Additionally, we convert the Option values to EitherNec[String, _] using the toRightNec() method which takes the error message as a String. Next, we then call parMapN() and pass their results to Scholarship.apply:

import cats.effect.{IOApp, IO, ExitCode}

object DataValidation extends IOApp.Simple:
  import Version1.*
  def run: IO[Unit] =
    Scholarship("Uganda", 23, 2.5) match
      case Right(x) => IO.println(x)
      case Left(y)  => IO.println(y.toChain)

// Chain(Invalid Age, Invalid Cgpa)

Lastly, when we run our program we accumulate our errors in a Chain.

5.2. Error Accumulation Using ValidatedNec

The Validated data type is identical to Either, since it has Valid state that holds a successful value and an Invalid state that holds an error value. Similar to EitherNec, Cats also provides a ValidatedNec for error accumulation.

Since the Either type is a Monad, we handled error accumulation in Version1 using the parMapN(), the parallel version of mapN(). However, Parallel computation for error accumulation is a bit much, we’re better off using an Applicative Functor like ValidatedNec, here’s why.

Using Applicative Functor, we can append the encountered errors with a Semigroup operation leaving out the need for parallel computation. Therefore, we can re-write our example with ValidatedNec:

import cats.data.ValidatedNec

object Version2:
  import Utilities.*

  case class Scholarship(country: Country, age: Age, cgpa: Cgpa)
  object Scholarship:
    def apply(
        value1: String,
        value2: Int,
        value3: Double
    ): ValidatedNec[String, Scholarship] =
      (
        Country(value1).toValidNec("Invalid Country"),
        Age(value2).toValidNec("Invalid Age"),
        Cgpa(value3).toValidNec("Invalid Cgpa")
      ).mapN(
        Scholarship.apply
      )

Now we return a ValidatedNec[String, Scholarship] from our apply() function, and we now use the toValidNec() method on Option to create a ValidatedNec:

import cats.data.Validated.{Invalid, Valid}

object DataValidation extends IOApp.Simple:
  import Version2.*
  def run: IO[Unit] =
    Scholarship("Uganda", 23, 2.5) match
      case Valid(x)   => IO.println(x)
      case Invalid(y) => IO.println(y.toChain)

// Chain(Invalid Age, Invalid Cgpa)

In the end, we manage to get rid of parMapN(), and now use the simpler mapN() function. In the next section, we’ll look at a better way to represent errors.

6. Representing Errors With ADTs

In Scala, it’s common practice to represent errors as ADTs (Algebraic Data Types). These have the following benefits:

We improve on error Pattern matching.
Better representation of illegal state.
There’s better readability.

We can create ADTs with the use of sealed traits and child classes or with the use of enums in Scala 3. Thus, for this example, we’ll be using a sealed trait:

object Utilities2:
  sealed trait ScholarshipValidationError:
    val errMsg: String
  object ScholarshipValidationError:
    case object CountryValidationError extends ScholarshipValidationError:
      override val errMsg: String = "Must come from Uganda, Kenya or Tanzania."
    case object AgeValidationError extends ScholarshipValidationError:
      override val errMsg: String = "Must be 25 years or more."
    case object CgpaValdiationError extends ScholarshipValidationError:
      override val errMsg: String = "CGPA must be 3.0 or more"

Here we create a sealed trait, ScholarshipValidationError, with one abstract val, errMsg, which is a String. Furthermore, within the companion object, we’ve three case objects, CountryValidationError, AgeValidationError, and CgpaValdiationError that each extends ScholarshipValidationError.

As a result, we now provide proper error messages here to better describe what went wrong:

object Utilities2:
  ...
  import ScholarshipValidationError.*

  case class Country(value: String)
  object Country:
    private val countries = List("uganda", "kenya", "tanzania")
    def apply(
      value: String
    ): ValidatedNec[ScholarshipValidationError, Country] =
      Validated.condNec(
        countries.contains(value.toLowerCase),
        new Country(value.toLowerCase),
        CountryValidationError
      )

  case class Age(value: Int)
  object Age:
    def apply(value: Int): ValidatedNec[ScholarshipValidationError, Age] =
      Validated.condNec(
        value >= 25,
        new Age(value),
        AgeValidationError
      )

  case class Cgpa(value: Double)
  object Cgpa:
    def apply(value: Double): ValidatedNec[ScholarshipValidationError, Cgpa] =
      Validated.condNec(
        value >= 3.0,
        new Cgpa(value),
        CgpaValdiationError
      )

We also need to update each apply() function for Country, Age, and Cgpa to return a ValidatedNec with the error channel as ScholarshipValidationError.

The Validated.condNec() method above takes a predicate, a success value, and a unique failure value represented as a subtype of ScholarshipValidationError to produce a ValidateNec:

object Version3:
  import Utilities2.*
  case class Scholarship(country: Country, age: Age, cgpa: Cgpa)
  object Scholarship:
    def apply(
        value1: String,
        value2: Int,
        value3: Double
    ): ValidatedNec[ScholarshipValidationError, Scholarship] =
      (
        Country(value1),
        Age(value2),
        Cgpa(value3)
      ).mapN(
        Scholarship.apply
      )

Here we no longer need to call toValidNec since we set our apply() functions to produce this type. Therefore the result of our apply() in the Scholarship object is now ValidatedNec[ScholarshipValidationError, Scholarship] with the error being represented as ScholarshipValidationError instead of a String:

object DataValidation extends IOApp.Simple:
  import Version3.*
  def run: IO[Unit] =
    Scholarship("Uganda", 23, 2.5) match
      case Valid(x)   => IO.println(x)
      case Invalid(y) => IO.println(y.toChain)

// Chain(AgeValidationError, CgpaValdiationError)

This is a better representation of our errors, and we can call errMsg on the value if we need to pass these to the scholarship applicant.

7. Conclusion

In this article, we’ve looked at different methods of validating input with Cats using Either, and Validated. In addition, we looked at error accumulation and how this is possible with EitherNec and ValidatedNec and saw the difference between a Monad and an Applicative Functor.

As always, the code for this article can be found over on GitHub.

Full Archive

About Baeldung

Scala Basics

Functional Programming

Akka

Scala OOP

Scala Type System

Testing

Play Framework