Baeldung Pro – Scala – NPI EA (cat = Baeldung on Scala)
announcement - icon

Learn through the super-clean Baeldung Pro experience:

>> Membership and Baeldung Pro.

No ads, dark-mode and 6 months free of IntelliJ Idea Ultimate to start with.

1. Introduction

In this tutorial, we’ll demonstrate different ways to read and write CSV files in Scala. Of course, our demonstration wouldn’t be complete if we didn’t explore some of the most popular libraries that provide CSV read and write capabilities.

Hence, we included sections for Scala CSV, Open CSV, and Apache Commons CSV libraries.

2. Building Blocks

To make our examples comparable and easier to work with, let’s define two Scala traits that every CSV library will implement, namely the CommaSeparatedValuesWriter and the CommaSeparatedValuesReader.

Specifically, the CommaSeparatedValuesWriter defines a method that yields a convenient CSV file digest:

trait CommaSeparatedValuesReader {
  def read(file: File): CSVReadDigest
}

The CSVReadDigest, which is the return type of the read function, contains headers and rows as properties:

case class CSVReadDigest(headers: Seq[String], rows: Seq[Seq[String]])

Equally with the CommaSeparatedValuesWriter, the CommaSeparatedValuesWrites defines a method that writes headers and rows to a given file:

trait CommaSeparatedValuesWriter {
  def write(
    file: File,
    headers: Seq[String],
    rows: Seq[Seq[String]]
  ): Try[Unit]
}

3. PrintWriter and BufferedReader

Before we import any external libraries, it’s useful to showcase how we can write and read a CSV file using readily available Java tools, the PrintWriter and BufferedReader.

3.1. Write

The SimpleCSVWriter prints the file contents by iterating through the input data and appending them to the underlying PrintWriter:

class SimpleCSVWriter extends CommaSeparatedValuesWriter {
  override def write(
    file: File,
    headers: Seq[String],
    rows: Seq[Seq[String]]
  ): Try[Unit] = Try {
    val writer = new PrintWriter(file)
    writer.println(headers.mkString(","))
    rows.foreach(row => writer.println(row.mkString(",")))
    writer.close()
  }
}

3.2. Read

The SimpleCSVReader reads a file’s contents by leveraging the BufferedReader interface. Additionally, the file contents are exhausted by recursively calling the readLinesRecursively method, which invokes the BufferedReader readline method for each line until the end of the file is reached:

class SimpleCSVReader extends CommaSeparatedValuesReader {
  override def read(file: File): CSVReadDigest = {
    val in = new InputStreamReader(new FileInputStream(file))
    val bufferedReader = new BufferedReader(in)

    @tailrec
    def readLinesRecursively(
      currentBufferedReader: BufferedReader,
      result: Seq[Seq[String]]
    ): Seq[Seq[String]] = {
      currentBufferedReader.readLine() match {
        case null => result
        case line =>
          readLinesRecursively(
            currentBufferedReader,
            result :+ line.split(",").toSeq
          )
      }
    }

    val csvLines = readLinesRecursively(bufferedReader, List())

    bufferedReader.close()

    CSVReadDigest(
      csvLines.head,
      csvLines.tail
    )
  }
}

4. Scala CSV

Scala CSV is a Scala library that provides traits with methods that accept and return Scala data structures making CSV handling less cumbersome since no Java-to-Scala conversions are needed.

4.1. Write

The library’s trait that provides CSV read capabilities is the CSVWriter. The ScalaCSVWriter wraps the CSVWriter which trivializes the task of writing lines to a CSV file by exposing functions that accept Seq[Any] arguments:

class ScalaCSVWriter extends CommaSeparatedValuesWriter {
  override def write(
    file: File,
    headers: Seq[String],
    rows: Seq[Seq[String]]
  ): Try[Unit] = Try {
    val writer = CSVWriter.open(file)
    writer.writeRow(headers)
    writer.writeAll(rows)
    writer.close()
  }
}

4.2. Read

Likewise, reading a file with ScalaCSV CSVReader is straightforward. The reader’s method all returns the file as a List[List[String]] hence the ScalaCSVReader wrapper implementation is quite short:

class ScalaCSVReader extends CommaSeparatedValuesReader {
  override def read(file: File): CSVReadDigest = {
    val reader = CSVReader.open(file)
    val all = reader.all()
    reader.close()
    CSVReadDigest(all.head, all.tail)
  }
}

5. OpenCSV

OpenCSV is a popular and widely used Java library for reading and writing CSV files thus we’ll proceed to create our own Scala wrapper implementations to showcase it.

5.1. Write

To use the OpenCSV CSVWriter’s interface writeAll method, our input must be first transformed to a Java Iterable. Let’s write the OpenCSVWriter:

class OpenCSVWriter extends CommaSeparatedValuesWriter {
  override def write(
    file: File,
    headers: Seq[String],
    rows: Seq[Seq[String]]
  ): Try[Unit] = Try(
    new CSVWriter(new BufferedWriter(new FileWriter(file)))
  ).flatMap((csvWriter: CSVWriter) =>
    Try {
      csvWriter.writeAll(
        (headers +: rows).map(_.toArray).asJava,
        false
      )
      csvWriter.close()
    }
  )
}

Furthermore, the OpenCSV CSVWriter interface includes methods that accept java.sql.ResultSet arguments, thus making it extremely useful when dealing with data fetched from databases.

5.2. Read

Similar to the SimpleCSVReader, the OpenCSVReader uses recursion to read a file’s contents. The recursive method readLinesRecursively returns the CSV rows by using the readNext CSVReader‘s function as an iterator:

class OpenCSVReader extends CommaSeparatedValuesReader {
  override def read(file: File): CSVReadDigest = {
    val reader = new CSVReader(
      new InputStreamReader(new FileInputStream(file))
    )

    @tailrec
    def readLinesRecursively(
      currentReader: CSVReader,
      result: Seq[Seq[String]]
    ): Seq[Seq[String]] = {
      currentReader.readNext() match {
        case null => result
        case line => readLinesRecursively(currentReader, result :+ line.toSeq)
      }
    }

    val csvLines = readLinesRecursively(reader, List())
    reader.close()

    CSVReadDigest(
      csvLines.head,
      csvLines.tail
    )
  }
}

6. Apache Commons CSV

The Apache Commons CSV library enables us to read and write CSV files in various formats. It’s been evolving since 2014 and is widely used, mainly by Java projects.

6.1. Write

In contrast with the other implementations we’ve provided in our examples, the Apache Commons CSVWriter‘s format is configured using a second argument, the CSVFormat. The CSVFormat builder allows us to choose a format and configure it by overriding any property that suits us. In our example, namely the ApacheCommonsCSVWriter, we override the headers property of the default CSVFormat:

class ApacheCommonsCSVWriter extends CommaSeparatedValuesWriter {
  override def write(
    file: File,
    headers: Seq[String],
    rows: Seq[Seq[String]]
  ): Try[Unit] = Try {
    val csvFormat = CSVFormat.DEFAULT
      .builder()
      .setHeader(headers: _*)
      .build()

    val out = new FileWriter(file)
    val printer = new CSVPrinter(out, csvFormat)
    rows.foreach(row => printer.printRecord(row: _*))
    printer.close()
  }
}

6.2. Read

The CSVFormat is also used for CSV parsing but in a different way. Let’s notice that in our implementation, the empty setHeader method call is the Apache Commons CSV way of telling the parser to automatically parse the headers from the first line of the file. The call to the parse function of the CSVFormat yields the CSVParser interface, which provides a variety of methods for reading the lines of the input file. So, let’s write our example without forgetting that the returned objects need Java-to-Scala transformations:

class ApacheCommonsCSVReader extends CommaSeparatedValuesReader {
  override def read(file: File): CSVReadDigest = {
    val in = new InputStreamReader(new FileInputStream(file))
    val csvParser = CSVFormat.DEFAULT
      .builder()
      .setHeader()
      .build()
      .parse(in)
    val result = CSVReadDigest(
      csvParser.getHeaderNames.asScala.toSeq,
      csvParser.getRecords.asScala.map(r => r.values().toSeq).toSeq
    )
    csvParser.close()
    result
  }
}

7. CSV Delimiters

Before we conclude our short tutorial, it’s crucial to mention the most frequent issue encountered in a CSV dataset is the comma presence in the delimited values leading to parse errors and wrong results.

A good compromise is the choice of a delimiter that is very unlikely to be present in our data such as a special character or a complex sequence of characters. Alternatively, some use quotes but then quotes must be escaped as well.

To conclude, we suggest that we should have a good knowledge of the data contents of our files before choosing a delimiter for our dataset.

8. Conclusion

In this short tutorial, we demonstrated how to read and write CSV files with Scala.

Additionally, we included some examples with Scala and Java libraries for good measure. In the end, we emphasized the common delimiter issue that developers usually have to provide a solution for.

The code backing this article is available on GitHub. Once you're logged in as a Baeldung Pro Member, start learning and coding on the project.