1. Introduction

Scala has a very powerful and extensive Collections API in the standard library. These APIs make it easy for users to apply methods to single collections or seamlessly combine and perform operations on multiple collections.

In this tutorial, we’ll explore two methods for combining multiple collections: zip() and lazyZip().

2. The zip() Method

We can use the zip() method to combine two collections into a single collection of tuples. It chooses elements from the corresponding index in each collection and creates a tuple of two elements. Let’s look at an example:

val list1 = List(1, 2, 3)
val list2 = List("a", "b", "c")
val zipped = list1.zip(list2)
zipped shouldBe List((1, "a"), (2, "b"), (3, "c"))

We can observe that applying the zip() operation to two lists generated a new list of tuples.

The zip() method is implemented in the IterableOps trait in the Scala Collections. As a result, it is available for any collections that implement the trait. However, the ordering of elements might not be reliable on non-indexed collections such as Set and Map.

When the zip() operation is used with collections of different sizes, the resulting collection will be equivalent to the smaller of the two input collections.

We can also chain zip operations on multiple collections, creating a nested tuple structure:

val res: List[((Int, Int), Int)] = list.zip(list).zip(list)

It generates a nested tuple by executing successive zip() operations on the initial result.

3. The lazyZip() Method

Scala provides an alternative form of the zip() method, lazyZip(). Unlike the zip() method, which performs eager evaluation, lazyZip() adopts lazy evaluation, postponing computation until the elements are accessed.

Let’s look at a sample code:

val list1 = List(1, 2, 3)
val list2 = List("a", "b", "c")
val zipped = list1.lazyZip(list2)
zipped.toList shouldBe List((1, "a"), (2, "b"), (3, "c"))

The variable zipped represents a lazy result. When we apply toList(), it converts the result into a List.

This lazy evaluation is beneficial when working with large or potentially infinite collections. Let’s consider a scenario where we use lazyZip() on infinite collections:

val infiniteNumbers: LazyList[Int] = LazyList.from(1)
val infiniteStrings: LazyList[String] = LazyList.iterate("a")(_ + "a")
val result = infiniteNumbers.lazyZip(infiniteStrings)
result.take(3).toList shouldBe List((1, "a"), (2, "aa"), (3, "aaa"))

Here, we have two infinite collections using LazyList. Even though we apply lazyZip() to the infinite collection, it doesn’t evaluate immediately. Zipping occurs only when we apply an on-demand operation.

Unlike the zip() method, lazyZip() automatically flattens chained operations up to four levels; beyond that, it creates a nested structure:

val list = List(1, 2, 3)
val level4Res = list.lazyZip(list).lazyZip(list).lazyZip(list).toList
level4Res shouldBe List((1, 1, 1, 1), (2, 2, 2, 2), (3, 3, 3, 3))
val level5Res =
level5Res shouldBe List(
  ((1, 1, 1, 1), 1),
  ((2, 2, 2, 2), 2),
  ((3, 3, 3, 3), 3)

We can observe that the fifth chaining generates the result as a nested structure.

In Scala 2.13, the lazyZip() method was introduced to replace the zipped() method found in Scala 2.12 and earlier versions.

4. Simple Performance Comparison

In this section, we’ll explore a basic method to compare the time taken by zip() and lazyZip() operations:

def timed[T](f: => T): T = {
  val startTime = System.nanoTime()
  val res = f
  val endTime = System.nanoTime()
  println(s"Time taken for operation: ${(endTime - startTime) / 1000000} milliseconds")
def main(): Unit = {
  val largeList = (1 to 10000000).toList
  println("--- zip ---")
  timed(largeList.zip(largeList).take(100)) // eager evaluation
  println("--- lazyZip without eval ---")
  timed(largeList.lazyZip(largeList)) // lazy evaluation of lazyZip
  println("--- lazyZip with partial eval ---")
  timed(largeList.lazyZip(largeList).take(100).toList) // force partial evaluation of lazyZip
  println("--- lazyZip with full eval ---")
  timed(largeList.lazyZip(largeList).toList) // force full evaluation of lazyZip

Here, we perform different operations using zip() and lazyZip() methods and calculate the time taken for each:

zip vs lazyZip Opration timings

We can observe that the partial evaluation took very little time despite the collection’s size.

4. Conclusion

In this article, we looked at zip() and lazyZip() methods on Scala Collections.

While zip() may perform better on smaller datasets because of its eager evaluation, lazyZip() is beneficial for managing very large collections, prioritizing memory efficiency, and deferred computation. Additionally, we observed that lazyZip() offers the added benefit of automatically flattening the result during chaining operations. Depending on the scenario, we should choose between zip() and lazyZip() accordingly to ensure optimal performance.

As always, the sample code used in this tutorial is available over on GitHub.

Comments are open for 30 days after publishing a post. For any issues past this date, use the Contact form on the site.