1. Overview

Calculating moving averages is a common task in data analysis and time-series forecasting. Scala 3 has various ways to tackle this problem, each with its trade-offs in terms of performance and complexity.

In this tutorial, we’ll explore one approach using Scala’s built-in collection methods. With it, we’ll achieve a time complexity of O(n)*O(k). Additionally, we’ll review an optimization using foldLeft and a Queue to reduce the cost of the operation over the window, arguably achieving an O(n) efficiency.

We’ll also discuss why the O(n)*O(k) approach might be more than adequate for most practical purposes when k is small.

2. Understanding Moving Averages

A moving average is a statistical calculation that we can use to understand trends in data. To this effect, we have to split the data into a collection of overlapping subsets and then take each subset’s average.

We’ll encounter them in domains like finance, statistics, and data science, and we will need them to smooth out data and identify trends.

2.1. Types of Moving Averages

Before diving into the Scala-specifics, it’s worth noting that there are different types of moving averages, each with its applications. Let’s look at the definitions of three of the most used ones:

  1. Simple Moving Average (SMA): this is the average of specific data points.
  2. Weighted Moving Average (WMA): this type gives more weight to recent data points.
  3. Geometric Moving Average (GMA): this type uses the geometric mean instead of the arithmetic mean, making it more sensitive to price movements.

For instance, WMA is often used in financial markets to give more importance to recent price changes, while GMA is commonly used in various scientific computations.

2.2. Two-Step Process

Calculating a moving average can be seen as a two-step process:

  1. Creating Windows: the first step is to create a collection of subsets (aka. windows) from the original data set. Each window contains a specific number of data points
  2. Applying an Operation: the second step is to use an operation (like summing, averaging, etc.) for each window

window creation diagram

The diagram above illustrates creating windows for a simple moving sum. Each node represents a data point, and the highlighted nodes represent a window. Lastly, the operation (in this case, summing) is applied to each window.

In the operation step, we can use any function that makes sense for our data. For instance, if we’re working with stock prices, calculate the average price over a certain period. Or we might want to find the maximum/minimum temperature in each window if we are working with temperature data.

Now that we understand moving averages and how they work, let’s explore how to implement them in Scala 3.

3. Using Sliding Windows

A naive implementation of this approach results in a time complexity of O(n)*O(k), where n is the size of the list and k is the window size. The simplicity and understability make this the choice ideal for small window sizes.

In Scala, we get the naive implementation for free with any collection implementing the Seq trait like List, Vector, Array, and String. We can use the sliding method to convert any Seq of elements into one of the subsequences of a given size.

Let’s explore how to calculate Simple Moving Averages (SMA) and Weighted Moving Averages (WMA).

3.1. Simple Moving Average (SMA)

First, let’s define a case class to represent the stock price on a given day:

case class StockPrice(date: String, price: Double)

Now, let’s calculate the three-day moving average using the sliding method:

val stockPrices = List(
  StockPrice("2021-01-01", 1.2),
  StockPrice("2021-01-02", 1.3),
  StockPrice("2021-01-03", 1.4),
  StockPrice("2021-01-04", 1.5),
  StockPrice("2021-01-05", 1.6)
)

val windowSize = 3

val sma = stockPrices.sliding(windowSize).map { window =>
  val average = window.map(_.price).sum / window.size
  val lastDate = window.last.date
  StockPrice(lastDate, average)
}.toList

Here, we use the sliding method to create a collection of windows, each containing three stock prices. Finally, we calculate the average price for each window and associate it with the last date in the window.

3.2. Weighted Moving Average (WMA)

We’ll assign weights to the prices in each window for the weighted moving average and then calculate the average:

val weights = List(0.1, 0.2, 0.7)

val wma = stockPrices.sliding(windowSize).map { window =>
  val weightedSum = (window.map(_.price) zip weights).map { case (price, weight) =>
    price * weight
  }.sum
  val lastDate = window.last.date
  StockPrice(lastDate, weightedSum)
}.toList

We can choose any weights that make sense for your specific use case. In our example, we avoid needing an extra division after the sum by selecting weights that add up to one.

3.3. Time Complexity

Both operations have a cost of O(n)*O(k), but when k is a small number, we can rationally simplify the cost of the operation over each window to be O(1)

Then again, if k is not a tiny single-digit integer and we are working with more considerable amounts of data, we will likely need to dig deeper into the Scala toolbox to avoid paying the recurring costs of calculation over each window.

4. Optimizing the Moving Average With foldLeft and a Queue

When dealing with large data sets and non-trivial window sizes, the sliding window approach’s O(n)*O(k) time complexity can become a bottleneck.

val windowSize = 60

val optimizedSMA = stockPrices
  .foldLeft((Queue.empty[Double], List.empty[StockPrice], 0.0)) {
    case ((queue, averages, sum), StockPrice(date, price)) =>
      val newQueue = queue.enqueue(price)
      val newSum = sum + price

      if (newQueue.size > windowSize) {
        val (oldest, remainingQueue) = newQueue.dequeue
        val newAverage = (newSum - oldest) / windowSize
        (
          remainingQueue,
          StockPrice(date, newAverage) :: averages,
          newSum - oldest
        )
      } else if (newQueue.size == windowSize) {
        val newAverage = newSum / windowSize
        (newQueue, StockPrice(date, newAverage) :: averages, newSum)
      } else {
        (newQueue, averages, newSum)
      }
  }
  ._2
  .reverse

println("Optimized 30-day Simple Moving Averages:")
optimizedSMA.foreach { case StockPrice(date, average) =>
  println(s"$date: $average")
}

To mitigate this, we can use foldLeft and a Queue to maintain a running sum and achieve a more efficient O(n) time complexity:

In this example, we use foldLeft to iterate through the list of stock prices. We maintain a Queue to keep track of the elements in the current window and a running sum of those elements. We update the Queue and the sum as we iterate through the list.

Finally, we calculate the new average and add a collection. Obviously, we can calculate the value only from the point the window is complete for the first time.

This approach has a time complexity of O(n), making it more efficient for larger data sets and window sizes.

4.1. Time Complexity Trade-Offs

We made a classic trade-off in the code above: We used more space to save time. Consequently, we managed to speed things up to O(n) by using more memory for a Queue and the running sum.

But here’s the catch: this trick will only work for some types of window operations. For example, if we’re using a weighted moving average, the math gets more complicated, and our speedy solution doesn’t apply.

5. Conclusion

In this article, we’ve explored the fascinating world of moving averages in Scala 3, from understanding what they are to implementing them differently.

We started with the straightforward sliding window approach, which is excellent for small data sets and window sizes. Then, we delved into an optimized version using foldLeft and a Queue, which is more suitable for larger data sets.

We also discussed the time complexity trade-offs, emphasizing that while we can optimize for time, it often comes at the cost of using more space. And let’s not forget that not all types of moving averages can be optimized in the same way.

As always, the complete source code for the examples is available over on GitHub.

2 Comments
Oldest
Newest
Inline Feedbacks
View all comments
Comments are open for 30 days after publishing a post. For any issues past this date, use the Contact form on the site.