## 1. Introduction

In this tutorial, we’ll explore methods to create **a random sample of a fixed size** **from Scala List**. For example, if we need to select some five arbitrary elements from the list:

*List[Int] = List(0,1,2,3,4,5,6,7,8,9)*, the result might resemble

*List(8, 0, 3, 7, 5)*.

## 2. Certain Considerations Regarding Functional Approaches

In Scala, we often adhere to the functional style when working with collections, employing combinators like *map* and *flatMap*. Essentially, we traverse the collection and update each value. However, **when it comes to accessing some random elements, traversing the list for updates may perform pretty weakly**.

## 3. Solution Based on Tail Recursion

For the purpose of illustration, we can address the problem using tail recursion:

```
def getRandomSampleRec[T](list: List[T], size: Int): List[T] = {
@tailrec
def rec(xs: List[T], acc: List[T]): List[T] = {
if (acc.size == size) {
acc
} else {
val index = Random.nextInt(xs.size)
val (left, right) = xs.splitAt(index)
val (xsUpd, next) = if (right.nonEmpty) {
(left ::: right.tail, right.head)
} else {
(left.dropRight(1), left.tail.last)
}
rec(xsUpd, next :: acc)
}
}
if (size == 0) {
List.empty[T]
} else if (size > list.size) {
list
} else {
rec(list, List.empty[T])
}
}
```

Let’s review the key steps of the solution:

- We utilize
*splitAt(index)*on the list*xs*to effectively create a new list (*xsUpd*) without the random element*next*. - During each iteration, we prepend the element
*next*to the accumulator*acc.*We terminate once the*acc*reaches the desired*size*.

It’s important to note that here, we impose a significant burden on performance. This is due to the fact that *splitAt* and list concatenation (*:::*) both have *O(N)* time complexity and, since we repeat these operations *size* times, **the overall time complexity becomes** ** O(size*N)**. Utilizing this method would be appropriate only for small samples.

## 4. Solution Based on Zipping Collection With Random Indexes

Among other purely functional solutions, we might consider the following approach:

```
def getRandomSampleZip[T](list: List[T], size: Int): List[T] =
list
.map(elem => (Random.nextInt(), elem))
.sortBy(_._1)
.map(_._2)
.take(size)
```

Let’s break down the process:

**We associate each element of the original**. As a result, we obtain a*List[T]*with a random*Int*index using*map**List[(Int, T)]*.- Next, we sort the
*List[(Int, T)]*by these random indexes. - We then convert the sorted collection back into a
*List[T]*by omitting random indexes using*map(_._2)*, and we limit the result with*take(size)*.

By following this approach, **the time complexity is** ** O(N*log(N))**, which corresponds to the complexity of the sorting operation.

## 5. Utilizing *Random.shuffle* from the Standard Library

We can leverage a standard Scala library method — ** shuffle from scala.util.Random** —

**to mix elements of the**. Then, we take only the desired number of elements:

*List*```
def getRandomSampleShuffle[T](list: List[T], size: Int): List[T] =
Random.shuffle(list).take(size)
```

*Random.shuffle* uses the *mutable.ArrayBuffer* data structure under the hood, which enhances the performance of element access by index.

**This approach boasts a time complexity of** ** O(N)** since the

*shuffle*operation traverses the list only once. In the worst case, the list will be traversed twice, if the

*size*of the randomized sample matches the size of the initial list (as seen when

*take*is called). Furthermore, this solution is notably elegant.

## 6. Testing

For testing purposes, **we’ll extract samples from a sorted List of elements ranging from 0 to 99**. Throughout the testing phase, our objectives include ensuring that the resulting samples’ elements are both

**shuffled (and hence not sorted) and non-repeating**. Additionally, we must confirm that the sample

**sizes are accurate**. As an appendix, we may also ensure which method performs the best:

```
"RandomFixedSizeSample" should {
"create a random sample out of the initial List" in {
val list = List.range(0, 100)
val sampleSize = 10
val list_0 = RandomFixedSizeSample.getRandomSampleRec(list, sampleSize)
val list_1 = RandomFixedSizeSample.getRandomSampleZip(list, sampleSize)
val list_2 = RandomFixedSizeSample.getRandomSampleShuffle(list, sampleSize)
list_0.size shouldBe sampleSize
list_1.size shouldBe sampleSize
list_2.size shouldBe sampleSize
list_0.toSet.size shouldBe list_0.size
list_1.toSet.size shouldBe list_1.size
list_2.toSet.size shouldBe list_2.size
isSorted(list_0) shouldBe false
isSorted(list_1) shouldBe false
isSorted(list_2) shouldBe false
}
"ensure getRandomSampleShuffle is the most performant, then goes getRandomSampleZip and then getRandomSampleRec" in {
val list = List.range(0, 10_000)
val sampleSize = 100
val start_0 = System.nanoTime()
RandomFixedSizeSample.getRandomSampleRec(list, sampleSize)
val end_0 = System.nanoTime()
val duration_0 = end_0 - start_0
val start_1 = System.nanoTime()
RandomFixedSizeSample.getRandomSampleZip(list, sampleSize)
val end_1 = System.nanoTime()
val duration_1 = end_1 - start_1
val start_2 = System.nanoTime()
RandomFixedSizeSample.getRandomSampleShuffle(list, sampleSize)
val end_2 = System.nanoTime()
val duration_2 = end_2 - start_2
duration_0 should be > duration_1
duration_1 should be > duration_2
}
```

## 7. Conclusion

As we can see, **leveraging scala.util.Random method shuffle leads to the best performance and conciseness**. Additionally, we’ve observed that zipping collections with random numbers is also pretty powerful.

As usual, these examples are available over on GitHub.