1. Overview

Random numbers are pretty useful when we need to pick an unbiased number for a simulation use case. In this tutorial, we’ll learn a few common ways of generating random numbers in the Linux environment.

2. Pseudorandom Number Generator

By definition, random numbers are unpredictable. On the other hand, computing machines by design produce deterministic output. Thereby, random numbers generated by them are not truly random, but only pseudorandom.

Pseudorandom number generators (PRNGs) uses computational algorithms to generate large sequences of random results. However, in reality, all values in such sequences depend on an initial value, which we call the seed value. If we know the seed value and the algorithm, we can precisely determine the entire sequence, and all the values can easily be predicted.

The catch is that as long as the seed is unknown and the algorithm is sufficiently complex to guess, the sequence would appear to be random. So, in most common use cases, it is safe to use PRNGs. However, if the use case depends strongly on the randomness, then we shall switch to physical methods that use hardware to generate random numbers.

In the next few sections, our focus will be limited to pseudorandom number generation using popular Unix utilities.

3. $RANDOM in Bash

Let’s say that we want to simulate an event for the rolling of a dice enumerated with numbers from 1 to 6.

To do this, we can use the $RANDOM, a builtin Bash variable that gives a pseudorandom number:

$ echo $RANDOM
30627
$ echo $RANDOM
10419

With $RANDOM, we get a number between0 and 32767. However, in this case, we need to limit the random values between 1 and 6, so we can use the modulo (%) operator to limit the values between min=1 and max=6:

$ cat dice.sh
#!/bin/bash

function roll_dice {
    min=1
    max=6
    number=$(expr $min + $RANDOM % $max)
    echo $number
}

Note that we didn’t provide a seed value here, which means that Bash will use the default seed value to generate the random sequence. If we generate two sequences by calling this function at the same time, then the chances are high that we’d get the same sequence.

So, it’s recommended that we introduce a variation in the seed value by explicitly initializing it:

RANDOM=$$

As we know that the two invocations will have different process IDs, we can use PID ($$) as the seed value.

Finally, let’s call this function via roll.sh script:

$ cat roll.sh
#!/bin/bash
. ./dice.sh
roll_dice
$ ./roll.sh
4
$ ./roll.sh
3
$ ./roll.sh
5

4. Using awk

awk is a useful language and Unix utility for working with text files. Let’s see how we can generate random numbers using awk:

4.1. Dataset

Let’s say that we have got a file, students.txt, that contains names of students belonging to different groups:

$ cat students.txt
Bryan,Roger,Christina
Rishabh,Mary,Rose
Paul,Vikram,Yasim
Leo,Immy,Kudrat

Now, we’re required to choose exactly one student randomly from each group.

4.2. srand() and rand()

In awk, we have access to two functions, namely srand() and rand(), that can generate a random floating number between 0 and 1. In this case, srand() helps us to initialize the seed value for generating the random sequence.

Let’s write our awk script in the choose.awk file:

#!/bin/awk
BEGIN {
    FS=",";
    srand(seed);
}

{
    field=int(1.0 + rand()*NF)
    print $field
}

Essentially, we’re doing a one-time initialization of field separator (FS) and seed in the BEGIN block. Later, we scale the floating value to an integer value by multiplying it with the number of fields in that line (NF).

Finally, let’s execute our script by supplying the seed value as an external variable using the -v flag:

$ awk -v seed=$RANDOM -f choose.awk students.txt
Christina
Rose
Yasim
Kudrat

Of course, we can execute this multiple times to see that we’re, in fact, getting random results each time.

5. Using Pseudodevice Files

Internally, numbers are stored as bits. So, another way to generate random numbers is by generating random bytes. In this section, we’ll use the pseudodevice files as PRNGs.

5.1. /dev/random and /dev/urandom

In Unix, everything is a file, which includes devices as well, which are mounted under the /dev virtual directory of the filesystem. Additionally, Unix also has a set of special pseudo-device files that are not actually associated with real hardware. Among these, /dev/random and /dev/urandom are two pseudodevice files that serve as PRNGs:

$ ls -l /dev/*random
crw-rw-rw- 1 root root 1, 8 Oct 14 13:30 /dev/random
crw-rw-rw- 1 root root 1, 9 Oct 14 13:30 /dev/urandom

As we can see, all users can read/write from/to these files, but no user has the execute (x) privilege.

Interestingly, the Unix kernel gathers noisy data from various devices and transfers them to an internal pool of entropy. Further, /dev/random and /dev/urandom can internally access this noise and produce random bytes of data.

With a basic understanding of how all of this works internally, let’s see how they’re different. Well, the “u” in urandom stands for unlimited. So, it essentially means that even if the entropy pool contains less entropy than required to fulfill the request, /dev/urandom would never block. However, /dev/random will be blocked in such cases.  So, depending on the acceptance criteria for randomness and latency, we should make the right choice.

5.2. Random Number from Random Bytes

Let’s put our theoretical understanding of these pseudodevices to generate a random integer with a size 4 byte. To do so, we can make use of the dd command:

# dd if=/dev/urandom of=~/random_number count=4 bs=1
4+0 records in
4+0 records out
4 bytes copied, 0.0002276 s, 17.6 kB/s

We’re essentially copying 4 bytes of data from /dev/urandom to ~/random_number. As data is stored in raw bytes, we can’t use the usual commands such as cat to read it. Instead, we’ll have to use the od command to read the raw bytes to form an integer:

$ od -An --format=dI ~/random_number
   449690100

We must note that we specified the format as dI (decimal integer) so that od can read it as an integer. We also used the -An to suppress the offset address field from the output.

6. Conclusion

The goal of this tutorial was to leverage popular Unix utilities to generate random numbers.

We started by developing a basic understanding of Pseudorandom Number Generators and explored several utilities such as Bash, awk, and pseudodevice files to solve different use cases.

guest
0 Comments
Inline Feedbacks
View all comments