1. Overview

Sometimes, we may need to generate a human-readable text file whose content is random. For example, we might be developing a program which gets a text file as an input. If we don’t want to test this program with the same input file, we have to find a way of generating text files with random content.

In this tutorial, we’ll discuss how to generate human-readable text files with random content.

2. Using the base64 Command

Base64 is a popular binary-to-text and text-to-binary encoding/decoding scheme for transferring binary data over channels that only support text transfer. For example, it’s used for transferring binary data such as an image on the World Wide Web.

We can use the base64 command to generate a human-readable text file from binary data.

base64 is used for both Base64 encoding and decoding. If we use it without any options, it encodes the input. Its -d option is for decoding the encoded data.

We’ll get the random binary data from the /dev/urandom device file. This file is a pseudorandom number generator.

Now, let’s try to generate a human-readable file with random content using base64:

$ base64 /dev/urandom | head –c 10
pIS3VBfRhf

In the base64 /dev/urandom part of the above command, base64 encoded the binary data generated by /dev/urandom. However, since /dev/urandom device file is non-blocking, it generates random data continuously. So, we got the first 10 characters of the encoded data using the head -c 10 part of the command. In this example, the generated random data was pIS3VBfRhf which consists of 10 characters.

We can generate the human-readable file with random content simply by directing the output of the above command to a file:

$ base64 /dev/urandom | head –c 10 > random_file.txt
$ cat random_file.txt
/8ZtS5IfK+

Here, we directed the encoded output to the file random_file.txt. Then, we checked its content using the cat command. As it’s apparent, the output consists of 10 characters.

Base64 encoding uses an alphabet of 64 characters. These characters are the alphanumeric characters and the + and / characters.

3. Using the tr Command

We can also use the tr command for converting binary data to human-readable data and hence for creating text files with random content. This command can be used, in general, for performing text transformations such as case conversions, text replacement, or deleting characters.

Let’s consider an example:

$ tr –dc [:graph:] < /dev/urandom | head –c 10
m1#Vs\:ABp

We use the -d option of tr for removing specific characters in the input set. The -c option complements the input set. The [:graph:] specifies a set consisting of all printable characters, excluding space. Therefore, the tr -dc [:graph:] < /dev/urandom part of the command specified to remove all characters generated by /dev/urandom except the printable characters. That’s why we observed only printable characters in the output.

The second part of the command after the pipe, head -c 10, just limited the output to 10 characters as before.

We can change the set of the output characters. For example, we can use [:alnum:] for getting just letters and digits in the output:

$ tr –dc [:alnum:] < /dev/urandom | head –c 10
miB150lFw2

The set generated by [:graph:] is, of course, a broader set than the one generated by using [:alnum:].

Finally, we must direct the output to a file to generate a text file with random content:

$ tr –dc [:graph:] < /dev/urandom | head –c 10 > random_file.txt
$ cat random_file.txt
TKt4r$#Z7r

We directed the output to the file random_file.txt. The file consisted of 10 randomly generated characters.

4. Using the strings Command

The strings command is another option for obtaining a text file with random content. We generally use strings for extracting the human-readable text in binary files. So, we can use this idea to transform the output of /dev/urandom into a human-readable text:

$ strings -s "" /dev/urandom | head -c 10
BB2\!C5./8

Here, strings tried to find character sequences of at least 4 characters long. We can change the default character limit of 4 using its -n option.

Normally, strings prints each found character sequence on a separate line as the newline is the default output separator. But, it’s possible to customize the output separator using the -s option. So, the -s “” part of the command specified to use an empty string as the delimiter. In other words, we concatenated the found character sequences.

We used the head -c 10 part of the command to limit the output to 10 characters as before.

We can direct the output to a file as before to have a human-readable file with random content:

$ strings -s "" /dev/urandom | head -c 10 > random_file.txt
$ cat random_file.txt
7)XM/F%F;^

5. Using a Dictionary

The methods we discussed till now used /dev/urandom as the random binary data generator. We converted the binary data to a human-readable format using base64, tr, or strings. So, the generated content was completely random, having no meaning in a language.

If we need to generate random content with meaningful words, then we can use a dictionary. A dictionary in English generally comes installed in Linux distros. The dictionary file is /usr/share/dict/words. So, we can use this dictionary to generate a file with randomly chosen words from this dictionary:

$ shuf -n 5 /usr/share/dict/words | tr '\n' ' ' > random_file.txt
$ cat random_file.txt
discriminated intermeasuring viceversally nightwear oversocial

Here, we used the shuf command to shuffle the dictionary and then select 5 words from the shuffled dictionary. We achieved it using the shuf –n 5 /usr/share/dict/words part of the command.

shuf prints each selected word on a separate line by default, so we used tr for replacing newlines with blanks. The tr ‘\n’ ‘ ’ part of the command was for this purpose.

Finally, we directed the output to the file random_file.txt. As it’s apparent from the output, there are 5 randomly chosen words from the dictionary.

6. Conclusion

In this article, we discussed four different methods for generating a human-readable text file with random content.

First, we used base64 for obtaining text data using Base64 encoding.

Second, we filtered out the binary data using tr. We achieved this by deleting the non-printable characters in the binary data.

Third, we used strings. We extracted the human-readable text in the binary data and concatenated the extracted character sequences.

base64, tr, and strings commands used /dev/urandom device file as the input for obtaining the random binary data.

Finally, we used the dictionary file /usr/share/dict/words to generate a file with meaningful words. We used shuf for randomly selecting words from the dictionary.

2 Comments
Oldest
Newest
Inline Feedbacks
View all comments
Comments are closed on this article!