1. Introduction

Generating a large number of files can be helpful when testing functions, performance, and limits. Having a fresh batch ready quickly can be important for streamlining continuous integration or simply not wasting time.

In this tutorial, we explore several fast methods to efficiently create lots of files. First, we use classic loops to implement a solution. After that, we turn to common interpreters. Finally, we consider a standard-based approach.

To verify our results, we clear all buffers and create one hundred thousand (100000) files with every method. In each case, we compare the real component as returned by the time command.

We tested the code in this tutorial on Debian 11 (Bullseye) with GNU Bash 5.1.4. It should work in most POSIX-compliant environments.

2. Using Shell Loops

Like other major languages, Bash has for and while loops. In fact, we can use either to create many files using Bash alone.

2.1. Bash for Loop

The Bash-specific for loop construct can designate a number of cycles via a control expression. We use the latter with the $i variable to leverage it as a filename:

$ for ((i=1; i<=100000; i++)); do : >> "$i"; done

Within the loop body, we create a file by redirecting the output of the : colon (null) utility.

Let’s check the performance:

$ time { for ((i=1; i<=100000; i++)); do : >> "$i"; done; }

real    0m3.405s
user    0m1.113s
sys     0m2.244s

At 3.405 s, we can take this as a baseline.

2.2. POSIX while

POSIX defines the while loop construct and behavior. By introducing a control variable and incrementing it, we can make a solution similar to the one with for:

$ i=1; while [ "$i" -le 100000 ]; do : >> "$i"; i=$(($i + 1)); done

Naturally, the times are comparable:

$ time { i=1; while [ "$i" -le 100000 ]; do : >> "$i"; i=$(($i + 1)); done; }

real    0m3.516s
user    0m1.403s
sys     0m2.084s

Actually, considering the minor fluctuations of each run, the results with shell loops are more or less identical. However, there is room for improvement.

3. Using Interpreted Programming Languages

As usual, most universal interpreters can help with our task. Although languages like C might be marginally more efficient, their drawbacks in terms of complexity, compilation time, and much lower flexibility exclude them from our methods.

3.1. Perl

The perl interpreter is itself written in C, so using it in the correct way provides many performance benefits without the same drawbacks.

Using perl, we can create the same type of for loop we had with Bash earlier:

$ perl -e 'for ($i=1;$i<=100000;$i++) { open($f, ">", "$i"); }'

Here, we use the open() system call to create files.

Let’s time this solution:

$ time perl -e 'for ($i=1;$i<=100000;$i++) { open($f, ">", "$i"); }'

real    0m2.260s
user    0m0.354s
sys     0m1.874s

At 2.260 s, we removed more than a second from our previous best. Of these, the interpreter launch takes up around 0.003 s.

3.2. Python

In python, for loops often look different than in Bash:

$ python -c 'for i in range(100000): open(str(i), "w");'

Here, we use the built-in range() function to get all values for our filenames. In each case, we call the built-in Python open() function to create the file.

Timing this, we get an equivalent time to that of Perl:

$ time python -c 'for i in range(100000): open(str(i), "w");'

real    0m2.154s
user    0m0.274s
sys     0m1.850s

Launching Python takes around 0.010s. One other overhead we might consider is the conversion of i to string with str(). However, “$i” in Perl implicitly does the same due to loose variable types.

3.3. Ruby

Finally, ruby has its own syntax for iteration, which looks like the Bash {a..b} brace expansion with ranges:

$ ruby -e '(1..100000).each { |i| File.open(i.to_s, "w") }'

After generating each number, we store it in i, which we convert to a string with to_s. This makes the code similar to that of Perl and Python:

$ time ruby -e '(1..100000).each { |i| File.open(i.to_s, "w") }'

real    0m2.207s
user    0m0.457s
sys     0m1.912s

So, the similar execution time is not a surprise. Still, Ruby takes around 0.060 s to launch, making it the heavier of the three interpreters.

4. Using POSIX Commands

Actually, POSIX implementations can compete with the times above by employing their own toolset:

$ printf '%s ' {1..100000} | xargs touch

In this case, we generate an argument list with printf and brace expansion. After that, we pass each of its elements to touch via xargs.

Because of the minimalistic utilities adhering to the POSIX philosophy of one tool = one job, we get a very efficient outcome despite multiple calls:

$ time printf '%s ' {1..100000} | xargs touch

real    0m2.241s
user    0m0.287s
sys     0m1.898s

At 2.241 s, the time is equivalent to the third-party interpreters we looked at earlier.

4. Summary

In this article, we looked at different ways to generate many files.

In conclusion, even without interpreters, we can generate lots of files quickly and efficiently by just using POSIX-standard tooling.

Comments are closed on this article!