The Python programming language is an all-around environment for large-scale development as well as simple scripting. Furthermore, we can combine features of our shell and available commands to produce hybrid scripts and one-liners that perform complex tasks.
In this tutorial, we explain output buffering in Python and how to use the tee command with the python interpreter. First, we explore applications of tee and piping to the command. After that, we briefly refresh our knowledge about buffering. Next, we discuss buffering and piping in relation to python. Finally, we get into options for buffer control in Python.
2. Piping to tee
In fact, the tee command is versatile and applicable to many scenarios:
$ watch date | tee --append output
Moreover, we can use tee to store changes back to a piped file:
$ cat file | tr --delete 'x' | tee file
In this case, we delete every x from file and save the modifications back to file in-place, while also displaying the new content on the terminal. Notably, we can replace cat with a python command to process its output.
However, not all command output is generated the same way, mainly due to buffering.
The concept of a buffer applies to many areas of an operating system (OS) and computing in general. In essence, in the most basic case, buffering occurs when a data producer might sometimes get ahead of a consumer due to varying chunk sizes and processing speeds:
+----------+ +----------+ +----------+ | | | BUFFER | | | | Producer | >>>>DATA_CHANNEL>>>> | Consumer | | | <---COMM_CHANNEL---> | | +----------+ +----------+
Importantly, if a producer is constantly ahead of the consumer, we might have an out-of-sync system, so BUFFER would simply delay the inevitable due to its limited capacity. Buffering only solves temporary synchronization issues.
Usually, buffers release data once the consumer is ready. Even then, we might not want to perform a buffer flush, i.e., full data release. There is often a way to signal releases, such as through a separate COMM_CHANNEL.
However, in some cases, a buffer might dispense data automatically based on different criteria:
- amount of data consumed
- buffer fill percent
- special signal or marker like end-of-file (EOF), i.e., full buffering
- specific character like newline, i.e., line buffering
In relation to this, how data is released depends on the producer, consumer, and inter-process communication (IPC) method.
4. Python Buffering Behavior
Like other commands, the Python interpreter can produce output that we can then pass to other commands and processes.
4.1. Line Buffering
To analyze how this affects the behavior of pipelines, we use a short code snippet:
$ python -c 'import time; print("Text."); time.sleep(666);' Text. [...666-second pause...]$
Let’s omit the newline and run the code again:
$ python -c 'import sys; import time; sys.stdout.write("Text."); time.sleep(666);' [...666-second pause...]Text.$
Consequently, we don’t see Text. before waiting for over 11 minutes. While this may be inconvenient, it’s a rare occasion to have output without newline separators.
4.2. Full Buffering
Now, let’s explore how our print() sample code is affected by piping to cat:
$ python -c 'import time; print("Text."); time.sleep(666);' | cat [...666-second pause...] Text. $
In this case, it looks like the time.sleep(666) call comes before the print(“Text.”) call. Yet, that’s not the case. What actually occurs is full buffering. In other words, Python doesn’t release any data to stdout, i.e., the pipe, before the program concludes. Thus, we don’t see any output, although the print() call has already been made.
Naturally, the same happens with tee as well:
$ python -c 'import time; print("Text."); time.sleep(666);' | tee [...666-second pause...] Text. $
Of course, this behavior is equivalent for other commands since its root cause isn’t in the command after the pipe but in the pipe itself.
5. Controlling Python Buffering
Knowing when and why Python might restrict the output, we can come up with ways to control or work around the buffering.
5.1. flush Parameter
The Python print() method has a specific parameter for flushing the buffer, i.e., releasing all the current data at once:
$ python -c 'import time; print("Text.", flush=True); time.sleep(666);' | cat Text. [...666-second pause...] $
Critically, the flush argument is only available in Python 3 and works for print() alone.
This means we won’t change the behavior with full buffering when using, e.g., sys.stdout.write().
5.2. Manual Flush
For a more universal way to flush, we can leverage the sys.stdout object flush() method:
$ python -c 'import sys; import time; print("Text."); sys.stdout.flush(); time.sleep(666);' | cat Text. [...666-second pause...] $
Here, we have control over the times at which the stdout buffer is flushed.
5.3. Buffering Control
If we want to switch between full buffering and line buffering, we can do that as well:
$ python -c 'import sys; import time; sys.stdout.reconfigure(line_buffering=True); print("Text."); time.sleep(666);' | cat Text. [...666-second pause...] $
Again, this solution is only available in Python 3 but provides a flexible way to change the type of buffering.
5.4. Using the -u Flag
$ python -u -c 'import time; print("Text."); time.sleep(666);' | cat Text. [...666-second pause...] $
Critically, using -u disables buffering entirely, so there isn’t even buffering per line:
$ python -u -c 'import sys; import time; sys.stdout.write("Text."); time.sleep(666);' | cat Text. [...666-second pause...] $
Turning buffering off might have unwanted consequences, such as reduced performance.
Moreover, the -u flag places all streams, including stdin, in binary mode. Critically, buffer controls do not affect file methods like readlines() or xreadlines() and file-object iterators like for line in sys.stdin. To avoid buffering in those cases, we usually employ sys.stdin.read-line() in an infinite loop.
Finally, we can include the -u flag in the Python script shebang like #!/usr/bin/python3 -u or #!/usr/bin/python -u.
5.5. Using stdbuf
As a more shell-oriented solution, we can also employ stdbuf to control buffering for a whole pipeline:
$ stdbuf --output=L python -c 'import time; print("Text."); time.sleep(666);' | cat Text. [...666-second pause...] $
In this case, we requested [L]ine buffering via the –output or -o option.
Alternatively, we can use 0 instead of L to disable buffering altogether, mimicking the Python -u flag.
In this article, we talked about the tee command, buffers, and pipes, and how those concepts relate to the use of output from python.
In conclusion, knowing how buffering applies to a given situation can be vital to achieving the behavior we aim for.