1. Introduction

The Python programming language is an all-around environment for large-scale development as well as simple scripting. Furthermore, we can combine features of our shell and available commands to produce hybrid scripts and one-liners that perform complex tasks.

In this tutorial, we explain output buffering in Python and how to use the tee command with the python interpreter. First, we explore applications of tee and piping to the command. After that, we briefly refresh our knowledge about buffering. Next, we discuss buffering and piping in relation to python. Finally, we get into options for buffer control in Python.

We tested the code in this tutorial on Debian 11 (Bullseye) with GNU Bash 5.1.4. It should work in most POSIX-compliant environments unless otherwise specified.

2. Piping to tee

The tee command is a way to duplicate stdout to a file or files. In particular, we usually achieve this by piping the output of a previous command to tee.

In fact, the tee command is versatile and applicable to many scenarios:

$ watch date | tee --append output

Here, we store the output of the watch date command to the output file while also leaving the stream to stdout intact.

Moreover, we can use tee to store changes back to a piped file:

$ cat file | tr --delete 'x' | tee file

In this case, we delete every x from file and save the modifications back to file in-place, while also displaying the new content on the terminal. Notably, we can replace cat with a python command to process its output.

However, not all command output is generated the same way, mainly due to buffering.

3. Buffering

The concept of a buffer applies to many areas of an operating system (OS) and computing in general. In essence, in the most basic case, buffering occurs when a data producer might sometimes get ahead of a consumer due to varying chunk sizes and processing speeds:

+----------+     +----------+     +----------+
|          |     |  BUFFER  |     |          |
| Producer | >>>>DATA_CHANNEL>>>> | Consumer |
|          | <---COMM_CHANNEL---> |          |
+----------+                      +----------+

Importantly, if a producer is constantly ahead of the consumer, we might have an out-of-sync system, so BUFFER would simply delay the inevitable due to its limited capacity. Buffering only solves temporary synchronization issues.

For example, pipes have buffers by default to accommodate sequential execution, but we can turn the pipe buffering off.

Usually, buffers release data once the consumer is ready. Even then, we might not want to perform a buffer flush, i.e., full data release. There is often a way to signal releases, such as through a separate COMM_CHANNEL.

However, in some cases, a buffer might dispense data automatically based on different criteria:

  • amount of data consumed
  • buffer fill percent
  • special signal or marker like end-of-file (EOF), i.e., full buffering
  • specific character like newline, i.e., line buffering

In relation to this, how data is released depends on the producer, consumer, and inter-process communication (IPC) method.

4. Python Buffering Behavior

Like other commands, the Python interpreter can produce output that we can then pass to other commands and processes.

However, python uses line buffering only if both stdin and stdout are a console. When we employ file redirection or pipes, Python performs full buffering.

4.1. Line Buffering

To analyze how this affects the behavior of pipelines, we use a short code snippet:

$ python -c 'import time; print("Text."); time.sleep(666);'
[...666-second pause...]$

In this example, we run a Python one-liner [-c]ommand that uses the built-in time module to sleep() for 666 seconds after printing Text. followed by a newline on the screen.

Let’s omit the newline and run the code again:

$ python -c 'import sys; import time; sys.stdout.write("Text."); time.sleep(666);'
[...666-second pause...]Text.$

This time, we employed the sys.stdout.write() method of the sys.stdout object in the sys module to output our strings to stdout without a newline suffix.

Consequently, we don’t see Text. before waiting for over 11 minutes. While this may be inconvenient, it’s a rare occasion to have output without newline separators.

4.2. Full Buffering

Now, let’s explore how our print() sample code is affected by piping to cat:

$ python -c 'import time; print("Text."); time.sleep(666);' | cat
[...666-second pause...]

In this case, it looks like the time.sleep(666) call comes before the print(“Text.”) call. Yet, that’s not the case. What actually occurs is full buffering. In other words, Python doesn’t release any data to stdout, i.e., the pipe, before the program concludes. Thus, we don’t see any output, although the print() call has already been made.

Naturally, the same happens with tee as well:

$ python -c 'import time; print("Text."); time.sleep(666);' | tee
[...666-second pause...]

Of course, this behavior is equivalent for other commands since its root cause isn’t in the command after the pipe but in the pipe itself.

5. Controlling Python Buffering

Knowing when and why Python might restrict the output, we can come up with ways to control or work around the buffering.

5.1. flush Parameter

The Python print() method has a specific parameter for flushing the buffer, i.e., releasing all the current data at once:

$ python -c 'import time; print("Text.", flush=True); time.sleep(666);' | cat
[...666-second pause...]

Critically, the flush argument is only available in Python 3 and works for print() alone.

This means we won’t change the behavior with full buffering when using, e.g., sys.stdout.write().

5.2. Manual Flush

For a more universal way to flush, we can leverage the sys.stdout object flush() method:

$ python -c 'import sys; import time; print("Text."); sys.stdout.flush(); time.sleep(666);' | cat
[...666-second pause...]

Here, we have control over the times at which the stdout buffer is flushed.

5.3. Buffering Control

If we want to switch between full buffering and line buffering, we can do that as well:

$ python -c 'import sys; import time; sys.stdout.reconfigure(line_buffering=True); print("Text."); time.sleep(666);' | cat
[...666-second pause...]

Again, this solution is only available in Python 3 but provides a flexible way to change the type of buffering.

5.4. Using the -u Flag

The python interpreter provides an execution flag to make both stdout and stderr [-u]nbuffered:

$ python -u -c 'import time; print("Text."); time.sleep(666);' | cat
[...666-second pause...]

Critically, using -u disables buffering entirely, so there isn’t even buffering per line:

$ python -u -c 'import sys; import time; sys.stdout.write("Text."); time.sleep(666);' | cat
[...666-second pause...]

Turning buffering off might have unwanted consequences, such as reduced performance.

Moreover, the -u flag places all streams, including stdin, in binary mode. Critically, buffer controls do not affect file methods like readlines() or xreadlines() and file-object iterators like for line in sys.stdin. To avoid buffering in those cases, we usually employ sys.stdin.read-line() in an infinite loop.

Finally, we can include the -u flag in the Python script shebang like #!/usr/bin/python3 -u or #!/usr/bin/python -u.

5.5. Using stdbuf

As a more shell-oriented solution, we can also employ stdbuf to control buffering for a whole pipeline:

$ stdbuf --output=L python -c 'import time; print("Text."); time.sleep(666);' | cat
[...666-second pause...]

In this case, we requested [L]ine buffering via the –output or -o option.

Alternatively, we can use 0 instead of L to disable buffering altogether, mimicking the Python -u flag.

6. Summary

In this article, we talked about the tee command, buffers, and pipes, and how those concepts relate to the use of output from python.

In conclusion, knowing how buffering applies to a given situation can be vital to achieving the behavior we aim for.

Comments are open for 30 days after publishing a post. For any issues past this date, use the Contact form on the site.