There are multiple utilities in Linux, which can act on files and standard input. In addition, some can even act on a continuous stream of data directed to them.
In this tutorial, we deal with the grep (Global Regular Expression Print) command and how we can:
- pass a continuous stream of data to grep
- filter and output that data
Particularly, we define streams and discuss stream manipulation. After that, we briefly discuss how grep handles streams and files. Next, we see a continuous stream produced by a specific tool. Finally, grep is used with a continuous stream, where we also demonstrate buffering control.
We tested the code in this tutorial on Debian 11 (Bullseye) with GNU Bash 5.1.4. It is POSIX-compliant and should work in any such environment.
At its most basic level, a stream is just a pipeline for data. Note the word choice of “for”, instead of “with”. This is because it takes into account the fact that data isn’t needed for a stream to exist.
Consider a keyboard. There is a stream for that input device, but it doesn’t mean we constantly slam keys to feed it with data. In practice, interested processes can subscribe to streams as if they were radio stations.
They do that in multiple ways:
We can send signals between processes to notify them of an event, but they usually don’t hold actual data along with the metadata. On the other hand, sockets are mainly used for networking. In fact, we can roughly equate a socket to a network pipe.
This leads us to manipulate streams in general – our primary interest.
3. Stream Manipulation
In general, we have many means to redirect:
$ echo 'Data.' >> file.ext $ cat file.ext Data. $ echo 'Data.' | cat Data.
In both cases above, we are dealing with redirection operators. First, we use >> to echo some data to a file. Next, we output it back via cat (Concatenate). After that, we pipe the same information with | directly to cat.
Importantly, pipes almost always have buffers.
The use of buffers often means the information doesn’t get through unless a given amount is already loaded for transfer or one end terminates. In particular, the termination can be of the process or the stream via a special character. In short, we can define a buffer as a place where data temporarily accumulates.
Some commands also use buffers directly.
The grep tool has internal buffering. It usually functions alone during file operations. On the other hand, grep can also work on streams, which themselves provide a second buffer layer.
Indeed, we can just pass data to grep via stream redirection:
$ echo 'Content.' | grep 'Con' Content.
In this instance, we pipe a string directly for processing. Particularly, we redirect stdout through the pipe. Once grep is done with the string, all processes terminate along with the pipe.
However, there is an alternative way.
Of course, grep can act on files directly by just using the filename:
$ echo 'Content.' > file.ext $ grep 'Con' file.ext Content.
But what if we wanted to monitor the file for changes? By combining with another tool, we can do just that.
5. Continuous Streams with tail
The tail command has the -f (follow) flag, which waits for file updates and adds them to the output instead of terminating directly after execution.
For example, if we start such a trailing tail of a file in one terminal and send data to that file in another, we expect to see the same data in the first terminal. Let’s see this in action.
First, we run tail:
$ tail -f /file.ext
After that, in another terminal, we send data to file.ext:
$ echo 'Line.' >> /file.ext
Indeed, we see the same information at the other end. Let’s now add our filter to the equation.
6. Continuous Stream Processing With grep and tail
This time, we pipe a continuous stream from tail to grep:
$ tail -f /file.ext | grep 'Line'
Next, we add data to the file in another terminal:
$ echo 'Line.' >> /file.ext
Now, depending on the exact setup, we might not see the output. Why? Because of buffering. Both the pipe and grep buffer and may delay output until a line feed or a certain amount of bytes.
However, we can control and prevent this.
7. Buffer Control
We can use the –line-buffered flag of grep to force flushing its buffer on each line termination instead of waiting for a concrete number of bytes:
$ tail -f /file.ext | grep --line-buffered 'Line'
After the above, every line appended to file.ext should produce output. If we don’t output a newline character, no output will get through:
$ echo -n 'Line.' >> /file.ext
Despite the modification, we may still encounter instances where the pipe itself buffers data. In these cases, there is a tool at our disposal: stdbuf.
In fact, we can enforce the same line buffering on the pipe:
$ stdbuf --output=L tail -f /file.ext | grep --line-buffered 'Line'
Using the –output option equal to L (line), we have line buffering on both sides of the pipe.
Actually, we can use stdbuf to completely remove buffering. To achieve this, we replace L with 0 (no buffering):
$ stdbuf --output=0 tail -f /file.ext | grep 'Line'
Depending on the setup, this line should produce immediate output on any modification of file.ext.
In this article, we saw how grep can be used with continuous streams of data. In addition, we applied buffer control via stdbuf.
To demonstrate both tools on a continuous stream, we used tail. Note that this is not the only way to produce such streams, but the methods discussed should work with any command-line tool.
In conclusion, grep works with continuous streams out of the box, but there are options to further enhance and control its functionality.