Learn through the super-clean Baeldung Pro experience:
>> Membership and Baeldung Pro.
No ads, dark-mode and 6 months free of IntelliJ Idea Ultimate to start with.
Last updated: March 18, 2024
In this tutorial, we’ll focus on filtering the output of a disk space usage command based on the available space. This can be useful in situations when we only need to display filesystems that are only partially filled or have a certain percentage of free space.
We’ll first look at the awk command, which is considered one of the best tools for numeric output comparisons in Linux. Then, we’ll discuss the grep command, which is a popular tool for text filtering.
Before applying the space usage filters, let’s first examine the initial df command, which we’ll then filter:
$ df
Filesystem 1K-blocks Used Available Use% Mounted on
tmpfs 1627020 2412 1624608 1% /run
/dev/sda1 114792976 63567940 45347668 59% /
/dev/sdb1 960302096 282964952 628482720 32% /home
tmpfs 8135088 3556 8131532 1% /dev/shm
...
Here, we can see the list of all filesystems and their space usage percentage in the fifth column.
Now, let’s apply the filter to display only the filesystems with usage greater than 50%. For that, we’ll use awk.
The awk command is a powerful tool for text processing, based on the AWK programing language. It provides a handy way to filter Linux command output.
First, let’s apply a version of an awk solution to our problem:
$ df | awk '0+$5 >= 50'
/dev/sda1 114792976 63567940 45347668 59% /
Let’s look at this command in more detail:
In the end, we’re left with the lines, where the usage is greater than 50%. In our case, it’s the /dev/sda1 root filesystem, which is at 59%.
If, for instance, we need to compare to a different number, we can replace 50 in our original command with the number we need.
Furthermore, we can change the comparison operator >= as well. This allows us to display the lines with filesystems that have more space available (less space used).
Another option is to filter the df output using the grep command. This may be a little trickier than the awk command because grep isn’t specialized in numeric comparisons.
To display the lines with disk usage higher than 50%, we’ll use the -E switch:
$ df | grep -E "([5-9][0-9]|100)%"
/dev/sda1 114792976 63567944 45347664 59% /
Let’s see what this command does:
We can see that the resulting output is the same as our previous example with awk.
If, for instance, we need to display disk usage higher than 20%, we may replace 5 with 2 in our command (because 20 starts with 2):
$ df | grep -E "([2-9][0-9]|100)%"
/dev/sda1 114792976 63567928 45347680 59% /
/dev/sdb1 960302096 282965252 628482420 32% /home
Now, we get the two filesystems in the output. Each of them takes more than 20% of our disk space.
There is a limitation to this method though. We can only compare our output to the numbers that can be divided by 10 (e.g., 10, 20, 30, etc.). Otherwise, the regex pattern might become more complex, and we’ll need to create a unique pattern for each number.
In this article, we learned the ways to filter the df command output based on disk space usage criteria.
We looked at the awk and grep commands. We also discussed possible limitations of the grep regex pattern.