The Bash history feature has many uses:
- audit – provide a list of commands each user has run, potentially with a date and time
- automation – enable searching for a past command and rerunning it
- conciseness – provide a way to prune history so it only includes commands of interest
- flexibility – employs the fc and history builtins, but also the ! exclamation point operator
Naturally, a comprehensive function may also require maintenance.
In this tutorial, we look at ways to avoid and remove duplicate entries in the Bash history files. First, we explore the Bash history file format. After that, we see how to manually remove duplicates. Next, we check the history control variables, which help us prevent duplicates in the future. Finally, we explain that combining both approaches leads to the best results.
We tested the code in this tutorial on Debian 11 (Bullseye) with GNU Bash 5.1.4. It should work in most POSIX-compliant environments.
2. Bash History File Format
Importantly, the command history is kept in memory while Bash is running and only written out to the respective file periodically, on exit, or manually, with the -a or -w flags of history.
The default format of the $HOME/.bash_history (or $HISTFILE) file is fairly basic:
$ cat $HOME/.bash_history
echo First command.
In general, most commands in the history file appear as they were written at the prompt, one per row, in chronological order. However, there can be exceptions. Depending on the syntax, here-strings may break the mechanics as well.
By setting $HISTTIMEFORMAT to the appropriate strftime() format, we can add the exact timestamp of each command:
This is reflected as comments in the $HOME/.bash_history file:
$ cat $HOME/.bash_history
Regardless of the chosen time display format, stored timestamps are always in epoch time. Notably, setting $HISTTIMEFORMAT doesn’t retroactively add the timestamp comments in the history file. Just like resetting it doesn’t remove them.
3. Manual Deduplication
After getting to know the Bash history format, let’s see how to manually remove duplicates from it post-factum.
Here, we treat $HOME/.bash_history as a regular file. Because of this, it would be best to dump everything to the file before running any deduplication attempt, as the latter wouldn’t apply to commands in memory:
$ history -a
$ history -w
Of course, the standard uniq and sort commands both have the –unique or -u flag for removing duplicates. However, both require a sort, which could break the chronological order of our history.
Due to that, we take a refined approach involving awk to remove duplicates without sorting:
$ cat $HOME/.bash_history
$ awk '!a[$0]++' $HOME/.bash_history
Here, awk removes the second instance of test from the file:
- $0 is the current line
- a is an associative array mapping each line to the number of its occurrences
- ++ increments each element in the array as the lines come
- ! ensures that any non-zero (already occurred) element returns false, preventing the line from being printed
If the results are acceptable, we can commit them to the original file:
$ awk '!a[$0]++' $HOME/.bash_history > $HOME/.bash_history.tmp &&
mv $HOME/.bash_history.tmp $HOME/.bash_history
There are three main drawbacks to this solution:
- no deduplication of in-memory history
- possible confusion when .bash_history contains timestamps
- has to be reapplied on new entries
While we can address all by setting a trigger to first write out the history and then modify the awk command to check for and remove timestamp comments, there is a possibly better option.
4. $HISTCONTROL and $HISTIGNORE
Part of the Bash history facility is the $HISTCONTROL settings variable.
By setting $HISTCONTROL to a colon-separated list of values, we can control the stream of commands committed to history:
- ignorespace – if a command begins with space, it doesn’t go to the history
- ignoredups – if a line matches the previous line, it doesn’t get readded to the history
- ignoreboth – combines ignorespace and ignoredups
- erasedups – similar to ignoredups, but removes all lines like the current one and adds it to the history list
In essence, the last option is closest to our needs. While erasedups doesn’t deduplicate the whole history retroactively before it sees a duplicate, it still avoids future duplicates both in-memory and in the file.
To be sure, we can simply set the $HISTCONTROL variable with all possible options on system start:
In addition, there’s the $HISTIGNORE variable, which includes a colon-separated list of patterns to ignore the history for. With its globbing support, where & ampersand matches the previous history line and * wildcards work as usual, we can simulate ignoreboth, but not erasedups.
In this article, we looked at duplicates in the Bash history list, as well as how to deal with them.
In conclusion, there are two main ways to deal with duplicates, but one works only retroactively, while the other mainly works proactively. This way, applying the first and setting up the second ensures there are no duplicates in the Bash history.