- audit – provide a list of commands each user has run, potentially with a date and time
- automation – enable searching for a past command and rerunning it
- conciseness – provide a way to prune history so it only includes commands of interest
- flexibility – employs the fc and history builtins, but also the ! exclamation point operator
Naturally, a comprehensive function may also require maintenance.
In this tutorial, we look at ways to avoid and remove duplicate entries in the Bash history files. First, we explore the Bash history file format. After that, we see how to manually remove duplicates. Next, we check the history control variables, which help us prevent duplicates in the future. Finally, we explain that combining both approaches leads to the best results.
2. Bash History File Format
Importantly, the command history is kept in memory while Bash is running and only written out to the respective file periodically, on exit, or manually, with the -a or -w flags of history.
The default format of the $HOME/.bash_history (or $HISTFILE) file is fairly basic:
$ cat $HOME/.bash_history echo First command. echo Baeldung mkdir /dir cd /dir cat <
In general, most commands in the history file appear as they were written at the prompt, one per row, in chronological order. However, there can be exceptions. Depending on the syntax, here-strings may break the mechanics as well.
$ HISTTIMEFORMAT='' $ history 1 HISTTIMEFORMAT='' 2 history $ HISTTIMEFORMAT='%s' $ history 1 1679906660HISTTIMEFORMAT='' 2 1679906661history 3 1679906667HISTTIMEFORMAT='%s' 4 1679906668history
This is reflected as comments in the $HOME/.bash_history file:
$ cat $HOME/.bash_history #1679906660 HISTTIMEFORMAT='' #1679906661 history #1679906667 HISTTIMEFORMAT='%s' #1679906668 history
Regardless of the chosen time display format, stored timestamps are always in epoch time. Notably, setting $HISTTIMEFORMAT doesn’t retroactively add the timestamp comments in the history file. Just like resetting it doesn’t remove them.
3. Manual Deduplication
After getting to know the Bash history format, let’s see how to manually remove duplicates from it post-factum.
Here, we treat $HOME/.bash_history as a regular file. Because of this, it would be best to dump everything to the file before running any deduplication attempt, as the latter wouldn’t apply to commands in memory:
$ history -a $ history -w
$ cat $HOME/.bash_history test VAR='value' test echo 'Last.' $ awk '!a[$0]++' $HOME/.bash_history test VAR='value' echo 'Last'
Here, awk removes the second instance of test from the file:
- $0 is the current line
- a is an associative array mapping each line to the number of its occurrences
- ++ increments each element in the array as the lines come
- ! ensures that any non-zero (already occurred) element returns false, preventing the line from being printed
If the results are acceptable, we can commit them to the original file:
$ awk '!a[$0]++' $HOME/.bash_history > $HOME/.bash_history.tmp && mv $HOME/.bash_history.tmp $HOME/.bash_history
There are three main drawbacks to this solution:
- no deduplication of in-memory history
- possible confusion when .bash_history contains timestamps
- has to be reapplied on new entries
4. $HISTCONTROL and $HISTIGNORE
Part of the Bash history facility is the $HISTCONTROL settings variable.
By setting $HISTCONTROL to a colon-separated list of values, we can control the stream of commands committed to history:
- ignorespace – if a command begins with space, it doesn’t go to the history
- ignoredups – if a line matches the previous line, it doesn’t get readded to the history
- ignoreboth – combines ignorespace and ignoredups
- erasedups – similar to ignoredups, but removes all lines like the current one and adds it to the history list
In essence, the last option is closest to our needs. While erasedups doesn’t deduplicate the whole history retroactively before it sees a duplicate, it still avoids future duplicates both in-memory and in the file.
To be sure, we can simply set the $HISTCONTROL variable with all possible options on system start:
In addition, there’s the $HISTIGNORE variable, which includes a colon-separated list of patterns to ignore the history for. With its globbing support, where & ampersand matches the previous history line and * wildcards work as usual, we can simulate ignoreboth, but not erasedups.
In this article, we looked at duplicates in the Bash history list, as well as how to deal with them.
In conclusion, there are two main ways to deal with duplicates, but one works only retroactively, while the other mainly works proactively. This way, applying the first and setting up the second ensures there are no duplicates in the Bash history.