Remove and Avoid Duplicate Entries in Bash History

1. Introduction

The Bash history feature has many uses:

audit – provide a list of commands each user has run, potentially with a date and time
automation – enable searching for a past command and rerunning it
conciseness – provide a way to prune history so it only includes commands of interest
flexibility – employs the fc and history builtins, but also the ! exclamation point operator

Naturally, a comprehensive function may also require maintenance.

In this tutorial, we look at ways to avoid and remove duplicate entries in the Bash history files. First, we explore the Bash history file format. After that, we see how to manually remove duplicates. Next, we check the history control variables, which help us prevent duplicates in the future. Finally, we explain that combining both approaches leads to the best results.

We tested the code in this tutorial on Debian 11 (Bullseye) with GNU Bash 5.1.4. It should work in most POSIX-compliant environments.

2. Bash History File Format

Importantly, the command history is kept in memory while Bash is running and only written out to the respective file periodically, on exit, or manually, with the -a or -w flags of history.

The default format of the $HOME/.bash_history (or $HISTFILE) file is fairly basic:

$ cat $HOME/.bash_history
echo First command.
echo Baeldung
mkdir /dir
cd /dir
cat <

In general, most commands in the history file appear as they were written at the prompt, one per row, in chronological order. However, there can be exceptions. Depending on the syntax, here-strings may break the mechanics as well.

By setting $HISTTIMEFORMAT to the appropriate strftime() format, we can add the exact timestamp of each command:

$ HISTTIMEFORMAT=''
$ history
    1  HISTTIMEFORMAT=''
    2  history
$ HISTTIMEFORMAT='%s'
$ history
    1  1679906660HISTTIMEFORMAT=''
    2  1679906661history
    3  1679906667HISTTIMEFORMAT='%s'
    4  1679906668history

This is reflected as comments in the $HOME/.bash_history file:

$ cat $HOME/.bash_history
#1679906660
HISTTIMEFORMAT=''
#1679906661
history
#1679906667
HISTTIMEFORMAT='%s'
#1679906668
history

Regardless of the chosen time display format, stored timestamps are always in epoch time. Notably, setting $HISTTIMEFORMAT doesn’t retroactively add the timestamp comments in the history file. Just like resetting it doesn’t remove them.

3. Manual Deduplication

After getting to know the Bash history format, let’s see how to manually remove duplicates from it post-factum.

Here, we treat $HOME/.bash_history as a regular file. Because of this, it would be best to dump everything to the file before running any deduplication attempt, as the latter wouldn’t apply to commands in memory:

$ history -a
$ history -w

Of course, the standard uniq and sort commands both have the –unique or -u flag for removing duplicates. However, both require a sort, which could break the chronological order of our history.

Due to that, we take a refined approach involving awk to remove duplicates without sorting:

$ cat $HOME/.bash_history
test
VAR='value'
test
echo 'Last.'
$ awk '!a[$0]++' $HOME/.bash_history
test
VAR='value'
echo 'Last'

Here, awk removes the second instance of test from the file:

$0 is the current line
a[] is an associative array mapping each line to the number of its occurrences
++ increments each element in the array as the lines come
! ensures that any non-zero (already occurred) element returns false, preventing the line from being printed

If the results are acceptable, we can commit them to the original file:

$ awk '!a[$0]++' $HOME/.bash_history > $HOME/.bash_history.tmp &&
   mv $HOME/.bash_history.tmp $HOME/.bash_history

There are three main drawbacks to this solution:

no deduplication of in-memory history
possible confusion when .bash_history contains timestamps
has to be reapplied on new entries

While we can address all by setting a trigger to first write out the history and then modify the awk command to check for and remove timestamp comments, there is a possibly better option.

4. $HISTCONTROL and $HISTIGNORE

Part of the Bash history facility is the $HISTCONTROL settings variable.

By setting $HISTCONTROL to a colon-separated list of values, we can control the stream of commands committed to history:

ignorespace – if a command begins with space, it doesn’t go to the history
ignoredups – if a line matches the previous line, it doesn’t get readded to the history
ignoreboth – combines ignorespace and ignoredups
erasedups – similar to ignoredups, but removes all lines like the current one and adds it to the history list

In essence, the last option is closest to our needs. While erasedups doesn’t deduplicate the whole history retroactively before it sees a duplicate, it still avoids future duplicates both in-memory and in the file.

To be sure, we can simply set the $HISTCONTROL variable with all possible options on system start:

export HISTCONTROL=ignoreboth:erasedups

In addition, there’s the $HISTIGNORE variable, which includes a colon-separated list of patterns to ignore the history for. With its globbing support, where & ampersand matches the previous history line and * wildcards work as usual, we can simulate ignoreboth, but not erasedups.

5. Summary

In this article, we looked at duplicates in the Bash history list, as well as how to deal with them.

In conclusion, there are two main ways to deal with duplicates, but one works only retroactively, while the other mainly works proactively. This way, applying the first and setting up the second ensures there are no duplicates in the Bash history.

Administration

Scripting

Networking

Files

Processes

Full Archive

About Baeldung

1. Introduction

2. Bash History File Format

3. Manual Deduplication

4. $HISTCONTROL and $HISTIGNORE

5. Summary