Learn through the super-clean Baeldung Pro experience:
>> Membership and Baeldung Pro.
No ads, dark-mode and 6 months free of IntelliJ Idea Ultimate to start with.
Last updated: June 30, 2024
While performing text manipulation, especially with big datasets, removing the last instance of a pattern can be challenging since common tools might not easily handle this task.
In this tutorial, we’ll learn how to remove the last occurrence of a pattern in a file using command-line utilities such as sed, awk, and tac.
Let’s take a look at the items.txt sample text file:
$ cat items.txt
item1,item2,item3,
item4,item5,item6,
item7,item8,item9,
The file contains comma-delimited values.
Unfortunately, the last line has an extra occurrence of a comma (,) after the item9 value. So, we aim to remove the last comma from the items.txt file.
Let’s explore how to solve this use case via the sed command-line utility.
We can start with one of the common sed idioms to read the entire file into the pattern space:
$ sed -E ':a;N;$!ba' items.txt
item1,item2,item3,
item4,item5,item6,
item7,item8,item9,
Now, let’s break this down to understand the nitty-gritty of the logic. To begin with, we add the a label to facilitate iteration. Then, we continue by appending the next line using the N and b functions until we reach the last line ($). Finally, we can see that the entire file is displayed because of the default behavior of sed to print the pattern space.
If we look closely, there are just two commands other than the label definition:
:a
N
$!ba
Finally, we can use a greedy match approach with the (.*),(.*) group in the substitution (s) command. This group-based substitution splits the entire pattern space into two groups, namely, \1 and \2, separated by a comma:
$ sed -E ':a;N;$!ba; s/(.*),(.*)/\1\2/' items.txt
item1,item2,item3,
item4,item5,item6,
item7,item8,item9
Thus, we get the correct results.
The greedy approach to read the entire file into the pattern space works fine for smaller datasets. However, we may notice performance issues with large datasets because of extensive memory utilization.
To optimize memory usage, we can employ the tac command to reverse the order of lines, remove the pattern, and then reverse the order of lines back again:
$ tac <file> | <sed script to remove> | tac
Since tac shows the file’s contents with the last line first and the first line last, we use sed to remove the last occurrence of the pattern in the first line containing it.
Let’s see the entire series of commands in action:
$ tac items.txt \
| sed -n -E ':remove_and_print;s/(.*),(.*)/\1\2/;t print_only; p; n; b remove_and_print :print_only; p; n; b print_only;' \
| tac
item1,item2,item3,
item4,item5,item6,
item7,item8,item9
Again, the approach works as expected. Let’s break down the regular expression (-E) used in the sed command.
We define two labels for flow control, namely, remove_and_print and print_only. Then, within the remove_and_print, we try to substitute the last occurrence of the pattern on that specific line. After a successful substitution, the flow is transferred to print_only:
:remove_and_print
s/(.*),(.*)/\1\2/;
t print_only;
p;
n;
b remove_and_print;
Moreover, within the print_only block, we continue to take the next (n) line and print (p) it:
:print_only;
p;
n;
b print_only;
Notably, the advantage of this approach is that we’re keeping a single line in the pattern space, so it doesn’t use much memory.
Let’s learn how we can use the awk utility to remove the last occurrence of a comma in the items.txt file.
Let’s start by looking at the remove_comma.awk script in its entirety:
$ cat remove_comma.awk
function sub_at_position(line, position) {
len = length(line);
pre = substr(line, 1, position-1);
post = substr(line, position+1, len-position);
return pre post;
}
{
buffer[NR] = $0;
n = split($0, a, ",");
if (n > 1) {
last_occurrence = NR;
position_last_comma = length($0) - length(a[n]);
}
}
END {
for (i = 1; i <= NR; i++) {
if (i == last_occurrence) {
buffer[i]=sub_at_position(buffer[i], position_last_comma);
}
print buffer[i];
}
}
Now, we can go step by step to understand the code flow within the script:
Only for the last_occurrence line, we use the sub_at_position() function to remove the comma marked by the position_last_comma index.
Finally, let’s execute the remove_comma.awk script to remove the last occurrence of comma (,) in the items.txt file:
$ awk -f remove_comma.awk items.txt
item1,item2,item3,
item4,item5,item6,
item7,item8,item9
It looks like we nailed this one as well.
Like the greedy approach with the sed utility, the buffer-based approach with awk utilizes a lot of memory. So, it’s not preferred for large datasets. However, we can again optimize the approach by using tac.
In this case, tac reverses the items.txt file, removing the comma from the first matching line, and reversing it back:
$ tac items.txt | awk -f remove_comma_optimized.awk | tac
Now, let’s take a look at the remove_comma_optimized.awk script in its entirety:
$ cat remove_comma_optimized.awk
function sub_at_position(line, position) {
len = length(line);
pre = substr(line, 1, position-1);
post = substr(line, position+1, len-position-1);
return pre post
}
BEGIN {
is_done=0;
}
{
if (!is_done) {
n = split($0, a, ",");
if (n > 1) {
last_occurrence = NR;
position_last_comma = length($0) - length(a[n]);
$0=sub_at_position($0, position_last_commma);
is_done=1;
}
}
print $0
}
Next, we can understand the optimizations done in remove_comma_optimized.awk script over the remove_comma.awk script:
Lastly, let’s execute the remove_comma_optimized.awk script in combination with tac:
$ tac items.txt | awk -f remove_comma_optimized.awk | tac
item1,item2,item3,
item4,item5,item6,
item7,item8,item9
It works as expected and removes the last occurrence of a comma from the input file.
Vim is a versatile text editor that can be used for effective text manipulation. We can write a vim script to solve the use case of removing the last occurrence of a pattern in a file.
We can automate text editing operations using a vim script and run them repeatedly. So, let’s write a basic function in the remove_last_pattern.vim Vim script file:
$ cat remove_last_pattern.vim
function! RemoveLastPattern(pattern)
" Get the total number of lines in the file
let l:last_line_num = line('$')
" Move cursor to the end of the last line
normal! $
" Initialize a flag to track if pattern is found
let l:pattern_found = 0
" Get the current line where the cursor is positioned
let l:line = getline('.')
" Find the position of the last occurrence of the pattern in the current line
let l:pos = strridx(l:line, a:pattern)
" Search for the last occurrence in the entire file
for l:lnum in reverse(range(1, l:last_line_num))
let l:line = getline(l:lnum)
let l:pos = strridx(l:line, a:pattern)
if l:pos != -1
let l:line = l:line[:l:pos - 1] . l:line[l:pos + len(a:pattern):]
call setline(l:lnum, l:line)
let l:pattern_found = 1
break
endif
endfor
endfunction
" Map the function to a command for ease of use
command! -nargs=1 RemoveLast :call RemoveLastPattern()
Initially, the script can look overwhelming. However, it’s just a series of vim commands. Let’s look closer to understand the complete logic and each action within the RemoveLastPattern() function:
A string function, strridx(), finds the index of the last occurrence of a pattern in the current line The strridx() function returns -1 if a match isn’t found. If a match is found, we remove the pattern from the current line and use the break command to end the loop iterations.
Lastly, we create a custom command mapping RemoveLast that calls the RemoveLastPattern with exactly one argument (-nargs=1). Notably, <q-args> gets replaced by the argument passed to the RemoveLastPattern() function.
Let’s open the items.txt file using the vim command:
$ vim items.txt
Now, we source the remove_last_pattern.vim script so that we get access to the RemoveLast custom command:
:source remove_last_pattern.vim
Next, we can call the RemoveLast command with a comma (,) as the first argument:
:RemoveLast ,
At this point, we successfully removed the last occurrence of a comma (,) in the items.txt file.
Finally, after verifying the changes, we can choose to save the file:
:wq
Thus, we have a convenient way to implement and apply the use case in Vim.
In this article, we learned how to remove the last occurrence of a pattern in a file.
In particular, we explored command-line utilities, such as sed, awk, tac, and vim. The choice between these options depends on the context and preference of the user.