1. Overview

A common task when working on the Linux command-line is searching for a string or a pattern and then replacing or deleting it. However, there are special characters that can cause this common task to be less trivial than we anticipate.

In this tutorial, we’ll explore several approaches to remove newline characters using tools such as tr, awk, Perl, paste, sed, Bash, and the Vim editor.

2. Preparing Our Example File

Before we start, let’s create a text file named some_names.txt that we’ll use to apply all our strategies:

$ cat > some_names.txt << _eof_
Martha,
Charlotte,
Diego,
William,
_eof_

The goal is to end up with a CSV-like file with the content:

Martha,Charlotte,Diego,William,

3. Using tr

To delete or replace some characters by specific others, we think of tr because it’s easy to use.

The command tr uses the standard input (stdin), performs some operations (translate, squeeze, delete), and then copies the result to the standard output (stdout).

We’ll now focus on the “delete” operation. With the parameter -d, we define a set of characters that we want tr to remove.

Since we just want to delete the newlines, we place only this character in the set and then redirect the standard output to a new CSV file:

$ tr -d "\n" < some_names.txt > some_names.csv

Now, let’s see the content of our CSV file:

$ cat some_names.txt
Martha,Charlotte,Diego,William,

4. Using awk

The awk program is a well-known, powerful, and useful tool that allows us to process text using patterns and actions.

It lets us perform some operations in a very straightforward way, with the help of some tricks:

$ awk 1 ORS='' some_names.txt > some_names.csv

Let’s see the content of our CSV file:

$ cat some_names.csv
Martha,Charlotte,Diego,William,

Let’s take a closer look to understand how we solved the problem.

We wrote the pattern “1” because it evaluates to true (allowing the record to be processed), then, with the absence of action, awk makes the default action, which is to print the entire record terminated with the value of the ORS variable.

Then we define the ORS (Output Record Separator) variable, which is set to newline by default, to be the empty string.

Following these two steps, we consumed every record, then printed them using the empty string as the output record separator. In other words, we simply ignored the newline.

Another way is to use it as an awk program text:

$ awk 'ORS="";1' some_names.txt

And an extended version of that would be:

$ awk 'BEGIN{ ORS="" } { print $0 }' some_names.txt

Here, we do the same, but this time, we use the BEGIN pattern, which executes the action of defining the ORS variable before any of the input is read, and then, printing the $0 variable, which contains the whole record (usually an entire line of the input).

5. Using Perl

Perl is a language that has a great set of features for text processing.

We’ll use the Perl interpreter in a sed-like way:

$ perl -pe 's/\n//' some_names.txt > some_names.csv

Let’s take a look at how this command works:

  • -p tells Perl to assume the following loop around our program
  • -e tells Perl to use the next string as a one-line script
  • ‘s/\n//’ is the script that instructs to Perl to remove the \n character

And now, let’s review our CSV file:

$ cat some_names.csv
Martha,Charlotte,Diego,William,

6. Using paste

The paste program is a utility that merges lines of files, but we can also use it to remove newlines.

Let’s try with the next one-liner:

$ paste -sd "" some_names.txt > some_names.csv 

Now, let’s check our CSV file:

$ cat some_names.csv
Martha,Charlotte,Diego,William,

We’re able to achieve this because paste has the parameters -s, which pastes one file at a time leaving each one as a row, and -d, which allows us to define the empty string as the delimiter.

With these two paste options, we can get what we want without mentioning the newline.

7. Using sed

When we talk about processing text, the sed stream editor usually comes to mind, regardless of the problem.

The script ‘s/<pattern>/replacement/’ is commonly used in sed.

Let’s use it to replace the line endings and see what happens:

$ sed 's/\n//g' some_names.txt
Martha,
Charlotte,
Diego,
William,

And there’s no change because sed reads one line at a time, and then the newline is always stripped off before it’s placed into the pattern space.

Let’s try with this new one-liner:

$ sed ':label1 ; N ; $! b label1 ; s/\n//g' some_names.txt > some_names.csv

Next, let’s see what’s inside our CSV file:

$ cat some_names.csv
Martha,Charlotte,Diego,William,

Now we have what we wanted.

Let’s break down each section (separated by the semicolon) of the script to understand how it works:

  • :label1 creates a label named label1
  • N tells sed to append the next line into the pattern space
  • $! b label1 tells sed to branch (go to) our label label1 if not the last line
  • s/\n//g removes the \n character from what is in the pattern space

In other words, with all these pieces together, we construct a loop that finishes when sed is in the last line of the input.

8. Using a Bash Command-Line Script

Bash is installed in most Linux distributions, so we could try to use it to get what we want.

One option that we could use is a while loop:

$ while read row
do 
    printf "$row"
done < some_names.txt > some_names.csv

Here, in the while loop and with the help of the Bash built-in read, we read the content of the file some_names.txt, and then we assign each line to the variable row.

After that, the built-in printf prints that line without the newline. And finally, we redirect the output to our CSV file.

We can achieve the same with the help of the readarray built-in, the IFS variable, and the parameter expansion mechanism:

$ OLDIFS=$IFS ; IFS='' ; readarray -t file_array < some_names.txt ; echo "${file_array[*]}" > some_names.csv ; IFS=$OLDIFS

Bash is full of tricks, and we’re using a few of them here. Let’s understand it section by section:

  • OLDIFS=$IFS: We save the current variable IFS into the OLDIFS variable.
  • IFS=”: We define IFS to the empty string
  • With readarray -t file_array, we assign to the array file_array the content of the some_names.txt file removing the newline from each row
  • With “${file_array[*]}”, Bash expands each value of the array file_array, separated by the first character of the IFS variable
  • Finally, we restore the IFS variable

But we can be a little trickier using a subshell:

$ (
readarray -t file_array < some_names.txt;
IFS='';
echo "${file_array[*]}" > some_names.csv;
)

This is equivalent while keeping our current IFS variable safe, thanks to the fact that variables inside the subshell aren’t visible outside of it.

It’s worth mentioning that the IFS variable is special. The default value of the Bash IFS variable is <space><tab><newline>, or ” \t\n”.

Finally, let’s see what is now inside our CSV file:

$ cat some_names.csv
Martha,Charlotte,Diego,William,

9. Using the Vim Editor

In Linux, we have many editor flavors, but let’s focus on one of the most famous.

Vim (Vi Improved) is an editor equipped with a lot of useful utilities.

Let’s open our example file into the Vim editor:

$ vim some_names.txt
Martha,
Charlotte,
Diego,
William,

Next, let’s write the command %s/\n// and save it to our CSV file.

Right now, we have something like this:

Martha,Charlotte,Diego,William,

Now, let’s save the content into a file called some_names.csv.

To finish this section, let’s understand what happened. With the command s/\n//, we remove every \n character. And with the % sign, Vim applies this in all the lines of the file.

10. Conclusion

Removing newlines leads us to think about strategies beyond those that delete common characters. In this article, we’ve reviewed some of these strategies using commands like tr, awk, Perl, paste, sed, Bash, and the Vim editor.

guest
0 Comments
Inline Feedbacks
View all comments