Merge Two Files Line By Line in Linux

1. Overview

We know that we can use the command cat file1 file2 to concatenate multiple files. However, sometimes, we want to combine two files column-wise.

In this tutorial, we’ll learn how to do this under the Linux command line.

2. Introduction to the Problem

Sometimes, we want to combine two files column-wise. In other words, display them side by side so that we can read or compare their content easier.

Let’s see a simple example:

$ head left.txt right.txt 
==> left.txt <==
I am line 1 on the left.
I am line 2 on the left.
I am line 3 on the left.
I am line 4 on the left.

==> right.txt <==
Right side: line #1
Right side: line #2
Right side: line #3
Right side: line #4

As the output above shows, we have two files, left.txt and right.txt. Now, we want to have a new file or output like this:

I am line 1 on the left.	Right side: line #1
I am line 2 on the left.	Right side: line #2
I am line 3 on the left.	Right side: line #3
I am line 4 on the left.	Right side: line #4

The example shows the best case. All lines in the left.txt have the same length. Therefore, it’s easy to get a pretty aligned output.

However, in the practical, the length of lines in the left.txt can be various. We may want the output still to be pretty aligned.

Also, the number of lines in the two files can be different. Sometimes, we would like to identify if a file has reached the end in the produced side-by-side view.

Furthermore, in some cases, it would be good if we can customize the separator between the contents of two files.

In this tutorial, we’ll explore different solutions to cover all these aspects.

Next, let’s see them in action.

3. Displaying Two Files Side by Side – the paste Command

The paste command can merge lines of multiple files. Also, it’s pretty easy to use:

$ paste left.txt right.txt 
I am line 1 on the left.	Right side: line #1
I am line 2 on the left.	Right side: line #2
I am line 3 on the left.	Right side: line #3
I am line 4 on the left.	Right side: line #4

As we can see, we just intuitively pass the two files to the paste command. It’ll merge the lines from the input files.

The default delimiter of the paste command is a tab.

In this example, the contents of the two files are also clearly separated by tabs since all lines in the left.txt have the same length.

4. Pretty Align the Output – the column Command

We’ve seen the paste command can easily display two files side by side.

However, as we’ve mentioned, in practice, the format of input files may be various. For example, the lengths of lines in the left.txt file can be different. Thus, the output format can be messy.

4.1. The New Problem

Let’s see another pair of files, the left2.txt and right2.txt:

$ cat -n left2.txt
     1	I'm line1.
     2	I am line2, I am very very very long.
     3	
     4	Hi, I'm line4.

$ cat right2.txt 
Line1 on the right
Line2 on the right
Line3 on the right
Line4 on the right
Line5 on the right
Line6 on the right

In the left2.txt file, we have four lines. Also, the third line is empty. We’ve used the cat command with the -n option to add line numbers to the output so that we can identify the empty lines.

Further, the second line is longer than the other lines in the file.

On the other hand, the right2.txt has six lines of the same length.

Now, if we pass the two files to the paste command, we’ll get:

$ paste left2.txt right2.txt 
I'm line1.	Line1 on the right
I am line2, I am very very very long.	Line2 on the right
	Line3 on the right
Hi, I'm line4.	Line4 on the right
	Line5 on the right
	Line6 on the right

The paste command lists the contents of the two files still side by side. However, obviously, the format is messy and not easy to read.

Next, let’s pretty align the output.

4.2. The column Command

The column command is a member of the util-linux package, which is by default available on most modern Linux distributions.

As its name implies, the column command is good at reformating inputs in columns. In addition, it provides many options to control the output alignment and format.

But for solving our problem, basically, we need only two options:

-s – Setting the separator between fields
-t – Asking column to reformat the content as a table. That is, the columns will be aligned

We’ve learned that the default delimiter of the paste command is a tab. Therefore, we can tell the column command to use a tab as the separator to reformat the result:

$ paste left2.txt right2.txt | column -s $'\t' -t 
I'm line1.                             Line1 on the right
I am line2, I am very very very long.  Line2 on the right
                                       Line3 on the right
Hi, I'm line4.                         Line4 on the right
                                       Line5 on the right
                                       Line6 on the right

Now, the output looks much nicer.

When we pass the tab to column‘s -s option, we may notice that we used a special format: $’\t’.

If we pass the string ‘\t’, the column command won’t treat it as a tab. Instead, it’ll use the ‘t‘ character as the separator. That is, the ANSI C standard backslash-escaped characters won’t be expanded.

However, when we use the form $’…’, the backslash-escaped characters will be replaced as specified by the ANSI C standard.

4.3. Choosing the Separator

We’ve seen how column -s$’\t’ -t command aligns the output. However, a question may come up: What if the contents of the input files contain tabs already?

Yes, if the files contain tabs, this column command won’t produce our expected result.

The solution would be picking an unused character as the separator of paste and column commands.

We can use the paste command’s -d option to set a delimiter to overwrite the default tab. Later, when we pipe the result to the column command, we can use the same character as the separator.

Actually, choosing an unused character isn’t an easy task, as we cannot predicate what characters are in the input files.

However, we can choose some invisible characters that usually won’t be contained in file contents, such as:

\a – Alert bell
\e – Escape
\f – Form feed
\v – Vertical tab
\x01 – Starting of heading
\x02 – Starting of text
…

Next, let’s see an example to use the ‘\a‘ character as to separator to get a pretty-aligned output:

$ paste -d $'\a' left2.txt right2.txt | column -s $'\a' -t
I'm line1.                             Line1 on the right
I am line2, I am very very very long.  Line2 on the right
                                       Line3 on the right
Hi, I'm line4.                         Line4 on the right
                                       Line5 on the right
                                       Line6 on the right

As the output above shows, we’ve got the expected result.

However, sometimes we want to customize the separator between the contents of two files.

Also, looking at the output, we cannot tell if the left2.txt has four lines or six lines. So, when two files contain a different number of lines, in the result, we may want to identify the shorter file has no more lines.

Next, let’s see how to achieve these requirements.

5. Customizing the Separator and Identifying Non-exist Lines

awk is a powerful text-processing utility, and it can certainly process multiple files. awk has defined a C-like script language. This makes awk pretty flexible to process text.

Let’s say we would like to use the string “<- = ->” to separate lines from two input files.

Also, if one file has no more lines, we would like to display the text: “[ File Ended. No More Lines ]“.

5.1. The awk Solution

Now, let’s see how awk solves the problem:

awk -v sep='<- = ->' -v no_line_txt='[ File Ended. No More Lines ]' '
    NR==FNR { max_length = (length > max_length)? length : max_length
              left_lines = FNR
              left[FNR] = $0
              next
            }
    { printf "%-*s %s %s\n", max_length, (FNR in left? left[FNR] : no_line_txt), sep, $0 }
    END     { if (FNR < left_lines) {
                for (i=FNR+1; i <= left_lines; i++)
                    printf "%-*s %s %s\n", max_length, left[FNR], sep, no_line_txt
              }
            }
'  left2.txt right2.txt

Let’s see what output the command above will produce:

I'm line1.                            <- = -> Line1 on the right
I am line2, I am very very very long. <- = -> Line2 on the right
                                      <- = -> Line3 on the right
Hi, I'm line4.                        <- = -> Line4 on the right
[ File Ended. No More Lines ]         <- = -> Line5 on the right
[ File Ended. No More Lines ]         <- = -> Line6 on the right

Good! The output is exactly what we expected.

Next, let’s understand how the awk command does the job.

5.2. How Does the awk Command Work?

Now, let’s walk through the awk code and understand how it works:

awk -v sep='<- = ->' -v no_line_txt='[ File Ended. No More Lines ]' '

Here, we declared two awk variables sep and no_line_txt to store the custom separator and the hint for non-exist lines.

Next, we’ll handle the first file:

NR==FNR { max_length = (length > max_length)? length : max_length
          left_lines = FNR
          left[FNR] = $0
          next
        }

First, we go through the first file and find out the length of the longest line (max_length) and the number of lines (left_lines). Also, we store all lines in an associative array: left[lineNumber]=The line’s text.

Then, awk starts reading the second input file:

{ printf "%-*s %s %s\n", max_length, (FNR in left? left[FNR] : no_line_txt), sep, $0 }

Since we’ve known max_length, we can dynamically control the padding using the printf action to align the lines from the second file.

Once the FNR of the second file is not in the array left[], it means the second file has more lines than the first file. For those additional lines, we print the value of no_line_txt on the left side.

Of course, we separate the left and right sides by the sep variable.

END     { if (FNR < left_lines) {
            for (i=FNR+1; i <= left_lines; i++)
                printf "%-*s %s %s\n", max_length, left[FNR], sep, no_line_txt
          }
        }

In our example, the second file contains more lines than the first input file. However, for different input files, the situation can be the opposite.

Therefore, in the END block, we need to check if the second file has fewer lines. If this is the case, we should print no_line_txt on the right side for the missing lines.

Finally, we need to feed the awk command two input files:

'  left2.txt right2.txt

The lines in the first input will be listed on the left side, while the content of the second file will be listed on the right side.

6. Conclusion

In this article, we’ve learned how to column-wise combine two input files through examples.

In most cases, paste and column commands can help us to achieve our goals.

Also, the flexible awk script allows us to do more customizations.

Learn Java Collections

Learn Spring

Learn Maven

View All Courses

Administration

Scripting

Networking

Files

Processes

Full Archive

About Baeldung