1. Overview

The while loop with the read command is a well-known and efficient way to read a file line by line. However, sometimes we need a way to accomplish more complicated tasks.

In this tutorial, we’ll learn how to read corresponding lines from two files.

2. Problem Statement

Let’s assume that we have two text files, fileA.txt and fileB.txt:

$ cat fileA.txt

File A - line #1
File A - line #2
File A - line #3

$ cat fileB.txt

File B - line #1
File B - line #2
File B - line #3

Next, we want to perform some operations on corresponding lines from both files. Then, let’s introduce a mock-up function two_lines_operation, which only prints its arguments:

two_lines_operation ()
{
    echo "Doing something with lines from two files:"
    printf 'fileA.txt line: %s\n' "${1}" 
    printf 'fileB.txt line: %s\n' "${2}" 
    printf '\n'
}

So, our desired output looks like this:

Doing something with lines from two files:
fileA.txt line: File A - line #1
fileB.txt line: File B - line #1

Doing something with lines from two files:
fileA.txt line: File A - line #2
fileB.txt line: File B - line #2

Doing something with lines from two files:
fileA.txt line: File A - line #3
fileB.txt line: File B - line #3

Finally, we’re going to put our function in the library mockup to be sourced in scripts.

3. The Nested Loop Approach

As the first attempt, let’s use the nested while read loop, at the same time keeping track of read lines.

Let’s take a look at the nreader script:

#!/bin/bash

source mockup # library with function to work with lines

countA=0
while read lineA
do
    countB=0
    while read lineB
    do
    	if [ "$countA" -eq "$countB" ]
        then
            two_lines_operation "$lineA" "$lineB"
            break
        fi
        countB=`expr $countB + 1`
        done < fileB.txt
     countA=`expr $countA + 1`
done < fileA.txt

We count the lines and print the second file’s line only when its number matches. Let’s check the result:

$ ./nreader

Doing something with lines from two files:
fileA.txt line: File A - line #1
fileB.txt line: File B - line #1

Doing something with lines from two files:
fileA.txt line: File A - line #2
fileB.txt line: File B - line #2

Doing something with lines from two files:
fileA.txt line: File A - line #3
fileB.txt line: File B - line #3

We met our demands, but the construction is far from optimal. Indeed, reading each time the full content of fileB.txt to pick out only one matching line is wasteful.

4. Reading Simultaneously From Both Files

Let’s extend the well-known while read loop to operate with two files. So, we’re going to read two files simultaneously within the same loop pass.

4.1. Bash 4.1 and Higher

Since Bash 4.1, we can open the files explicitly instead of caring about numerical file descriptors. Let’s apply it in the rreader script:

#!/bin/bash

source mockup

# open input files
exec {fdA}<fileA.txt
exec {fdB}<fileB.txt

while read -r -u "$fdA" lineA && read -r -u "$fdB" lineB
do
    two_lines_operation "$lineA" "$lineB"
done

exec {fdA}>&- {fdB}>&- # close input files

Let’s notice that exec {fd}<file_name opens the file and creates the file descriptor fd. Subsequently, we need to explicitly close the file related to the file descriptor with exec {fd}>&-.

Now, let’s check the output:

$ ./rreader

Doing something with lines from two files:
fileA.txt line: File A - line #1
fileB.txt line: File B - line #1

Doing something with lines from two files:
fileA.txt line: File A - line #2
# ... more output skipped

4.2. Earlier Bash Versions

When working with earlier Bash versions, we need to provide valid file descriptors on our’s own. Moreover, this way is still quite widespread. So, let’s rewrite the rreader script:

#!/bin/bash

source mockup

while read -r -u 3 lineA && read -r -u 4 lineB
do
    two_lines_operation "$lineA" "$lineB"
done 3<"fileA.txt" 4<"fileB.txt"

We set file descriptors equal to 3 for fileA.txt and 4 for fileB.txt. In addition, we should not use 0, 1, and 2 because they are reserved for stdin, stdout, and stderr, respectively.

5. Working With Arrays

Let’s notice that solutions showed so far processed file line by line, without reading the whole file’s content into memory. Now, let’s read the files’ lines into arrays and process them later.

Let’s study the areader script. First, let’s define empty arrays to store the lines. With this, we can simply append each line at the end of the array. Then, we fill the arrays in the read loop.

#!/bin/bash

source mockup

exec {fdA}<fileA.txt
exec {fdB}<fileB.txt

# define empty arrays
arrA=()
arrB=()

while read -r -u "$fdA" lineA && read -r -u "$fdB" lineB
do
    arrA+=("$lineA") # append line at the end of array
    arrB+=("$lineB")
done

exec {fdA}>&- {fdB}>&-

# check the result
for i in "${!arrA[@]}"
do
    two_lines_operation "${arrA[$i]}" "${arrB[$i]}"
done

Let’s notice that when printing arrays’ rows, we are looping over indices of the first array ${!arrA[@]}. It’s correct, as far as both arrays have equal lengths.

Next, let’s check the output:

$ ./areader

Doing something with lines from two files:
fileA.txt line: File A - line #1
fileB.txt line: File B - line #1

Doing something with lines from two files:
fileA.txt line: File A - line #2
# ... more output skipped

6. Using mapfile

The mapfile utility reads the file’s content into an array. Therefore, we don’t need to create an explicit loop.

The command has been a build-in since Bash version 4, and its alias is readarray.

6.1. Reading in Whole Files

Let’s read two files into corresponding arrays with the mareader script:

#!/bin/bash

source mockup

mapfile -t arrA < fileA.txt
mapfile -t arrB < fileB.txt

for i in "${!arrA[@]}"
do 
    two_lines_operation "${arrA[$i]}" "${arrB[$i]}"
done

Let’s notice the t flag to strip newlines from read lines.

Now let’s check the usual output:

$ ./mareader

Doing something with lines from two files:
fileA.txt line: File A - line #1
fileB.txt line: File B - line #1

Doing something with lines from two files:
fileA.txt line: File A - line #2
# ... more output skipped

6.2. Reading in a Part of the File

Let’s read only a part of the file at once. So, with the s switch, we discard an initial number of lines. Then, with the n switch, we only read a specified number of lines. As an example, we’re going to read only the second lines of both files:

#!/bin/bash

source mockup
mapfile -t -n1 -s1 arrA < fileA.txt
mapfile -t -n1 -s1 arrB < fileB.txt

for i in "${!arrA[@]}"
do 
    two_lines_operation "${arrA[$i]}" "${arrB[$i]}"
done

Let’s run the script:

$ ./mareader

Doing something with lines from two files:
fileA.txt line: File A - line #2
fileB.txt line: File B - line #2

7. Playing Around With paste

The paste command reads many input files at once and prints together corresponding lines. However, if we want to perform other operations, we need access to each line.

Therefore, let’s feed the while IFS loop with the paste‘s output. Because the tabulator is the default separator for the paste command, we’re going to use it as IFS too in the preader script:

#!/bin/bash

source mockup

while IFS=$'\t' read -r lineA lineB
do
    two_lines_operation "$lineA" "$lineB"
done < <(paste fileA.txt fileB.txt)

Let’s notice the use of the process substitution <(paste fileA.txt fileB.txt). Briefly, with this construct, read recognizes the paste‘s output as a file. Let’s pay attention to the space between two redirection operators <.

Now, let’s check the output:

$ ./preader

Doing something with lines from two files:
fileA.txt line: File A - line #1
fileB.txt line: File B - line #1

Doing something with lines from two files:
fileA.txt line: File A - line #2
# ... more output skipped

This script fails if the first file itself contains tab separators. Of course, we can use any other separator for both paste and IFS, choosing one that is not expected in fileA.txt.

8. Miscellaneous Commands

Now let’s look through solutions that use specific Bash commands to select lines. So, we’re going to read a line from the first file and then retrieve the corresponding one from the second file.

Let’s notice that this approach is similar to the nested loop example and is similarly less efficient than the other presented so far. However, for small files, it’d be an acceptable way.

8.1. The head and tail Solution

Let’s get the nth line from the file with the construct head -n | less -1:

#!/bin/bash

source mockup

nr=0

while read lineA
do
    nr=`expr $nr + 1`
    two_lines_operation "$lineA" "$(head -"$nr" fileB.txt | tail -1)"
done < fileA.txt

8.2. The sed Solution

Next, let’s use sed, the stream editor, to find the nth line and quit immediately:

#!/bin/bash

source mockup

nr=0

while read lineA
do
    two_lines_operation "$lineA" "$(sed "$(($nr + 1))q;d" fileB.txt)"
    nr=`expr $nr + 1`
done < fileA.txt

8.3. The awk Solution

Finally, let’s step into awk:

#!/bin/bash

source mockup

nr=0

while read lineA
do
    nr=`expr $nr + 1`
    two_lines_operation "$lineA" "$(awk "NR == $nr {print; exit;}" fileB.txt)"
done < fileA.txt

9. Conclusion

In this article, we learned how to read corresponding lines from two input files. We started from a naive, nested loop solution. Then, we used the while read loop to access both files simultaneously, within a single pass of the loop.

Then, we split the problem into two parts. First, we read the files into the Bash arrays, subsequently processing the arrays’ elements. We populated the arrays either directly with read or using the mapfile command.

Next, we approached the problem with the paste command.

Finally, we looked at miscellaneous Bash commands, which help read a line with a given number.

Comments are closed on this article!