Comparing Strings Using the diff Command

1. Introduction

Sometimes in Linux, we wish to compare text. The diff command is a common solution for this. It provides a few different ways of outputting text differences.

In this short tutorial, we’ll look at how to use diff to compare files and strings.

2. diff Command

We can use the diff command to compare the contents of files. Likewise, it spots differences between two strings.

2.1. How diff Works

diff uses an algorithm to determine the longest common sequence (LCS) of lines that appear in both sources without any lines missing. As a result, the complexity of the LCS problem is O(N×M), where N and M are the sizes of the two input sequences.

When using the diff command, it points out two main types of differences: lines that have been removed, and lines that have been added. Removed lines appear in the first input but not the second, while added lines are the opposite – present in the second input but not the first.

2.2. Installation

Installing the diff command to our Linux system requires adding it using our package manager:

$ sudo apt update
$ sudo apt install diffutils

This provides the Linux system with the diff command if it’s found to be missing.

3. Comparison of Files and Strings

First, let’s view the contents of the files by employing the cat command:

$ cat Info1.txt

box
table
chair

$ cat Info2.txt

box 
table
monitor

Now we can use diff to compare the files:

$ diff Info1.txt Info2.txt

$ 3c3
< chair
---
> monitor

Here, < comes from the first file or string, while > is from the secondary file or string we are comparing. Then there’s — to indicate where diff switches to show us output from the second file.

3.1. Temporary Files

If we don’t already have our text for comparison in files, we can turn each string into a temporary file before using the diff command:

$ echo "example\nz\nx\ny" > str1.txt


$ echo "example\nz\nx\nv" > str2.txt


$ diff str1.txt str2.txt

These two files are created and then compared using diff :

$ 4c4
< y
---
> v

After comparing, we can remove these temporary files :

$ rm str1.txt str2.txt

3.2. Here Strings

In the Bash shell, Here strings work by creating a file from a string on the fly, extracting the information from it to present to the diff command:

$ diff <<< "example\nfirst\nstring" <<< "example\nsecond\nstring"

$ 2c2
< first
---
> second

In this, the here string converts the string into a readable file and the <<< allows the diff command to scan the string.

3.3. Process Substitution

First, let’s view the contents of both dir1 and dir2 using the ls command:

$ ls dir1
str1.txt
str2.txt

$ ls dir2

str1.txt
str4.txt

Following that, we use the diff command to compare the output of the directories from ls:

$ diff <(ls dir1) <(ls dir2)

$ 2c2
< str2.txt
---
> str4.txt

The diff command allows several methods to compare strings. We can use pure process substitution to swap an echo command for an input to the diff command:

$ diff <(echo "This\nis\na\ntext") <(echo "This\nis\na\nfile")

$ 4c4
< text
---
> file

The diff command displays the difference between the two directories and between the two strings.

**4. Output Formats of the diff Command**

Configuration of diff is done by adding the flags. We’ll see a few different output formats in this section.

4.1. Default Format

Let’s look at diff‘s default output format:

$ diff <(echo -e "example\n1\n2\n3") <(echo -e "example\n1\n5\n3")

$ 3c3
< 2
> 5

As we can see, the differences are shown, as with previous examples.

4.2. Unified Format

The unified format shows the coordinates of the difference in addition to the contrast and is used for version management. Here’s an example:

$ diff -u <(echo -e "x\ny\nz") <(echo -e "x\ny\nv")

$ @@ -1,3 +1,3 @@
 x
 y
-z
+v

The first part, which is the chunk header, shows us the exact coordinates of the difference in the strings which is in the third line in the example above. Then, the – indicates that it’s a part of the original string, while the + indicates that it’s a part of the second string.

4.3. Side-By-Side Format

The side-by-side form can be a little easier to read:

$ diff -y <(echo -e "This\nis\nsoftware") <(echo -e "This\nis\nhardware")

This                                      This
is                                        is
software                                | hardware

The | symbol shows the difference between the two strings. The left side is the original string and the right side is the secondary string.

4.4. The ed Script Format

Lastly, this format is a valid choice for using the diff command in the ed editor. This is useful for showing only the differences, unlike other formats that show a full comparison:

$ diff -e <(echo -e "L\ni\nn\nu\nx\na") <(echo -e "L\ni\nn\nu\nx\nb")

$ 6c
   b
   .

It shows us three essential pieces of information: the number of the line, c for change, and the character or part of the string needing change.

5. Conclusion

In this article, we discussed a few ways of using the diff command to compare strings.

Firstly, we saw the diff command in general and how to install it. Then we discussed the different forms of comparing the strings finding process substitution to be the most convenient.

We also looked at the different outputs of the diff command and found that all of them provide a sufficient result to each user who intends to use it for different purposes.

Finally, we discovered the versatility of using diff which led us to utilize it as the main way of comparing strings.

Full Archive

About Baeldung

Administration

Filesystems

Processes

Files

Scripting

Installation

Networking

Security