File and directory comparisons have many uses. From pre-backup assessments, through collaboration and version control systems, to administrative verifications, knowing how to check data and its structure for similarities can save time and resources.
In this tutorial, we explore how to compare a local to a remote directory. First, we create a simple setup. Next, we discuss comparing directories in general, and we pick a tool for our needs. After that, we discuss mounts as a possible option when comparing local and remote directories. Then, we turn to a ubiquitous remote access protocol for possible solutions. Finally, we leverage a classic tool for remote synchronization.
We tested the code in this tutorial on Debian 11 (Bullseye) with GNU Bash 5.1.4. It should work in most POSIX-compliant environments.
2. Sample Directory Structure
To demonstrate comparisons, we have three directories:
- /dir1 on localhost, i.e., localhost:/dir1
- /dir2 on localhost, i.e., localhost:/dir2
- /rdir on xost, i.e., xost:/dir2
Let’s include some example contents in /dir1:
$ tree /dir1 /dir1 ├── file1 ├── file2 ├── file3 └── subdir └── subfile 1 directory, 4 files
Similarly, we also populate /rdir on xost:
$ tree /rdir /rdir ├── file1 └── subdir └── subfile └── subfile2 1 directory, 3 files
Here, we use tree to confirm the directory contents.
3. Comparing Directories
The GNU diffUtils package contains de facto standard tools for file and directory comparison.
In it, there are several commands for the purpose:
- cmp for comparisons of two files by character, usually best for binary data
- diff3 for comparisons of three files by line
- sdiff for side-by-side file and directory comparisons
- diff for general file and directory comparisons
Usually, we’d only focus on the main diff command since it can compare two local directories:
$ diff /dir1 /dir2 [...]
We can also choose only to summarize whether files differ by adding the –brief flag:
$ diff --brief /dir1 /dir2 [...]
This way, diff outputs the directory structure differences, as well as which files have the same names but different content.
However, diff won’t work directly if we simply supply a remote host path like xost:/rdir. Let’s see how we can work around that.
4. Remote Directory Comparison via Mounts
The standard way of making a remote path look like a local one is via mounts.
For example, assuming we have public key authentication already in place, we can perform an SSHFS mount and run diff directly on the directories of interest:
$ mkdir /xost-rdir $ sshfs -o allow_other,default_permissions xost:/rdir /xost-rdir $ diff /dir1 /xost-rdir [...]
The mount integrates the remote xost:/rdir path into the local filesystem tree at /xost-rdir. Of course, we can also employ CIFS and SAMBA, NFS, and others instead of SSHFS.
In any case, diff and any other comparison tools can work seamlessly on the mounted directory. Still, there are other options.
5. Remote Directory Comparison via SSH
Working over the ubiquitous Secure Shell (SSH) protocol with default public key authentication already in place, we can compare directory structures in several steps.
5.1. Acquire Directory Structure
The first step is to use tools that return a uniform representation of the data we’re after.
To get a view of the directory structure and files, we can use one of many methods:
- tree has flags to flatten, sort, and recursively output directory trees
- ls can add more information to a directory structure
- find with its -printf switch can refine the data format of its output
- du may require some further output processing with cut, but also works
For example, let’s use tree to get our directory structure into a file:
$ tree /dir1 > /dir1.tree $ cat /dir1.tree /dir1 ├── file1 ├── file2 ├── file3 └── subdir └── subfile 1 directory, 4 files
After this, we have the local data in the /dir1.tree file. Getting a remote directory structure via SSH is relatively straight-forward:
$ ssh xost 'tree /rdir' > /rdir.tree $ cat /rdir.tree /rdir ├── file1 └── subdir └── subfile └── subfile2 1 directory, 3 files
Now, we have what we need in files. In our case, that’s only the paths and their relation.
5.2. Compare Results
At this point, we can simply choose a method to compare the files that contain directory tree data:
$ sdiff /dir1.tree /rdir.tree /rdir | /dir1 ├── file1 ├── file1 > ├── file2 > ├── file3 └── subdir └── subdir └── subfile └── subfile └── subfile2 < 1 directory, 3 files | 1 directory, 4 files
Using sdiff usually allows for an easier side-by-side comparison of the results.
5.3. Single Command
Finally, we can combine all of the above into a single command:
$ sdiff <(tree /dir1) <(ssh xost 'tree /rdir')
Using process substitution, the compact command above yields the same results as earlier.
6. Remote Directory Comparison via rsync
As usual, we can employ rsync for directory tasks.
Let’s use it on our example directories:
$ rsync --dry-run --delete --recursive --links --out-format=%n /dir1/ xost:/rdir/ file1 file2 file3 deleting subdir/subfile2 subdir/subfile
Critically, we can use remote paths like xost:/rdir/, but any path should end with a / slash.
With its –dry-run or -n flag, rsync can output what it would do without actually doing it while adding –delete ensures we see any file object discrepancies between the two paths. Further, we use –recursive traversing, making sure to resolve –links and output each operation (%n) in the specified –out-format. Thus, we use rsync to compare directory contents.
If we’re also interested in the file contents, removing –out-format and adding the –itemize-changes flag of rsync along with –checksum will show changes per file:
$ rsync --dry-run --delete --recursive --links --checksum --itemize-changes /dir1/ xost:/rdir/ >fcsT...... file1 >f+++++++++ file2 >f+++++++++ file3 *deleting subdir/subfile2
Although they are pretty self-explanatory, we can use the section about –itemize-changes in the rsync manual to decode all output prefixes:
YXcstpoguax path/to/file ||||||||||| ||||||||||╰- x: extended attribute information changed |||||||||╰-- a: ACL information changed ||||||||╰--- u: (reserved for future use) |||||||╰---- g: group is different ||||||╰----- o: owner is different |||||╰------ p: permissions are different ||||╰------- t: modification time is different |||╰-------- s: size is different ||╰--------- c: checksum is different (for regular files), or || changed value (for symlinks, devices, and special files) |╰---------- file type: | f: file, | d: directory, | L: symlink, | D: device, | S: special file (e.g., named socket, FIFO) ╰----------- update type: <: file should go to the remote host (send) >: file should go to the local host (receive) c: local change or creation of the item h: item is a hard link to another item (requires --hard-links) .: item is not being updated (does not include attributes) *: rest is a message (e.g., "deleting")
Just like our earlier example, understanding this output is the key to knowing the differences between directories.
In this article, we explored how to perform a comparison between a local and a remote directory. We looked at several options, including mounts, SSH, and rsync.
In conclusion, depending on the scenario, mounts are usually the best way to work with local and remote directories at the same time, but there are alternatives.