Remote File Synchronization in Linux

1. Introduction

When doing administrative tasks like transferring files, taking backups of folders, or installing new software, we often need a flexible, efficient, and safe tool. rsync is a versatile tool that can satisfy all these needs and more, and it does so efficiently. Therefore, it’s a good investment for a Linux system administrator or user to learn how to use rsync.

In this tutorial, we’re going to learn how to use rsync to synchronize files and folders. We’ll learn how to do this on our local system, and also on remote machines.

2. A Sample Directory Structure

First, let’s create a sample directory structure. We’ll use this to run our rsync commands:

$ mkdir dir1
$ echo test > dir1/file.txt
$ echo test2 > dir1/file2.txt
$ mkdir dir1/nested-dir
$ echo test3 > dir1/nested-dir/file3.txt
$ mkdir dir2

This’ll create three files, one in a nested directory inside dir1/. Let’s check it using the tree command:

$ tree
dir1/
├── nested-dir
│ └── file3.txt
├── text-file2.txt
└── text-file.txt

3. Listing Files

As our first example, we can just use rsync to list the files in a directory. That’ll get us familiar with rsync‘s output format:

$ rsync dir1/
drwxrwxr-x 4,096 2021/04/04 11:38:24 .
-rw-rw-r-- 5 2021/04/04 10:02:32 text-file.txt
-rw-rw-r-- 5 2021/04/04 10:02:32 text-file2.txt
drwxrwxr-x 4,096 2021/04/04 11:38:24 nested-dir

It only lists the files that are in dir1/, ignoring the files in dir1/nested-dir/. To make rsync recursively walk directories, we use the -r flag:

$ rsync -r dir1/
drwxrwxr-x 4,096 2021/04/04 11:38:24 .
-rw-rw-r-- 5 2021/04/04 10:02:32 text-file.txt
-rw-rw-r-- 5 2021/04/04 10:02:32 text-file2.txt
drwxrwxr-x 4,096 2021/04/04 11:38:24 nested-dir
-rw-rw-r-- 6 2021/04/04 11:38:24 nested-dir/file3.txt

Note that adding / at the end of each directory path is important when using rsync. Otherwise, rsync will assume that it’s a file.

4. Synchronizing Local Directories

rsync can be used to synchronize local directories. An important property of rsync is that it works in one direction only. Therefore, if we want to sync two directories bidirectionally, we should run rsync twice, changing the directory order.
Let’s sync dir1/ to dir2/:

$ rsync -havun dir1/ dir2/

Let’s see what these command-line flags (h, a, v, u, n) mean:

The -h flag generates human-readable output.
The -a flag activates the archive mode. This is a shortcut for -rlptgoD, which means:

(-r) recursively walk directories
(-l) copy symlinks as symlinks
(-p) preserve permissions
(-t) preserve modification times
(-g) preserve group
(-D) preserve devices and special files

The -v flag increases verbosity to print out the details of what is done.
The -u flag activates update. This will skip files that are newer on the receiver.
The -n flag does a dry run. A dry run only simulates the sync and lists the output without actually syncing anything. It’s useful to see what we’ll be doing before actually running the command.

By doing a dry run (-n) before doing an actual sync operation, we can catch user errors that may cause data loss. Therefore, when using rsync, doing a dry run is an important first step.

Let’s run the command and check the output:

$ rsync -havun dir1/ dir2/
sending incremental file list
created directory dir2
./
text-file.txt
text-file2.txt
nested-dir/
nested-dir/file3.txt

If we’re sure these files are the ones that we want to sync, we can remove the -n flag and run the actual sync:

$ rsync -havu dir1/ dir2/
sending incremental file list
created directory dir2
./
text-file.txt
text-file2.txt
nested-dir/
nested-dir/file3.txt


sent 358 bytes received 111 bytes 938.00 bytes/sec
total size is 16 speedup is 0.03

If we run the sync a second time, rsync won’t detect any files to sync. Therefore, we’ll see that if we do a dry run again, it won’t output any files for syncing:

$ rsync -havun dir1/ dir2/
sending incremental file list


sent 199 bytes received 13 bytes 424.00 bytes/sec
total size is 16 speedup is 0.08 (DRY RUN)

We can also use this to test that our sync is completed successfully or to compare local and remote folders.

5. Synchronizing a Remote Directory

To sync a remote directory, we can use:

$ rsync -havuz dir1/ user@host:~/dir2/

We added a new flag, -z. It means the rsync will compress the data during transfer. As we are sending our files over the network, we’ll have performance gains.

We can also use rsync to sync a remote folder to a local one by just changing the parameter order:

$ rsync -havuz user@host:~/dir2/ dir1/

A limitation of rsync by design is that it can’t sync two remote directories. Therefore, to sync two remote directories, we should consider other options. We can either sync to our local system first, or we can first create an ssh connection to one of the remote systems and do the sync from there.

6. Synchronizing a Remote Directory Over SSH

To set the protocol to ssh, we can use the -e flag:

$ rsync -havuz -e ssh dir1/ user@host:~/dir2/

We can also set the port used by ssh using the same method:

rsync -havuz -e 'ssh -p 8022' dir1/ user@host:~/dir2/

7. Synchronizing a Single File

If we want to sync only a single file using rsync, we can just specify the filename by not using the trailing slash:

$ rsync -havuz dir1/text-file.txt dir2/

8. Using Other Flags

rsync also has some other flags that we may find useful while syncing folders.

8.1. Ignoring Existing Files

The –ignore-existing flag only syncs the files that are missing in the target directory. Let’s try it:

$ rsync -havuz --ignore-existing dir1/ dir2/

8.2. Deleting Files That Are Missing in the Source Directory

We use the –delete flag to do more strict synchronization. Let’s use it to delete files on the target directory if they don’t exist in the source directory:

$ rsync -havuz --delete dir1/ dir2/

8.3. Deleting Files in the Source Directory After Synchronization

If we’re doing a task like syncing backups, we may want to delete the synced files in the source directory once they’re successfully transferred to the target. To achieve this, we use the –remove-source-files flag:

$ rsync -havuz --remove-source-files dir1/ dir2/

8.4. Including and Excluding Files

Let’s determine which files we’re going to sync using –include and –exclude parameters:

$ rsync -havuz --include '*.jpg' --exclude '*' dir1/ dir2/

This command will include only the *.jpg files, excluding all others.

8.5. Setting the Maximum Size of Files to Be Transferred

Let’s set the maximum size of files to include in the transfer using the –max-size parameter:

$ rsync -havuz --max-size='100K' dir1/ dir2/

This can be helpful in many cases, such as if we’re interested in transferring only scripts and configuration files but not large binary files.

8.6. Displaying Progress

For sync operations that take longer to complete, we may want to show progress as the sync is running. Let’s add a progress indicator:

$ rsync -havuz --progress dir1/ dir2/

8.7. Using Checksum

By default, rsync uses the file size and last modification time to compare files. If we use the –checksum flag, rsync will additionally compare the checksum of the source and target files. It’ll calculate the checksums only if the files’ sizes and last modification times are equal. That’ll be much safer in terms of data loss. In exchange, it will spend additional computational resources and a little amount of additional bandwidth.

$ rsync -havuz --checksum dir1/ dir2/

9. Conclusion

In this article, we learned how to use the rsync command to sync files and directories. We also learned how to sync between both local and remote locations. Also, we learned various command-line options and parameters to change the behavior of rsync.

Full Archive

About Baeldung

Administration

Filesystems

Processes

Files

Scripting

Installation

Networking

Security