RSync - Remote Synchronization Protocol

Jakob Jenkov
Last update: 2022-07-16

RSync is a remote file (or data) synchronization protocol that enables you to synchronize a file stored on a local computer against a file stored on a remote computer - so that after synchronization the local and remote files are identical. If there are any differences between the local and remote file, RSync detects these differences and exchanges only the differences (+ merge instructions) between the local and remote computer, so the two files can be made identical.

Efficient Remote File Synchronization

As mentioned above, RSync is capable of synchronizing files without sending a whole file across the network.

In an implementation I've done, only data corresponding to about 2% of the total file size is exchanged, in addition to any new data in the file, of course. New data has to be sent across the wire, byte for byte - but could be sent zip-compressed if the data amounts are larger.

The more similar the files are, the less data RSync will have to exchange to perform the synchronization.

Efficient Way to Update an Old Version of a File to a Newer Version

.

The most obvious use case for RSync is if you have a local and remote copy of a file in different versions, and you want to either:

  • Update the remote file to the local version.
  • Update the local version to the remote version.

Since two different versions of the same file are most likely quite similar, RSync can synchronize such two files reasonably efficiently.

Resumable Uploads or Downloads

Because of the way RSync works, it can also be used as an incremental download / upload protocol, allowing you to upload or download a file over many sessions. If the current upload or download fails, you can just resume it later.

If you start an upload or download and it fails part-way e.g. due to a connection failure, you will have a complete file and a partial file. In case of an upload, the partial file will be remote. In case of a download, the partial file will be local. In both cases RSync will be able to detect the differences between the two files and exchange mostly the differences.

Obviously, if you know the two files are 100% identical in the parts already uploaded / downloaded, then it would be more efficient to resume the upload / download from the exact byte the upload / download failed. That would save the difference detection and merge instructions etc. of RSync. But RSync could solve the problem too.

This RSync tutorial explains the different parts of RSync on different pages. These pages are:

RSync Origins

RSync is also an executable (program) on unix systems, which implements the RSync protocol. This tutorial will focus on the protocol itself, though. The Java implementation I describe in the end of this tutorial is not compatible with the unix RSync implementation. However, this implementation gives you both client and server side tools, so you don't need the unix RSync implementation to use it.

Here is the original description of RSync.

Here is the Wikpedia page on RSync.

Jakob Jenkov

Featured Videos



Core Software Performance Optimization Principles

Thread Congestion in Java - Video Tutorial













Close TOC

All Trails

Trail TOC

Page TOC

Previous

Next