How does rsync work in this simple example?

Question

Let's say:

a is a 256 MB file containing random bytes
b is the same file except it has one additional leading byte 0

Thanks to this answer, I discovered that rsync is able to compute a "binary diff patch" between these two files:

rsync --only-write-batch=patch b a

In this example, the patch file is ... only 65 KB, so it's very good.

In short, how did rsync detect so few byes were changed? I initially thought it would compare:

a[0:k] and b[0:k]
a[k+1:2k] and b[k+1:2k]
a[2k+1:3k] and b[2k+1:3k]
...
a[N-k:N] and b[N-k:N]

for various values of k, e.g. the biggest power of 2 possible (2^j), then if no match, 2^(j-1), then 2^(j-2), etc.

But for these files a and b, it would totally fail because since b is just a shifted of one byte, there would be no similar chunks at all! Then we would expect the patch to be ... 256 MB.

But here it works in a more clever way, how did the algorithm work in this simple example b = a byte concatenated with the content of a ?

Basj · Accepted Answer · 2020-02-03T23:20:14.687

2

Perhaps someone who knows this better can post another answer, but after further research, the key in rsync algorithm seems to be detailed in the paragraph "Determining which parts of a file have changed": Rolling hash.

Another useful reading: https://moinakg.wordpress.com/tag/rolling-hash/

vs. :

Another useful resource: http://tutorials.jenkov.com/rsync/overview.html

edited Feb 03 '20 at 23:20

answered Feb 03 '20 at 21:22

Basj

2,351
9
37
70

How does rsync work in this simple example?

1 Answers1