Let's say:
ais a 256 MB file containing random bytesbis the same file except it has one additional leading byte0
Thanks to this answer, I discovered that rsync is able to compute a "binary diff patch" between these two files:
rsync --only-write-batch=patch b a
In this example, the patch file is ... only 65 KB, so it's very good.
In short, how did rsync detect so few byes were changed? I initially thought it would compare:
- a[0:k] and b[0:k]
- a[k+1:2k] and b[k+1:2k]
- a[2k+1:3k] and b[2k+1:3k]
- ...
- a[N-k:N] and b[N-k:N]
for various values of k, e.g. the biggest power of 2 possible (2^j), then if no match, 2^(j-1), then 2^(j-2), etc.
But for these files a and b, it would totally fail because since b is just a shifted of one byte, there would be no similar chunks at all! Then we would expect the patch to be ... 256 MB.
But here it works in a more clever way, how did the algorithm work in this simple example b = a byte concatenated with the content of a ?

