i’m using rsync a lot; both at work [ backups, replication of content to various servers, ad-hoc copying ] and privately. it’s smart enough to avoid re-sending the whole file if it has grown a bit [ like logs like to do ] or only few bytes changed in source or destination.
out-of-the-box rsync is snot smart enough to detect that given file was moved around in a directory structure. that happens a lot when i’m re-organizing my collection of photos:
root@srcserver:/tmp/test# ls -la
total 5420
drwxr-xr-x 2 root root 4096 Mar 2 09:30 .
drwxrwxrwt 16 root root 4096 Mar 2 09:30 ..
-rw-r--r-- 1 root root 966246 Sep 9 2010 IMG_3185.jpg
-rw-r--r-- 1 root root 1191165 Sep 9 2010 IMG_3318.jpg
-rw-r--r-- 1 root root 2090196 Sep 9 2010 IMG_3343.jpg
-rw-r--r-- 1 root root 1287526 Sep 9 2010 IMG_3369.jpg
root@dstserver:/tmp# rsync -av --progress root@srcserver:/tmp/test ./
receiving incremental file list
test/
test/IMG_3185.jpg
966,246 100% 4.54MB/s 0:00:00 (xfr#1, to-chk=3/5)
test/IMG_3318.jpg
1,191,165 100% 4.44MB/s 0:00:00 (xfr#2, to-chk=2/5)
test/IMG_3343.jpg
2,090,196 100% 6.11MB/s 0:00:00 (xfr#3, to-chk=1/5)
test/IMG_3369.jpg
1,287,526 100% 3.30MB/s 0:00:00 (xfr#4, to-chk=0/5)
sent 104 bytes received 5,536,784 bytes 3,691,258.67 bytes/sec
total size is 5,535,133 speedup is 1.00
root@srcserver:/tmp/test# mkdir somedir
root@srcserver:/tmp/test# mv IMG_3369.jpg somedir/
root@dstserver:/tmp# rsync --delete-after -av --progress root@srcserver:/tmp/test ./
receiving file list ... 6 files to consider
test/
test/somedir/
test/somedir/IMG_3369.jpg
1,287,526 100% 5.39MB/s 0:00:00 (xfr#1, to-chk=0/6)
0 files...
deleting test/IMG_3369.jpg
sent 48 bytes received 1,288,047 bytes 515,238.00 bytes/sec
total size is 5,535,133 speedup is 4.30
that’s not good – we had to re-transmit 1.2MB of content that was already present at the destination.
rsync has –fuzzy parameter, but it does not solve this issue: This option tells rsync that it should look for a basis file for any destination file that is missing. The current algorithm looks in the same directory as the destination file for either a file that has an identical size and modified-time, or a similarly-named file. If found, rsync uses the fuzzy basis file to try to speed up the transfer. [ Debian’s man for rsync 3.2.7 ].
this post explains in more details how it’s implemented.
but.. don’t give up – there are few options:
- https://github.com/m-manu/rsync-sidekick
- https://github.com/dparoli/hrsync based on https://lincolnloop.com/insights/detecting-file-moves-renames-rsync/
- related discussions: https://serverfault.com/questions/489289/handling-renamed-files-or-directories-in-rsync
- detect-renamed + detect-renamed-lax patches
i could not wrap my head around the first two, but the last did the trick and worked with the latest rsync.
wget https://www.samba.org/ftp/rsync/rsync-3.4.1.tar.gz https://www.samba.org/ftp/rsync/rsync-patches-3.4.1.tar.gz
tar -xvf rsync-3.4.1.tar.gz
tar -xvf rsync-patches-3.4.1.tar.gz
cd rsync-3.4.1
patch -p1 -N < patches/detect-renamed.diff
patch -p1 -N < patches/detect-renamed-lax.diff
./configure ; make
in turn we get rsync with an extra flag – –detect-moved which will detect moved file at the destination and not retransmit it from the source if size and file name [ but not file location ] match.
let’s try earlier mentioned scenario with patched rsync:
root@srcserver:/tmp/test# mv IMG_3318.jpg somedir/
root@dstserver:/tmp# /tmp/src/rsync-3.4.1/rsync --delete-after --detect-moved -av --progress root@srcserver:/tmp/test ./
receiving file list ... 6 files to consider
test/
test/somedir/
test/somedir/IMG_3318.jpg
0 files...
deleting test/IMG_3318.jpg
sent 46 bytes received 172 bytes 145.33 bytes/sec
total size is 5,535,133 speedup is 25,390.52
this time rsync has sent much less data! also – it was enough to have patched rsync at the destination server – machine where i was pulling the data; srcserver had Debian’s standard rsync.