parallel rsync

rsync remains my main tool for transferring backups or just moving data between servers. but it has some pain points – e.g. rsync’s checksum calculation or ssh over which data is piped can easily saturate single CPU core before i run out of storage I/O or network bandwidth.

how to parallelize it – based on this blog post:

#!/bin/bash -e

if [ $# -ne 2 ] ; then
        echo "syntax: parallel-rsync.sh src dst"
        exit 2
fi
tmpdir=$(mktemp -d /tmp/parallel-rsync.XXXXXXXXXXX)
echo "using $tmpdir"
rsync --archive --verbose --partial --progress --dry-run --itemize-changes "$1" "$2"|grep -E '^<' |cut -d" " -f2 | split - -n r/8 "$tmpdir/list"
ls "$tmpdir/list"* | parallel --lb -t -j 8 rsync --archive --verbose --partial --progress --files-from {}  "$1" "$2"
rsync --archive --verbose --progress "$1" "$2"
rm -rf "$tmpdir"

Leave a Reply

Your email address will not be published. Required fields are marked *

(Spamcheck Enabled)