i’m backing up in total ~90GB of mysqldumps each night. the more data, the bigger pain it is.
mysqldump
my original setup used unpacked output of:
mysqldump --defaults-file=/etc/mysql/debian.cnf --quick --skip-lock-tables --single-transaction --flush-logs --hex-blob --master-data=2 -A --skip-extended-insert
then archived with rdiff-backup. some more details here. i do know that backup produced in this way – with skip-extended-insert – is larger and takes more time to restore, more about it later.
recently i’ve found out that gzip / pigz has option –rsyncable which produces slightly larger archive that is more ‘friendly’ for the rsync algorithm [also used by rdiff-backup]. so i decided to compare size of rdiff diffs for uncompressed mysql dump and one that is compressed with –rsyncable option to see if i can gain anything from the change.
in my case using the best compression method [ -9 ] is not worth it. 25GB file compressed to 10474588077B with -9 compression took in avg 5m22s to finish. the same file compressed with the default options was 20MB larger – 10494193501B, compression took 5m7s. i took average time from 3 pigz runs for both of methods; for most of the time process was cpu bound.
25GB file compressed to 10474588077B with -9 compression option in, with default compression method.
information about dumps from particular dates, their sizes and sizes of diffs generated by rdiff:
input size | rdiff increment | |||||||
date | raw sql | bz2 | gz | rsyncable.gz | raw sql | bz2 | gz | rsyncable.gz |
12 | 26 804 890 664 | 8 811 112 277 | 10 329 919 622 | 10 484 961 505 | ||||
13 | 26 808 113 131 | 8 811 291 622 | 10 331 311 920 | 10 486 382 591 | 3 716 517 795 | 8 812 764 367 | 10 331 856 487 | 3 824 695 504 |
14 | 26 818 029 468 | 8 814 098 013 | 10 334 963 258 | 10 490 095 367 | 3 224 401 389 | 8 812 943 745 | 10 333 249 046 | 3 403 160 952 |
15 | 26 818 132 819 | 8 814 100 947 | 10 334 998 479 | 10 490 120 905 | 51 622 629 | 8 814 248 439 | 10 334 867 739 | 56 785 799 |
16 | 26 823 520 086 | 8 816 488 488 | 10 337 628 727 | 10 492 799 195 | 2 194 848 425 | 8 815 753 598 | 10 336 834 625 | 2 355 854 983 |
17 | 26 831 125 141 | 8 819 823 312 | 10 341 419 832 | 10 496 643 886 | 6 618 091 371 | 8 818 141 586 | 10 339 567 038 | 4 293 817 628 |
18 | 26 795 688 890 | 8 806 741 963 | 10 325 784 290 | 10 480 737 096 | 3 690 809 910 | 8 821 477 034 | 10 343 358 854 | 3 800 270 362 |
19 | 26 797 195 049 | 8 806 555 214 | 10 326 417 028 | 10 481 487 569 | 3 822 050 888 | 8 808 393 234 | 10 327 720 381 | 3 928 010 024 |
20 | 26 803 479 158 | 8 810 942 424 | 10 329 400 507 | 10 484 489 179 | 3 812 420 943 | 8 808 206 449 | 10 328 353 239 | 3 931 321 186 |
21 | 26 834 035 279 | 8 818 676 083 | 10 338 926 030 | 10 494 198 724 | 3 660 097 166 | 8 812 594 481 | 10 331 337 276 | 3 938 756 318 |
22 | 26 834 013 616 | 8 818 614 552 | 10 338 914 292 | 10 494 193 501 | 51 043 642 | 7 389 340 006 | 8 821 785 673 | 61 001 180 |
total size of the rdiff archive [latest version+increments]:
raw sql | bz2 | gz | rsyncable.gz |
57 675 940 544 | 95 532 500 216 | 112 167 867 459 | 40 087 890 366 |
so in case of my data gziping the mysqldump first with pigz –rsyncable makes most sense from the final backup size point of view. using regular gzip or bz2 leads to output files having too much differences between each dump, leading to very large diffs produced by the rsync algorithm of rdiff.
as mentioned earlier we use the skip-extended-insert option for mysqldump and i’m somewhat torn weather to use it or not. Pros of using skip-extended-insert:
- backups done with skip-extended-insert produce smaller rdiff diffs [ for uncompressed dumps it’s ~3.7GB instead of ~5.1GB, similarly for gzip’ed backups with the rsyncable switch ].
- backups taken in this way are easier to grep; in our case restoring single sql row is much more common than recovering the whole database
cons:
- recovery time is the biggest downside. in my tests i can recover 26GB backup taken with skip-extended-insert option in ~90 minutes; backup without skip-extended-insert takes only 22min to restore.
vmware backups taken with ghettovcb
we use ghettoVCB to take weekly snapshots of vms running under vmware esxi. backups can be large. so far i’ve been pbzip2’ing them and using rdiff to keep current and single previous version. i’ve done some test on a randomly selected snapshot of windows 2012 server vm and compared the sizes:
input size | diff size | |||||
raw | bz2 | rsyncable.gz | raw | bz2 | rsyncable.gz | |
17 | 35 170 054 000 | 14 820 111 000 | 15 427 020 000 | |||
24 | 39 063 046 000 | 15 484 366 000 | 16 193 240 000 | 8 049 706 000 | 14 810 731 000 | 11 461 074 000 |
and the total size of rdiff archive:
raw | bz2 | rsyncable.gz |
47 112 752 000 | 30 295 097 000 | 27 654 314 000 |
so also here pigz with –rsyncable option seems to be the winner. in a while i should take a look at xz.