i’m backing up in total ~90GB of mysqldumps each night. the more data, the bigger pain it is.

mysqldump

my original setup used unpacked output of:

mysqldump --defaults-file=/etc/mysql/debian.cnf --quick --skip-lock-tables --single-transaction  --flush-logs --hex-blob  --master-data=2  -A --skip-extended-insert

then archived with rdiff-backup. some more details here. i do know that backup produced in this way – with skip-extended-insert – is larger and takes more time to restore, more about it later.

recently i’ve found out that gzip / pigz has option –rsyncable which produces slightly larger archive that is more ‘friendly’ for the rsync algorithm [also used by rdiff-backup]. so i decided to compare size of rdiff diffs for uncompressed mysql dump and one that is compressed with –rsyncable option to see if i can gain anything from the change.

in my case using the best compression method [ -9 ] is not worth it. 25GB file compressed to 10474588077B with -9 compression took in avg 5m22s to finish. the same file compressed with the default options was 20MB larger – 10494193501B, compression took 5m7s. i took average time from 3 pigz runs for both of methods; for most of the time process was cpu bound.

25GB file compressed to 10474588077B with -9 compression option in, with default compression method.

information about dumps from particular dates, their sizes and sizes of diffs generated by rdiff:

	input size				rdiff increment
date	raw sql	bz2	gz	rsyncable.gz	raw sql	bz2	gz	rsyncable.gz
12	26 804 890 664	8 811 112 277	10 329 919 622	10 484 961 505
13	26 808 113 131	8 811 291 622	10 331 311 920	10 486 382 591	3 716 517 795	8 812 764 367	10 331 856 487	3 824 695 504
14	26 818 029 468	8 814 098 013	10 334 963 258	10 490 095 367	3 224 401 389	8 812 943 745	10 333 249 046	3 403 160 952
15	26 818 132 819	8 814 100 947	10 334 998 479	10 490 120 905	51 622 629	8 814 248 439	10 334 867 739	56 785 799
16	26 823 520 086	8 816 488 488	10 337 628 727	10 492 799 195	2 194 848 425	8 815 753 598	10 336 834 625	2 355 854 983
17	26 831 125 141	8 819 823 312	10 341 419 832	10 496 643 886	6 618 091 371	8 818 141 586	10 339 567 038	4 293 817 628
18	26 795 688 890	8 806 741 963	10 325 784 290	10 480 737 096	3 690 809 910	8 821 477 034	10 343 358 854	3 800 270 362
19	26 797 195 049	8 806 555 214	10 326 417 028	10 481 487 569	3 822 050 888	8 808 393 234	10 327 720 381	3 928 010 024
20	26 803 479 158	8 810 942 424	10 329 400 507	10 484 489 179	3 812 420 943	8 808 206 449	10 328 353 239	3 931 321 186
21	26 834 035 279	8 818 676 083	10 338 926 030	10 494 198 724	3 660 097 166	8 812 594 481	10 331 337 276	3 938 756 318
22	26 834 013 616	8 818 614 552	10 338 914 292	10 494 193 501	51 043 642	7 389 340 006	8 821 785 673	61 001 180

total size of the rdiff archive [latest version+increments]:

raw sql	bz2	gz	rsyncable.gz
57 675 940 544	95 532 500 216	112 167 867 459	40 087 890 366

so in case of my data gziping the mysqldump first with pigz –rsyncable makes most sense from the final backup size point of view. using regular gzip or bz2 leads to output files having too much differences between each dump, leading to very large diffs produced by the rsync algorithm of rdiff.

as mentioned earlier we use the skip-extended-insert option for mysqldump and i’m somewhat torn weather to use it or not. Pros of using skip-extended-insert:

backups done with skip-extended-insert produce smaller rdiff diffs [ for uncompressed dumps it’s ~3.7GB instead of ~5.1GB, similarly for gzip’ed backups with the rsyncable switch ].
backups taken in this way are easier to grep; in our case restoring single sql row is much more common than recovering the whole database

cons:

recovery time is the biggest downside. in my tests i can recover 26GB backup taken with skip-extended-insert option in ~90 minutes; backup without skip-extended-insert takes only 22min to restore.

vmware backups taken with ghettovcb

we use ghettoVCB to take weekly snapshots of vms running under vmware esxi. backups can be large. so far i’ve been pbzip2’ing them and using rdiff to keep current and single previous version. i’ve done some test on a randomly selected snapshot of windows 2012 server vm and compared the sizes:

	input size			diff size
	raw	bz2	rsyncable.gz	raw	bz2	rsyncable.gz
17	35 170 054 000	14 820 111 000	15 427 020 000
24	39 063 046 000	15 484 366 000	16 193 240 000	8 049 706 000	14 810 731 000	11 461 074 000

and the total size of rdiff archive:

raw	bz2	rsyncable.gz
47 112 752 000	30 295 097 000	27 654 314 000

so also here pigz with –rsyncable option seems to be the winner. in a while i should take a look at xz.

pigz –rsyncable, rdiff

mysqldump

vmware backups taken with ghettovcb

Leave a Reply Cancel reply