mysqldump headaches

over the years i’ve set up multiple backup workflows. one of them is described here. part of the cycles involves taking database backups. for MySQL i’m mostly using mysqldump. it comes with some headaches:

  • it’s single-threaded by nature – both for backup, and restore [ unless you chop the dump file into pieces and try to feed it into mysql in parallel ]. i’m getting backup speed at ~ 50-100MB/s, restore speed – at 5-20MB/s. this becomes a problem if you need to restore database server with more than few GBs of data,
  • it leads to random lock-up of update statements, despite of using –skip-lock-tables –single-transaction switches. i did not find good documentation of that behavior but I’ve read that others experienced similar problem.

how about physical backup?

moving from logical backups to physical backups is one option. i could use:

  • xtrabackup from Percona – for both Oracle’s MySQLs and MariaDBs,
  • mariabackup – for MariaDB,
  • file-system-level-snapshots taken from replication slaves; it’s something i’m doing already for the most important databases, using BTRFS and btrfs-snap.

but there’s a value in logical backups – partial recovery of a specific table or row is usually quite easy, can be done with cat + grep, maybe awk. such backup is usually not too fuzzy about restoring to slightly different version of the server; if traditional data types are in use and there are no fancy indices – it’s possible to recovery from MariaDB to Oracle’s MySQL and vice-versa.

mydumper

mydumper sounds interesting; while it’s an independent open source project – Percona’s long-term employees are behind it – it’s a good sign.

i gave it a try – using version 0.12.3-2 and got surprised that streaming backup – officially released 4 months earlier – was throwing a lot of errors:

root@szctsql0.i:/tmp/b# mydumper --defaults-file=/etc/mysql/debian.cnf --database mysql --tables-list mysql.user --rows 1000000  --trx-consistency-only  --threads 12 --compress --stream > /dev/null

** (mydumper:1349422): WARNING **: 20:01:47.789: loading mydumper: Key file does not have group “mydumper”

** (mydumper:1349422): WARNING **: 20:01:47.789: Using trx_consistency_only, binlog coordinates will not be accurate if you are writing to non transactional tables.

** (mydumper:1349422): CRITICAL **: 20:01:47.802: Data transmited for export-20220518-200147/mysql-schema-create.sql.zst doesn't match. File size: 0 Transmited: 101

** (mydumper:1349422): CRITICAL **: 20:01:47.804: Data transmited for export-20220518-200147/mysql.user-schema.sql.zst doesn't match. File size: 0 Transmited: 745

** (mydumper:1349422): CRITICAL **: 20:01:47.804: Data transmited for export-20220518-200147/mysql.user.00000.sql.zst doesn't match. File size: 0 Transmited: 577

** (mydumper:1349422): CRITICAL **: 20:01:47.806: Data transmited for export-20220518-200147/metadata doesn't match. File size: 0 Transmited: 75

“critical” errors are not something you want to see in your backup routine. looks like this issue was fixed in v0.12.3-3 but i’m still pondering how to migrate from mysqldump to mydumper without losing sleep and asking myself if my backups are complete and not-corrupted.

i had a chance to chat with Percona’s consultant and asked him about generic backup advice; he was, understandably, very much on the safe side. his recommendations:

  • do both logical and physical backups – first to allow selective recoveries, 2nd – to be able to restore quickly,
  • have automated backup verification pipeline [ for now – i cannot justify putting so much resources at it, so this remains a pipe-dream ]:
    1. set up binlogs on the database server you want to back up – so it can be a primary server for replication; even if you don’t need any replication,
    2. take a consistent backup using tool of your choice,
    3. restore backup to another server,
    4. point this server to the server from 1# and start replication,
    5. let it catch-up to the 1# server by replying binary logs,
    6. use pt-table-checksum – tool to check replication consistency – to confirm that replica and primary servers have identical data.

Leave a Reply

Your email address will not be published.

 

(Spamcheck Enabled)