i’ve been running a set of production MySQL databases on BTRFS since April 2016. BTRFS is not exactly known for its stellar performance when hosting databases or images of virtual machines due to its COW nature. why would i do it then? to have data snapshots and be able to ‘go back in time’ quickly and see how the databases looked like few minutes or hours ago. on BTRFS i host few replication slaves that apply all of the transactions from the master nodes, each night pt-table-checksum controls the data consistency. those slave servers are used only for snapshotting – there’s no client / analytics traffic hitting them; this also means that i can temporarily roll back to the earlier snapshot at will, without anyone noticing.
at first i’ve used RAID1 on 2x 1.6TB Intel s3500 drives, those did not provide enough IO. later i’ve moved to RAID10 on 4x 800GB Intel s3500 which were still not enough; few times a week we would get a replication lag of 30-60 minutes. to make things work i had to anti-tune both MySQL and BTRFS and get rid of durability:
- my.cnf: innodb_doublewrite=OFF
- my.cnf: innodb_support_xa=0
- /etc/fstab – added nobarrier
eventually i got what i asked for – a server crash [ likely due to out of disk space scenario… i did not properly monitor the free disk space on btrfs ]; after that crash MySQL data was corrupted. fortunately it’s just a slave.
despite the crash and need to restore from aster, i still think that the setup is very handy: it saved me few backup restores just to fetch few rows fat-fingered from the primary database. so.. i’ve thrown money on the problem – now the BTRFS holding slaves and their snapshots sit on NVMe card intel p4500, all durability settings are back to sane defaults. slaves keep up with their masters.
i’m using btrfs-snap, have 8 snapshots taken every 15 minutes, 24 hourly ones and 16 taken every 12h. according to btrfs fi usage this takes:
Data,single: Size:1.52TiB, Used:771.73GiB /dev/nvme0n1p1 1.52TiB
while the snapshotted has 268GB
you’ll notice that i went from redundant RAID1/RAID10 to a non-redundant NVMe. i copy the latest snapshot to another machine every 12h, in the worst case i’ll lose bunch of snapshots but even then i should not need to restore the slaves from master data, i’ll just let the replication to catch up.
One thought on “MySQL on BTRFS?”