{"id":112,"date":"2009-09-27T21:26:02","date_gmt":"2009-09-27T20:26:02","guid":{"rendered":"http:\/\/kudzia.eu\/b\/?p=112"},"modified":"2016-02-12T13:54:01","modified_gmt":"2016-02-12T12:54:01","slug":"mysql-replication-some-tips","status":"publish","type":"post","link":"https:\/\/kudzia.eu\/b\/2009\/09\/mysql-replication-some-tips\/","title":{"rendered":"mysql replication &#8211; some tips"},"content":{"rendered":"<p>so.. i have couple of master machines and few slaves. some of slaves are offsite &#8211; 60ms away from primary site.<br \/>\n<!--more--><br \/>\ni have 3 sets containing:<\/p>\n<ul>\n<li>a master<\/li>\n<li>a slave #1 connected with master over lan<\/li>\n<li>a slave #2 connected with the master via vpn over the internet<\/li>\n<\/ul>\n<p>my.cnf on master includes in [mysqld] section:<\/p>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">\r\n# to save some bandwidth\r\nslave_compressed_protocol=true\r\n# i tend to generate id based on two last bytes of ipv4 address\r\nserver-id               = 1210\r\n# since i have data partitioned into 3 sets..:\r\nauto_increment_increment=3\r\nauto_increment_offset=3 # 3 for server #0, 4 for server #1, 5 for server #2\r\n<\/pre>\n<p>my slave config:<\/p>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">\r\n# to save the bandwidth\r\nslave_compressed_protocol=true\r\n# to make sure slave tries to reconnect every 3 minutes rather then 24h... if network connection stalls eg because of vpn flap\r\nslave-net-timeout = 180\r\nmaster-connect-retry = 180\r\nserver-id               = 1220\r\n# same as on master:\r\nauto_increment_increment=3\r\nauto_increment_offset=3\r\n<\/pre>\n<p>i monitor each master-slave-slave pair in two ways:<\/p>\n<ul>\n<li>i continuously update one of the tables to monitor replication lag<\/li>\n<li>during every weekend i run consistency check using mk-table-checksum from <a href=\"http:\/\/www.maatkit.org\/\">maatkit<\/a><\/li>\n<\/ul>\n<p>at the beginning i thought that &#8216;SHOW SLAVE STATUS&#8217; on slave will give me reliable information about replication lag, but it was not the case &#8211; i saw situations when slave had 10-20 minutes to catch up, yet lag in status was 0. so.. on the master i&#8217;ve created simple cron script run every minute:<\/p>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">\r\nstart=`date +%s`\r\nd=0\r\nwhile &#x5B; $d -lt 60 ] ; do\r\n        echo &quot;update repl_test set t= NOW(); &quot; |mysql --defaults-file=\/etc\/mysql\/debian.cnf repl_test\r\n        sleep 3\r\n        d=$(( `date +%s` - $start ))\r\ndonest\r\n<\/pre>\n<p>while on slave i use nagios script to check if content of my repl_test.repl_test table is not too old:<\/p>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">\r\n#!\/bin\/bash\r\nif &#x5B; $# -ne 5 ] ; then\r\n        echo syntax mysql_replication_lag host login pass database db allowed_lag_in_sec\r\n        exit 2\r\nfi\r\nr=`echo &quot;select UNIX_TIMESTAMP(NOW())-UNIX_TIMESTAMP(t) from repl_test;&quot; | mysql  -h$1 -u$2 -p$3 -N $4`\r\nif &#x5B; $r -lt $5 ] ; then\r\n        echo OK $r sec of lag\r\n        exit 0\r\nfi\r\necho critical $r sec of lag\r\nexit 2\r\n<\/pre>\n<p>quickly i&#8217;ve run into funny scenario &#8211; local clocks of servers drifted&#8230; and i had negative replication lag, so i sync all sql servers with ntp server couple of times per day.<\/p>\n<p>for weekend consistency check i use:<\/p>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">\r\nmk-table-checksum h=$master,u=debian-sys-maint,p=$pass h=$slave,u=debian-sys-maint,p=$pass| mk-checksum-filter\r\n<\/pre>\n<p>if returned errorcode is non-zero &#8211; databases&#8217; content differ, in such a case stdout will contain list of tables\/databases which content does not match.<br \/>\nmk-table-checksum runs pretty fast and transfers only hashes of data.. so i have no problem running checksums of multi-gigabyte tables over internet in under a minute.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>so.. i have couple of master machines and few slaves. some of slaves are offsite &#8211; 60ms away from primary site.<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[17],"tags":[12],"class_list":["post-112","post","type-post","status-publish","format-standard","hentry","category-tech","tag-mysql"],"_links":{"self":[{"href":"https:\/\/kudzia.eu\/b\/wp-json\/wp\/v2\/posts\/112","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/kudzia.eu\/b\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/kudzia.eu\/b\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/kudzia.eu\/b\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/kudzia.eu\/b\/wp-json\/wp\/v2\/comments?post=112"}],"version-history":[{"count":13,"href":"https:\/\/kudzia.eu\/b\/wp-json\/wp\/v2\/posts\/112\/revisions"}],"predecessor-version":[{"id":2574,"href":"https:\/\/kudzia.eu\/b\/wp-json\/wp\/v2\/posts\/112\/revisions\/2574"}],"wp:attachment":[{"href":"https:\/\/kudzia.eu\/b\/wp-json\/wp\/v2\/media?parent=112"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/kudzia.eu\/b\/wp-json\/wp\/v2\/categories?post=112"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/kudzia.eu\/b\/wp-json\/wp\/v2\/tags?post=112"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}