{"id":629,"date":"2011-05-01T21:02:30","date_gmt":"2011-05-01T20:02:30","guid":{"rendered":"http:\/\/kudzia.eu\/b\/?p=629"},"modified":"2013-05-30T12:03:56","modified_gmt":"2013-05-30T11:03:56","slug":"innodb_stats_sample_pages-to-save-a-day","status":"publish","type":"post","link":"https:\/\/kudzia.eu\/b\/2011\/05\/innodb_stats_sample_pages-to-save-a-day\/","title":{"rendered":"innodb_stats_sample_pages to save a day"},"content":{"rendered":"<p>i&#8217;ve run into nasty instability in mysql&#8217;s query execution plan joining few tables with 10s of millions of rows. mysql 5.1&#8217;s innodb_stats_sample_pages seem to solve the problem.<br \/>\n<!--more--><br \/>\nlong time ago i&#8217;ve set query execution time on the application server level to 20s [and with assumption that everything should finish much earlier]. data set was growing, queries executed in much shorter time&#8230; except few. those few did not bother me too much until their execution time exceeded 20sec and started causing sql exceptions. not cool.<\/p>\n<p>i&#8217;ve started frantic investigation and noticed that query execution plan differ between my shards that should [statistically speaking] contain very similar data. first thing i&#8217;ve tried was running <i>analyze table<\/i> command on all involved tables. i&#8217;ve repeated few times <i>analyze table<\/i> and <i>explain select.. <\/i> and got really worried &#8211; query plans were flipping forth and back from ok-ish to very bad:<\/p>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">\r\nExecution plan with marked problem &#x5B;sum of values in column rows is proportional to the total query execution time]:\r\nmysql&gt; explain SELECT sql_no_cache \u2026. ;\r\n+----+-------------+----------------+--------+-----------------------------+-----------+---------+-------------------------------+--------+-------------------------------------------+\r\n| id | select_type | table          | type   | possible_keys               | key       | key_len | ref                           | rows   | Extra                                     |\r\n+----+-------------+----------------+--------+-----------------------------+-----------+---------+-------------------------------+--------+-------------------------------------------+\r\n|  1 | SIMPLE      | b0             | range  | PRIMARY,series_id           | PRIMARY   | 4       | NULL                          | 221867 | Using where; Using index; Using temporary |\r\n|  1 | SIMPLE      | nc             | eq_ref | PRIMARY,series_id           | PRIMARY   | 4       | data.b0.entity_id             |      1 | Using where                               |\r\n|  1 | SIMPLE      | _entity_schema | ref    | entity_id                   | entity_id | 4       | data.nc.entity_id             |      1 | Using where; Using index                  |\r\n|  1 | SIMPLE      | a0             | ref    | series_id                   | series_id | 4       | data.nc.entity_id             |      5 | Using where; Using index                  |\r\n|  1 | SIMPLE      | c0             | ref    | PRIMARY,series_id           | series_id | 4       | data._entity_schema.entity_id |      5 | Using where; Using index; Distinct        |\r\n|  1 | SIMPLE      | av0            | eq_ref | PRIMARY,attribute_name_id_2 | PRIMARY   | 4       | data.c0.attribute_value_id    |      1 | Using where; Distinct                     |\r\n+----+-------------+----------------+--------+-----------------------------+-----------+---------+-------------------------------+--------+-------------------------------------------+\r\n6 rows in set (0.01 sec)\r\n\r\nmysql&gt; analyze table ...\r\n\r\nmysql&gt; explain SELECT \u2026;\r\n+----+-------------+----------------+--------+-----------------------------+---------------------+---------+-------------------+------+-------------------------------------------+\r\n| id | select_type | table          | type   | possible_keys               | key                 | key_len | ref               | rows | Extra                                     |\r\n+----+-------------+----------------+--------+-----------------------------+---------------------+---------+-------------------+------+-------------------------------------------+\r\n|  1 | SIMPLE      | av0            | ref    | PRIMARY,attribute_name_id_2 | attribute_name_id_2 | 4       | const             |  135 | Using where; Using index; Using temporary |\r\n|  1 | SIMPLE      | c0             | ref    | PRIMARY,series_id           | PRIMARY             | 4       | data.av0.id       | 1146 | Using where; Using index                  |\r\n|  1 | SIMPLE      | nc             | eq_ref | PRIMARY,series_id           | PRIMARY             | 4       | data.c0.entity_id |    1 | Using where                               |\r\n|  1 | SIMPLE      | _entity_schema | ref    | entity_id                   | entity_id           | 4       | data.nc.entity_id |    1 | Using where; Using index                  |\r\n|  1 | SIMPLE      | a0             | ref    | series_id                   | series_id           | 4       | data.c0.entity_id |    7 | Using index                               |\r\n|  1 | SIMPLE      | b0             | ref    | PRIMARY,series_id           | series_id           | 4       | data.a0.entity_id |    7 | Using where; Using index; Distinct        |\r\n+----+-------------+----------------+--------+-----------------------------+---------------------+---------+-------------------+------+-------------------------------------------+\r\n6 rows in set (0.00 sec)\r\n\r\nmysql&gt;\r\n<\/pre>\n<p>in the first case sql query would execute in 0.17 sec, in second &#8211; 23-25 sec. that&#8217;s a lot of variation. <\/p>\n<p>so i&#8217;m all in panic. seeing how quickly <i>analyze table<\/i> is executed over few GB of data reminded me something i&#8217;ve read before about innodb and statistical approach to determine how many distinct values given index has. &#8216;random dives&#8217; term popped up in my head .. probably read somewhere on <a href=\"http:\/\/mysqlperformanceblog.com\">mysql performance blog<\/a>. i&#8217;ve started googling. mysql 5.0 with built-in innodb does not allow you to change how well index cardinality is estimated&#8230; it&#8217;s based on<a href=\"http:\/\/dev.mysql.com\/doc\/refman\/5.5\/en\/innodb-other-changes-statistics-estimation.html\"> 8 and only 8 checks<\/a>. <\/p>\n<p>fortunately <a href=\"http:\/\/wiki.debian.org\/DebianSqueeze\">new debian<\/a> is out and it comes with mysql 5.1 having innodb plugin. fortunately new innodb plugin has the <a href=\"http:\/\/dev.mysql.com\/doc\/refman\/5.5\/en\/innodb-parameters.html#sysvar_innodb_stats_sample_pages\">innodb_stats_sample_pages<\/a> parameter which<br \/>\nlets me tune how much work will the storage engine spend calculating index stats [that are later used to create query plans].<\/p>\n<p>i&#8217;ve run simple test &#8211; executed 200 times commands:<\/p>\n<ol>\n<li>analyze table &#8230;<\/li>\n<li>select sql_no_cache .. <\/li>\n<\/ol>\n<p>i&#8217;ve timed duration of the select and on my production environment with mysql 5.0 i got 175 results < 1sec, 25 results >20 sec.<\/p>\n<p>i&#8217;ve reconfigured my test environment [that was already upgraded to squeeze and mysql 5.1] and added sysvar_innodb_stats_sample_pages=1000 [dont ask me why 1000 &#8211; it seemed big enough] to my my.cnf. same test was repeated there and all queries executed in less than 1sec. time of executing analyze table rose significantly, but that&#8217;s not a problem.<\/p>\n<p>following pages were very helpful:<\/p>\n<ul>\n<li><a href=\"http:\/\/dev.mysql.com\/doc\/innodb-plugin\/1.0\/en\/innodb-other-changes-statistics-estimation.html\">innodb plugin 1.0 &#8211; Controlling Optimizer Statistics Estimation<\/a><\/li>\n<li><a href=\"http:\/\/dev.mysql.com\/doc\/refman\/5.5\/en\/innodb-other-changes-statistics-estimation.html\">mysql 5.5 &#8211; Controlling Optimizer Statistics Estimation<\/a><\/li>\n<li><a href=\"http:\/\/oysteing.blogspot.com\/2011\/04\/more-stable-query-execution-time-by.html\">More Stable Query Execution Times by Improving InnoDB Statistics<\/a><\/li>\n<li><a href=\"http:\/\/www.mysqlperformanceblog.com\/2008\/09\/02\/beware-of-running-analyze-in-production\/\">Beware of running ANALYZE in Production<\/a><\/li>\n<li><a href=\"http:\/\/bugs.mysql.com\/bug.php?id=12113\">http:\/\/bugs.mysql.com\/bug.php?id=12113<\/a><\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>i&#8217;ve run into nasty instability in mysql&#8217;s query execution plan joining few tables with 10s of millions of rows. mysql 5.1&#8217;s innodb_stats_sample_pages seem to solve the problem.<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[17],"tags":[25,12],"class_list":["post-629","post","type-post","status-publish","format-standard","hentry","category-tech","tag-innodb","tag-mysql"],"_links":{"self":[{"href":"https:\/\/kudzia.eu\/b\/wp-json\/wp\/v2\/posts\/629","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/kudzia.eu\/b\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/kudzia.eu\/b\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/kudzia.eu\/b\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/kudzia.eu\/b\/wp-json\/wp\/v2\/comments?post=629"}],"version-history":[{"count":4,"href":"https:\/\/kudzia.eu\/b\/wp-json\/wp\/v2\/posts\/629\/revisions"}],"predecessor-version":[{"id":1821,"href":"https:\/\/kudzia.eu\/b\/wp-json\/wp\/v2\/posts\/629\/revisions\/1821"}],"wp:attachment":[{"href":"https:\/\/kudzia.eu\/b\/wp-json\/wp\/v2\/media?parent=629"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/kudzia.eu\/b\/wp-json\/wp\/v2\/categories?post=629"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/kudzia.eu\/b\/wp-json\/wp\/v2\/tags?post=629"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}