innodb_stats_sample_pages to save a day

i’ve run into nasty instability in mysql’s query execution plan joining few tables with 10s of millions of rows. mysql 5.1’s innodb_stats_sample_pages seem to solve the problem.

long time ago i’ve set query execution time on the application server level to 20s [and with assumption that everything should finish much earlier]. data set was growing, queries executed in much shorter time… except few. those few did not bother me too much until their execution time exceeded 20sec and started causing sql exceptions. not cool.

i’ve started frantic investigation and noticed that query execution plan differ between my shards that should [statistically speaking] contain very similar data. first thing i’ve tried was running analyze table command on all involved tables. i’ve repeated few times analyze table and explain select.. and got really worried – query plans were flipping forth and back from ok-ish to very bad:

Execution plan with marked problem [sum of values in column rows is proportional to the total query execution time]:
mysql> explain SELECT sql_no_cache …. ;
+----+-------------+----------------+--------+-----------------------------+-----------+---------+-------------------------------+--------+-------------------------------------------+
| id | select_type | table          | type   | possible_keys               | key       | key_len | ref                           | rows   | Extra                                     |
+----+-------------+----------------+--------+-----------------------------+-----------+---------+-------------------------------+--------+-------------------------------------------+
|  1 | SIMPLE      | b0             | range  | PRIMARY,series_id           | PRIMARY   | 4       | NULL                          | 221867 | Using where; Using index; Using temporary |
|  1 | SIMPLE      | nc             | eq_ref | PRIMARY,series_id           | PRIMARY   | 4       | data.b0.entity_id             |      1 | Using where                               |
|  1 | SIMPLE      | _entity_schema | ref    | entity_id                   | entity_id | 4       | data.nc.entity_id             |      1 | Using where; Using index                  |
|  1 | SIMPLE      | a0             | ref    | series_id                   | series_id | 4       | data.nc.entity_id             |      5 | Using where; Using index                  |
|  1 | SIMPLE      | c0             | ref    | PRIMARY,series_id           | series_id | 4       | data._entity_schema.entity_id |      5 | Using where; Using index; Distinct        |
|  1 | SIMPLE      | av0            | eq_ref | PRIMARY,attribute_name_id_2 | PRIMARY   | 4       | data.c0.attribute_value_id    |      1 | Using where; Distinct                     |
+----+-------------+----------------+--------+-----------------------------+-----------+---------+-------------------------------+--------+-------------------------------------------+
6 rows in set (0.01 sec)

mysql> analyze table ...

mysql> explain SELECT …;
+----+-------------+----------------+--------+-----------------------------+---------------------+---------+-------------------+------+-------------------------------------------+
| id | select_type | table          | type   | possible_keys               | key                 | key_len | ref               | rows | Extra                                     |
+----+-------------+----------------+--------+-----------------------------+---------------------+---------+-------------------+------+-------------------------------------------+
|  1 | SIMPLE      | av0            | ref    | PRIMARY,attribute_name_id_2 | attribute_name_id_2 | 4       | const             |  135 | Using where; Using index; Using temporary |
|  1 | SIMPLE      | c0             | ref    | PRIMARY,series_id           | PRIMARY             | 4       | data.av0.id       | 1146 | Using where; Using index                  |
|  1 | SIMPLE      | nc             | eq_ref | PRIMARY,series_id           | PRIMARY             | 4       | data.c0.entity_id |    1 | Using where                               |
|  1 | SIMPLE      | _entity_schema | ref    | entity_id                   | entity_id           | 4       | data.nc.entity_id |    1 | Using where; Using index                  |
|  1 | SIMPLE      | a0             | ref    | series_id                   | series_id           | 4       | data.c0.entity_id |    7 | Using index                               |
|  1 | SIMPLE      | b0             | ref    | PRIMARY,series_id           | series_id           | 4       | data.a0.entity_id |    7 | Using where; Using index; Distinct        |
+----+-------------+----------------+--------+-----------------------------+---------------------+---------+-------------------+------+-------------------------------------------+
6 rows in set (0.00 sec)

mysql>

in the first case sql query would execute in 0.17 sec, in second – 23-25 sec. that’s a lot of variation.

so i’m all in panic. seeing how quickly analyze table is executed over few GB of data reminded me something i’ve read before about innodb and statistical approach to determine how many distinct values given index has. ‘random dives’ term popped up in my head .. probably read somewhere on mysql performance blog. i’ve started googling. mysql 5.0 with built-in innodb does not allow you to change how well index cardinality is estimated… it’s based on 8 and only 8 checks.

fortunately new debian is out and it comes with mysql 5.1 having innodb plugin. fortunately new innodb plugin has the innodb_stats_sample_pages parameter which
lets me tune how much work will the storage engine spend calculating index stats [that are later used to create query plans].

i’ve run simple test – executed 200 times commands:

analyze table …
select sql_no_cache ..

i’ve timed duration of the select and on my production environment with mysql 5.0 i got 175 results < 1sec, 25 results >20 sec.

i’ve reconfigured my test environment [that was already upgraded to squeeze and mysql 5.1] and added sysvar_innodb_stats_sample_pages=1000 [dont ask me why 1000 – it seemed big enough] to my my.cnf. same test was repeated there and all queries executed in less than 1sec. time of executing analyze table rose significantly, but that’s not a problem.

following pages were very helpful:

Leave a Reply Cancel reply