当前位置:Gxlcms > 数据库问题 > MySQL SQL优化之in与range查询

MySQL SQL优化之in与range查询

时间:2021-07-01 10:21:17 帮助过:5人阅读

index dive与index statistics是MySQL优化器对开销代价的估算方法,前者统计速度慢但是能得到精准的值,后者统计速度快但是数据未必精准。

the optimizer can estimate the row count for each range using dives into the index or index statistics.

在MySQL5.7版本中将默认值从10修改成200目的是为了尽可能的保证范围等值运算(IN())执行计划尽量精准,因为IN()list的数量很多时候都是超过10的。

说在前面

今天文章的主题有两个:

  1. range查询与索引使用
  2. eq_range_index_dive_limit的说明

range查询与索引使用

SQL如下:

1 2 SELECT * FROM pre_forum_post WHERE tid=7932552 AND `invisible` IN(‘0‘,‘-2‘) ORDER BY dateline DESC LIMIT 10;

索引如下:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 +----------------+------------+--------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | +----------------+------------+--------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ | pre_forum_post | 0 | PRIMARY | 1 | tid | A | NULL | NULL | NULL | | BTREE | | | | pre_forum_post | 0 | PRIMARY | 2 | position | A | 25521392 | NULL | NULL | | BTREE | | | | pre_forum_post | 0 | pid | 1 | pid | A | 25521392 | NULL | NULL | | BTREE | | | | pre_forum_post | 1 | fid | 1 | fid | A | 1490 | NULL | NULL | | BTREE | | | | pre_forum_post | 1 | displayorder | 1 | tid | A | 880048 | NULL | NULL | | BTREE | | | | pre_forum_post | 1 | displayorder | 2 | invisible | A | 945236 | NULL | NULL | | BTREE | | | | pre_forum_post | 1 | displayorder | 3 | dateline | A | 25521392 | NULL | NULL | | BTREE | | | | pre_forum_post | 1 | first | 1 | tid | A | 880048 | NULL | NULL | | BTREE | | | | pre_forum_post | 1 | first | 2 | first | A | 1215304 | NULL | NULL | | BTREE | | | | pre_forum_post | 1 | new_auth | 1 | authorid | A | 1963184 | NULL | NULL | | BTREE | | | | pre_forum_post | 1 | new_auth | 2 | invisible | A | 1963184 | NULL | NULL | | BTREE | | | | pre_forum_post | 1 | new_auth | 3 | tid | A | 12760696 | NULL | NULL | | BTREE | | | | pre_forum_post | 1 | idx_dt | 1 | dateline | A | 25521392 | NULL | NULL | | BTREE | | | | pre_forum_post | 1 | mul_test | 1 | tid | A | 880048 | NULL | NULL | | BTREE | | | | pre_forum_post | 1 | mul_test | 2 | invisible | A | 945236 | NULL | NULL | | BTREE | | | | pre_forum_post | 1 | mul_test | 3 | dateline | A | 25521392 | NULL | NULL | | BTREE | | | | pre_forum_post | 1 | mul_test | 4 | pid | A | 25521392 | NULL | NULL | | BTREE | | | +----------------+------------+--------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+

看下执行计划:

1 2 3 4 5 6 7 8 root@localhost 16:08:27 [ultrax]> explain SELECT * FROM pre_forum_post WHERE tid=7932552 AND `invisible` IN(‘0‘,‘-2‘) -> ORDER BY dateline DESC LIMIT 10; +----+-------------+----------------+-------+-------------------------------------------+--------------+---------+------+------+---------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+----------------+-------+-------------------------------------------+--------------+---------+------+------+---------------------------------------+ | 1 | SIMPLE | pre_forum_post | range | PRIMARY,displayorder,first,mul_test,idx_1 | displayorder | 4 | NULL | 54 | Using index condition; Using filesort | +----+-------------+----------------+-------+-------------------------------------------+--------------+---------+------+------+---------------------------------------+ 1 row in set (0.00 sec)

MySQL优化器认为这是一个range查询,那么(tid,invisible,dateline)这条索引中,dateline字段肯定用不上了,也就是说这个SQL最后的排序肯定会生成一个临时结果集,然后再结果集里面完成排序,而不是直接在索引中直接完成排序动作,于是我们尝试增加了一条索引。

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 root@localhost 16:09:06 [ultrax]> alter table pre_forum_post add index idx_1 (tid,dateline); Query OK, 20374596 rows affected, 0 warning (600.23 sec) Records: 0 Duplicates: 0 Warnings: 0   root@localhost 16:20:22 [ultrax]> explain SELECT * FROM pre_forum_post force index (idx_1) WHERE tid=7932552 AND `invisible` IN(‘0‘,‘-2‘) ORDER BY dateline DESC LIMIT 10; +----+-------------+----------------+------+---------------+-------+---------+-------+--------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+----------------+------+---------------+-------+---------+-------+--------+-------------+ | 1 | SIMPLE | pre_forum_post | ref | idx_1 | idx_1 | 3 | const | 120646 | Using where | +----+-------------+----------------+------+---------------+-------+---------+-------+--------+-------------+ 1 row in set (0.00 sec)   root@localhost 16:22:06 [ultrax]> SELECT sql_no_cache * FROM pre_forum_post WHERE tid=7932552 AND `invisible` IN(‘0‘,‘-2‘) ORDER BY dateline DESC LIMIT 10; ... 10 rows in set (0.40 sec)   root@localhost 16:23:55 [ultrax]> SELECT sql_no_cache * FROM pre_forum_post force index (idx_1) WHERE tid=7932552 AND `invisible` IN(‘0‘,‘-2‘) ORDER BY dateline DESC LIMIT 10; ... 10 rows in set (0.00 sec)

实验证明效果是极好的,其实不难理解,上面我们就说了in()在MySQL优化器里面是以多种组合方式来检索数据的,如果加了一个排序或者分组那势必只能在临时结果集上操作,也就是说索引里面即使包含了排序或者分组的字段依然是没用的。唯一不满的是MySQL优化器的选择依然不够靠谱。

总结下:在MySQL查询里面使用in(),除了要注意in()list的数量以及eq_range_index_dive_limit的值以外(具体见下),还要注意如果SQL包含排序/分组/去重等等就需要注意索引的使用

eq_range_index_dive_limit的说明

还是上面的案例,为什么idx_1无法直接使用?需要使用hint强制只用这个索引呢?这里我们首先看下eq_range_index_dive_limit的值。

1 2 3 4 5 6 7 root@localhost 22:38:05 [ultrax]> show variables like ‘eq_range_index_dive_limit‘; +---------------------------+-------+ | Variable_name | Value | +---------------------------+-------+ | eq_range_index_dive_limit | 2 | +---------------------------+-------+ 1 row in set (0.00 sec)

根据我们上面说的这种情况0 < eq_range_index_dive_limit <= N使用index statistics,那么接下来我们用OPTIMIZER_TRACE来一看究竟。

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 { "index": "displayorder", "ranges": [ "7932552 <= tid <= 7932552 AND -2 <= invisible <= -2", "7932552 <= tid <= 7932552 AND 0 <= invisible <= 0" ], "index_dives_for_eq_ranges": false, "rowid_ordered": false, "using_mrr": false, "index_only": false, "rows": 54, "cost": 66.81, "chosen": true } // index dive为false,最终chosen是true ... { "index": "idx_1", "ranges": [ "7932552 <= tid <= 7932552" ], "index_dives_for_eq_ranges": true, "rowid_ordered": false, "using_mrr": false, "index_only": false, "rows": 120646, "cost": 144776, "chosen": false, "cause": "cost" }

我们可以看到displayorder索引的cost是66.81,而idx_1的cost是120646,而最终MySQL优化器选择了displayorder这条索引。那么如果我们把eq_range_index_dive_limit设置>N是不是应该就会使用index dive计算方式,得到更准确的执行计划呢?

1 2 3 4 5 6 7 8 9 10 root@localhost 22:52:52 [ultrax]> set eq_range_index_dive_limit = 3; Query OK, 0 rows affected (0.00 sec)   root@localhost 22:55:38 [ultrax]> explain SELECT * FROM pre_forum_post WHERE tid=7932552 AND `invisible` IN(‘0‘,‘-2‘) ORDER BY dateline DESC LIMIT 10; +----+-------------+----------------+------+-------------------------------------------+-------+---------+-------+--------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+----------------+------+-------------------------------------------+-------+---------+-------+--------+-------------+ | 1 | SIMPLE | pre_forum_post | ref | PRIMARY,displayorder,first,mul_test,idx_1 | idx_1 | 3 | const | 120646 | Using where | +----+-------------+----------------+------+-------------------------------------------+-------+---------+-------+--------+-------------+ 1 row in set (0.00 sec)

optimize_trace结果如下

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 { "index": "displayorder", "ranges": [ "7932552 <= tid <= 7932552 AND -2 <= invisible <= -2", "7932552 <= tid <= 7932552 AND 0 <= invisible <= 0" ], "index_dives_for_eq_ranges": true, "rowid_ordered": false, "using_mrr": false, "index_only": false, "rows": 188193, "cost": 225834, "chosen": true } ... { "index": "idx_1", "ranges": [ "7932552 <= tid <= 7932552" ], "index_dives_for_eq_ranges": true, "rowid_ordered": false, "using_mrr": false, "index_only": false, "rows": 120646, "cost": 144776, "chosen": true } ... "cost_for_plan": 144775, "rows_for_plan": 120646, "chosen": true // 在备选索引选择中两条索引都被选择,在最后的逻辑优化中选在了代价最小的索引也就是idx_1

以上就是在等值范围查询中eq_range_index_dive_limit的值怎么影响MySQL优化器计算开销,从而影响索引的选择。另外我们可以通过profiling来看看优化器的统计耗时:

index dive

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 +----------------------+----------+ | Status | Duration | +----------------------+----------+ | starting | 0.000048 | | checking permissions | 0.000004 | | Opening tables | 0.000015 | | init | 0.000044 | | System lock | 0.000009 | | optimizing | 0.000014 | | statistics | 0.032089 | | preparing | 0.000022 | | Sorting result | 0.000003 | | executing | 0.000003 | | Sending data | 0.000101 | | end | 0.000004 | | query end | 0.000002 | | closing tables | 0.000009 | | freeing items | 0.000013 | | cleaning up | 0.000012 | +----------------------+----------+

index statistics

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 +----------------------+----------+ | Status | Duration | +----------------------+----------+ | starting | 0.000045 | | checking permissions | 0.000003 | | Opening tables | 0.000014 | | init | 0.000040 | | System lock | 0.000008 | | optimizing | 0.000014 | | statistics | 0.000086 | | preparing | 0.000016 | | Sorting result | 0.000002 | | executing | 0.000002 | | Sending data | 0.000016 | | Creating sort index | 0.412123 | | end | 0.000012 | | query end | 0.000004 | | closing tables | 0.000013 | | freeing items | 0.000023 | | cleaning up | 0.000015 | +----------------------+----------+

可以看到当eq_range_index_dive_limit加大使用index dive时,优化器统计耗时明显比ndex statistics方式来的长,但最终它使用了作出了更合理的执行计划。统计耗时0.032089s vs .000086s,但是SQL执行耗时却是约0.03s vs 0.41s。

附:如何使用optimize_trace

1 2 3 set optimizer_trace=‘enabled=on‘; select * from information_schema.optimizer_trace\G // 注:optimizer_trace建议只在session模式下开启调试即可

参考资料
http://dev.mysql.com/doc/refman/5.6/en/range-optimization.html
http://imysql.com/2014/08/05/a-fake-bug-with-eq-range-index-dive-limit.shtml
http://blog.163.com/li_hx/blog/static/18399141320147521735442/

MySQL SQL优化之in与range查询

标签:

人气教程排行