当前位置:Gxlcms > 数据库问题 > SQL SERVER中关于OR会导致索引扫描或全表扫描的浅析

SQL SERVER中关于OR会导致索引扫描或全表扫描的浅析

时间:2021-07-01 10:21:17 帮助过:5人阅读

 TABLE TEST
  1.   
  1. <span style="color: #0000ff">CREATE</span> <span style="color: #0000ff">TABLE</span> TEST (OBJECT_ID  <span style="color: #0000ff">INT</span>, NAME <span style="color: #0000ff">VARCHAR</span>(32));
  1.  
  1. <span style="color: #0000ff">CREATE</span> <span style="color: #0000ff">INDEX</span> PK_TEST <span style="color: #0000ff">ON</span> TEST(OBJECT_ID)
  1.   
  1. <span style="color: #0000ff">DECLARE</span> @<span style="color: #0000ff">Index</span> <span style="color: #0000ff">INT</span> =0;
  1.  
  1. <span style="color: #0000ff">WHILE</span> @<span style="color: #0000ff">Index</span> < 500000
  1. <span style="color: #0000ff">BEGIN</span>
  1.     INSERT <span style="color: #0000ff">INTO</span> TEST
  1.     <span style="color: #0000ff">SELECT</span> @<span style="color: #0000ff">Index</span>, <span style="color: #006080">‘kerry‘</span>+<span style="color: #0000ff">CAST</span>(@<span style="color: #0000ff">Index</span> <span style="color: #0000ff">AS</span> <span style="color: #0000ff">VARCHAR</span>(6));
  1.   
  1.     <span style="color: #0000ff">SET</span> @<span style="color: #0000ff">Index</span> = @<span style="color: #0000ff">Index</span> +1;
  1. <span style="color: #0000ff">END</span>
  1.  
  1.  
  1. <span style="color: #0000ff">UPDATE</span> <span style="color: #0000ff">STATISTICS</span> TEST <span style="color: #0000ff">WITH</span> FULLSCAN

 

场景1:如下所示,并不是所有的OR条件都会导致SQL走全表扫描。具体情况具体分析,不要套用教条。

  1. <span style="color: #0000ff">SELECT</span> * <span style="color: #0000ff">FROM</span> TEST <span style="color: #0000ff">WHERE</span> (OBJECT_ID =5 <span style="color: #0000ff">OR</span> OBJECT_ID = 105)

技术分享

 

场景2:加了条件1=1后,执行计划从索引查找(Index Seek)变为全表扫描(Table Scan),为什么会如此呢?个人理解为优化器将OR运算拆分为两个子集处理,由于一些原因,1=1这个条件导致优化器认定需要全表扫描才能完成1=1条件子集的计算处理(为了理解这个,煞费苦心,鉴于理论薄弱,如有错误或不足,敬请指出)。所以优化器在权衡代价后生成的执行计划最终选择了全表扫描(Table Scan)

  1. <span style="color: #0000ff">SELECT</span> * <span style="color: #0000ff">FROM</span> TEST <span style="color: #0000ff">WHERE</span> (1=1 <span style="color: #0000ff">OR</span> OBJECT_ID =105);

技术分享

 

场景3: 下面场景比较好理解,因为下面需要从500000条记录中取出499700条记录,而全表扫描(Table Scan)肯定是最优的选择,代价(Cost)最低。

  1. <span style="color: #0000ff">SELECT</span> * <span style="color: #0000ff">FROM</span> TEST <span style="color: #0000ff">WHERE</span> (OBJECT_ID >300 <span style="color: #0000ff">OR</span> OBJECT_ID =105);

 

场景4:这种场景跟场景2的情况本质是一样的。所以在此略过。其实类似这种写法也是实际情况中最常出现的情况,还在迷糊的同学,赶紧抛弃这种写法吧

  1. <span style="color: #0000ff">DECLARE</span> @OBJECT_ID <span style="color: #0000ff">INT</span> =150;
  1.  
  1. <span style="color: #0000ff">SELECT</span> * <span style="color: #0000ff">FROM</span> TEST <span style="color: #0000ff">WHERE</span> (@OBJECT_ID <span style="color: #0000ff">IS</span> <span style="color: #0000ff">NULL</span> <span style="color: #0000ff">OR</span> OBJECT_ID =@OBJECT_ID);

技术分享

 

聚集索引表单索引

在聚集索引表中,我们也依葫芦画瓢,准备实验测试的数据环境。

  1. <span style="color: #0000ff">DROP</span> <span style="color: #0000ff">TABLE</span> TEST
  1.   
  1. <span style="color: #0000ff">CREATE</span> <span style="color: #0000ff">TABLE</span> TEST (OBJECT_ID  <span style="color: #0000ff">INT</span>, NAME <span style="color: #0000ff">VARCHAR</span>(32));
  1.  
  1. <span style="color: #0000ff">CREATE</span> <span style="color: #0000ff">CLUSTERED</span> <span style="color: #0000ff">INDEX</span> PK_TEST <span style="color: #0000ff">ON</span> TEST(OBJECT_ID)
  1.   
  1. <span style="color: #0000ff">DECLARE</span> @<span style="color: #0000ff">Index</span> <span style="color: #0000ff">INT</span> =0;
  1.  
  1. <span style="color: #0000ff">WHILE</span> @<span style="color: #0000ff">Index</span> < 500000
  1. <span style="color: #0000ff">BEGIN</span>
  1.     INSERT <span style="color: #0000ff">INTO</span> TEST
  1.     <span style="color: #0000ff">SELECT</span> @<span style="color: #0000ff">Index</span>, <span style="color: #006080">‘kerry‘</span>+<span style="color: #0000ff">CAST</span>(@<span style="color: #0000ff">Index</span> <span style="color: #0000ff">AS</span> <span style="color: #0000ff">VARCHAR</span>(6));
  1.   
  1.     <span style="color: #0000ff">SET</span> @<span style="color: #0000ff">Index</span> = @<span style="color: #0000ff">Index</span> +1;
  1. <span style="color: #0000ff">END</span>
  1.  
  1.  
  1. <span style="color: #0000ff">UPDATE</span> <span style="color: #0000ff">STATISTICS</span> TEST <span style="color: #0000ff">WITH</span> FULLSCAN

 

场景1 :索引查找(Index Seek)

 

  1. <span style="color: #0000ff">SELECT</span> * <span style="color: #0000ff">FROM</span> TEST <span style="color: #0000ff">WHERE</span> (OBJECT_ID =5 <span style="color: #0000ff">OR</span> OBJECT_ID = 105)

 

场景2:聚集索引扫描(Clustered Index Scan)

技术分享

 

场景3:似乎与堆表有所不同。聚集索引表居然还是走聚集索引查找。

技术分享

 

场景4:OR导致聚集索引扫描

技术分享

 

如果堆表或聚集索引表上建立有联合索引,情况也大致如此,在此不做过多案例讲解。下面仅仅讲述一两个案例场景。

  1. <span style="color: #0000ff">DROP</span> <span style="color: #0000ff">TABLE</span> test1;
  1.  
  1. <span style="color: #0000ff">CREATE</span> <span style="color: #0000ff">TABLE</span> test1
  1.   (
  1.      a <span style="color: #0000ff">INT</span>,
  1.      b <span style="color: #0000ff">INT</span>,
  1.      c <span style="color: #0000ff">INT</span>,
  1.      d <span style="color: #0000ff">INT</span>,
  1.      e <span style="color: #0000ff">INT</span> 
  1.   )
  1.  
  1. <span style="color: #0000ff">DECLARE</span> @<span style="color: #0000ff">Index</span> <span style="color: #0000ff">INT</span> =0;
  1.  
  1. <span style="color: #0000ff">WHILE</span> @<span style="color: #0000ff">Index</span> < 10000
  1.   <span style="color: #0000ff">BEGIN</span> 
  1.       INSERT <span style="color: #0000ff">INTO</span> test1
  1.       <span style="color: #0000ff">SELECT</span> @<span style="color: #0000ff">Index</span>,
  1.              @<span style="color: #0000ff">Index</span>,
  1.              @<span style="color: #0000ff">Index</span>,
  1.              @<span style="color: #0000ff">Index</span>,
  1.              @<span style="color: #0000ff">Index</span> 
  1.  
  1.       <span style="color: #0000ff">SET</span> @<span style="color: #0000ff">Index</span> = @<span style="color: #0000ff">Index</span> + 1;
  1.   <span style="color: #0000ff">END</span> 
  1.  
  1. <span style="color: #0000ff">CREATE</span> <span style="color: #0000ff">INDEX</span> idx_test_n1
  1.   <span style="color: #0000ff">ON</span> test1(a, b, c, d)
  1.  
  1. <span style="color: #0000ff">UPDATE</span> <span style="color: #0000ff">STATISTICS</span> test1 <span style="color: #0000ff">WITH</span> fullscan;

SELECT * FROM TEST1 WHERE A=12 OR B> 500 OR C >100000

技术分享

 

因为结果集是几个条件的并集,最多只能在查找A=12的数据时用索引,其它几个条件都需要表扫描,那优化器就会选择直接走一遍表扫描,以最低的代价COST完成,所以索引就失效了。

 

那么如何优化查询语句含有的OR的SQL语句呢?方法无外乎有三种:

1:通过索引覆盖,使包含OR的SQL走索引查找(Index Seek)。但是这个只能满足部分场景,并不能解决所有这类SQL。这个Solution具有一定的局限性。

SELECT * FROM TEST1 WHERE A=12 OR B=500

技术分享

如果我们通过索引覆盖,在字段B上面也建立索引,那么下面OR查询也会走索引查找。

  1. <span style="color: #0000ff">CREATE</span> <span style="color: #0000ff">INDEX</span> IDX_TEST1_B <span style="color: #0000ff">ON</span> TEST1(B);
  1.  
  1. <span style="color: #0000ff">SELECT</span> * <span style="color: #0000ff">FROM</span> TEST1 <span style="color: #0000ff">WHERE</span> A=12 <span style="color: #0000ff">OR</span> B=500

技术分享

 

2:使用IN替换OR。 但是这个Solution也有很多局限性。在此不做过多阐述。

 

3:一般将OR的字句分解成多个查询,并且通过UNION ALL 或UNION连接起来。在联合索引或有索引覆盖的场景下。大部分情况下,UNION ALL的效率更高。但是并不是所有的UNION ALL都会比OR的SQL的代价(COST),特殊的情况或特殊的数据分布也会出现UNION ALL比OR代价要高的情况。例如,上面特殊的要求,从全表中取两条记录,如下所示

  1. <span style="color: #0000ff">SELECT</span> * <span style="color: #0000ff">FROM</span> TEST1 <span style="color: #0000ff">WHERE</span> A=12
  1.  
  1. <span style="color: #0000ff">UNION</span> <span style="color: #0000ff">ALL</span>
  1.  
  1. <span style="color: #0000ff">SELECT</span> * <span style="color: #0000ff">FROM</span> TEST1 <span style="color: #0000ff">WHERE</span> B=500

技术分享

 

UNON ALL语句的代价(Cost)要高与OR是因为它做了两次索引查找(Index Seek),而OR语句只做一次索引查找(Index Seek)就完成了。开销明显小一些,但是实际情况这类特殊情况比较少,实际情况的取数条件、数据都比这个简单案例要复杂得多。所以在大部分情况下,拆分为UNION ALL语句的效率要高于OR语句

另外一个案例,就是最上面实验的堆表TEST, 在字段OBJECT_ID上建有索引

  1. <span style="color: #0000ff">SELECT</span> * <span style="color: #0000ff">FROM</span> TEST <span style="color: #0000ff">WHERE</span> (OBJECT_ID >300 <span style="color: #0000ff">OR</span> OBJECT_ID =105);
  1.  
  1. <span style="color: #0000ff">SELECT</span> * <span style="color: #0000ff">FROM</span> TEST <span style="color: #0000ff">WHERE</span> OBJECT_ID >300
  1.  
  1. <span style="color: #0000ff">UNION</span> <span style="color: #0000ff">ALL</span>
  1.  
  1. <span style="color: #0000ff">SELECT</span> * <span style="color: #0000ff">FROM</span> TEST <span style="color: #0000ff">WHERE</span> OBJECT_ID =105;

技术分享

可以从下面看出两者开销不同的地方在于IO方面,两者开销之所以有区别,是因为第二个SQL多了一次扫描(索引查找)

技术分享

技术分享

 

总结:

    在实际开发环境中,OR这种写法确实会带来很多不确定性,尽量使用UNION 或IN替换OR。我们需要遵循一些规则,但是也不能认为它就是一成不变的,永为真理。具体场景、具体环境具体分析。要知其然知其所以然。在微软亚太区数据库技术支持组的官方博客中就有一个案例SQL Server性能问题案例解析 (3)也是OR引起的性能案例。 博客中有个观点,我觉得挺赞的:”需要注意的是,对于OR或UNION,并没有确定的孰优孰劣,使用时要进行测试才能确定。“ 。

SQL SERVER中关于OR会导致索引扫描或全表扫描的浅析

标签:

人气教程排行