当前位置：Gxlcms > mysql > 分析函数和表连接的性能分析
分析函数和表连接的性能分析

时间：2021-07-01 10:21:17 帮助过：33人阅读
同事报表有些sql语句经常会涉及求表中分组后求某列最大或者最小等行的所有行记录，而这个往往开发人员写的sql都是先构造一个已经完成分组和某列max或者min的表，然后原表做子查询还是表关联。 SQL> set linesize 120 SQL> SELECT r.sample_id, 2 r.result_co
同事报表有些sql语句经常会涉及求表中分组后求某列最大或者最小等行的所有行记录，而这个往往开发人员写的sql都是先构造一个已经完成分组和某列max或者min的表，然后原表做子查询还是表关联。

SQL> set linesize 120

SQL> SELECT r.sample_id,

  2                   r.result_code,

  3                   r.reason_code,

  4                   r.begin_time

  5              FROM call.hf_dm_visit_record r,

  6                   (  SELECT r.sample_id, MIN (r.begin_time) begin_time

  7                        FROM call.hf_dm_visit_record r

  8                    GROUP BY r.sample_id) r1

  9             WHERE r.sample_id = r1.sample_id AND r.begin_time = r1.begin_time;
137540 rows selected.
Elapsed: 00:00:09.85
Execution Plan

----------------------------------------------------------

Plan hash value: 4064551521
---------------------------------------------------------------------------------------------------

| Id  | Operation            | Name               | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |

---------------------------------------------------------------------------------------------------

|   0 | SELECT STATEMENT     |                    |     3 |   474 |       |  5438   (2)| 00:01:06 |

|*  1 |  FILTER              |                    |       |       |       |            |          |

|   2 |   HASH GROUP BY      |                    |     3 |   474 |       |  5438   (2)| 00:01:06 |

|*  3 |    HASH JOIN         |                    |   555K|    83M|    20M|  5381   (1)| 00:01:05 |

|   4 |     TABLE ACCESS FULL| HF_DM_VISIT_RECORD |   274K|    17M|       |  1500   (1)| 00:00:19 |

|   5 |     TABLE ACCESS FULL| HF_DM_VISIT_RECORD |   274K|    23M|       |  1503   (2)| 00:00:19 |

---------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):

---------------------------------------------------
   1 - filter("R"."BEGIN_TIME"=MIN("R"."BEGIN_TIME"))

   3 - access("R"."SAMPLE_ID"="R"."SAMPLE_ID")
Statistics

----------------------------------------------------------

          0  recursive calls

          0  db block gets

      13338  consistent gets

       1951  physical reads

        172  redo size

    9797533  bytes sent via SQL*Net to client

     101351  bytes received via SQL*Net from client

       9171  SQL*Net roundtrips to/from client

          0  sorts (memory)

          0  sorts (disk)

     137540  rows processed
这个是开发人员的原sql语句，同样可以改写成子查询，但是由于子查询是可以展开的，所以一般执行计划不会变化，由于执行计划一样，就不重复列出。

SELECT r.sample_id,

       r.result_code,

       r.reason_code,

       r.begin_time

  FROM call.hf_dm_visit_record r

 WHERE (sample_id, begin_time) IN

          (  SELECT r.sample_id, MIN (r.begin_time) begin_time

               FROM call.hf_dm_visit_record r

           GROUP BY r.sample_id)
而如果我们改成写分析函数，此时oracle只需要扫描一次hf-dm_visit_record表，但是有个WINDOW SORT的排序成本

SQL> SELECT a.sample_id,

       a.result_code,

  2    3         a.reason_code,

       a.begin_time

  FROM (SELECT r.sample_id,

               r.result_code,

               r.reason_code,

               r.begin_time,

               ROW_NUMBER ()

                  OVER (PARTITION BY r.sample_id ORDER BY r.sample_id )

                  cnt

          FROM call.hf_dm_visit_record r)a

 WHERE cnt = 1;  4    5    6    7    8    9   10   11   12   13  
137540 rows selected.
Elapsed: 00:00:12.19
Execution Plan

----------------------------------------------------------

Plan hash value: 679670933
-------------------------------------------------------------------------------------------------------

| Id  | Operation                | Name               | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |

-------------------------------------------------------------------------------------------------------

|   0 | SELECT STATEMENT         |                    |   274K|    46M|       |  6564   (1)| 00:01:19 |

|*  1 |  VIEW                    |                    |   274K|    46M|       |  6564   (1)| 00:01:19 |

|*  2 |   WINDOW SORT PUSHED RANK|                    |   274K|    20M|    24M|  6564   (1)| 00:01:19 |

|   3 |    TABLE ACCESS FULL     | HF_DM_VISIT_RECORD |   274K|    20M|       |  1503   (2)| 00:00:19 |

-------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):

---------------------------------------------------
   1 - filter("CNT"=1)

   2 - filter(ROW_NUMBER() OVER ( PARTITION BY "R"."SAMPLE_ID" ORDER BY  NULL )<=1)
Statistics

----------------------------------------------------------

          0  recursive calls

          0  db block gets

       6670  consistent gets

       1951  physical reads

          0  redo size

    9796204  bytes sent via SQL*Net to client

     101351  bytes received via SQL*Net from client

       9171  SQL*Net roundtrips to/from client

          1  sorts (memory)

          0  sorts (disk)

     137540  rows processed
看sql的执行时间，相比第一种表关联的方式相应更加快，而且cost值也更加低，只是表关联方式的逻辑读相比分析函数要高一部分,还有一个特别需要我们关注的就是hash join和windows sort PUSHED RANK都用到了临时表空间，我们看下hash join大概用到了TempSpc 20M，而windows sort pushed rank则达到了TempSpc  24M,注意这里的TempSpc表示的是hash join和windows sort pushed rank排序消耗的临时表空间大小。
但是大多数系统都是以sql的响应时间为性能参考的，上述sql语句改写为分析函数后执行效率并没有表连接或者子查询效率高，所以经常网络上有文章提到分析函数性能较高，那个也只是片面的，要依据实际的数据分布。
再看下面的一个sql语句：

SQL> SELECT r.called_object

  2    FROM call.hf_script_callrecord r,

  3         (  SELECT r.called_object, MAX (r.begin_time) begin_time

  4              FROM call.hf_script_callrecord r

        GROUP BY r.called_object) r1

 WHERE r.called_object = r1.called_object AND r.begin_time = r1.begin_time;  5    6  
138246 rows selected.
Elapsed: 00:00:30.70
Execution Plan

----------------------------------------------------------

Plan hash value: 4009191755
-----------------------------------------------------------------------------------------------------

| Id  | Operation            | Name                 | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |

-----------------------------------------------------------------------------------------------------

|   0 | SELECT STATEMENT     |                      |    26 |  3796 |       |  8740  (14)| 00:01:45 |

|*  1 |  FILTER              |                      |       |       |       |            |          |

|   2 |   HASH GROUP BY      |                      |    26 |  3796 |       |  8740  (14)| 00:01:45 |

|*  3 |    HASH JOIN         |                      |  8534K|  1188M|    25M|  7706   (3)| 00:01:33 |

|   4 |     TABLE ACCESS FULL| HF_SCRIPT_CALLRECORD |   338K|    21M|       |  2434   (1)| 00:00:30 |

|   5 |     TABLE ACCESS FULL| HF_SCRIPT_CALLRECORD |   338K|    25M|       |  2434   (1)| 00:00:30 |

-----------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):

---------------------------------------------------
   1 - filter("R"."BEGIN_TIME"=MAX("R"."BEGIN_TIME"))

   3 - access("R"."CALLED_OBJECT"="R"."CALLED_OBJECT")
Statistics

----------------------------------------------------------

          0  recursive calls

          0  db block gets

      20540  consistent gets

       6818  physical reads

          0  redo size

    5652618  bytes sent via SQL*Net to client

     101868  bytes received via SQL*Net from client

       9218  SQL*Net roundtrips to/from client

          0  sorts (memory)

          0  sorts (disk)

     138246  rows processed
消耗的逻辑读是20540，排序消耗的temp是25M，cost为8740
SQL> SELECT *

  2    FROM (SELECT r.called_object,

  3                 ROW_NUMBER ()

  4                 OVER (PARTITION BY called_object ORDER BY called_object DESC)

  5                    cnt

  6            FROM call.hf_script_callrecord r)

  7   WHERE cnt = 1;
138199 rows selected.
Elapsed: 00:00:05.50
Execution Plan

----------------------------------------------------------

Plan hash value: 1404543553
---------------------------------------------------------------------------------------------------------

| Id  | Operation                | Name                 | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |

---------------------------------------------------------------------------------------------------------

|   0 | SELECT STATEMENT         |                      |   338K|    30M|       |  5099   (2)| 00:01:02 |

|*  1 |  VIEW                    |                      |   338K|    30M|       |  5099   (2)| 00:01:02 |

|*  2 |   WINDOW SORT PUSHED RANK|                      |   338K|  9268K|    11M|  5099   (2)| 00:01:02 |

|   3 |    TABLE ACCESS FULL     | HF_SCRIPT_CALLRECORD |   338K|  9268K|       |  2434   (1)| 00:00:30 |

---------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):

---------------------------------------------------
   1 - filter("CNT"=1)

   2 - filter(ROW_NUMBER() OVER ( PARTITION BY "CALLED_OBJECT" ORDER BY  NULL )<=1)
Statistics

----------------------------------------------------------

          0  recursive calls

          0  db block gets

      10275  consistent gets

        171  physical reads

        136  redo size

    6175932  bytes sent via SQL*Net to client

     101835  bytes received via SQL*Net from client

       9215  SQL*Net roundtrips to/from client

          1  sorts (memory)

          0  sorts (disk)

     138199  rows processed
改写为分析函数后，逻辑读降低到10275，cost成本也降为了5099，消耗的tempspc则是降低为9268K，而sql的相应时间则从30秒降低为5秒
改写为分析函数后，sql响应时间得到了提高，而消耗的系统资源也较少了，而如果这个sql语句我们改写为分析函数无疑是比较高效的。
同样小鱼创建了一个大表，我们来看下这两张方式消耗资源和响应时间：

SQL>create table t01 as select * from dba_objects;

SQL>insert into t01 select * from t01;

SQL>insert into t01 select * from t01;

SQL>insert into t01 select * from t01;

…

SQL> select count(*) from t01;
  COUNT(*)

----------

   3220992
这个表有将近320w的数据，分别设置event 10046和tkprof后查看整个sql执行时间所消耗的资源和等待事件。
SQL> select a.object_id,a.object_name,a.object_type from t01 a,(select max(objec

t_id) col,object_type from t01 group by object_type) b where a.object_id=b.col a

nd a.object_type=b.object_type;
SQL> select * from (select object_id,object_name,object_type,max(object_id)over(

partition by object_type) col from t01)a where a.object_id=a.col;
TKPROF: Release 10.2.0.4.0 - Production on Fri Jun 13 15:32:05 2014
Copyright (c) 1982, 2007, Oracle.  All rights reserved.
Trace file: g:\oracle\product\10.2.0\admin\ora10g\udump\ora10g_ora_8256.trc

Sort options: default
********************************************************************************

count    = number of times OCI procedure was executed

cpu      = cpu time in seconds executing

elapsed  = elapsed time in seconds executing

disk     = number of physical reads of buffers from disk

query    = number of buffers gotten for consistent read

current  = number of buffers gotten in current mode (usually for update)

rows     = number of rows processed by the fetch or execute call

--------------------------------------------------------------------------------
*** SESSION ID:(143.195) 2014-06-13 15:29:31.947
********************************************************************************
select a.object_id,a.object_name,a.object_type

from

 t01 a,(select max(object_id) col,object_type from t01 group by object_type)

  b where a.object_id=b.col and a.object_type=b.object_type
call     count       cpu    elapsed       disk      query    current        rows

------- ------  -------- ---------- ---------- ---------- ----------  ----------

Parse        1      0.00       0.00          0          0          0           0

Execute      1      0.00       0.00          0          0          0           0

Fetch      168      2.12       1.91          0      92607          0        2496

------- ------  -------- ---------- ---------- ---------- ----------  ----------

total      170      2.12       1.91          0      92607          0        2496
Misses in library cache during parse: 1

Optimizer mode: ALL_ROWS

Parsing user id: SYS
Rows     Row Source Operation

-------  ---------------------------------------------------

   2496  HASH JOIN  (cr=92607 pr=0 pw=0 time=2380108 us)

     39   VIEW  (cr=46223 pr=0 pw=0 time=1239182 us)

     39    HASH GROUP BY (cr=46223 pr=0 pw=0 time=1239142 us)

3220992     TABLE ACCESS FULL T01 (cr=46223 pr=0 pw=0 time=79 us)

3220992   TABLE ACCESS FULL T01 (cr=46384 pr=0 pw=0 time=30 us)
Elapsed times include waiting on following events:

  Event waited on                             Times   Max. Wait  Total Waited

  ----------------------------------------   Waited  ----------  ------------

  SQL*Net message to client                     169        0.00          0.00

  SQL*Net message from client                   169       19.92         32.62

********************************************************************************

select *

from

 (select object_id,object_name,object_type,max(object_id)over(partition by

  object_type) col from t01)a where a.object_id=a.col
call     count       cpu    elapsed       disk      query    current        rows

------- ------  -------- ---------- ---------- ---------- ----------  ----------

Parse        1      0.00       0.00          0          0          0           0

Execute      1      0.00       0.00          0          0          0           0

Fetch      168      7.28      23.42      71080      46223         15        2496

------- ------  -------- ---------- ---------- ---------- ----------  ----------

total      170      7.28      23.42      71080      46223         15        2496
Misses in library cache during parse: 0

Optimizer mode: ALL_ROWS

Parsing user id: SYS
Rows     Row Source Operation

-------  ---------------------------------------------------

   2496  VIEW  (cr=46223 pr=71080 pw=52672 time=289701960 us)

3220992   WINDOW SORT (cr=46223 pr=71080 pw=52672 time=130099916 us)

3220992    TABLE ACCESS FULL T01 (cr=46223 pr=0 pw=0 time=89 us)
Elapsed times include waiting on following events:

  Event waited on                             Times   Max. Wait  Total Waited

  ----------------------------------------   Waited  ----------  ------------

  SQL*Net message to client                     168        0.00          0.00

  direct path write temp                        251        0.24          1.98

  direct path read temp                       44876        0.26         14.55

  SQL*Net message from client                   168       23.36         25.61

********************************************************************************
表连接的sql语句响应时间明显快于分析函数，而且分析函数有个很致命的问题是排序用了磁盘的temp，我们可以在10046 格式化的trac文件中看见很明显的direct path write temp和direct path read temp，同样由于是自己的测试机（T430），分析函数的sql执行时特别消耗自己的pc资源。
对于上述这类sql请求，分析函数的优势是减少表扫描次数，但是有个windows sort的排序成本，而表连接主要是表扫描次数增多，同样会有一个表连接和group by的排序，两种方式并不是绝对的谁优于谁，需要根据具体的数据分布来进行评估，有兴趣的朋友可以自己找自己的生产系统的sql语句来进行测试，当然也可以自己模拟数据。
原文地址：分析函数和表连接的性能分析, 感谢原作者分享。
分析函数和表连接的性能分析

人气教程排行