当前位置:Gxlcms > mysql > Oracle执行计划(3)-两表连接基数

Oracle执行计划(3)-两表连接基数

时间:2021-07-01 10:21:17 帮助过:46人阅读

Oracle 执行计划(3)- 两表连接基数 1 公式: 基数 = 连接选择率 * 过滤条件 1 基数 + 过滤条件 2 的基数 连接选择率 =((num_rows( 表 1)-num_nulls( 表 1 连接字段 ))/num_rows( 表 1))* ((num_rows( 表 2)-num_nulls( 表 2 连接字段 ))/num_rows( 表 2))

Oracle 执行计划(3)-两表连接基数

1 公式:

基数= 连接选择率*过滤条件1基数+过滤条件2的基数

连接选择率=((num_rows(1)-num_nulls(1连接字段))/num_rows(1))*

((num_rows(2)-num_nulls(2连接字段))/num_rows(2)) /

Greater(num_distinct(1连接字段),num_distinct(2连接字段))

create table t1 as
select  trunc(dbms_random.value(0,25)) filter1,
trunc(dbms_random.value(0,30)) join1,
lpad(rownum,10) v1,
rpad('x',100) padding1
from all_objects
where rownum<=10000;

 

create table t2 as
select  trunc(dbms_random.value(0,50)) filter2,
trunc(dbms_random.value(0,40)) join2,
lpad(rownum,10) v2,
rpad('x',100) padding2
from all_objects
where rownum<=10000;

 

select t1.v1,t2.v2

from t1,t2

where t1.join1=t2.join2

and t1.filter=1

and t2.filter2=2


已选择2259行。
已用时间:  00: 00: 00.03
执行计划
----------------------------------------------------------
Plan hash value: 2959412835
---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |  2000 | 56000 |    76   (3)| 00:00:01 |
|*  1 |  HASH JOIN         |      |  2000 | 56000 |    76   (3)| 00:00:01 |
|*  2 |   TABLE ACCESS FULL| T2   |   200 |  2800 |    38   (3)| 00:00:01 |
|*  3 |   TABLE ACCESS FULL| T1   |   400 |  5600 |    38   (3)| 00:00:01 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."JOIN1"="T2"."JOIN2")
   2 - filter("T2"."FILTER2"=2)
   3 - filter("T1"."FILTER"=1)
统计信息
----------------------------------------------------------
          1  recursive calls
          0  db block gets
        504  consistent gets
          0  physical reads
          0  redo size
      60032  bytes sent via SQL*Net to client
       2035  bytes received via SQL*Net from client
        152  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
       2259  rows processed

select * from user_tab_col_statistics where table_name='T1'

TABLE_NAME

COLUMN_NAME

NUM_DISTINCT

DENSITY

NUM_NULLS

T1

FILTER

25

0.04

0

T1

JOIN1

30

0.0333333333333333

0

T1

V1

10000

0.0001

0

T1

PADDING

1

1

0

TABLE_NAME

COLUMN_NAME

NUM_DISTINCT

DENSITY

NUM_NULLS

SAMPLE_SIZE

T2

FILTER2

50

0.02

0

10000

T2

JOIN2

40

0.025

0

10000

T2

V2

10000

0.0001

0

10000

T2

PADDING2

1

1

0

10000

连接选择率=(10000-0)/10000)*(1000-0)/10000)/greater(30,40)=1/40

连接基数=1/40*(400*200)=2000

执行计划当中的 T2 ROWS=200,T1.ROWS=400 HASH JOIN.ROWS=2000
|*  1 |  HASH JOIN         |      |  2000 | 56000 |    76   (3)| 00:00:01 |
|*  2 |   TABLE ACCESS FULL| T2   |   200 |  2800 |    38   (3)| 00:00:01 |
|*  3 |   TABLE ACCESS FULL| T1   |   400 |  5600 |    38   (3)| 00:00:01 |

2 包含空值情况下

Update t1 set join1=null where mod(to_number(v1),20)=0;

Update t2 set join2=null where mod(to_number(v2),30)=0;

SQL> analyze table t2 compute statistics;

SQL> analyze table t1 compute statistics;

select * from user_tab_col_statistics where table_name='T1'

TABLE_NAME

COLUMN_NAME

NUM_DISTINCT

DENSITY

NUM_NULLS

SAMPLE_SIZE

T1

FILTER

25

0.04

0

10000

T1

JOIN1

30

0.0333333333333333

500

10000

T1

V1

10000

0.0001

0

10000

T1

PADDING

1

1

0

10000

TABLE_NAME

COLUMN_NAME

NUM_DISTINCT

DENSITY

NUM_NULLS

SAMPLE_SIZE

T2

FILTER2

50

0.02

0

10000

T2

JOIN2

40

0.025

333

10000

T2

V2

10000

0.0001

0

10000

T2

PADDING2

1

1

0

10000

套公式 选择率=((10000-500)/10000)*((10000-333)/10000)/greater(30,40)

=9500/10000*9667/10000/40

=0.95*0.9667/40

=0.022959125

基数=200*400*0.022959125 =1836.73

执行计划:

已选择2042行。

已用时间: 00: 00: 00.03

执行计划

----------------------------------------------------------

Plan hash value: 2959412835
---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |  1837 | 51436 |    76   (3)| 00:00:01 |
|*  1 |  HASH JOIN         |      |  1837 | 51436 |    76   (3)| 00:00:01 |
|*  2 |   TABLE ACCESS FULL| T2   |   200 |  2800 |    38   (3)| 00:00:01 |
|*  3 |   TABLE ACCESS FULL| T1   |   400 |  5600 |    38   (3)| 00:00:01 |
---------------------------------------------------------------------------

3 过滤基数

基数=基本选择率*(num_rows-nulls)

Update t1 set filter=null where mod(to_number(v1),50)=0;

Update t2 set filter2=null where mod(to_number(v2),100)=0;

200 rows updated

100 rows updated

T1.filter cardinatitly=1/25*(10000-200)=392

T2.FILTER2 CARDINATILTY=1/50(10000-100)=198

连接基数=392*198*0.022959125=1781.995

已选择2000行。

已用时间: 00: 00: 00.06

执行计划

----------------------------------------------------------

Plan hash value: 2959412835
---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |  1782 | 49896 |    76   (3)| 00:00:01 |
|*  1 |  HASH JOIN         |      |  1782 | 49896 |    76   (3)| 00:00:01 |
|*  2 |   TABLE ACCESS FULL| T2   |   198 |  2772 |    38   (3)| 00:00:01 |
|*  3 |   TABLE ACCESS FULL| T1   |   392 |  5488 |    38   (3)| 00:00:01 |

4 多连接条件

select t3.v2,t4.v2
from t3,t4
where t3.join1=t4.join2
and t3.join2=t4.join2

连接公式:=(条件1选择率)*(条件2选择率)

不求证了!

5 范围连接选择率

1 Where t1.join1

2 Where t2.join1 between t1.join1-1 and t1.join1+1

1 选择率=5% 固定选择率

2 转化成绑定变量格式, 固定选择率相乘.5%*5%

6 不等连接选择率

Where t1.join1!=t2.join2

选择率 = 1-( t1.join1=t2.join2 选择率)

=1-1/40=39/40

7 AND OR 多连接条件

1 where t1.join1=t2.join1 and t1.join2=t2.join2

2 where t1.join1=t2.join1 OR t1.join2=t2.join2

可以参考单表基数的多谓词选择率

1 join1选择率*join2选择率

2 join1选择率+join2选择率- join1选择率*join2选择率

8 三表连接基数选择率

create table t3 as
select
trunc(dbms_random.value(0,50)) filter2,
trunc(dbms_random.value(
0,30)) join1,
trunc(dbms_random.value(
0,50)) join2,
lpad(rownum,
10) v2,
rpad(
'x',100) padding2
from all_objects
where rownum<=
10000;

然后重新生存 T1 T2 分析后执行语句

select t1.v1,t2.v2,t3.v2
from t1,t2,t3
where t1.join1=t2.join2
and t2.join2=t3.join1
and t1.filter1=
1
and t2.filter2=
1

1 先做T1T2的选择率和基数

前面已经获得2000

2 T2T3做连接

套用公式T2T3

选择率=(10000-0)/10000)*(10000-0)/10000)/greater(40,30)=1/40

基数=1/40*2000*10000=50,0000

注意 2000是第一个连接的基数,10000是T3无过滤条件的基数.

已用时间: 00: 00: 09.42

执行计划

----------------------------------------------------------

Plan hash value: 1184213596
----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |   500K|    19M|   123   (9)| 00:00:02 |
|*  1 |  HASH JOIN          |      |   500K|    19M|   123   (9)| 00:00:02 |
|*  2 |   HASH JOIN         |      |  2000 | 56000 |    76   (3)| 00:00:01 |
|*  3 |    TABLE ACCESS FULL| T2   |   200 |  2800 |    38   (3)| 00:00:01 |
|*  4 |    TABLE ACCESS FULL| T1   |   400 |  5600 |    38   (3)| 00:00:01 |
|   5 |   TABLE ACCESS FULL | T3   | 10000 |   117K|    39   (3)| 00:00:01 |

----------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

1 - access("T2"."JOIN2"="T3"."JOIN1")

2 - access("T1"."JOIN1"="T2"."JOIN2")

3 - filter("T2"."FILTER2"=1)

4 - filter("T1"."FILTER1"=1)

9 传递闭包

create table t4 as
select
trunc(dbms_random.value(0,50)) filter2,
trunc(dbms_random.value(
0,40)) join1,
trunc(dbms_random.value(
0,40)) join2,
lpad(rownum,
10) v2,
rpad(
'x',100) padding2
from all_objects
where rownum<=
10000;

select t3.v2,t4.v2
from t3,t4
where t3.join1=t4.join1
and t3.join2=t4.join2
and t3.join1=20;

传递闭包是这么回事因为T3.JOIN1=20 并且T3.JOIN1=T4.JOIN1 T4.JOIN1=20;

执行计划

----------------------------------------------------------

Plan hash value: 920528290
---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |    52 |  1456 |    78   (3)| 00:00:01 |
|*  1 |  HASH JOIN         |      |    52 |  1456 |    78   (3)| 00:00:01 |
|*  2 |   TABLE ACCESS FULL| T4   |   250 |  3500 |    39   (3)| 00:00:01 |
|*  3 |   TABLE ACCESS FULL| T3   |   333 |  4662 |    39   (3)| 00:00:01 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

1 - access("T3"."JOIN1"="T4"."JOIN1" AND "T3"."JOIN2"="T4"."JOIN2")

2 - filter("T4"."JOIN1"=20)

3 - filter("T3"."JOIN1"=20)

实际上结果集行数是:1554 与52基数相差超大

因为 JOIN1选择率*JOIN2选择率=(10000-0)/10000)*(10000-0)/10000)/greater(30,40)

*(10000-0)/10000)*(10000-0)/10000)/greater(50,40)=1/40*1/50=1/2000

并且因为10G多列完备性检查,选择结果集最小选择率相乘 1/40*1/40=1/1600

基数=1/1600*10000/30*10000/40=52

基数=1/40*10000/30*10000/50=1/40*333*200=1665 与结果集相当.因为该版本没有消除掉连接条件.

人气教程排行