当前位置:Gxlcms > mysql > 初识全文索引

初识全文索引

时间:2021-07-01 10:21:17 帮助过:23人阅读

通常来说,全文索引大多用在OLAP环境当中,全文索引擅长于词汇的快速搜索。 一、全文索引和普通b_tree索引对比 SQL create tablet1 (id int,name varchar(10)); Table created. SQL create indext1_ind on t1(name); Index created. SQL create tablet2 as s

通常来说,全文索引大多用在OLAP环境当中,全文索引擅长于词汇的快速搜索。

一、全文索引和普通b_tree索引对比

SQL> create tablet1 (id int,name varchar(10));

Table created.

SQL> create indext1_ind on t1(name);

Index created.

SQL> create tablet2 as select * from t1;

Table created.

SQL> create indext2_ind on t2(name) indextype is ctxsys.context;

Index created.

SQL> select *from t1 where name like '%tom%';

ID NAME

--------------------

1 tom

2 tom tom

2 tom tom

Execution Plan

----------------------------------------------------------

Plan hash value:3589342044

--------------------------------------------------------------------------------------

| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |

--------------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 1 | 10 | 2 (0)| 00:00:01 |

| 1 | TABLE ACCESS BY INDEX ROWID| T1 | 1 | 10 | 2 (0)| 00:00:01 |

|* 2 | INDEX FULL SCAN | T1_IND| 1 | | 1 (0)| 00:00:01 |

--------------------------------------------------------------------------------------

PredicateInformation (identified by operation id):

---------------------------------------------------

2 - filter("NAME" LIKE '%tom%' AND"NAME" IS NOT NULL)

Statistics

----------------------------------------------------------

0 recursive calls

0 db block gets

4 consistent gets

0 physical reads

0 redo size

676 bytes sent via SQL*Net to client

519 bytes received via SQL*Net from client

2 SQL*Net roundtrips to/from client

0 sorts (memory)

0 sorts (disk)

3 rows processed

SQL> select *from t2 where contains(name,'tom')>0;

ID NAME

--------------------

1 tom

2 tom tom

2 tom tom

Execution Plan

----------------------------------------------------------

Plan hash value:785228215

--------------------------------------------------------------------------------------

| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |

--------------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 3 | 30 | 7 (0)| 00:00:01 |

| 1 | TABLE ACCESS BY INDEX ROWID| T2 | 3 | 30 | 7 (0)| 00:00:01 |

|* 2 | DOMAIN INDEX | T2_IND| | | 1 (0)| 00:00:01 |

--------------------------------------------------------------------------------------

PredicateInformation (identified by operation id):

---------------------------------------------------

2 -access("CTXSYS"."CONTAINS"("NAME",'tom')>0)

Statistics

----------------------------------------------------------

10 recursive calls

0 db block gets

14 consistent gets

0 physical reads

0 redo size

676 bytes sent via SQL*Net to client

519 bytes received via SQL*Net from client

2 SQL*Net roundtrips to/from client

0 sorts (memory)

0 sorts (disk)

3 rows processed

SQL> selectobject_name,object_type from user_objects order by 1;

OBJECT_NAME OBJECT_TYPE

-------------------------------------------------

--DR开头的四张表为全文索引的基表

DR$T2_IND$X INDEX

DRC$T2_IND$R INDEX

SYS_IL0000236119C00006$$ INDEX

SYS_IL0000236124C00002$$ INDEX

SYS_IOT_TOP_236122 INDEX

SYS_IOT_TOP_236128 INDEX

SYS_LOB0000236119C00006$$ LOB

OBJECT_NAME OBJECT_TYPE

-------------------------------------------------

SYS_LOB0000236124C00002$$ LOB

T1 TABLE

T1_IND INDEX

T2 TABLE

T2_IND INDEX

二、DML操作对全文索引的影响

以context方式创建的全文索引并不是基于事务的,默认情况下,即使一个dml操作提交,信息也不会更新到全文索引中。

1、insert 操作

SQL> create tablet(name varchar2(30));

Table created.

SQL> create indext_ind on t(name) indextype is ctxsys.context;

Index created.

SQL> insert intot values('i am an oracle dba');

1 row created.

SQL> commit;

insert数据已提交,我们看看全文索引是否已更新

SQL> setautotrace on

SQL> select *from t where name like '%dba%';

NAME

------------------------------

i am an oracle dba

Execution Plan

----------------------------------------------------------

Plan hash value:1601196873

--------------------------------------------------------------------------

| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |

--------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 1 | 17 | 2 (0)| 00:00:01 |

|* 1 | TABLE ACCESS FULL| T | 1 | 17 | 2 (0)| 00:00:01 |

--------------------------------------------------------------------------

PredicateInformation (identified by operation id):

---------------------------------------------------

1 - filter("NAME" IS NOT NULL AND"NAME" LIKE '%dba%')

Note

-----

- dynamic sampling used for this statement(level=2)

Statistics

----------------------------------------------------------

5 recursive calls

0 db block gets

15 consistent gets

0 physical reads

0 redo size

538 bytes sent via SQL*Net to client

520 bytes received via SQL*Net from client

2 SQL*Net roundtrips to/from client

0 sorts (memory)

0 sorts (disk)

1 rows processed

SQL> set line 200

SQL> select *from t where contains(name,'dba') >0;

no rows selected

Execution Plan

----------------------------------------------------------

Plan hash value:315187259

-------------------------------------------------------------------------------------

| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |

-------------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 1 | 29 | 4 (0)| 00:00:01 |

| 1 | TABLE ACCESS BY INDEX ROWID| T | 1 | 29 | 4 (0)| 00:00:01 |

|* 2 | DOMAIN INDEX | T_IND| | | 4 (0)| 00:00:01 |

-------------------------------------------------------------------------------------

PredicateInformation (identified by operation id):

---------------------------------------------------

2 -access("CTXSYS"."CONTAINS"("NAME",'dba')>0)

Note

-----

- dynamic sampling used for this statement(level=2)

Statistics

----------------------------------------------------------

1829 recursive calls

0 db block gets

2696 consistent gets

30 physical reads

0 redo size

332 bytes sent via SQL*Net to client

509 bytes received via SQL*Net from client

1 SQL*Net roundtrips to/from client

164 sorts (memory)

0 sorts (disk)

0 rows processed

以上发现,全文索引并没有自动更新,而是把记录存放在线面的ctxsys.dr$pending表中。

SQL> setautotrace off

SQL> select *from ctxsys.dr$pending;

PND_CID PND_PID PND_ROWID PND_TIMESTAMP P

-------------------- ------------------ ------------------ -

1084 0 AABGmVAAEAAAADmAAA 03-APR-14 N

SQL> insert intot values('he is an oracle dba');

1 row created.

SQL> commit;

Commit complete.

SQL> select *from ctxsys.dr$pending;

PND_CID PND_PID PND_ROWID PND_TIMESTAMP P

-------------------- ------------------ ------------------ -

1084 0 AABGmVAAEAAAADmAAA 03-APR-14 N

1084 0 AABGmVAAEAAAADmAAB 03-APR-14 N

SQL> select *from t where contains(name,'dba') >0;

no rows selected

为了把信息同步到全文索引中,我们需要手工同步:

SQL> alter indext_ind rebuild parameters ('sync');

Index altered.

SQL> select *from t where contains(name,'dba') >0;

NAME

------------------------------

i am an oracle dba

he is an oracle dba

SQL> select *from ctxsys.dr$pending;

no rows selected

2、delete操作

SQL> select *from t;

NAME

------------------------------

i am an oracle dba

he is an oracle dba

SQL> delete fromt where name='he is an oracle dba';

1 row deleted.

SQL> select *from t where contains(name,'dba') >0;

NAME

------------------------------

i am an oracle dba

SQL> select *from ctxsys.dr$pending;

no rows selected

SQL> select *from ctxsys.dr$delete;

DEL_IDX_IDDEL_IXP_ID DEL_DOCID

-------------------- ----------

1084 0 2

这里我们看到全文索引立即生效了,至于ctxsys.dr$delete里面的数据是delete操作的中间状态,用来维护一个事物,无论事物提交或者回滚。

SQL> rollback;

Rollback complete.

SQL> select *from t where contains(name,'dba') >0;

NAME

------------------------------

i am an oracle dba

he is an oracle dba

SQL> select *from ctxsys.dr$delete;

no rows selected

3、update操作

update操作相当于delete+insert操作,所以默认情况下需要手动刷新全文索引。

SQL> update t setname='oracle dba' where name='i am an oracle dba';

1 row updated.

SQL> select *from ctxsys.dr$delete;

DEL_IDX_IDDEL_IXP_ID DEL_DOCID

-------------------- ----------

1084 0 1

SQL> select *from ctxsys.dr$pending;

PND_CID PND_PID PND_ROWID PND_TIMESTAMP P

-------------------- ------------------ ------------------ -

1084 0 AABGmVAAEAAAADmAAA 03-APR-14 N

SQL> select *from t where contains(name,'dba') > 0;

NAME

------------------------------

he is an oracle dba

SQL> alter indext_ind rebuild parameters ('sync');

Index altered.

SQL> select *from t where contains(name,'dba') > 0;

NAME

------------------------------

he is an oracle dba

oracle dba

由于全文索引创建对象大多是海量数据的表,dml操作如果实时更新会影响到系统性能。创建全文索引同步的三个选项:

manual:默认选项

every:在一个时间段后更新索引

on commitdml:在事务提交后更新索引

语法如下:

create index t_indon t(name) indextype is ctxsys.context parameters('sync (on commit)');

查看全文索引信息和性能的工具包ctx_report

人气教程排行