当前位置:Gxlcms > 数据库问题 > MySQL中InnoDB全文检索

MySQL中InnoDB全文检索

时间:2021-07-01 10:21:17 帮助过:5人阅读

GLOBAL innodb_ft_aux_table=test/fts_a;

可以在information_schema架构下的表INNODB_FT_INDEX_TABLE得到表fts_a中的分词信息。

对于InnoDB存储引擎而言,其总是在事务提交时将分词写入到FTS Index Cache,然后通过批量写入到磁盘。虽然InnoDB存储引擎通过一种延时的、批量的写入方式来提高数据库的性能,但是上述操作仅在事务提交时发生。

当数据库关闭时,在FTS Index Cache中的数据库会同步到磁盘上的Auxiliary Table中。如果当数据库发生宕机时,一些FTS Index Cache中的数据可能未同步到磁盘上,那么下次重启数据库时,当用户对表进行全文检索(查询、插入)时,InnoDB存储引擎会自动读取未完成的文档,然后进行分词操作,再将分词结果放到FTS Index Cache

为了支持全文检索,必须有一个列与word进行映射。在InnoDB中这个列被命名成FTS_DOC_ID,其类型为BIGINT UNSIGNED NOT NULL,并且InnoDB存储引擎自动会在该列加上一个名为FTS_DOC_ID_INDEX的Unique Index.这些操作由存储引擎自己完成,用户也可以在建表时自动添加FTS_DOC_ID,以及对应的Unique Index。由于列名FTS_DOC_ID聚友特殊意义,因此在创建时必须注意相应的类型,否则会报错

技术分享

可以看到,由于用户手动定义FTS_DOC_ID为INT,而非BIGINT因此在创建时候会抛出异常,应该将此处修改成对应的BIGINT即可

文档中的分词的插入操作是在事务提交时完成,但是对于删除操作,其在事务提交时,不删除磁盘Auxiliary Table的记录,而只是删除FTS Cache Index记录,对于Auxiliary Table中被删除的记录,存储引擎会记录其FTS DOCUMENT ID ,并将其保存在DELETE auxiliary table中,在设置参数innodb_ft_aux_table后,用户可以访问information_schema架构下的表INNODB_FT_DELETED来观察删除的FTS Document ID

由于文档的DML操作实际并不删除索引中的数据,相反还会在对应的DELETED表中插入记录,因此随着应用程序的允许,索引会变得越来越大,即使索引中的有些数据已经被删除,查询也不会选择这类记录,为此,InnoDB提供了一种方式,允许用户手工将已删除的记录从索引中彻底删除,这就是OPTIMIZE TABLE。因为OPTIMIZE TABLE还会进行一些其他的操作。如Cardinality重新统计,若用户希望对倒排索引进行操作,可以通过innodb_optimize_fulltext_only设置

  1. <span style="color: #0000ff;">SET</span> GLOBAL innodb_optimize_fulltext_only<span style="color: #808080;">=</span><span style="color: #800000; font-weight: bold;">1</span><span style="color: #000000;">;
  2. OPTIMIZE </span><span style="color: #0000ff;">TABLE</span> fts_a;

若被删除的文档很多,那么OPTIMIZE TABLE操作可能占用非常多的时间,会影响到程序并发性,并极大的降低用户的响应时间,用户可以通过参数innodb_ft_num_word_optimize来限制每次实际删除的分词数量,默认为2000

  1. <span style="color: #0000ff;">CREATE</span> <span style="color: #0000ff;">TABLE</span><span style="color: #000000;"> fts_a(
  2. FTS_DOC_ID </span><span style="color: #0000ff;">BIGINT</span> UNSIGNED AUTO_INCREMENT <span style="color: #808080;">NOT</span> <span style="color: #0000ff;">NULL</span><span style="color: #000000;">,
  3. body </span><span style="color: #0000ff;">TEXT</span><span style="color: #000000;">,
  4. </span><span style="color: #0000ff;">PRIMARY</span> <span style="color: #0000ff;">KEY</span><span style="color: #000000;">(FTS_DOC_ID)
  5. );
  6. </span><span style="color: #0000ff;">INSERT</span> <span style="color: #0000ff;">INTO</span> fts_a <span style="color: #0000ff;">SELECT</span> <span style="color: #0000ff;">NULL</span>,<span style="color: #ff0000;">‘</span><span style="color: #ff0000;">pease porridge in the post</span><span style="color: #ff0000;">‘</span><span style="color: #000000;">;
  7. </span><span style="color: #0000ff;">INSERT</span> <span style="color: #0000ff;">INTO</span> fts_a <span style="color: #0000ff;">SELECT</span> <span style="color: #0000ff;">NULL</span>,<span style="color: #ff0000;">‘</span><span style="color: #ff0000;">pease porridge hot,pease porridge cold</span><span style="color: #ff0000;">‘</span><span style="color: #000000;">;
  8. </span><span style="color: #0000ff;">INSERT</span> <span style="color: #0000ff;">INTO</span> fts_a <span style="color: #0000ff;">SELECT</span> <span style="color: #0000ff;">NULL</span>,<span style="color: #ff0000;">‘</span><span style="color: #ff0000;">Nine days old</span><span style="color: #ff0000;">‘</span><span style="color: #000000;">;
  9. </span><span style="color: #0000ff;">INSERT</span> <span style="color: #0000ff;">INTO</span> fts_a <span style="color: #0000ff;">SELECT</span> <span style="color: #0000ff;">NULL</span>,<span style="color: #ff0000;">‘</span><span style="color: #ff0000;">Some like it hot,some like it cold</span><span style="color: #ff0000;">‘</span><span style="color: #000000;">;
  10. </span><span style="color: #0000ff;">INSERT</span> <span style="color: #0000ff;">INTO</span> fts_a <span style="color: #0000ff;">SELECT</span> <span style="color: #0000ff;">NULL</span>,<span style="color: #ff0000;">‘</span><span style="color: #ff0000;">Some like it the pot</span><span style="color: #ff0000;">‘</span><span style="color: #000000;">;
  11. </span><span style="color: #0000ff;">INSERT</span> <span style="color: #0000ff;">INTO</span> fts_a <span style="color: #0000ff;">SELECT</span> <span style="color: #0000ff;">NULL</span>,<span style="color: #ff0000;">‘</span><span style="color: #ff0000;">Nine days old</span><span style="color: #ff0000;">‘</span><span style="color: #000000;">;
  12. </span><span style="color: #0000ff;">INSERT</span> <span style="color: #0000ff;">INTO</span> fts_a <span style="color: #0000ff;">SELECT</span> <span style="color: #0000ff;">NULL</span>,<span style="color: #ff0000;">‘</span><span style="color: #ff0000;">I like code days</span><span style="color: #ff0000;">‘</span><span style="color: #000000;">;
  13. </span><span style="color: #0000ff;">CREATE</span> FULLTEXT <span style="color: #0000ff;">INDEX</span> idx_fts <span style="color: #0000ff;">ON</span> fts_a(body);

查看数据

  1. mysql<span style="color: #808080;">></span> <span style="color: #0000ff;">select</span> <span style="color: #808080;">*</span> <span style="color: #0000ff;">from</span><span style="color: #000000;"> fts_a;
  2. </span><span style="color: #808080;">+</span><span style="color: #008080;">--</span><span style="color: #008080;">----------+----------------------------------------+</span>
  3. <span style="color: #808080;">|</span> FTS_DOC_ID <span style="color: #808080;">|</span> body <span style="color: #808080;">|</span>
  4. <span style="color: #808080;">+</span><span style="color: #008080;">--</span><span style="color: #008080;">----------+----------------------------------------+</span>
  5. <span style="color: #808080;">|</span> <span style="color: #800000; font-weight: bold;">1</span> <span style="color: #808080;">|</span> pease porridge <span style="color: #808080;">in</span> the post <span style="color: #808080;">|</span>
  6. <span style="color: #808080;">|</span> <span style="color: #800000; font-weight: bold;">2</span> <span style="color: #808080;">|</span> pease porridge hot,pease porridge cold <span style="color: #808080;">|</span>
  7. <span style="color: #808080;">|</span> <span style="color: #800000; font-weight: bold;">3</span> <span style="color: #808080;">|</span> Nine days old <span style="color: #808080;">|</span>
  8. <span style="color: #808080;">|</span> <span style="color: #800000; font-weight: bold;">4</span> <span style="color: #808080;">|</span> <span style="color: #808080;">Some</span> <span style="color: #808080;">like</span> it hot,<span style="color: #808080;">some</span> <span style="color: #808080;">like</span> it cold <span style="color: #808080;">|</span>
  9. <span style="color: #808080;">|</span> <span style="color: #800000; font-weight: bold;">5</span> <span style="color: #808080;">|</span> <span style="color: #808080;">Some</span> <span style="color: #808080;">like</span> it the pot <span style="color: #808080;">|</span>
  10. <span style="color: #808080;">|</span> <span style="color: #800000; font-weight: bold;">6</span> <span style="color: #808080;">|</span> Nine days old <span style="color: #808080;">|</span>
  11. <span style="color: #808080;">|</span> <span style="color: #800000; font-weight: bold;">7</span> <span style="color: #808080;">|</span> I <span style="color: #808080;">like</span> code days <span style="color: #808080;">|</span>
  12. <span style="color: #808080;">+</span><span style="color: #008080;">--</span><span style="color: #008080;">----------+----------------------------------------+</span>
  13. <span style="color: #800000; font-weight: bold;">7</span> rows <span style="color: #808080;">in</span> <span style="color: #0000ff;">set</span> (<span style="color: #800000; font-weight: bold;">0.00</span> sec)
  1. mysql<span style="color: #808080;">></span> <span style="color: #0000ff;">set</span> global innodb_ft_aux_table<span style="color: #808080;">=</span><span style="color: #ff0000;">‘</span><span style="color: #ff0000;">iot2/fts_a</span><span style="color: #ff0000;">‘</span><span style="color: #000000;">;
  2. Query OK, </span><span style="color: #800000; font-weight: bold;">0</span> rows affected (<span style="color: #800000; font-weight: bold;">0.00</span> sec)

mysql> SELECT * FROM information_schema.`INNODB_FT_INDEX_TABLE`;
+----------+--------------+-------------+-----------+--------+----------+
| WORD | FIRST_DOC_ID | LAST_DOC_ID | DOC_COUNT | DOC_ID | POSITION |
+----------+--------------+-------------+-----------+--------+----------+
| code | 7 | 7 | 1 | 7 | 7 |
| cold | 2 | 4 | 2 | 2 | 34 |
| cold | 2 | 4 | 2 | 4 | 30 |
| days | 3 | 7 | 3 | 3 | 5 |
| days | 3 | 7 | 3 | 6 | 5 |
| days | 3 | 7 | 3 | 7 | 12 |
| hot | 2 | 4 | 2 | 2 | 15 |
| hot | 2 | 4 | 2 | 4 | 13 |
| like | 4 | 7 | 3 | 4 | 5 |
| like | 4 | 7 | 3 | 4 | 17 |
| like | 4 | 7 | 3 | 5 | 5 |
| like | 4 | 7 | 3 | 7 | 2 |
| nine | 3 | 6 | 2 | 3 | 0 |
| nine | 3 | 6 | 2 | 6 | 0 |
| old | 3 | 6 | 2 | 3 | 10 |
| old | 3 | 6 | 2 | 6 | 10 |
| pease | 1 | 2 | 2 | 1 | 0 |
| pease | 1 | 2 | 2 | 2 | 0 |
| pease | 1 | 2 | 2 | 2 | 19 |
| porridge | 1 | 2 | 2 | 1 | 6 |
| porridge | 1 | 2 | 2 | 2 | 6 |
| porridge | 1 | 2 | 2 | 2 | 19 |
| post | 1 | 1 | 1 | 1 | 22 |
| pot | 5 | 5 | 1 | 5 | 17 |
| some | 4 | 5 | 2 | 4 | 0 |
| some | 4 | 5 | 2 | 4 | 17 |
| some | 4 | 5 | 2 | 5 | 0 |
+----------+--------------+-------------+-----------+--------+----------+
27 rows in set (0.00 sec)

可以看到每个word对应一个DOC_ID和POSITION。此外,还记录了FIRST_DOC_ID、LAST_DOC_ID、DOC_COUNT分别代表该word第一次出现文档的ID,最后一次出现的文档ID,以及该word在多少个文档中存在。

若此时执行下面的SQL语句,会删除FTS_DOC_ID为7的文档

  1. <span style="color: #0000ff;">DELETE</span> <span style="color: #0000ff;">FROM</span> fts_a <span style="color: #0000ff;">WHERE</span> FTS_DOC_ID<span style="color: #808080;">=</span><span style="color: #800000; font-weight: bold;">7</span>;

InnoDB存储引擎并不会直接删除索引中对应的记录,而是将删除的文档ID插入到DELETED表

  1. <span style="color: #0000ff;">SELECT</span> <span style="color: #808080;">*</span> <span style="color: #0000ff;">FROM</span> information_schema.`INNODB_FT_DELETED`;

技术分享

如果用户想要彻底删除倒排索引中该文档的分词信息,可以

  1. mysql<span style="color: #808080;">></span> <span style="color: #0000ff;">SET</span> GLOBAL innodb_optimize_fulltext_only<span style="color: #808080;">=</span><span style="color: #800000; font-weight: bold;">1</span><span style="color: #000000;">;
  2. Query OK, </span><span style="color: #800000; font-weight: bold;">0</span> rows affected (<span style="color: #800000; font-weight: bold;">0.00</span><span style="color: #000000;"> sec)
  3. mysql</span><span style="color: #808080;">></span> OPTIMIZE <span style="color: #0000ff;">TABLE</span><span style="color: #000000;"> fts_a;
  4. </span><span style="color: #808080;">+</span><span style="color: #008080;">--</span><span style="color: #008080;">----------+----------+----------+----------+</span>
  5. <span style="color: #808080;">|</span> <span style="color: #0000ff;">Table</span> <span style="color: #808080;">|</span> Op <span style="color: #808080;">|</span> Msg_type <span style="color: #808080;">|</span> Msg_text <span style="color: #808080;">|</span>
  6. <span style="color: #808080;">+</span><span style="color: #008080;">--</span><span style="color: #008080;">----------+----------+----------+----------+</span>
  7. <span style="color: #808080;">|</span> iot2.fts_a <span style="color: #808080;">|</span> optimize <span style="color: #808080;">|</span> status <span style="color: #808080;">|</span> OK <span style="color: #808080;">|</span>
  8. <span style="color: #808080;">+</span><span style="color: #008080;">--</span><span style="color: #008080;">----------+----------+----------+----------+</span>
  9. <span style="color: #800000; font-weight: bold;">1</span> row <span style="color: #808080;">in</span> <span style="color: #0000ff;">set</span> (<span style="color: #800000; font-weight: bold;">0.08</span><span style="color: #000000;"> sec)
  10. mysql</span><span style="color: #808080;">></span> <span style="color: #0000ff;">SELECT</span> <span style="color: #808080;">*</span> <span style="color: #0000ff;">FROM</span><span style="color: #000000;"> information_schema.`INNODB_FT_DELETED`;
  11. </span><span style="color: #808080;">+</span><span style="color: #008080;">--</span><span style="color: #008080;">------+</span>
  12. <span style="color: #808080;">|</span> DOC_ID <span style="color: #808080;">|</span>
  13. <span style="color: #808080;">+</span><span style="color: #008080;">--</span><span style="color: #008080;">------+</span>
  14. <span style="color: #808080;">|</span> <span style="color: #800000; font-weight: bold;">7</span> <span style="color: #808080;">|</span>
  15. <span style="color: #808080;">+</span><span style="color: #008080;">--</span><span style="color: #008080;">------+</span>
  16. <span style="color: #800000; font-weight: bold;">1</span> row <span style="color: #808080;">in</span> <span style="color: #0000ff;">set</span> (<span style="color: #800000; font-weight: bold;">0.00</span><span style="color: #000000;"> sec)
  17. mysql</span><span style="color: #808080;">></span> <span style="color: #0000ff;">SELECT</span> <span style="color: #808080;">*</span> <span style="color: #0000ff;">FROM</span><span style="color: #000000;"> information_schema.`INNODB_FT_BEING_DELETED`;
  18. </span><span style="color: #808080;">+</span><span style="color: #008080;">--</span><span style="color: #008080;">------+</span>
  19. <span style="color: #808080;">|</span> DOC_ID <span style="color: #808080;">|</span>
  20. <span style="color: #808080;">+</span><span style="color: #008080;">--</span><span style="color: #008080;">------+</span>
  21. <span style="color: #808080;">|</span> <span style="color: #800000; font-weight: bold;">7</span> <span style="color: #808080;">|</span>
  22. <span style="color: #808080;">+</span><span style="color: #008080;">--</span><span style="color: #008080;">------+</span>
  23. <span style="color: #800000; font-weight: bold;">1</span> row <span style="color: #808080;">in</span> <span style="color: #0000ff;">set</span> (<span style="color: #800000; font-weight: bold;">0.00</span> sec)

运行OPTIMIZE TABLE 可以将记录彻底删除,并且彻底删除的文档ID会记录到INNODB_FT_BEGIN_DELETED中。此外,由于7这个文档一倍删除,因此不允许在插入这个文档ID,否则会抛出异常

  1. mysql<span style="color: #808080;">></span> <span style="color: #0000ff;">INSERT</span> <span style="color: #0000ff;">INTO</span> fts_a <span style="color: #0000ff;">SELECT</span> <span style="color: #800000; font-weight: bold;">7</span>,<span style="color: #ff0000;">‘</span><span style="color: #ff0000;">I like this days</span><span style="color: #ff0000;">‘</span><span style="color: #000000;">;
  2. ERROR </span><span style="color: #800000; font-weight: bold;">182</span> (HY000): Invalid InnoDB FTS Doc ID

stopword列表(stopword list)是本节最后阐述的一个概念,其表示该列表中的word不需要对其进行索引分词操作。例如,对于the这个单词,由于其不具有具体的意义,因此将其视为stopword,InnoDB存储引擎有一张默认的stopword列表,在information_schema架构下,表名为INNODB_FT_DEFAULT_STOPWORD,默认为36个stopword可以通过参数innodb_ft_server_stopword_table来定义stopword列表,如

  1. mysql<span style="color: #808080;">></span> <span style="color: #0000ff;">CREATE</span> <span style="color: #0000ff;">TABLE</span><span style="color: #000000;"> innodb_ft_bug (
  2. </span><span style="color: #808080;">-></span> value <span style="color: #0000ff;">VARCHAR</span>(<span style="color: #800000; font-weight: bold;">18</span>) <span style="color: #808080;">NOT</span> <span style="color: #0000ff;">NULL</span> <span style="color: #0000ff;">DEFAULT</span> <span style="color: #ff0000;">‘‘</span>
  3. <span style="color: #808080;">-></span> ) ENGINE<span style="color: #808080;">=</span>INNODB <span style="color: #0000ff;">DEFAULT</span> CHARSET<span style="color: #808080;">=</span><span style="color: #000000;">utf8; #此处必须为utf8不然会碰到bug
  4. Query OK, </span><span style="color: #800000; font-weight: bold;">0</span> rows affected (<span style="color: #800000; font-weight: bold;">0.07</span><span style="color: #000000;"> sec)
  5. mysql</span><span style="color: #808080;">></span> <span style="color: #0000ff;">SET</span> GLOBAL innodb_ft_server_stopword_table<span style="color: #808080;">=</span><span style="color: #ff0000;">‘</span><span style="color: #ff0000;">iot2/innodb_ft_bug</span><span style="color: #ff0000;">‘</span><span style="color: #000000;">;
  6. Query OK, </span><span style="color: #800000; font-weight: bold;">0</span> rows affected (<span style="color: #800000; font-weight: bold;">0.00</span> sec)

遇到bug的情形

  1. mysql<span style="color: #808080;">></span> <span style="color: #0000ff;">CREATE</span> <span style="color: #0000ff;">TABLE</span> user_stopword(VALUE <span style="color: #0000ff;">VARCHAR</span>(<span style="color: #800000; font-weight: bold;">30</span>))ENGINE<span style="color: #808080;">=</span><span style="color: #000000;">INNODB;
  2. Query OK, </span><span style="color: #800000; font-weight: bold;">0</span> rows affected (<span style="color: #800000; font-weight: bold;">0.03</span><span style="color: #000000;"> sec)
  3. mysql</span><span style="color: #808080;">></span> <span style="color: #0000ff;">SET</span> GLOBAL innodb_ft_server_stopword_table<span style="color: #808080;">=</span><span style="color: #ff0000;">‘</span><span style="color: #ff0000;">iot2/user_stopword</span><span style="color: #ff0000;">‘</span><span style="color: #000000;">;
  4. ERROR </span><span style="color: #800000; font-weight: bold;">1231</span> (<span style="color: #800000; font-weight: bold;">42000</span>): Variable <span style="color: #ff0000;">‘</span><span style="color: #ff0000;">innodb_ft_server_stopword_table</span><span style="color: #ff0000;">‘</span> can<span style="color: #ff0000;">‘</span><span style="color: #ff0000;">t be set to the value of </span><span style="color: #ff0000;">‘</span>iot2<span style="color: #808080;">/</span>user_stopword<span style="color: #ff0000;">‘<br><br>观察错误日志提示<br></span>

InnoDB: invalid column name for stopword table iot2/user_stopword. Its first column must be named as ‘value‘.

  1. <span style="color: #ff0000;"> </span>

 

 

使用全文检索还有以下限制

每张表只能有一个全文检索的索引

由多列组合而成的全文检索的索引必须使用相同的字符集与排序规则

不支持没有单词界定符delimiter的语言,如中文 日文汉语等

 

 

 

 

MySQL中InnoDB全文检索

标签:

人气教程排行