您的位置:首页 > 数据库 > MySQL

MySQL 全文索引基础操作

2015-11-15 10:44 701 查看
当前测试的 mysql 版本: 5.0.95

MYSQL MYISAM引擎支持全文索引;MYSQL 5.6 后,InnoDB 也完全支持全文索引。

全文索引支持的字段类型包括 char ,varchar , text 。

创建测试表:(36行数据)

CREATE TABLE test.ft_tab (
id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY,
charname CHAR(10),
collatename VARCHAR(200),
description TEXT(500),
maxlen int
) ENGINE=InnoDB;

insert into test.ft_tab (charname,collatename,description,maxlen)
select CHARACTER_SET_NAME,DEFAULT_COLLATE_NAME,DESCRIPTION,MAXLEN
from information_schema.character_sets ;


5.0 中 InnoDB 引擎创建失败,改为 MYISAM 再创建:

mysql> CREATE FULLTEXT INDEX idx_ft_tab ON test.ft_tab(description);
ERROR 1214 (HY000): The used table type doesn't support FULLTEXT indexes
mysql> 
mysql> ALTER TABLE test.ft_tab ENGINE = MYISAM;
Query OK, 36 rows affected (0.02 sec)
Records: 36  Duplicates: 0  Warnings: 0

mysql> CREATE FULLTEXT INDEX idx_ft_tab ON test.ft_tab(description);
Query OK, 36 rows affected (0.01 sec)
Records: 36  Duplicates: 0  Warnings: 0


使用全文索引搜索数据:

mysql> explain select * from test.ft_tab where description like '%Windows%';
+----+-------------+--------+------+---------------+------+---------+------+------+-------------+
| id | select_type | table  | type | possible_keys | key  | key_len | ref  | rows | Extra       |
+----+-------------+--------+------+---------------+------+---------+------+------+-------------+
|  1 | SIMPLE      | ft_tab | ALL  | NULL          | NULL | NULL    | NULL |   36 | Using where | 
+----+-------------+--------+------+---------------+------+---------+------+------+-------------+
1 row in set (0.00 sec)

mysql> explain select * from test.ft_tab where match(description) against('Windows');
+----+-------------+--------+----------+---------------+------------+---------+------+------+-------------+
| id | select_type | table  | type     | possible_keys | key        | key_len | ref  | rows | Extra       |
+----+-------------+--------+----------+---------------+------------+---------+------+------+-------------+
|  1 | SIMPLE      | ft_tab | fulltext | idx_ft_tab    | idx_ft_tab | 0       |      |    1 | Using where | 
+----+-------------+--------+----------+---------------+------------+---------+------+------+-------------+
1 row in set (0.00 sec)


使用 IN BOOLEAN MODE 设置特定搜索:

#至少包含一个单词(或)
select * from test.ft_tab where match(description) against('Windows Central' IN BOOLEAN MODE);

#括号,至少包含一个单词(或)
select * from test.ft_tab where match(description) against('(Windows Central)' IN BOOLEAN MODE);
select * from test.ft_tab where match(description) against('+(Windows Central)' IN BOOLEAN MODE);

#括号/大于小于好,至少包含一个单词(或),大于号的排序最高,小于号的最低,同时存在的处中间
select * from test.ft_tab where match(description) against('+(>Windows <Central)' IN BOOLEAN MODE);

select * ,(match(description) against('+(>Windows <Central)' IN BOOLEAN MODE)) AS REF
from test.ft_tab where match(description) against('+(>Windows <Central)' IN BOOLEAN MODE);

#两个(多个)单词同时包含(与)
select * from test.ft_tab where match(description) against('+Windows +Central' IN BOOLEAN MODE);

#包含一个(多个)单词,且不包含另一个(多个)单词
select * from test.ft_tab where match(description) against('+Windows -Central' IN BOOLEAN MODE);

#包含一个(多个)单词,如果包含一个单词(如 Central),排序更高
select * from test.ft_tab where match(description) against('+Windows Central' IN BOOLEAN MODE);

select * ,(match(description) against('+Windows Central' IN BOOLEAN MODE)) AS REF
from test.ft_tab where match(description) against('+Windows Central' IN BOOLEAN MODE);

#双引号,内部作为一个词组查询
select * from test.ft_tab where match(description) against('"Windows Central"' IN BOOLEAN MODE);

#星号,模糊匹配,只在单词尾部
select * from test.ft_tab where match(description) against('Windows*' IN BOOLEAN MODE);
select * from test.ft_tab where match(description) against('Cen*' IN BOOLEAN MODE);

#包含一个(多个)单词,如果包含一个单词(如 ~Central),排序更低
select * from test.ft_tab where match(description) against('+Windows ~Central' IN BOOLEAN MODE);

select * ,(match(description) against('+Windows ~Central' IN BOOLEAN MODE)) AS REF
from test.ft_tab where match(description) against('+Windows ~Central' IN BOOLEAN MODE);


上面的一个查询为例,匹配度越高,匹配的结果值(如下字段 ref )越大。

mysql> select * ,(match(description) against('+(>Windows <Central)' IN BOOLEAN MODE)) AS REF
    -> from test.ft_tab where match(description) against('+(>Windows <Central)' IN BOOLEAN MODE);
+----+----------+---------------------+-----------------------------+--------+-------------------+
| id | charname | collatename         | description                 | maxlen | REF               |
+----+----------+---------------------+-----------------------------+--------+-------------------+
|  7 | latin2   | latin2_general_ci   | ISO 8859-2 Central European |      1 | 0.666666686534882 | 
| 18 | cp1250   | cp1250_general_ci   | Windows Central European    |      1 |  1.16666674613953 | 
| 26 | macce    | macce_general_ci    | Mac Central European        |      1 | 0.666666686534882 | 
| 28 | cp852    | cp852_general_ci    | DOS Central European        |      1 | 0.666666686534882 | 
| 30 | cp1251   | cp1251_general_ci   | Windows Cyrillic            |      1 |               1.5 | 
| 31 | cp1256   | cp1256_general_ci   | Windows Arabic              |      1 |               1.5 | 
| 32 | cp1257   | cp1257_general_ci   | Windows Baltic              |      1 |               1.5 | 
| 35 | cp932    | cp932_japanese_ci   | SJIS for Windows Japanese   |      2 |               1.5 | 
| 36 | eucjpms  | eucjpms_japanese_ci | UJIS for Windows Japanese   |      3 |               1.5 | 
+----+----------+---------------------+-----------------------------+--------+-------------------+


现在查询一个单词,却找不出结果。 词语 'DEC' 不属于过滤的单词,为什么查不出?

mysql> select * from test.ft_tab where match(description) against('DEC');
Empty set (0.00 sec)


查看全文索引变量,就知道了。最小的长度为 4 字符,4 个字符以下的都忽略不当作单词了!

mysql> show variables like '%ft%';
+--------------------------+----------------+
| Variable_name            | Value          |
+--------------------------+----------------+
| ft_boolean_syntax        | + -><()~*:""&| | 
| ft_max_word_len          | 84             | 
| ft_min_word_len          | 4              | 
| ft_query_expansion_limit | 20             | 
| ft_stopword_file         | (built-in)     | 
+--------------------------+----------------+


#变量说明

http://dev.mysql.com/doc/refman/5.0/en/server-system-variables.html

ft_boolean_syntax :IN BOOLEAN MODE 匹配的选项

ft_max_word_len :最长匹配的字符数

ft_min_word_len :最短匹配的字符数

ft_query_expansion_limit :使用WITH QUERY EXPANSION 搜索出的前N(20)个结果再次搜索

ft_stopword_file :停词表文件,用于设置不需要匹配的词表

#默认停止词表

http://dev.mysql.com/doc/refman/5.0/en/fulltext-stopwords.html

设置变量,允许查询3个长度的单词:

[root@localhost]# vi /etc/my.cnf
[mysqld]
ft_min_word_len=3

#重启服务,修复表
[root@localhost]# service mysqld restart

mysql> repair table test.ft_tab;


再查询,结果出来了!

mysql> select * from test.ft_tab where match(description) against('DEC');
+----+----------+-----------------+-------------------+--------+
| id | charname | collatename     | description       | maxlen |
+----+----------+-----------------+-------------------+--------+
|  2 | dec8     | dec8_swedish_ci | DEC West European |      1 | 
+----+----------+-----------------+-------------------+--------+


而词语“for” 却查询不出!被忽略掉了!

mysql> select * from test.ft_tab where match(description) against('for');
Empty set (0.00 sec)


变量 ft_stopword_file 没有指定文件(built-in),使用系统默认设置(参考:Full-Text Stopwords

现在手动设置:

[root@localhost]# vi /etc/my.cnf
[mysqld]
ft_stopword_file=/usr/share/mysql/ft_stopword.txt

[root@localhost]# service mysqld restart

mysql> repair table test.ft_tab;

mysql> show variables like '%ft%';
mysql> show variables like '%ft%';
+--------------------------+----------------------------------+
| Variable_name            | Value                            |
+--------------------------+----------------------------------+
| ft_boolean_syntax        | + -><()~*:""&|                   | 
| ft_max_word_len          | 84                               | 
| ft_min_word_len          | 3                                | 
| ft_query_expansion_limit | 20                               | 
| ft_stopword_file         | /usr/share/mysql/ft_stopword.txt | 
+--------------------------+----------------------------------+


再查询,结果出来了!

mysql> select * from test.ft_tab where match(description) against('for');
+----+----------+---------------------+---------------------------+--------+
| id | charname | collatename         | description               | maxlen |
+----+----------+---------------------+---------------------------+--------+
| 35 | cp932    | cp932_japanese_ci   | SJIS for Windows Japanese |      2 | 
| 36 | eucjpms  | eucjpms_japanese_ci | UJIS for Windows Japanese |      3 | 
+----+----------+---------------------+---------------------------+--------+


现在需要把 “for” 忽略掉,怎么设置呢?如果添加过滤词,编辑文件添加该字符:

[root@localhost]# vi /usr/share/mysql/ft_stopword.txt
'for','in','at'

[root@localhost]# service mysqld restart


再查询,找不到了!已经被忽略!~

mysql> select * from test.ft_tab where match(description) against('for');
Empty set (0.00 sec)


参考: Full-Text Search Functions
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: