您的位置：首页 > 其它

Elasticsearch初体验-创建Index，Document以及常见的ES查询

2020-08-21 08:54 573 查看

What is Elasticsearch?

You know, for search (and analysis)

Elasticsearch is the distributed search and analytics engine at the heart of the Elastic Stack.

Elasticsearch provides near real-time search and analytics for all types of data.

从官网介绍可以看出几个关键的字眼，Elasticsearch是分布式的搜索、存储和数据分析引擎。Elasticsearch为所有类型的数据提供近乎实时的搜索和分析。

它很强很好用。后面我会总结一些Elasticsearch的相关文章，本文只体验体验它的搜索功能。

准备工作

安装、启动Elasticsearch，kibana

版本：7.9

下载Elasticsearch和kibana，解压之后进入各自文件夹下的bin目录下，双击.bat文件启动即可，此处略过。

本文主要目的是看elasticsearch的查询功能有多爽，原理性问题后面再说。

验证Elasticsearch启动

Kibana启动成功：

另外，建议安装一个

elasticsearch-head

，它能帮助我们很直观的查看ES节点状态。

elasticsearch-head提供可视化的操作页面，对ElasticSearch搜索引擎进行各种设置和数据检索功能。

可以很直观的查看集群的健康状况，索引分配情况，还可以管理索引和集群以及提供方便快捷的搜索功能等等。

安装、启动

elasticsearch-head

：

1. 安装node，略
2. 解压glasticsearch-head-master，并进入该文件夹，修改Gruntfile.js，在connect.server.options添加hostname: '*'：
connect: {
server: {
options: {
hostname: '*',
port: 9100,
base: '.',
keepalive: true
}
}
}
3. npm安装：D:\Java\elasticsearch-head-master> npm install
4. 启动：D:\Java\elasticsearch-head-master> npm run start

D:\Java\elasticsearch-head-master> npm run start
> elasticsearch-head@0.0.0 start D:\Java\elasticsearch-head-master
> grunt server

Running "connect:server" (connect) task
Waiting forever...
Started connect web server on http://localhost:9100

ES创建索引

启动Kibana后，直接在Kibana Dev Tools控制台执行。

建立索引

# 建立city索引
PUT /city

# 创建的每个索引都可以具有与之关联的特定设置，这些设置在主体中定义
# number_of_shards的默认值为1
# number_of_replicas的默认值为1（即每个主分片一个副本）
PUT /test
{
"settings": {
"index": {
"number_of_shards": 3,
"number_of_replicas": 2
}
}
}

我们可以看到，建立的

city

索引，有1个分片，1个副本，这也是默认值；而

test

索引，通过

settings

配置了分片数为3，副本数为2。由图中也可以看出

test

有3个分片，每个分片有2个副本。

为什么副本都是unassigned的呢？这是因为ES不允许Primary和它的Replica放在同一个节点中，并且同一个节点不接受完全相同的两个Replica，而我本地只启动了一个ES节点。

删除索引

上图中，test和city中间有一个

ilm-history-2-000001

，我也不知道它是啥，要不把它删掉吧？

# 删除索引
DELETE /ilm-history-2-000001

# 执行结果
{
"acknowledged" : true
}

可以通过查询索引验证一下：

# 查询索引
GET /ilm-history-2-000001

{
"error" : {
"root_cause" : [
{
"type" : "index_not_found_exception",
"reason" : "no such index [ilm-history-2-000001]",
"resource.type" : "index_or_alias",
"resource.id" : "ilm-history-2-000001",
"index_uuid" : "_na_",
"index" : "ilm-history-2-000001"
}
],
"type" : "index_not_found_exception",
"reason" : "no such index [ilm-history-2-000001]",
"resource.type" : "index_or_alias",
"resource.id" : "ilm-history-2-000001",
"index_uuid" : "_na_",
"index" : "ilm-history-2-000001"
},
"status" : 404
}

也可以通过

elasticsearch-head

查看：

没有

ilm-history-2-000001

索引了。

插入数据

这个语法很简单，可参考如下：

POST /city/_doc/1
{
"city" : "En shi",
"province" :  "Hubei province",
"acreage" :  24111
}

POST /city/_doc/2
{
"city" : "E zhou",
"province" :  "Hu bei province",
"acreage" :  1594
}

POST /city/_doc/3
{
"city" : "Zheng zhou, China",
"province" :  "Henan province, capital",
"acreage" :  7446
}

POST /city/_doc/4
{
"city" : "Bei jing",
"province" :  "Beijing China, capital",
"acreage" :  16410
}

POST /city/_doc/5
{
"city" : "Nan jing",
"province" :  "Jiangsu province, capital",
"acreage" :  6587
}

POST /city/_doc/6
{
"city" : "Shen zhen",
"province" :  "Guangdong province",
"acreage" :  1997
}

POST /city/_doc/7
{
"city" : "Guang zhou",
"province" :  "Guangdong province, capital",
"acreage" :  7434
}

以上面的数据为基础，开始ES常用查询。

ES查询

Elasticsearch查询有很多，下面针对常用查询做一个总结。

Query_string

查询所有

GET /索引/_search

GET /city/_search

查询出所有的7条记录，并且

relation

类型为

eq

（equal），

max_score

为1.0（相关度分数）

带参数的查询

GET /索引/_search?q=xx:xx

GET /city/_search?q=city: Bei Jing

"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 2.8371272,
"hits" : [
{
"_index" : "city",
"_type" : "1",
"_id" : "4",
"_score" : 2.8371272,
"_source" : {
"city" : "Bei jing",
"province" : "Beijing China, capital",
"acreage" : 16410
}
},
{
"_index" : "city",
"_type" : "1",
"_id" : "5",
"_score" : 1.1631508,
"_source" : {
"city" : "Nan jing",
"province" : "Jiangsu province, capital",
"acreage" : 6587
}
}
]
}

这个查询把

city

中带

jing

的都查出来了，但相关度分数不一样。

分页，排序查询

GET /索引/_search?from=x&size=x&sort=xx:[asc|desc]

GET /city/_search?from=0&size=3&sort=acreage:asc

"hits" : {
"total" : {
"value" : 7,
"relation" : "eq"
},
"max_score" : null,
"hits" : [
{
"_index" : "city",
"_type" : "1",
"_id" : "2",
"_score" : null,
"_source" : {
"city" : "E zhou",
"province" : "Hu bei province",
"acreage" : 1594
},
"sort" : [
1594
]
},
{
"_index" : "city",
"_type" : "1",
"_id" : "6",
"_score" : null,
"_source" : {
"city" : "Shen zhen",
"province" : "Guangdong province",
"acreage" : 1997
},
"sort" : [
1997
]
},
{
"_index" : "city",
"_type" : "1",
"_id" : "5",
"_score" : null,
"_source" : {
"city" : "Nan jing",
"province" : "Jiangsu province, capital",
"acreage" : 6587
},
"sort" : [
6587
]
}
]
}

查出来按面积

acreage

排序的前3条记录，总记录是7。还可以看出，相关度分数为

null

。

Query DSL

```
match_all
```
：匹配所有

GET /city/_search{
"query": {
"match_all": {}
}
}

和

GET /city/_search

不带

{}

查询结果一致。
2.

match

：xx字段包含xx

查询

city

字段中带

zhou

的：

GET /city/_search{
"query": {
"match": {
"city": "zhou"
}
}
}

# 结果截取
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : 0.8266786,
"hits" : [
{
"_index" : "city",
"_type" : "1",
"_id" : "2",
"_score" : 0.8266786,
"_source" : {
"city" : "E zhou",
"province" : "Hu bei province",
"acreage" : 1594
}
},
{
"_index" : "city",
"_type" : "1",
"_id" : "3",
"_score" : 0.8266786,
"_source" : {
"city" : "Zheng zhou",
"province" : "Henan province, capital",
"acreage" : 7446
}
},
{
"_index" : "city",
"_type" : "1",
"_id" : "7",
"_score" : 0.8266786,
"_source" : {
"city" : "Guang zhou",
"province" : "Guangdong province, capital",
"acreage" : 7434
}
}
]
}

查出来3条结果，都有相关度分数。

```
sort
```
：排序，正序，倒序

查询

province

中包含

capital

的并且按照面积倒序排序：

GET /city/_search{
"query": {
"match": {
"province": "capital"
}
},
"sort": [
{
"acreage": {
"order": "desc"
}
}
]
}

```
multi_match
```
：根据多个字段查询一个关键词

查询

city

和

province

字段中包含

China

的：

GET /city/_search{
"query": {
"multi_match": {
"query": "China",
"fields": ["city", "province"]
}
}
}

```
_source
```
元数据：可以指定显示的字段

设置查询结果只显示

acreage

字段：

GET /city/_search{
"query": {
"multi_match": {
"query": "China",
"fields": ["city", "province"]
}
},
"_source": ["acreage"]
}

只显示了

acreage

字段。

分页（deep-paging）

按照

acreage

倒序排序，并分页，每页3条记录，查询第一页：

GET /city/_search{
"query": {
"match_all": {}
},
"sort": [
{
"acreage": {
"order": "desc"
}
}
],
"from": 0,
"size": 3
}

全文检索 Full-text queries

query-term & query-match

GET /city/_search{
"query": {
"term": {
"city": {
"value": "Guang zhou"
}
}
}
}

GET /city/_search{
"query": {
"match": {
"city": "Guang zhou"
}
}
}

query-term查询结果：

没有任何记录！！！再来看一下query-match查询结果：

有3条记录！！！

match和term区别

为什么会出现1那样的结果呢？因为query-term查询的term不会分词，会将

Guang zhou

当做一个整体进行操作，而match会进行分词，分成

Guang

和

zhou

，所以查询结果里面city包含

zhou

的都出来了！

全文检索

我们来看下面这个查询：

GET /city/_search{
"query": {
"match": {
"province": "Guangdong province, capital"
}
}
}

查出来7条记录，每条记录都有相关度分数，并且按照相关度分数由高到低排好序了！

验证一下分词：

GET /_analyze
{
"analyzer": "standard",
"text": "Guangdong province, capital"
}

结果：

{
"tokens" : [
{
"token" : "guangdong",
"start_offset" : 0,
"end_offset" : 9,
"type" : "<ALPHANUM>",
"position" : 0
},
{
"token" : "province",
"start_offset" : 10,
"end_offset" : 18,
"type" : "<ALPHANUM>",
"position" : 1
},
{
"token" : "capital",
"start_offset" : 20,
"end_offset" : 27,
"type" : "<ALPHANUM>",
"position" : 2
}
]
}

由此可见，

Guangdong province, capital

被分成了

guangdong

，

province

，

capital

，ES会全文检索这些词，所以查出了所有包含

guangdong

，

province

，

capital

的document。

Phrase search 短语搜索

GET /city/_search{
"query": {
"match_phrase": {
"province": "Guangdong province, capital"
}
}
}

和全文检索相反，“GuangdongGuangdong province, capital”会作为一个短语去检索，应该会只查出一条记录：

看图，结果确实如此。

Query and filter 查询和过滤

bool

可以组合多个查询条件，bool查询也是采用more_matches_is_better的机制，因此满足must和should子句的
文档
（可理解为数据行）将会合并起来计算分值（相关度）。

must 必须满足

子句（查询）必须出现在匹配的文档中，并将有助于得分。

filte 过滤器不计算相关度分数，cache

子句（查询）必须出现在匹配的文档中。但是不像must，查询的相关度分数将被忽略。

Filter子句在filter上下文中执行，这意味着相关度得分被忽略，并且子句被考虑用于缓存。查询性能很高。

should 可能满足（SQL中的or）

子句（查询）应出现在匹配的文档中。也可以不在文档中。

must_not：必须不满足不计算相关度分数

子句（查询）不得出现在匹配的文档中。子句在过滤器上下文中执行，这意味着相关度得分被忽略，并且子句被视为用于缓存。由于忽略得分，得分将会返回数字0。

minimum_should_match

案例

在

city

这个index增加几条记录：

POST /city/_doc/8
{
"city" : "Fo shan",
"province" :  "Guangdong province",
"acreage" :  3797
}
POST /city/_doc/9
{
"city" : "Dong guan",
"province" :  "Guangdong province",
"acreage" :  2460
}
POST /city/_doc/10
{
"city" : "Hui zhou",
"province" :  "Guangdong province",
"acreage" :  11347
}
POST /city/_doc/11
{
"city" : "Mei zhou",
"province" :  "Guangdong province",
"acreage" :  15864
}

查询province包含
```
Guangdong province
```
，面积大于10000，并且city字段包含
```
hui
```
的document：

GET /city/_search{
"query": {
"bool": {
"must": [
{
"match": {
"city": "hui"
}
}
],
"filter": [
{
"match_phrase": {
"province": "Guangdong province"
}
},
{
"range": {
"acreage": {
"gt": 10000
}
}
}
]
}
}
}

这个执行过程是：

先对province中包含
```
Guangdong province
```
并且面积大于10000的进行筛选，符合条件的有2条记录：

再对city中包含
```
hui
```
的进行过滤，最终结果为：

bool多条件

查询city包含zhou不包含hui，province里包不包含guangdong都可以，面积要小于6000：

GET /city/_search{
"query": {
"bool": {
"must": [
{"match": {
"city": "zhou"
}}
],
"must_not": [
{"match": {
"city": "hui"
}}
],
"should": [
{"match": {
"province": "guangdong"
}}
],
"filter": [
{"range": {
"acreage": {
"lte": 6000
}
}}
]
}
}
}

嵌套查询

minimum_should_match

：参数指定should返回的文档必须匹配的子句的数量或百分比。

如果bool查询包含至少一个should子句，而没有must或 filter子句，则默认值为1。否则，默认值为0。

例如：

GET /city/_search{
"query": {
"bool": {
"must": [
{"match": {
"city": "zhou"
}}
],
"should": [
{"range": {
"acreage": {
"gt": 1500
}
}},
{
"range": {
"acreage": {
"gt": 5000
}
}
}
],
"minimum_should_match": 1
}
}
}

minimum_should_match

的意思是在

should

的子句中必须至少满足一个条件。

本例查询结果为：

"hits" : {
"total" : {
"value" : 5,
"relation" : "eq"
},
"max_score" : 2.7046783,
"hits" : [
{
"_index" : "city",
"_type" : "1",
"_id" : "7",
"_score" : 2.7046783,
"_source" : {
"city" : "Guang zhou",
"province" : "Guangdong province, capital",
"acreage" : 7434
}
},
{
"_index" : "city",
"_type" : "1",
"_id" : "10",
"_score" : 2.7046783,
"_source" : {
"city" : "Hui zhou",
"province" : "Guangdong province",
"acreage" : 11347
}
},
{
"_index" : "city",
"_type" : "1",
"_id" : "11",
"_score" : 2.7046783,
"_source" : {
"city" : "Mei zhou",
"province" : "Guangdong province",
"acreage" : 15864
}
},
{
"_index" : "city",
"_type" : "1",
"_id" : "3",
"_score" : 2.5874128,
"_source" : {
"city" : "Zheng zhou, China",
"province" : "Henan province, capital",
"acreage" : 7446
}
},
{
"_index" : "city",
"_type" : "1",
"_id" : "2",
"_score" : 1.7046783,
"_source" : {
"city" : "E zhou",
"province" : "Hu bei province",
"acreage" : 1594
}
}
]
}

查询出来的地市满足acreage大于1500或大于5000。

我的

看完点赞，养成习惯。举手之劳，赞有余香。

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航