您的位置：首页 > 产品设计 > UI/UE

009-elasticsearch【三】示例数据导入、URI查询方式简介、Query DSL简介、查询简述【_source、match、must、should等】、过滤器、聚合

2018-03-05 17:11 1016 查看

一、简单数据

客户银行账户信息，json

{
"account_number": 0,
"balance": 16623,
"firstname": "Bradshaw",
"lastname": "Mckenzie",
"age": 29,
"gender": "F",
"address": "244 Columbus Place",
"employer": "Euron",
"email": "bradshawmckenzie@euron.com",
"city": "Hobucken",
"state": "CO"
}

批量导入1000条

测试数据地址

curl -H "Content-Type: application/json" -XPOST 'localhost:9200/bank/account/_bulk?pretty&refresh' --data-binary "@accounts.json"
curl 'localhost:9200/_cat/indices?v'

如果windows上需要把单引号改为双引号

二、URI查询方式简介

　　有两种运行搜索的基本方法：一种是通过REST请求URI发送搜索参数，另一种是通过REST请求主体发送搜索参数。

2.1、请求URL方式

GET /bank/_search?q=*&sort=account_number:asc&pretty

说明：q=* 参数指示Elasticsearch匹配索引中的所有文档。

sort = account_number：asc参数指示按升序使用每个文档的account_number字段对结果进行排序。

pretty标识返回漂亮的json格式

{
　　took: 31,
　　timed_out: false,
　　_shards: {
　　　　total: 5,
　　　　successful: 5,
　　　　failed: 0
　　},
　　hits: {
　　　　total: 1000,
　　　　max_score: null,
　　　　hits: [
　　　　　　{
　　　　　　　　_index: "bank",
　　　　　　　　_type: "account",
　　　　　　　　_id: "0",
　　　　　　　　_score: null,
　　　　　　　　_source: {
　　　　　　　　　　account_number: 0,
　　　　　　　　　　balance: 16623,
　　　　　　　　　　firstname: "Bradshaw",
　　　　　　　　　　lastname: "Mckenzie",
　　　　　　　　　　age: 29,
　　　　　　　　　　gender: "F",
　　　　　　　　　　address: "244 Columbus Place",
　　　　　　　　　　employer: "Euron",
　　　　　　　　　　email: "bradshawmckenzie@euron.com",
　　　　　　　　　　city: "Hobucken",
　　　　　　　　　　state: "CO"
　　　　　　　　},
　　　　　　sort: [0]
　　　　}
　　　　//……
　　　　]
　　}
}

响应值说明

took：Elasticsearch执行搜索的时间（以毫秒为单位）

time_out：搜索是否超时

_shards：搜索了多少片，以及搜索片成功/失败的次数

hits：搜索结果

hits.total：符合我们搜索条件的文件总数
hits.hits：实际的搜索结果数组（默认为前10个文档）
hits.sort：对结果进行排序键（按分数排序时丢失）
hits._score and max_score：暂时忽略

2.2、请求体方式

使用工具header时候使用post请求

GET /bank/_search
{
"query": { "match_all": {} },
"sort": [
{ "account_number": "asc" }
]
}

三、Query DSL简介

　　Elasticsearch提供了一种可用于执行查询的JSON式特定于领域的语言。这被称为Query DSL。

注意使用header 工具时应该使用post请求

3.1、查询所有

GET /bank/_search
{
"query": { "match_all": {} }
}

　　match_all部分仅仅是我们想要运行的查询类型。 match_all查询只是搜索指定索引中的所有文档。

3.2、查询数据

GET /bank/_search
{
"query": { "match_all": {} },
"size": 1
}

请注意，如果未指定大小，则默认为10。

3.3、返回分页　　

此示例执行match_all并返回文档11至20：

GET /bank/_search
{
"query": { "match_all": {} },
"from": 10,
"size": 10
}

from参数（从0开始）指定从哪个文档索引开始，size参数指定从from参数开始返回多少个文档。此功能在实现分页搜索结果时非常有用。请注意，如果from未指定，则默认为0。

3.4、降序

此示例执行match_all并按帐户余额按降序对结果进行排序，并返回前10个（默认大小）文档。

GET /bank/_search
{
"query": { "match_all": {} },
"sort": { "balance": { "order": "desc" } }
}

四、查询简述

4.1、返回指定字段

　　请求是增加_source字段，在概念上与SQL SELECT 字段1 FROM字段列表有些相似。

　　返回的文档字段。默认情况下，完整的JSON文档作为所有搜索的一部分返回。这被称为源（搜索匹配中的_source字段）。如果我们不希望整个源文档被返回，我们有能力只需要返回源内的几个字段。

GET /bank/_search
{
"query": { "match_all": {} },
"_source": ["account_number", "balance"]
}

4.2、匹配查询

全匹配　　

"query": { "match_all": {} },

匹配查询，它可以被认为是基本的搜索查询（即针对特定字段或字段集合进行的搜索）。

//匹配account_number=20的数据
GET /bank/_search
{
"query": { "match": { "account_number": 20 } }
}

//匹配 address = mill
GET /bank/_search
{
"query": { "match": { "address": "mill" } }
}

//匹配 address =mill 或 lane
GET /bank/_search
{
"query": { "match": { "address": "mill lane" } }
}

//匹配 address =“mill lane” 全部的
GET /bank/_search
{
"query": { "match_phrase": { "address": "mill lane" } }
}

4,3、bool（布尔）查询

4.3.1、must == and

//匹配address=mill 并且 address =lane的文档 等价于 "query": { "match_phrase": { "address": "mill lane" } }

GET /bank/_search { "query": { "bool": { "must": [ { "match": { "address": "mill" } }, { "match": { "address": "lane" } } ] } } }

bool must子句指定了一个文档被认为是匹配的所有查询。

4.3.2、should==or

//匹配address=mill或者address=lane 等价于 "query": { "match": { "address": "mill lane" } }
GET /bank/_search
{
"query": {
"bool": {
"should": [
{ "match": { "address": "mill" } },
{ "match": { "address": "lane" } }
]
}
}
}

bool should子句指定了一个查询列表，其中任何一个查询都必须是真的才能被认为是匹配的文档。

4.3.3、must_not==not

//地址address！=mill 也 address！=lane
GET /bank/_search
{
"query": {
"bool": {
"must_not": [
{ "match": { "address": "mill" } },
{ "match": { "address": "lane" } }
]
}
}
}

bool must_not子句指定了一个查询列表，其中任何一个查询都不应该被认为是匹配的文档。

4.3.4、组合使用

可以在一个bool查询中同时结合must，should和must_not子句。此外，我们可以在任何这些bool子句中编写布尔查询来模拟任何复杂的多级布尔逻辑。

//返回任何40岁但未居住在ID街道人的所有帐户
GET /bank/_search
{
"query": {
"bool": {
"must": [
{ "match": { "age": "40" } }
],
"must_not": [
{ "match": { "state": "ID" } }
]
}
}
}

五、过滤器简述

　　文档分数（搜索结果中的_score字段）的细节。分数是一个数值，它是文档与我们指定的搜索查询匹配度的相对度量。分数越高，文档越相关，分数越低，文档的相关性越低。

　　但查询并不总是需要生成分数，特别是当它们仅用于“过滤”文档集时。 Elasticsearch检测这些情况并自动优化查询执行，以便不计算无用分数。

　　bool查询还支持筛选子句，它允许使用查询来限制将由其他子句匹配的文档，而不会更改计算分数的方式。范围查询，它允许我们通过一系列值来过滤文档。这通常用于数字或日期过滤。

5.1、rang 过滤

//查找余额大于或等于20000且小于等于30000的帐户。
GET /bank/_search
{
"query": {
"bool": {
"must": { "match_all": {} },
"filter": {
"range": {
"balance": {
"gte": 20000,
"lte": 30000
}
}
}
}
}
}

解析，bool查询包含一个match_all查询（查询部分）和一个范围查询（过滤器部分）。我们可以将任何其他查询替换为查询和过滤器部分。范围查询非常有意义，因为落入该范围的文档全部匹配“平等”，即没有文档比另一个更重要。

六、聚合

　　聚合提供了从数据中分组和提取统计数据的功能。考虑聚合的最简单方法是将其大致等同于SQL GROUP BY和SQL聚合函数。在Elasticsearch中，您可以执行返回匹配的搜索，同时还可以在一个响应中返回与匹配不同的聚合结果。这是非常强大和高效的，因为您可以运行查询和多个聚合，并使用简洁和简化的API避免网络往返，从而一次性获得两种（或两种）操作的结果。

6.1、group by、count

//使用state街道对所有帐户进行分组，然后返回按降序（也是默认值）排序的前10个（默认）状态：
GET /bank/_search
{
"size": 0,
"aggs": {
"group_by_state": {
"terms": {
"field": "state.keyword"
}
}
}
}

相当于数据库

SELECT state, COUNT(*) FROM bank GROUP BY state ORDER BY COUNT(*) DESC

响应结果

GET /bank/_search
{
"size": 0,
"aggs": {
"group_by_age": {
"range": {
"field": "age",
"ranges": [
{
"from": 20,
"to": 30
},
{
"from": 30,
"to": 40
},
{
"from": 40,
"to": 50
}
]
},
"aggs": {
"group_by_gender": {
"terms": {
"field": "gender.keyword"
},
"aggs": {
"average_balance": {
"avg": {
"field": "balance"
}
}
}
}
}
}
}
}

View Code
更多聚合：https://www.elastic.co/guide/en/elasticsearch/reference/5.4/search-aggregations.html

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航