您的位置:首页 > 编程语言 > Java开发

elasticsearch--搜索_Java基础使用

2016-08-11 20:06 246 查看
如转载请申明来源

一、搜索示例

a) 测试数据准备

curl -XPUT localhost:9200/my_index/my_type/_bulk -d '
{ "index": { "_id": 1 }}
{ "title": "The quick brown fox" ,  "age":"18"}
{ "index": { "_id": 2 }}
{ "title": "The quick brown fox jumps over the lazy dog" , "age":"20" }
{ "index": { "_id": 3 }}
{ "title": "The quick brown fox jumps over the quick dog" , "age":"19" }
{ "index": { "_id": 4 }}
{ "title": "Brown fox brown dog" , "age":"18" }
'


b) 查询参数说明

请求示例, 查询index名为my_index、type名为my_type下所有的数据

from、size: 用于分页,从第0条开始,取10条数据

sort: 排序的条件

aggs: 聚合分析的条件,与aggregations等价

bool: 用于组合多个查询条件,后面的内容会讲解

4000
[code]curl -XPOST localhost:9200/my_index/my_type/_search?pretty=true -d '
{
"query": {
"bool": {
"must": [
{
"match_all": { }
}
],
"must_not": [ ],
"should": [ ]
}
},
"from": 0,
"size": 10,
"sort": [ ],
"aggs": { }
}
'


返回结果:

took: 本次请求处理耗费的时间(单位:ms)

time_out: 请求处理是否超时。tip:如果查询超时,将返回已获取的结果,而不是终止查询

_shards:本次请求涉及的分片信息,共5个分片处理,成功5个,失败0个

hits:查询结果信息

hits.total: 满足查询条件总的记录数

hits.max_score: 最大评分(相关性),因为本次没有查询条件,所以没有相关性评分,每条记录的评分均为1分(_score=1)

hits.hits: 本次查询返回的结果, 即从from到min(from+size,hits.total)的结果集

hits.hits._score: 本条记录的相关度评分,因为本次没有查询条件,所以没有相关性评分,每条记录的评分均为1分

hits.hits._source: 每条记录的原数据

{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 4,
"max_score" : 1.0,
"hits" : [ {
"_index" : "my_index",
"_type" : "my_type",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"title" : "The quick brown fox jumps over the lazy dog",
"age" : "20"
}
}, {
"_index" : "my_index",
"_type" : "my_type",
"_id" : "4",
"_score" : 1.0,
"_source" : {
"title" : "Brown fox brown dog",
"age" : "18"
}
}, {
"_index" : "my_index",
"_type" : "my_type",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"title" : "The quick brown fox",
"age" : "18"
}
}, {
"_index" : "my_index",
"_type" : "my_type",
"_id" : "3",
"_score" : 1.0,
"_source" : {
"title" : "The quick brown fox jumps over the quick dog",
"age" : "19"
}
} ]
}
}


c) java查询代码

Client  client = ConnectionUtil.getLocalClient();
SearchRequestBuilder requestBuilder =
client.prepareSearch("my_index").setTypes("my_type")
.setFrom(0).setSize(10);
Log.debug(requestBuilder);

SearchResponse response = requestBuilder.get();
Log.debug(response);


二. 不同搜索/过滤关键字介绍

term, terms, range, exists, missing

match, match_all, multi_match

高亮搜索、scroll、排序

a) term

主要用于精确匹配,如数值、日期、布尔值或未经分析的字符串(not_analyzed)

{ "term": { "age":    26           }}
{ "term": { "date":   "2014-09-01" }}
{ "term": { "public": true         }}
{ "term": { "tag":    "full_text"  }}


Java代码:

QueryBuilder ageBuilder = QueryBuilders.termQuery("age", "10");


b) terms

和term有点类似,可以允许指定多个匹配条件。如果指定了多个条件,文档会去匹配多个条件,多个条件直接用or连接。以下表示查询title中包含内容dog或jumps的记录

{
"terms": {
"title": [ "dog", "jumps" ]
}
}


等效于:

"bool" : {
"should" : [ {
"term" : {
"title" : "dog"
}
}, {
"term" : {
"title" : "jumps"
}
} ]
}


Java代码:

QueryBuilder builder = QueryBuilders.termsQuery("title", "dog", "jumps");
// 与termsQuery等效
builder = QueryBuilders.boolQuery().should(QueryBuilders.termQuery("title", "dog")).should(QueryBuilders.termQuery("title", "jumps"));


c) range

允许我们按照指定范围查找一批数据。数值、字符串、日期等

数值:

{
"range": {
"age": {
"gte":  20,
"lt":   30
}
}
}


日期:

"range" : {
"timestamp" : {
"gt" : "2014-01-01 00:00:00",
"lt" : "2014-01-07 00:00:00"
}
}


当用于日期字段时,range 过滤器支持日期数学操作。例如,我们想找到所有最近一个小时的文档:

"range" : {
"timestamp" : {
"gt" : "now-1h"
}
}


日期计算也能用于实际的日期,而不是仅仅是一个像 now 一样的占位符。只要在日期后加上双竖线 ||,就能使用日期数学表达式了。

"range" : {
"timestamp" : {
"gt" : "2014-01-01 00:00:00",
"lt" : "2014-01-01 00:00:00||+1M" <1>
}
}


<1> 早于 2014 年 1 月 1 号加一个月

范围操作符包含:

gt :: 大于

gte:: 大于等于

lt :: 小于

lte:: 小于等于

Java代码:

QueryBuilders.rangeQuery("age").gte(18).lt(20);


过滤字符串时,字符串访问根据字典或字母顺序来计算。例如,这些值按照字典顺序排序:

5, 50, 6, B, C, a, ab, abb, abc, b

Tip: 使用range过滤/查找时,数字和日期字段的索引方式让他们在计算范围时十分高效。但对于字符串来说却不是这样。为了在字符串上执行范围操作,Elasticsearch 会在这个范围内的每个短语执行 term 操作。这比日期或数字的范围操作慢得多。

+

字符串范围适用于一个基数较小的字段,一个唯一短语个数较少的字段。你的唯一短语数越多,搜索就越慢。

d) exists, missing

exists和missing过滤可以用于查找文档中是否包含指定字段或没有某个字段,类似于SQL语句中的is not null和is null条件

目前es不推荐使用missing过滤, 使用bool.mustNot + exists来替代

{
"exists":   {
"field":    "title"
}
}
{
"missing":   {
"field":    "title"
}
}
"bool" : {
"must_not" : {
"exists" : {
"field" : "title"
}
}
}


Java代码:

// exits
QueryBuilder builder = QueryBuilders.existsQuery("title");
// missing
builder = QueryBuilders.missingQuery("title");
// instead of missing
builder = QueryBuilders.boolQuery().mustNot(QueryBuilders.existsQuery("title"));


e) match, match_all, multi_match

match_all用于查询所有内容,没有指定查询条件

{
"match_all": {}
}


常用与合并过滤或查询结果。

match查询是一个标准查询,全文查询或精确查询都可以用到他

如果你使用 match 查询一个全文本字段,它会在真正查询之前用分析器先分析match一下查询字符。使用match查询字符串时,查询关键字和查询目标均会进行分析(和指定的分词器有关),指定not
12efd
_analyzed除外。

{
"match": {
"tweet": "About Search"
}
}


如果用match下指定了一个确切值,在遇到数字,日期,布尔值或者not_analyzed 的字符串时,它将为你搜索你给定的值:

{ "match": { "age":    26           }}
{ "match": { "date":   "2014-09-01" }}
{ "match": { "public": true         }}
{ "match": { "tag":    "full_text"  }}


match参数type、operator、minimum_should_match寿命

type取值

boolean: 分析后进行查询

phrase: 确切的匹配若干个单词或短语, 如title: “brown dog”, 则查询title中包含brown和dog, 且两个是连接在一起的

phrase_prefix: 和phrase类似,最后一个搜索词(term)会进行前面部分匹配

官网解释:The match_phrase_prefix is the same as match_phrase, except that it allows for prefix matches on the last term in the text

operator取值

and: “brown dog”, 包含brown且包含dog

or: “brown dog”, 包含brown或dog

minimum_should_match:取值为整数或者百分数,用于精度控制。如取4,表示需要匹配4个关键字,50%,需要匹配一半的关键字。设置minimum_should_match时operator将失效

"match" : {
"title" : {
"query" : "BROWN DOG",
"type" : "boolean",
"operator" : "OR",
"minimum_should_match" : "50%"
}
}


multi_match查询允许你做match查询的基础上同时搜索多个字段:

{
"multi_match": {
"query":    "full text search",
"fields":   [ "title", "body" ]
}
}


tip:

1. 查询字符串时,match与term的区别

term查找时内容精确匹配,match则会进行分

析器处理,分析器中的分词器会将搜索关键字分割成单独的词(terms)或者标记(tokens)

eg. 查询title包含Jumps的内容, 用示例数据时,term匹配不到结果,但match会转化成jumps匹配,然后查找到结果。和使用的分析器有关,笔者使用的是自带的标准分析器

http://localhost:9200/my_index/_analyze?pretty=true&field=title&text=Jumps

{
"tokens" : [ {
"token" : "jumps",
"start_offset" : 0,
"end_offset" : 5,
"type" : "<ALPHANUM>",
"position" : 0
} ]
}


Java代码:

QueryBuilder builder = QueryBuilders.matchAllQuery();
builder = QueryBuilders.matchQuery("title", "Jumps");
builder = QueryBuilders.matchQuery("title", "BROWN DOG!").operator(MatchQueryBuilder.Operator.OR).type(MatchQueryBuilder.Type.BOOLEAN);
builder = QueryBuilders.multiMatchQuery("title", "dog", "jump");


f) 高亮搜索

本篇暂不介绍

g) 排序

和数据库中order by类似

"sort": { "date": { "order": "desc" }}


Java代码:

SearchRequestBuilder requestBuilder =
client.prepareSearch("my_index").setTypes("my_type")
.setFrom(0).setSize(10)
.addSort("age", SortOrder.DESC);


h) scroll

scroll 类似于数据库里面的游标,用于缓存大量结果数据

一个search请求只能返回结果的一个单页(10条记录),而scroll API能够用来从一个单一的search请求中检索大量的结果(甚至全部)

,这种行为就像你在一个传统数据库内使用一个游标一样。

scrolling目的不是为了实时的用户请求,而是为了处理大量数据。

官网解释(https://www.elastic.co/guide/en/elasticsearch/reference/2.3/search-request-scroll.html):

While a search request returns a single “page” of results, the scroll API can be used to retrieve large numbers of results (or even all results) from a single search request, in much the same way as you would use a cursor on a traditional database.

Scrolling is not intended for real time user requests, but rather for processing large amounts of data, e.g. in order to reindex the contents of one index into a new index with a different configuration.

通过scroll检索数据时,每次会返回一个scroll_id,检索下一批数据时,这个id必需要传递到scroll API

Client client = ConnectionUtil.getLocalClient();
SearchRequestBuilder requestBuilder = client.prepareSearch("my_index").setTypes("my_type")
.setScroll(new TimeValue(20000))    // 设置scroll有效时间
.setSize(2);
System.out.println(requestBuilder);
SearchResponse scrollResp = requestBuilder.get();
System.out.println("totalHits:" + scrollResp.getHits().getTotalHits());
while (true) {

String scrollId = scrollResp.getScrollId();
System.out.println("scrollId:" + scrollId);
SearchHits searchHits = scrollResp.getHits();

for (SearchHit hit : searchHits.getHits()) {
System.out.println(hit.getId() + "~" + hit.getSourceAsString());
}
System.out.println("=================");

// 3. 通过scrollId获取后续数据
scrollResp = client.prepareSearchScroll(scrollId)
.setScroll(new TimeValue(20000)).execute().actionGet();
if (scrollResp.getHits().getHits().length == 0) {
break;
}
}


三. 组合搜索

bool: 组合查询, 包含must, must not, should

搜索关键字的权重

a) bool

上面介绍查询/过滤关键子时多次提到bool,我们现在介绍bool

bool 可以用来合并多个条件,bool可以嵌套bool,已用于组成复杂的查询条件,它包含以下操作符:

must :: 多个查询条件的完全匹配,相当于 and。

must_not :: 多个查询条件的相反匹配,相当于 not。

should :: 至少有一个查询条件匹配, 相当于 or。

这些参数可以分别继承一个条件或者一个条件的数组:

{
"bool": {
"must":     { "term": { "folder": "inbox" }},
"must_not": { "match": { "tag":    "spam"  }},
"should": [
{ "term": { "starred": true   }},
{ "range": { "date": { "gte": "2014-01-01" }}}
]
}
}


tip: bool下面,must、must_not、should至少需存在一个

Java代码:

// (price = 20 OR productID = "1234") AND (price != 30)
QueryBuilder queryBuilder = QueryBuilders.boolQuery()
.should(QueryBuilders.termQuery("price", "20"))
.should(QueryBuilders.termQuery("productId", "1234"))
.mustNot(QueryBuilders.termQuery("price", "30"));


b) 搜索关键字权重, 提高查询得分

假设我们想搜索包含”full-text search”的文档,但想给包含“Elasticsearch”或者“Lucene”的文档更高的权重。即包含“Elasticsearch”或者“Lucene”的相关性评分比不包含的高,这些文档在结果文档中更靠前。

一个简单的bool查询允许我们写出像下面一样的非常复杂的逻辑:

"bool": {
"must": {
"match": {
"content": { (1)
"query":    "full text search",
"operator": "and"
}
}
},
"should": [ (2)
{ "match": { "content": "Elasticsearch" }},
{ "match": { "content": "Lucene"        }}
]
}


content字段必须包含full,text,search这三个单词。

如果content字段也包含了“Elasticsearch”或者“Lucene”,则文档会有一个更高的得分。

在上例中,如果想给包含”Elasticsearch”一词的文档得分更高于”Lucene”,则可以指定一个boost值控制权重,该值默认为1。一个大于1的boost值可以提高查询子句的相对权重。

"bool": {
"must": {
"match": {  (1)
"content": {
"query":    "full text search",
"operator": "and"
}
}
},
"should": [
{ "match": {
"content": {
"query": "Elasticsearch",
"boost": 3 (2)
}
}},
{ "match": {
"content": {
"query": "Lucene",
"boost": 2 (3)
}
}}
]
}


这些查询子句的boost值为默认值1。

这个子句是最重要的,因为他有最高的boost值。

这个子句比第一个查询子句的要重要,但是没有“Elasticsearch”子句重要。

Java代码:

QueryBuilders.matchQuery("title", "Dog").boost(3);


部分内容摘录于:http://es.xiaoleilu.com/ 第12、13章

附:测试类完整Java代码

package cn.com.axin.elasticsearch.qwzn.share;
import java.net.UnknownHostException;
import org.elasticsearch.action.search.SearchRequestBuilder;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.client.Client;
import org.elasticsearch.common.unit.TimeValue;
import org.elasticsearch.index.query.MatchQueryBuilder;
import org.elasticsearch.index.query.QueryBuilder;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.search.SearchHit;
import org.elasticsearch.search.SearchHits;
import org.elasticsearch.search.sort.SortOrder;
import cn.com.axin.elasticsearch.util.ConnectionUtil;
import cn.com.axin.elasticsearch.util.Log;

/**
* @Title
*
* @author
* @date 2016-8-11
*/
public class Search {

public static void main(String[] args) throws Exception {
// searchAll();
// execQuery(termSearch());
// execQuery(termsSearch());
// execQuery(rangeSearch());
// execQuery(existsSearch());
// execQuery(matchSearch());
execQuery(boolSearch());
// highlightedSearch();
// scorll();
//
}

/**
* @return
*/
private static QueryBuilder boolSearch() {
// age > 30 or last_name is Smith
QueryBuilder queryBuilder = QueryBuilders.boolQuery()
.should(QueryBuilders.rangeQuery("age").gt("30"))
.should(QueryBuilders.matchQuery("last_name", "Smith"));

// 挺高查询权重
// QueryBuilders.matchQuery("title", "Dog").boost(3);
// QueryBuilders.boolQuery().must(null);
// QueryBuilders.boolQuery().mustNot(null);

return queryBuilder;
}
private static void scorll() {

Client client = null;

try {
client = ConnectionUtil.getLocalClient(); // 获取Client连接对象
SearchRequestBuilder requestBuilder = client.prepareSearch("my_index").setTypes("my_type")
// .setQuery(QueryBuilders.termQuery("age", "20"))
.setScroll(new TimeValue(20000)) // 设置scroll有效时间
.setSize(2);
System.out.println(requestBuilder);

SearchResponse scrollResp = requestBuilder.get();
System.out.println("totalHits:" + scrollResp.getHits().getTotalHits());

while (true) {

String scrollId = scrollResp.getScrollId();
System.out.println("scrollId:" + scrollId);
SearchHits searchHits = scrollResp.getHits();

for (SearchHit hit : searchHits.getHits()) {
System.out.println(hit.getId() + "~" + hit.getSourceAsString());
}
System.out.println("=================");

// 3. 通过scrollId获取后续数据
scrollResp = client.prepareSearchScroll(scrollId)
.setScroll(new TimeValue(20000)).execute().actionGet();
if (scrollResp.getHits().getHits().length == 0) {
break;
}
}

} catch (Exception e) {
e.printStackTrace();
} finally {
if (null != client) {
client.close();
}
}
}

/**
* @return
*/
private static void highlightedSearch() {
QueryBuilder builder = QueryBuilders.termsQuery("age", "18");

Client client = null;
try {
client = ConnectionUtil.getLocalClient();
SearchRequestBuilder requestBuilder =
client.prepareSearch("my_index").setTypes("my_type")
.setFrom(0).setSize(10)
.addHighlightedField("age");
// .addSort("age", SortOrder.DESC);
Log.debug(requestBuilder);

SearchResponse response = requestBuilder.get();
Log.debug(response);

} catch (UnknownHostException e) {
e.printStackTrace();
} finally {
if (null != client) {
client.close();
}
}
}
/**
* @return
*/
private static QueryBuilder matchSearch() {

QueryBuilder builder = QueryBuilders.matchAllQuery();

builder = QueryBuilders.matchQuery("title", "Jumps");

/*
type: boolean 分析后进行查询
phrase: 确切的匹配若干个单词或短语,
phrase_prefix: The match_phrase_prefix is the same as match_phrase,
except that it allows for prefix matches on the last term in the text
*/
builder = QueryBuilders.matchQuery("title", "BROWN DOG!").operator(MatchQueryBuilder.Operator.OR).type(MatchQueryBuilder.Type.BOOLEAN);
builder = QueryBuilders.multiMatchQuery("title", "dog", "jump");

return builder;
}
/**
* @return
*/
private static QueryBuilder existsSearch() {

// exits
QueryBuilder builder = QueryBuilders.existsQuery("title");

// missing
builder = QueryBuilders.missingQuery("title");
// instead of missing
builder = QueryBuilders.boolQuery().mustNot(QueryBuilders.existsQuery("title"));

return builder;
}
/**
*
*/
private static QueryBuilder rangeSearch() {

// age >= 18 && age < 20
return QueryBuilders.rangeQuery("age").gte(18).lt(20);
}

private static QueryBuilder termSearch(){
QueryBuilder builder = QueryBuilders.termsQuery("title", "brown");
return builder;
}
private static QueryBuilder termsSearch(){
QueryBuilder builder = QueryBuilders.termsQuery("title", "dog", "jumps"); // 与termsQuery等效 builder = QueryBuilders.boolQuery().should(QueryBuilders.termQuery("title", "dog")).should(QueryBuilders.termQuery("title", "jumps"));
return builder;
}

private static void searchAll() {

Client client = null;
try {
client = ConnectionUtil.getLocalClient();
SearchRequestBuilder requestBuilder =
client.prepareSearch("my_index").setTypes("my_type")
.setFrom(0).setSize(10)
.addSort("age", SortOrder.DESC);
Log.debug(requestBuilder);

SearchResponse response = requestBuilder.get();
Log.debug(response);

} catch (UnknownHostException e) {
e.printStackTrace();
} finally {
if (null != client) {
client.close();
}
}
}

/**
* @param builder
* @throws UnknownHostException
*/
private static void execQuery(QueryBuilder builder)
throws UnknownHostException {
Client client = ConnectionUtil.getLocalClient();

SearchRequestBuilder requestBuilder =
client.prepareSearch("my_index").setTypes("my_type")
.setExplain(true)
.setQuery(builder);
Log.debug(requestBuilder);

SearchResponse response = requestBuilder.get();
Log.debug(response);
}

}


获取连接对象的代码

/**
* 获取本地的连接对象(127.0.0.1:9300)
* @return
* @throws UnknownHostException
*/
public static Client getLocalClient() throws UnknownHostException {
return getClient("127.0.0.1", 9300, "es-stu");
}

/**
* 获取连接对象
* @param host 主机IP
* @param port 端口
* @param clusterName TODO
* @return
* @throws UnknownHostException
*/
private static Client getClient(String host, int port, String clusterName) throws UnknownHostException {
// 参数设置
Builder builder = Settings.settingsBuilder();
// 启用嗅探功能 sniff
builder.put("client.transport.sniff", true);
// 集群名
builder.put("cluster.name", clusterName);

Settings settings = builder.build();

TransportClient transportClient = TransportClient.builder().settings(settings).build();
Client client = transportClient.addTransportAddress(
new InetSocketTransportAddress(InetAddress.getByName(host), port));
// 连接多个地址
// transportClient.addTransportAddresses(transportAddress);

return client;
}
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签:  搜索 java elastic search es