您的位置:首页 > 编程语言 > Java开发

java springboot 结合elasticsearch 实现全文检索 的步骤,有坑请绕行

2018-09-06 13:08 579 查看

 

开启springboot项目

首先我这里选择的是jestClient操作elasticsearch

这里还有一种方式是通过

ElasticsearchRepostiry类似jpa的一种工具接口,但会随着ela的版本的修改而变化代码,所以首选jestClient

 

ok!第一步先导入依赖

<dependency>

<groupId>org.springframework.boot</groupId>

<artifactId>spring-boot-starter-data-elasticsearch</artifactId>

<version>1.5.4.RELEASE</version>

</dependency>

<dependency>

<groupId>io.searchbox</groupId>

<artifactId>jest</artifactId>

</dependency>

<dependency>

<groupId>net.java.dev.jna</groupId>

<artifactId>jna</artifactId>

</dependency>

 

这里需要注意 ①

springboot对应的elasticsearch的版本

这里sprigboot是1.5.4,ela依赖也是1.5.4

springboot 和elasticsearch 版本对应参照请看下面

https://www.geek-share.com/detail/2736653120.html

 

第二步在application.properties中配置ela 服务地址连接上地址我们才能去调用服务

[code]#elasticsearch
spring.elasticsearch.jest.uris=@elasticsearch.service@
spring.elasticsearch.jest.read-timeout=60000
spring.elasticsearch.jest.connection-timeout=60000

注:@elasticsearch.service@ 这个是从pom.xml文件中读取出来的

然后咱们需要去连接服务,咱们需要获取jestClient对象去操作查询

 

第三步获取jestClient对象的方式

[code]package com.webi.welive.util;

import io.searchbox.client.JestClient;
import io.searchbox.client.JestClientFactory;
import io.searchbox.client.config.HttpClientConfig;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.stereotype.Service;

/**
* Title: 获取jestClient对象<br>
* Description: JestClientUtil<br>
* Company:韦博英语在线教育部</br>
* CreateDate:2018年06月14日 14:29
*
* @author james.fxy
*/
@Service
public class JestClientUtil {

private static String spring_elasticsearch_jest_uris;
private static Integer spring_elasticsearch_jest_read_timeout;
private static Integer Spring_elasticsearch_jest_connection_timeout;

@Value("${spring.elasticsearch.jest.uris}")
public void setSpring_elasticsearch_jest_uris(String spring_elasticsearch_jest_uris) {
JestClientUtil.spring_elasticsearch_jest_uris = spring_elasticsearch_jest_uris;
}

@Value("${spring.elasticsearch.jest.read-timeout}")
public void setSpring_elasticsearch_jest_read_timeout(Integer spring_elasticsearch_jest_read_timeout) {
JestClientUtil.spring_elasticsearch_jest_read_timeout = spring_elasticsearch_jest_read_timeout;
}

@Value("${spring.elasticsearch.jest.connection-timeout}")
public void setGetSpring_elasticsearch_jest_connection_timeout(Integer Spring_elasticsearch_jest_connection_timeout) {
JestClientUtil.Spring_elasticsearch_jest_connection_timeout = Spring_elasticsearch_jest_connection_timeout;
}

/**
* Title: 获取jestClient<br>
* Description: <br>
* CreateDate: 2018/6/14 16:31<br>
*
* @param
* @return
* @throws Exception
* @category 获取jestClient
* @author james.fxy
*/
public static JestClient getJestClient() {
JestClientFactory factory = new JestClientFactory();
factory.setHttpClientConfig(new HttpClientConfig.Builder(spring_elasticsearch_jest_uris).connTimeout(Spring_elasticsearch_jest_connection_timeout).readTimeout(spring_elasticsearch_jest_read_timeout).multiThreaded(true).build());
return factory.getObject();
}
}

第四步我们使用jestClient操作elasticsearch

① 选择我们需要操作的实体类

[code]package com.webi.welive.lessonhomework.param;

import com.webi.welive.lessonhomework.entity.HomeworkAnswerMedia;
import com.webi.welive.lessonhomework.entity.HomeworkQuestionAnswer;
import com.webi.welive.lessonhomework.entity.HomeworkQuestionMedia;
import lombok.Data;
import org.springframework.data.elasticsearch.annotations.Document;

import java.util.Date;
import java.util.List;

/**
* Title: LessonHomeworkParam<br>
* Description: LessonHomeworkParam<br>
* Company: 韦博英语在线教育部<br>
* CreateDate:2018年6月9日 上午11:39:42
*
* @author james.fxy
*/
@Data
@Document(indexName = "homework", type = "homeworktable")
public class LessonHomeworkParam {

private Integer id;
private String question;
private String explain;
private Boolean isEnabled;
private Boolean isDeleted;
private Integer createUserId;
private Integer sequence;
private Integer questionTypes;
private HomeworkQuestionMediaParam homeworkQuestionsMediaParam;
private List<HomeworkQuestionAnswerParam> homeworkQuestionAnswerParamList;
public LessonHomeworkParam() {
super();
}

}

注:@Document(indexName = "homework", type = "homeworktable") 

    indexName h和type分别对应着你在往elasticsearch导入数据时设置的

我在导入数据时是这样设置的如下:

[code]input {
jdbc {
jdbc_connection_string => "jdbc:sqlserver://10.0.0.130:1433;databaseName=Webi_WeLiveDBTest;"
jdbc_user => "speakhi_user"
jdbc_password => "speakhi_user123"
jdbc_driver_library => "/usr/share/logstash/mssql-jdbc-6.2.1.jre8.jar"
jdbc_driver_class => "com.microsoft.sqlserver.jdbc.SQLServerDriver"
jdbc_paging_enabled => "true"
statement => "SELECT * from Lesson_Homework where CreateTime < GETDATE()"
schedule => "* * * * *"
type => "lesson_homework"
}

output {
stdout {
codec => rubydebug
}
if[type] == "lesson_homework"{
elasticsearch {
hosts  => "10.0.0.13:9200"
index => "homework"
document_type => "homeworktable"
document_id => "%{id}"
}
}

}

 

可以看到我在导入elasticsearch导入的数据配置的indexName和type对应实体类中的document

 

② 使用jestclient操作查询语句

 

字符串拼接以后执行,根据官网的文档来的

https://www.elastic.co/blog/found-java-clients-for-elasticsearch

具体的拼接数据方式可以参考如下

https://segmentfault.com/a/1190000004429689

获取数据的方式解析方式有两种

https://www.geek-share.com/detail/2715614002.html

我这里使用了手动解析的方式,因为发现自动解析不好用

注:因为数据在导入elasticsearch的时候类型会是通过jackJson序列化过去的

我们在解析数据的时候需要保持参数类型是从elasticsearch过来的

[code]/**
* Title: <br>
* Description: 使用全文检索查询课程信息(添加过滤条件根据question查询)<br>
* CreateDate: 2018/6/14 14:47<br>
*
* @param query
* @return com.mingyisoft.javabase.bean.CommonJsonObject<java.lang.Object>
* @throws Exception
* @category @author james.fxy
*/
public CommonJsonObject<Object> findAllHomeworkByElasticsearch(String query) throws Exception {

CommonJsonObject<Object> json = new CommonJsonObject<>();
JestClient jestClient = null;
try {
jestClient = JestClientUtil.getJestClient();
// 判断一下需要查询的query字段第一个字段和最后一个字段是否是双引号,给进行一个去除双引号的处理
if (query.startsWith("\"")) {
query = query.substring(1);
}
if (query.endsWith("\"")) {
query = query.substring(0, query.length() - 1);
}
String queryStr = " {\"query\": { \"match\": { \"question\":\"" + query + "\" } }\n" +
"  ,\n" +
"  \"post_filter\": {    \n" +
"        \"term\" : {\n" +
"            \"isdeleted\" : \"false\"\n" +
"        }\n" +
"    }\n" +
"}";
json = search(jestClient, indexName, typeName, queryStr);
jestClient.shutdownClient();
} catch (Exception e) {
json.setCode(ErrorCodeEnum.ELASTIC_SEARCH_HAS_ERROR.getCode());
json.setMsg(ErrorCodeEnum.ELASTIC_SEARCH_HAS_ERROR.getDescription());
e.printStackTrace();
}
return json;
}

将jestClient和查询语句一起传过去,使用jestClient执行查询

得出结果数据解析数据的过程如下

实际上这就是queryStr字符串在 kibana上执行所得到的结果,我们将得到的结果进行一个json的序列化解析反馈给前端

 

 

[code]/**
* Title:全文检索课后作业 <br>
* Description: 全文检索方法<br>
* CreateDate: 2018/6/14 14:44<br>
*
* @param jestClient
* @param indexName  索引名称
* @param typeName   索引类型
* @param query      查询语句
* @return com.mingyisoft.javabase.bean.CommonJsonObject<java.lang.Object>
* @throws Exception
* @category
* @author james.fxy
*/
public static CommonJsonObject<Object> search(JestClient jestClient, String indexName, String typeName, String query) throws Exception {
CommonJsonObject<Object> json = new CommonJsonObject<>();

//        List<LessonHomeworkParam> lessonHomeworkParams = new ArrayList<>();
Search search = new Search.Builder(query)
.addIndex(indexName)
.addType(typeName)
.build();
JestResult jr = jestClient.execute(search);
//        System.out.println("全文搜索--" + jr.getJsonString());
//自动解析
//        System.out.println("全文搜索--" + jr.getSourceAsObject(User.class));
//        List<SearchResult.Hit<LessonHomeworkParam, Void>> jrList;
//        jrList = ((SearchResult) jr).getHits(LessonHomeworkParam.class);
//        for (SearchResult.Hit<LessonHomeworkParam, Void> lessonHomeworkParamVoidHit : jrList) {
//            LessonHomeworkParam lessonHomeworkParam = lessonHomeworkParamVoidHit.source;
//            lessonHomeworkParams.add(lessonHomeworkParam);
//        }
//        json.setData(lessonHomeworkParams);
//        return json;
//    }
// 手动解析
JsonObject jsonObject = jr.getJsonObject();
JsonObject hitsobject = jsonObject.getAsJsonObject("hits");
long took = jsonObject.get("took").getAsLong();
long total = hitsobject.get("total").getAsLong();
JsonArray jsonArray = hitsobject.getAsJsonArray("hits");

System.out.println("took:" + took + "  " + "total:" + total);

List<LessonHomeworkParam> lessonHomeworkParams = new ArrayList<LessonHomeworkParam>();

for (int i = 0; i < jsonArray.size(); i++) {
JsonObject jsonHitsObject = jsonArray.get(i).getAsJsonObject();

// 获取返回字段
JsonObject sourceObject = jsonHitsObject.get("_source").getAsJsonObject();

// 封装LessonHomeworkParam对象
LessonHomeworkParam lessonHomeworkParam = new LessonHomeworkParam();
lessonHomeworkParam.setId(Integer.parseInt(sourceObject.get("id").getAsNumber().toString()));
lessonHomeworkParam.setExplain(sourceObject.get("explain").getAsString());
lessonHomeworkParam.setQuestion(sourceObject.get("question").getAsString());
// lessonHomeworkParam.setCreateUserId(Integer.parseInt(sourceObject.get("createuserid").getAsNumber().toString()));
lessonHomeworkParam.setQuestionTypes(Integer.parseInt(sourceObject.get("questiontypes")
.getAsNumber().toString()));
lessonHomeworkParam.setIsDeleted(sourceObject.get("isdeleted").getAsBoolean());
lessonHomeworkParam.setIsEnabled(sourceObject.get("isenabled").getAsBoolean());
lessonHomeworkParams.add(lessonHomeworkParam);
}
json.setData(lessonHomeworkParams);
return json;
}

给大家展示一下queryStr在 kibana 上执行的结果

从kibana 上拿到的数据结果也就是我们在java代码中解析的数据

此处需要注意一点就是elasticsearch本身的数据使用jackson进行序列化了

 

kibana解析查询出来的数据解释:

kibana上查询出来的数据

例如:

{

"took": 40,

"timed_out": false,

"_shards": {

"total": 27,

"successful": 27,

"skipped": 0,

"failed": 0

},

"hits": {

"total": 1529,

"max_score": 5.710427,

"hits": [

{

"_index": "catalog",

"_type": "catalogtable",

"_id": "2406",

"_score": 5.710427,

"_source": {

"@timestamp": "2018-06-25T01:23:00.031Z",

"sequence": 8,

"id": 2406,

"isdeleted": false,

"parentid": 2197,

"createuserid": 10,

"name": "2",

"@version": "1",

"type": "lesson_catalog",

"createtime": "2018-06-23T08:20:35.183Z"

}

}

took字段表示该操作的耗时(单位为毫秒),timed_out字段表示是否超时,hits字段表示命中的记录 total:返回记录数。max_score:最高的匹配程度

hits:返回的记录组成的数组。

 

 

我们在解析数据时需要注意格式转换

如下代码:

[code]
// 获取返回字段
JsonObject sourceObject = jsonHitsObject.get("_source").getAsJsonObject();

// 封装LessonHomeworkParam对象
LessonHomeworkParam lessonHomeworkParam = new LessonHomeworkParam();
lessonHomeworkParam.setId(Integer.parseInt(sourceObject.get("id").getAsNumber().toString()));
lessonHomeworkParam.setExplain(sourceObject.get("explain").getAsString());
lessonHomeworkParam.setQuestion(sourceObject.get("question").getAsString());
// lessonHomeworkParam.setCreateUserId(Integer.parseInt(sourceObject.get("createuserid").getAsNumber().toString()));
lessonHomeworkParam.setQuestionTypes(Integer.parseInt(sourceObject.get("questiontypes")
.getAsNumber().toString()));
lessonHomeworkParam.setIsDeleted(sourceObject.get("isdeleted").getAsBoolean());
lessonHomeworkParam.setIsEnabled(sourceObject.get("isenabled").getAsBoolean());

 

中间数据的格式转换就是我们需要的去手动解析的

注:具体的类型对应如下图 type中的类型,我们需要进行类型的转换,然后才能拿出自己想要的数据

 

 

至此,java的elasticsearch全文检索代码部分

另有博客介绍elasticsearch的原理配置 

https://blog.csdn.net/weixin_37970049/article/details/80989617

 

 

阅读更多
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: