手写一个简单的ElasticSearch SQL转换器(一)
一.前言
之前有个需求,是使ElasticSearch支持使用SQL进行简单查询,较新版本的ES已经支持该特性(不过貌似还是实验性质的?) ,而且git上也有elasticsearch-sql
插件,之所以决定手写一个,主要有两点原因:
1. 目前用的ES版本较老
2. elasticsearch-sql虽好,但比较复杂,代码也不易维护
3. 练练手
二.技术选型
目前主流软件中通常使用ANTLR做词法语法分析,诸如著名的Hibernate,Spark,Hive等项目,之前因为工作原因也有所接触,不过如果只是解析标准SQL的话,
其实还有更好的选择,如使用Hibernate或阿里巴巴的数据库Druid(Druid采用了手写词法语法分析器的方案,这种方式当然比自动ANTLR生成的解析器性能高得多), 这里
我选择了第二种方案。
开始之前先看下我们可以通过Druid拿到的SQL语言的抽象语法树:
图片:https://www.jianshu.com/p/437aa22ea3ca
三.技术实现
首先我们创建一个SqlParser类,主流程都在parse方法中,该方法负责将一个SQL字符串解析(顺便说一句,Druid支持多种SQL方言,这里我选择了MySQL),
并返回SearchSourceBuilder对象,这是一个ElasticSearch提供的DSL构建器,以该对象作为参数,ES client端即可发起对ES 服务端搜索请求。
/** * * @author fred * */ public class SqlParser { private final static String dbType = JdbcConstants.MYSQL; private final static Logger logger = LoggerFactory.getLogger(SqlParser.class); private SearchSourceBuilder builder; public SqlParser(SearchSourceBuilder builder) { this.builder = builder; } /** * 将SQL解析为ES查询 */ public SearchSourceBuilder parse(String sql) throws Exception { if (Objects.isNull(sql)) { throw new IllegalArgumentException("输入语句不得为空"); } sql = sql.trim().toLowerCase(); List<SQLStatement> stmtList = SQLUtils.parseStatements(sql, dbType); if (Objects.isNull(stmtList) || stmtList.size() != 1) { throw new IllegalArgumentException("必须输入一句查询语句"); } // 使用Parser解析生成AST SQLStatement stmt = stmtList.get(0); if (!(stmt instanceof SQLSelectStatement)) { throw new IllegalArgumentException("输入语句须为Select语句"); } SQLSelectStatement sqlSelectStatement = (SQLSelectStatement) stmt; SQLSelectQuery sqlSelectQuery = sqlSelectStatement.getSelect().getQuery(); SQLSelectQueryBlock sqlSelectQueryBlock = (SQLSelectQueryBlock) sqlSelectQuery; SQLExpr whereExpr = sqlSelectQueryBlock.getWhere(); // 生成ES查询条件 BoolQueryBuilder bridge = QueryBuilders.boolQuery(); bridge.must(); QueryBuilder whereBuilder = whereHelper(whereExpr); // 处理where bridge.must(whereBuilder); SQLOrderBy orderByExpr = sqlSelectQueryBlock.getOrderBy(); // 处理order by if (Objects.nonNull(orderByExpr)) { orderByHelper(orderByExpr, bridge); } builder.query(bridge); return builder; }
主流程很简单,拿到SQL字符串后,直接通过Druid API将其转换为抽象语法树,我们要求输入语句必须为Select语句。接下来是对where语句和order by语句的处理,
目前的难点其实主要在于如何将where语句映射到ES查询中。
先从简单的看起,如何处理order by呢?SQL语句中 order by显然可以允许用户根据多字段排序,所以排序字段肯定是一个List<排序字段>,我们要做的就是将这个List映射到
SearchSourceBuilder对象中。见下面代码:
/** * 处理所有order by字段 * 4 * @param orderByExpr */ private void orderByHelper(SQLOrderBy orderByExpr, BoolQueryBuilder bridge) { List<SQLSelectOrderByItem> orderByList = orderByExpr.getItems(); // 待排序的列 for (SQLSelectOrderByItem sqlSelectOrderByItem : orderByList) { if (sqlSelectOrderByItem.getType() == null) { sqlSelectOrderByItem.setType(SQLOrderingSpecification.ASC); // 默认升序 } 12 String orderByColumn = sqlSelectOrderByItem.getExpr().toString(); builder.sort(orderByColumn, sqlSelectOrderByItem.getType().equals(SQLOrderingSpecification.ASC) ? SortOrder.ASC : SortOrder.DESC); } }
通过Druid的API,我们很容易拿到了SQL语句中所有的排序字段,我们逐个遍历这些字段,拿到排序的列名字面量和顺序,传递给SearchSourceBuilder的sort方法,需注意的
是, 如果原始SQL中没有指定字段是顺序,我们默认升序。
接下来我们处理稍微有点麻烦的where语句,因为SQL语句被解析成了语法树,很自然的我们想到使用递归方式进行处理。 而通常在处理递归问题的时候,
我习惯于从递归的base case开始考虑,where语句中的运算符根据Druid API中的定义主要分为以下三种:
1. 简单二元运算符:包括逻辑处理,如and, or 和大部分关系运算(后续会详细讲)
2. between或not between运算符:我们可以简单的将其映射成ES中的Range Query
3. in, not in 运算符: 可以简单的映射成ES中的Term Query
通过Druid,我们可以很方便的获取每种运算中的运算符与操作数
/** * 递归遍历“where”子树 * * @return */ private QueryBuilder whereHelper(SQLExpr expr) throws Exception { if (Objects.isNull(expr)) { throw new NullPointerException("节点不能为空!"); } BoolQueryBuilder bridge = QueryBuilders.boolQuery(); if (expr instanceof SQLBinaryOpExpr) { // 二元运算 SQLBinaryOperator operator = ((SQLBinaryOpExpr) expr).getOperator(); // 获取运算符 if (operator.isLogical()) { // and,or,xor return handleLogicalExpr(expr); } else if (operator.isRelational()) { // 具体的运算,位于叶子节点 return handleRelationalExpr(expr); } } else if (expr instanceof SQLBetweenExpr) { // between运算 SQLBetweenExpr between = ((SQLBetweenExpr) expr); boolean isNotBetween = between.isNot(); // between or not between ? String testExpr = between.testExpr.toString(); String fromStr = formatSQLValue(between.beginExpr.toString()); String toStr = formatSQLValue(between.endExpr.toString()); if (isNotBetween) { bridge.must(QueryBuilders.rangeQuery(testExpr).lt(fromStr).gt(toStr)); } else { bridge.must(QueryBuilders.rangeQuery(testExpr).gte(fromStr).lte(toStr)); } return bridge; } else if (expr instanceof SQLInListExpr) { // SQL的 in语句,ES中对应的是terms SQLInListExpr siExpr = (SQL 576 InListExpr) expr; boolean isNotIn = siExpr.isNot(); // in or not in? String leftSide = siExpr.getExpr().toString(); List<SQLExpr> inSQLList = siExpr.getTargetList(); List<String> inList = new ArrayList<>(); for (SQLExpr in : inSQLList) { String str = formatSQLValue(in.toString()); inList.add(str); } if (isNotIn) { bridge.mustNot(QueryBuilders.termsQuery(leftSide, inList)); } else { bridge.must(QueryBuilders.termsQuery(leftSide, inList)); } return bridge; } return bridge; }
上述第一种情况比较复杂,首先我们先看看运算符是逻辑运算的情况:
如下面的代码所示,如果运算符是逻辑运算符,我们需要对左右操作数分别递归,然后根据运算符类型归并结果:or可以映射成ES 中的Should,而and则映射成Must.
/** * 逻辑运算符,目前支持and,or * * @return * @throws Exception */ private QueryBuilder handleLogicalExpr(SQLExpr expr) throws Exception { BoolQueryBuilder bridge = QueryBuilders.boolQuery(); SQLBinaryOperator operator = ((SQLBinaryOpExpr) expr).getOperator(); // 获取运算符 SQLExpr leftExpr = ((SQLBinaryOpExpr) expr).getLeft(); SQLExpr rightExpr = ((SQLBinaryOpExpr) expr).getRight(); // 分别递归左右子树,再根据逻辑运算符将结果归并 QueryBuilder leftBridge = whereHelper(leftExpr); QueryBuilder rightBridge = whereHelper(rightExpr); if (operator.equals(SQLBinaryOperator.BooleanAnd)) { bridge.must(leftBridge).must(rightBridge); } else if (operator.equals(SQLBinaryOperator.BooleanOr)) { bridge.should(leftBridge).should(rightBridge); } return bridge; }
下面来讨论下第一种情况中,如果运算符是关系运算符的情况,我们知道,SQL中的关系运算主要就是一些比较运算符,诸如大于,小于,等于,Like等,这里我还加上了
正则搜索(不过貌似性能比较差,ES对正则搜索的限制颇多,不太建议使用)。
/** * 大于小于等于正则 * * @param expr * @return */ private QueryBuilder handleRelationalExpr(SQLExpr expr) { SQLExpr leftExpr = ((SQLBinaryOpExpr) expr).getLeft(); if (Objects.isNull(leftExpr)) { throw new NullPointerException("表达式左侧不得为空"); } String leftExprStr = leftExpr.toString(); String rightExprStr = formatSQLValue(((SQLBinaryOpExpr) expr).getRight().toString()); // TODO:表达式右侧可以后续支持方法调用 SQLBinaryOperator operator = ((SQLBinaryOpExpr) expr).getOperator(); // 获取运算符 QueryBuilder queryBuilder; switch (operator) { case GreaterThanOrEqual: queryBuilder = QueryBuilders.rangeQuery(leftExprStr).gte(rightExprStr); break; case LessThanOrEqual: queryBuilder = QueryBuilders.rangeQuery(leftExprStr).lte(rightExprStr); break; case Equality: queryBuilder = QueryBuilders.boolQuery(); TermQueryBuilder eqCond = QueryBuilders.termQuery(leftExprStr, rightExprStr); ((BoolQueryBuilder) queryBuilder).must(eqCond); break; case GreaterThan: queryBuilder = QueryBuilders.rangeQuery(leftExprStr).gt(rightExprStr); break; case LessThan: queryBuilder = QueryBuilders.rangeQuery(leftExprStr).lt(rightExprStr); break; case NotEqual: queryBuilder = QueryBuilders.boolQuery(); TermQueryBuilder notEqCond = QueryBuilders.termQuery(leftExprStr, rightExprStr); ((BoolQueryBuilder) queryBuilder).mustNot(notEqCond); break; case RegExp: // 对应到ES中的正则查询 queryBuilder = QueryBuilders.boolQuery(); RegexpQueryBuilder regCond = QueryBuilders.regexpQuery(leftExprStr, rightExprStr); ((BoolQueryBuilder) queryBuilder).mustNot(regCond); break; case NotRegExp: queryBuilder = QueryBuilders.boolQuery(); RegexpQueryBuilder notRegCond = QueryBuilders.regexpQuery(leftExprStr, rightExprStr); ((BoolQueryBuilder) queryBuilder).mustNot(notRegCond); break; case Like: queryBuilder = QueryBuilders.boolQuery(); MatchPhraseQueryBuilder likeCond = QueryBuilders.matchPhraseQuery(leftExprStr, rightExprStr.replace("%", "")); ((BoolQueryBuilder) queryBuilder).must(likeCond); break; case NotLike: queryBuilder = QueryBuilders.boolQuery(); MatchPhraseQueryBuilder notLikeCond = QueryBuilders.matchPhraseQuery(leftExprStr, rightExprStr.replace("%", "")); ((BoolQueryBuilder) queryBuilder).mustNot(notLikeCond); break; default: throw new IllegalArgumentException("暂不支持该运算符!" + operator.toString()); } return queryBuilder; }
到这里我们就完成了SQL转ES DSL的功能了(其实只是简单查询的转换),下面我们写几个Junit测试一下吧:
首先是简单的比较运算:
public void normalSQLTest() { String sql = "select * from test where time>= 1"; SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder(); try { searchSourceBuilder = new SqlParser(searchSourceBuilder).parse(sql); } catch (Exception e) { e.printStackTrace(); } System.out.println(searchSourceBuilder); SearchSourceBuilder builderToCompare = new SearchSourceBuilder(); QueryBuilder whereBuilder = QueryBuilders.rangeQuery("time").gte("1"); BoolQueryBuilder briage = QueryBuilders.boolQuery(); briage.must(); briage.must(whereBuilder); builderToCompare.query(briage); assertEquals(searchSourceBuilder,builderToCompare); }
下面是输出的ES 查询语句:
{ "query" : { "bool" : { "must" : [ { "range" : { "time" : { "from" : "1", "to" : null, "include_lower" : true, "include_upper" : true, "boost" : 1.0 } } } ], "disable_coord" : false, "adjust_pure_negative" : true, "boost" : 1.0 } } }
再来个带排序的:
@Test public void normalSQLWithOrderByTest() { String sql = "select * from test where time>= 1 order by time desc"; SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder(); try { searchSourceBuilder = new SqlParser(searchSourceBuilder).parse(sql); } catch (Exception e) { e.printStackTrace(); } System.out.println(searchSourceBuilder); SearchSourceBuilder builderToCompare = new SearchSourceBuilder(); QueryBuilder whereBuilder = QueryBuilders.rangeQuery("time").gte("1"); BoolQueryBuilder briage = QueryBuilders.boolQuery(); briage.must(); briage.must(whereBuilder); builderToCompare.sort("time",SortOrder.DESC); builderToCompare.query(briage); assertEquals(searchSourceBuilder,builderToCompare); }
between, in这些没什么区别,就不贴代码了,最后看看稍微复杂点儿,带逻辑运算的查询:
@Test public void sqlLogicTest() { String sql = "select * from test where raw_log not like"+"'%aaa' && b=1 or c=0"; SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder(); try { searchSourceBuilder = new SqlParser(searchSourceBuilder).parse(sql); } catch (Exception e) { e.printStackTrace(); } System.out.println(searchSourceBuilder); SearchSourceBuilder builderToCompare = new SearchSourceBuilder(); QueryBuilder builder =QueryBuilders.matchPhraseQuery("raw_log","aaa"); BoolQueryBuilder briage1 = QueryBuilders.boolQuery();//raw log not like briage1.mustNot(builder); BoolQueryBuilder briage2 = QueryBuilders.boolQuery(); //b=1 briage2.must(QueryBuilders.termQuery("b","1")); BoolQueryBuilder briage3 = QueryBuilders.boolQuery(); // not like and b=1 briage3.must(briage1).must(briage2); BoolQueryBuilder briage4 = QueryBuilders.boolQuery(); //c =0 briage4.must(QueryBuilders.termQuery("c","0")); BoolQueryBuilder briage5 = QueryBuilders.boolQuery(); // not like and b =1 or c =0 briage5.should(briage3).should(briage4); BoolQueryBuilder briage6 = QueryBuilders.boolQuery(); briage6.must(); briage6.must(briage5); builderToCompare.query(briage6); assertEquals(searchSourceBuilder,builderToCompare); }
下面是生成的查询语句:
{ "query" : { "bool" : { "must" : [ { "bool" : { "should" : [ { "bool" : { "must" : [ { "bool" : { "must_not" : [ { "match_phrase" : { "raw_log" : { "query" : "aaa", "slop" : 0, "boost" : 1.0 } } } ], "disable_coord" : false, "adjust_pure_negative" : true, "boost" : 1.0 } }, { "bool" : { "must" : [ { "term" : { "b" : { "value" : "1", "boost" : 1.0 } } } ], "disable_coord" : false, "adjust_pure_negative" : true, "boost" : 1.0 } } ], "disable_coord" : false, "adjust_pure_negative" : true, "boost" : 1.0 } }, { "bool" : { "must" : [ { "term" : { "c" : { "value" : "0", "boost" : 1.0 } } } ], "disable_coord" : false, "adjust_pure_negative" : true, "boost" : 1.0 } } ], "disable_coord" : false, "adjust_pure_negative" : true, "boost" : 1.0 } } ], "disable_coord" : false, "adjust_pure_negative" : true, "boost" : 1.0 } } }
四.总结
本篇文章主要讲述了如何使用Druid实现SQL语句转换ES DSL进行搜索的功能,后续文章中会陆续完善这个功能,实现诸如聚合查询,分页查询等功能。
- 手写一个简单的Servlet程序
- 利用反射手写有一个简单的SpringIOC框架
- 浅析Spring原理,手写一个简单的Spring
- 手写一个简单的RPC
- androidstudio简单上手写一个app
- 30分钟手写一个简单的RPC
- jquery下拉菜单怎么做,一个简单的纯手写例子
- 手写一个简单的Tomcat
- Rhyme/ 手写服务器 实现一个基于xml解析的简单的Tomcat服务器
- 神经网络和深度学习(二)——一个简单的手写数字分类网络
- vue用Object.defineProperty手写一个简单的双向绑定的示例
- Java: 动手写一个简单的线程池
- 面试手写一个简单的STL容器vector
- 利用Sklearn对MNIST手写数据集开始一个简单的二分类判别器项目(在这个过程中学习关于模型性能的评价指标,如accuracy,precision,recall,混淆矩阵)
- 神经网络与深度学习 1.3 神经网络的架构 1.4 一个简单的分类手写数字的网络架构
- 手写一个简单的WinForm程序(2)
- 手写一个简单的WinForm程序
- ajax简单手写了一个猜拳游戏
- 一个简单的掷骰子游戏
- 一个简单的AJAX请求类