ES学习笔记五-搜索相关性
2015-02-21 19:09
204 查看
By default, results are returned sorted by relevance—with
the most relevant docs first。
首先来了解一下排序:
{query:{
},
"from":0,
"size":10,
"sort":"field" | "sort:"["filed1","field2"] | "sort":{"filed":"desc"}
}
Analyzed string fields are also multivalue fields, but
sorting on them seldom gives you the results you want. If you analyze a string like
doesn’t have this information at its disposal at sort time.
被分析的string类型的字段是多值字段,如果在这些字段上排序很有可能得不到预期结果。
解决的办法是定义mapping
搜索结果相关性
The standard similarity algorithm used in Elasticsearch is known
as term frequency/inverse document frequency, or TF/IDF, which takes the following factors into account:
Term frequency 词元在此文档中出现的频率越高,则相关性越好How often does the term appear in the field? The more often, the more relevant. A field containing five mentions of the same term is more likely to be relevant than a field containing just one mention.Inverse document frequency 词元在其他文档中出现的频率越高,则相关性越低How often does each term appear in the index? The more often, the less relevant. Terms that appear in many documents have a lower weight than
more-uncommon terms.Field-length norm 文档的长度越低,相关度越小How long is the field? The longer it is, the less likely it is that words in the field will be relevant. A term appearing in a short
carries more weight than the same term appearing in a long
It adds information about the shard and the node that the document came from, which is useful to know because term and document frequencies are calculated
per shard, rather than per index
相关性得分计算是以分片为单位计算的,不是以索引为单位计算的。
记得 explain只在debug中使用 production model中请关闭此选项,性能开销很大
To make sorting efficient, Elasticsearch loads all the values for the field that you want to sort on into memory. This is referred to as fielddata.
the most relevant docs first。
首先来了解一下排序:
{query:{
},
"from":0,
"size":10,
"sort":"field" | "sort:"["filed1","field2"] | "sort":{"filed":"desc"}
}
"sort": { "dates": { "order": "asc", "mode": "min" } }
string sorting and multifields
Analyzed string fields are also multivalue fields, butsorting on them seldom gives you the results you want. If you analyze a string like
fine old art, it results in three terms. We probably want to sort alphabetically on the first term, then the second term, and so forth, but Elasticsearch
doesn’t have this information at its disposal at sort time.
被分析的string类型的字段是多值字段,如果在这些字段上排序很有可能得不到预期结果。
解决的办法是定义mapping
"tweet": { "type": "string", "analyzer": "english", "fields": { "raw": { "type": "string", "index": "not_analyzed" } } }
GET /_search { "query": { "match": { "tweet": "elasticsearch" } }, "sort": "tweet.raw" }
搜索结果相关性
The standard similarity algorithm used in Elasticsearch is known
as term frequency/inverse document frequency, or TF/IDF, which takes the following factors into account:
Term frequency 词元在此文档中出现的频率越高,则相关性越好How often does the term appear in the field? The more often, the more relevant. A field containing five mentions of the same term is more likely to be relevant than a field containing just one mention.Inverse document frequency 词元在其他文档中出现的频率越高,则相关性越低How often does each term appear in the index? The more often, the less relevant. Terms that appear in many documents have a lower weight than
more-uncommon terms.Field-length norm 文档的长度越低,相关度越小How long is the field? The longer it is, the less likely it is that words in the field will be relevant. A term appearing in a short
titlefield
carries more weight than the same term appearing in a long
contentfield.
It adds information about the shard and the node that the document came from, which is useful to know because term and document frequencies are calculated
per shard, rather than per index
相关性得分计算是以分片为单位计算的,不是以索引为单位计算的。
GET /_search?explain { "query" : { "match" : { "tweet" : "honeymoon" }} }
记得 explain只在debug中使用 production model中请关闭此选项,性能开销很大
fielddata
To make sorting efficient, Elasticsearch loads all the values for the field that you want to sort on into memory. This is referred to as fielddata.
相关文章推荐
- ES学习笔记六-分布式搜索执行解析
- Lucene 学习笔记(二)——搜索方式(一)
- OpenGL ES学习笔记之三
- 嵌入式linux c 学习笔记4-深度优先搜索和广义优先搜索
- 遗传算法与直接搜索工具箱学习笔记 五-----使用GPS算法寻找一个函数的最小值
- 遗传算法与直接搜索工具箱学习笔记 十-----遗传算法的工作原理
- 遗传算法与直接搜索工具箱学习笔记 一-----概述
- 遗传算法与直接搜索工具箱学习笔记 三-----目标函数的约束
- 遗传算法与直接搜索工具箱学习笔记 二-----编写自己的目标函数
- Android(OPhone) 学习笔记1 - 搜索API的使用
- 有关AutoCompleteBox组件的研究[5][Final]_集成搜索引擎搜索建议(Search Suggestion)——Silverlight学习笔记[40]
- OpenGL ES学习笔记之二
- OpenLDAP学习笔记8——LDAP常用操作:添加、删除、修改、搜索
- 遗传算法与直接搜索工具箱学习笔记 四-----从直接搜索算法开始
- Lucene 学习笔记(三)——搜索解析
- OpenGL ES学习笔记之四
- OpenGL ES学习笔记之五
- OpenGL ES学习笔记之一
- 遗传算法与直接搜索工具箱学习笔记 七-----模式搜索工作原理详解
- 遗传算法与直接搜索工具箱学习笔记 九-----遗传算法举例