elasticsearch _field_stats 源码分析
2018-02-03 11:00
716 查看
_field_stats实现的功能:https://www.elastic.co/guide/en/elasticsearch/reference/5.6/search-field-stats.html
获取索引下字段的统计信息,如下表,同时还可以针对这些统计值进行过滤:
Field statistics
The field stats api is supported on string based, number based and date based fields and can return the following statistics per field:max_doc | The total number of documents. |
doc_count | The number of documents that have at least one term for this field, or -1 if this measurement isn’t available on one or more shards. |
density | The percentage of documents that have at least one value for this field. This is a derived statistic and is based on the max_docand doc_count. |
sum_doc_freq | The sum of each term’s document frequency in this field, or -1 if this measurement isn’t available on one or more shards. Document frequency is the number of documents containing a particular term. |
sum_total_term_freq | The sum of the term frequencies of all terms in this field across all documents, or -1 if this measurement isn’t available on one or more shards. Term frequency is the total number of occurrences of a term in a particular document and field. |
Field stats index constraints ——kibana里按照时间范围进行绘图就是用到这个。
Field stats index constraints allows to omit all field stats for indices that don’t match with the constraint. An index constraint can exclude indices' field stats based on the min_valueand
max_valuestatistic. This option is only useful if the
leveloption is set to
indices. Fields that are not indexed (not searchable) are always omitted when an index constraint is defined.
For example index constraints can be useful to find out the min and max value of a particular property of your data in a time based scenario. The following request only returns field stats for the
answer_countproperty for indices holding questions created in the year 2014:
POST _field_stats?level=indices { "fields" : ["answer_count"],
"index_constraints" : {
"creation_date" : {
"max_value" : {
"gte" : "2014-01-01T00:00:00.000Z" }, "min_value" : {
"lt" : "2015-01-01T00:00:00.000Z" } } } }
对应ES5.5的源码部分:elasticsearch/search/lookup/IndexField.java
import org.apache.lucene.search.CollectionStatistics; import org.elasticsearch.common.util.MinimalMap; import java.io.IOException; import java.util.HashMap; import java.util.Map; /** * Script interface to all information regarding a field. * */ public class IndexField extends MinimalMap<String, IndexFieldTerm> { /* * TermsInfo Objects that represent the Terms are stored in this map when * requested. Information such as frequency, doc frequency and positions * information can be retrieved from the TermInfo objects in this map. */ private final Map<String, IndexFieldTerm> terms = new HashMap<>(); // the name of this field private final String fieldName; /* * The holds the current reader. We need it to populate the field * statistics. We just delegate all requests there */ private final LeafIndexLookup indexLookup; /* * General field statistics such as number of documents containing the * field. */ private final CollectionStatistics fieldStats; public IndexField(String fieldName, LeafIndexLookup indexLookup) throws IOException { assert fieldName != null; this.fieldName = fieldName; assert indexLookup != null; this.indexLookup = indexLookup; fieldStats = this.indexLookup.getIndexSearcher().collectionStatistics(fieldName); } /* get number of documents containing the field */ public long docCount() throws IOException { return fieldStats.docCount(); } /* get sum of the number of words over all documents that were indexed */ public long sumttf() throws IOException { return fieldStats.sumTotalTermFreq(); } /* * get the sum of doc frequencies over all words that appear in any document * that has the field. */ public long sumdf() throws IOException { return fieldStats.sumDocFreq(); } // 。。。。。。。 }
相关文章推荐
- Elasticsearch源码分析十三--高亮显示highlight
- Elasticsearch之client源码简要分析
- 分布式搜索Elasticsearch源码分析之二------索引过程源码概要分析
- JUC源码分析3-原子变量-AtomicIntegerFieldUpdater/AtomicLongFieldUpdater/AtomicReferenceFieldUpdater
- memcache源码分析之命名空间stats
- Elasticsearch源码分析五--调用Lucene查询接口之模糊查询(Fuzzy)
- elasticsearch源码分析之search模块(server端)
- elasticsearch源码分析---discovery模块
- 兄弟连区块链教程btcpool矿池源码分析statshttpd模块解析
- Elasticsearch源码分析九--查询解析器QueryParser注册过程
- Elasticsearch源码分析十四--搜索类型
- elasticsearch源码分析之Gateway(六)
- elasticsearch源码分析之Gateway(六)
- 通过阅读源码分析elasticsearch中分片如何分配到集群中节点
- elasticsearch源码分析---TransportClient
- Storm-源码分析-Stats (backtype.storm.stats)
- Django Form源码分析之Field验证逻辑
- elasticsearch源码分析之集群服务(八)
- 开源项目ExplosionField(爆炸特效)源码分析
- elasticsearch源码分析之discovery(七)