您的位置:首页 > 其它

Solr入门之官方文档6.0阅读笔记系列(五) 第二部分结束

2016-06-13 14:57 381 查看
第二部分 Documents,
Fields, and Schema Design

Putting the Pieces Together

将上面讲到的部分拼接到一起就是一个良好的schema.xml了
Choosing Appropriate Numeric Types

如果频繁的查询数字类型,可以使用precisionStep="8" 或者默认值,来提高查询速度,但是索引大小或增加

Working
With Text

两种方式
使用copyfield复制要检索的字段,作为默认检索
使用copyfield复制一个字段,使用不同的处理方式.分词 排序 

Related Topics
SchemaXML

DocValues

在内存中纪录索引的内容,使操作更加高效,像sorting和faceting.
Why DocValues?
平时我们使用倒排序索引来建立一个文档的索引,这样,普通的搜索是很快的,但是对于solr中别的功能such as sorting, faceting, and highlighting,而且,使用缓慢加载的方式,将其加载到内容中.

在lucene4中引入docvalues来结果这个问题.能提高效率和减少内存的使用.
DocValue fields are now column-oriented fields with a document-to-value
mapping built at index time.
Enabling DocValues

在定义字段类型或者字段时使 docValues="true"
<field name="manu_exact" type="string" indexed="false" stored="false" docValues="true"
/>

可以使用docvalue属性的相关类型:
StrField and UUIDField.
    If the field is single-valued (i.e., multi-valued is false), Lucene will use the SORTED type.
    If the field is multi-valued, Lucene will use the SORTED_SET type.
Any Trie* numeric fields and EnumField.
     If the field is single-valued (i.e., multi-valued is false), Lucene will use the NUMERIC type.
    If the field is multi-valued, Lucene will use the SORTED_SET type.

you could choose to keep everything in memory by specifying docValuesFormat="Memory" on afield
type:

<fieldType name="string_in_mem_dv" class="solr.StrField" docValues="true"
docValuesFormat="Memory" />

docValuesFormat属性在将来版本经可能会改变

Using
DocValues
理解还不够,以后可以继续加深下
Sorting,
Faceting & Functions
If docValues="true" for a field, then DocValues will automatically be used any time the field is used for sorting, faceting or Function
Queries.

Retrieving DocValues During Search

docvalues的返回与否取决于 useDocValuesAsStored="true"
的属性及查询条件 fl 指定 还是* 

docvalues和正常的检索的区别点:
1.返回值 正常是插入的顺序,docvalues是排序后的顺序
2.多值的话,会去除重复的值

Schemaless Mode
These Solr features, all controlled via solrconfig.xml,

Using
the Schemaless Example
curl http://localhost:8983/solr/gettingstarted/schema/fields 
Configuring Schemaless Mode

Enable Managed Schema
默认是支持对schema进行管理的,除非你自己实现了ClassicIndexSchemaFactory被使用.要配置如下的例子:

<schemaFactory class="ManagedIndexSchemaFactory">
<bool name="mutable">true</bool>
<str name="managedSchemaResourceName">managed-schema</str>
</schemaFactory>
Define an UpdateRequestProcessorChain

UpdateRequestProcessorChain
能够让solr猜测一个字段的类型,你也能自己定义一个默认的字段类型.在solrconfig.xml 例子如下;
<updateRequestProcessorChain name="add-unknown-fields-to-the-schema">
<!-- UUIDUpdateProcessorFactory will generate an id if none is present in the
incoming document -->
<processor class="solr.UUIDUpdateProcessorFactory" />
<processor class="solr.LogUpdateProcessorFactory"/>
<processor class="solr.DistributedUpdateProcessorFactory"/>
<processor class="solr.RemoveBlankFieldUpdateProcessorFactory"/>
<processor class="solr.FieldNameMutatingUpdateProcessorFactory">
<str name="pattern">[^\w-\.]</str>
<str name="replacement">_</str>
</processor>
<processor class="solr.ParseBooleanFieldUpdateProcessorFactory"/>
<processor class="solr.ParseLongFieldUpdateProcessorFactory"/>
<processor class="solr.ParseDoubleFieldUpdateProcessorFactory"/>
<processor class="solr.ParseDateFieldUpdateProcessorFactory">
<arr name="format">
<str>yyyy-MM-dd'T'HH:mm:ss.SSSZ</str>
<str>yyyy-MM-dd'T'HH:mm:ss,SSSZ</str>
<str>yyyy-MM-dd'T'HH:mm:ss.SSS</str>
<str>yyyy-MM-dd'T'HH:mm:ss,SSS</str>
<str>yyyy-MM-dd'T'HH:mm:ssZ</str>
<str>yyyy-MM-dd'T'HH:mm:ss</str>
<str>yyyy-MM-dd'T'HH:mmZ</str>
<str>yyyy-MM-dd'T'HH:mm</str>
<str>yyyy-MM-dd HH:mm:ss.SSSZ</str>
<str>yyyy-MM-dd HH:mm:ss,SSSZ</str>
<str>yyyy-MM-dd HH:mm:ss.SSS</str>
<str>yyyy-MM-dd HH:mm:ss,SSS</str>
<str>yyyy-MM-dd HH:mm:ssZ</str>
<str>yyyy-MM-dd HH:mm:ss</str>
<str>yyyy-MM-dd HH:mmZ</str>
<str>yyyy-MM-dd HH:mm</str>
<str>yyyy-MM-dd</str>
</arr>
</processor>
<processor class="solr.AddSchemaFieldsUpdateProcessorFactory">
<str name="defaultFieldType">strings</str>
<lst name="typeMapping">
<str name="valueClass">java.lang.Boolean</str>
<str name="fieldType">booleans</str>
</lst>
<lst name="typeMapping">
<str name="valueClass">java.util.Date</str>
<str name="fieldType">tdates</str>
</lst>
<lst name="typeMapping">
<str name="valueClass">java.lang.Long</str>
<str name="valueClass">java.lang.Integer</str>
<str name="fieldType">tlongs</str>
</lst>
<lst name="typeMapping">
<str name="valueClass">java.lang.Number</str>
<str name="fieldType">tdoubles</str>
</lst>
</processor>
<processor class="solr.RunUpdateProcessorFactory"/>
</updateRequestProcessorChain>

上文中的 update processor factories

UUIDUpdateProcessorFactory
RemoveBlankFieldUpdateProcessorFactory
FieldNameMutatingUpdateProcessorFactory
ParseBooleanFieldUpdateProcessorFactory
ParseLongFieldUpdateProcessorFactory
ParseDoubleFieldUpdateProcessorFactory
ParseDateFieldUpdateProcessorFactory
AddSchemaFieldsUpdateProcessorFactory

Make
the UpdateRequestProcessorChain the Default for the UpdateRequestHandler

在solrconfig.xml中配置,是上文定义的更新处理过程链被UpdateRequestHandlers使用:

<initParams path="/update/**">
<lst name="defaults">
<str name="update.chain">add-unknown-fields-to-the-schema</str>
</lst>
</initParams>

Examples
of Indexed Documents
solr官方给出一个配置实现

solr6\server\solr\configsets\data_driven_schema_configs\conf

curl "http://localhost:8983/solr/gettingstarted/update?commit=true"
-H
"Content-type:application/csv" -d '
id,Artist,Album,Released,Rating,FromDistributor,Sold
44C,Old Shews,Mead for Walking,1988-08-13,0.01,14,0'

{
"responseHeader":{
"status":0,
"QTime":1},
"fields":[{
"name":"Album",
"type":"strings"}, // Field value guessed as String -> strings fieldType
{
"name":"Artist",
"type":"strings"}, // Field value guessed as String -> strings fieldType
{
"name":"FromDistributor",
"type":"tlongs"}, // Field value guessed as Long -> tlongs fieldType
{
"name":"Rating",
"type":"tdoubles"}, // Field value guessed as Double -> tdoubles fieldType
{
"name":"Released",
"type":"tdates"}, // Field value guessed as Date -> tdates fieldType
{
"name":"Sold",
"type":"tlongs"}, // Field value guessed as Long -> tlongs fieldType
{
"name":"_text_",
...
},
{
"name":"_version_",
...
},
{
"name":"id",
...
}]}

这是动态的猜测添加字段,实际上来说,我们还是需要更加的可控才好.这个不太重要.

好,到这里,第二部分的内容,就算是简单的看完了,现在回忆一下大概的要点:

Documents, Fields, and Schema Design

Solr Field Types:  这部分讲述的是字段类型的定义及其属性有哪些,及默认值,其中可能会定义分词器,过滤器等(这个应该后面详细会说)
Defining Fields:讲述的是如何定义字段及字段的属性描述
Copying Fields: 如何定义复制字段,和具体如何使用他
Dynamic Fields:动态字段的作用和定义,属性等
Schema API: 使用 http REST API来操作schema
Other Schema Elements: 其他的元素, 唯一主键,默认操作符,默认搜索域及打分相关的类配置
Putting the Pieces Together: 组合成一个schmea约束文件
DocValues: 什么是docvalues,工作原理及使用的场景
Schemaless Mode: 动态的添加字段类型根据配置的猜测更新请求链

内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签:  solr 文档