您的位置：首页 > 其它

[未完成]elasticsearch学习

2018-03-30 11:20 246 查看

入门

集群

集群透明

集群健康,3状态，查询命令：

curl -XGET 'localhost:9200/_cluster/health?pretty'

索引

索引是保存相关数据的地方

分片

底层工作单元，本质就是一个完整的搜索引擎

一个Lucene的实例

数据的容器，文档保存在分片内，分片又被分配到集群内的各个节点里

Elasticsearch 会自动的在各节点中迁移分片，使得数据仍然均匀分布在集群里

一个分片可以是主分片或者副本分片

索引内任意一个文档都归属于一个主分片，所以主分片的数目决定着索引能够保存的最大数据量。

一个副本分片只是一个主分片的拷贝

一个索引一个或者多个物理分片的逻辑命名空间

在

blogs

索引中分配3个主分片和一份副的命令示例：

curl -XPUT 'localhost:9200/blogs?pretty' -H 'Content-Type: application/json' -d'

{

"settings" : {

"number_of_shards" : 3,

"number_of_replicas" : 1

}

}

'

数据输出输入

文档（存储）

Elastcisearch 是分布式的文档存储（json）

Elasticsearch 中的文档有着特定的含义。它是指最顶层或者根对象, 这个根对象被序列化成 JSON 并存储到 Elasticsearch 中，指定了唯一 ID

文档元数据

_index 文档存储位置

_type 文档的对象类型

_id 文档唯一标识

其他字段

_version 版本号

_source 文档内容

谓词

PUT

谓词(“使用这个 URL 存储这个文档”)

POST

谓词(“存储文档在这个 URL 命名空间下”)

索引与文档

创建索引

创建一个索引，索引名称为 website ，类型称为 blog ，并且选择 123 作为 ID

curl -XPUT 'localhost:9200/website/blog/123?pretty' -H 'Content-Type: application/json' -d'

{

"title": "My first blog entry",

"text":  "Just trying this out...",

"date":  "2014/01/01"

}

'

创建一个索引，索引名称为 website ，类型称为 blog ，并且随机生成 ID

curl -XPOST 'localhost:9200/website/blog/?pretty' -H 'Content-Type: application/json' -d'

{

"title": "My second blog entry",

"text":  "Still trying this out...",

"date":  "2014/01/01"

}

'

上面命令即创建了索引，也在索引下创建了文档

列出所有索引:

curl 'localhost:9200/_cat/indices?v'

取回文档

取回一个索引名称为 website ，类型称为 blog ，并且id为 123 的文档

curl -XGET 'localhost:9200/website/blog/123?pretty&pretty'

取回文档的一部分，指定文档返回的部分为

title

和

text

curl -XGET 'localhost:9200/website/blog/123?_source=title,text&pretty'

不返回元数据，只返回文档内容

curl -XGET 'localhost:9200/website/blog/123/_source?pretty'

检查文档是否存在

命令

curl -i -XHEAD http://localhost:9200/website/blog/123[/code] 
通过状态码判断,如200 ok
,`404 Not Found


更新文档（其实是创建一个版本号+1的文档）

更新一个索引名称为 website ，类型称为 blog ，并且id为 123 的文档

curl -XPUT 'localhost:9200/website/blog/123?pretty' -H 'Content-Type: application/json' -d'

{

"title": "My first blog entry",

"text":  "I am starting to get the hang of this...",

"date":  "2014/01/02"

}

'


返回的数据中 _version
 版本号+1,而 created
 标志设置成 false
 ，是因为相同的索引、类型和 ID 的文档已经存在。

{

"_index" :   "website",

"_type" :    "blog",

"_id" :      "123",

"_version" : 2,

"created":   false

}


POST
 方法的更新

它似乎对文档直接进行了修改，但实际上 Elasticsearch 按前述完全相同方式执行

唯一的区别在于, update API 仅仅通过一个客户端请求来实现这些步骤，而不需要单独的 get 和 index 请求。

创建新文档

确保创建一个新文档的最简单办法是，使用索引请求的 POST
 形式让 Elasticsearch 自动生成唯一 _id
:POST /website/blog/


如果已经有自己的 _id ，那么我们必须告诉 Elasticsearch ，只有在相同的 _index 、 _type 和 _id 不存在时才接受我们的索引请求

第一种方法使用 op_type
 查询 -字符串参数:PUT /website/blog/123?op_type=create


第二种方法是在 URL 末端使用 /_create
:PUT /website/blog/123/_create


使用 /_create
时如果具有相同的 _index
 、 _type
 和 _id
 的文档已经存在，Elasticsearch 将会返回 409 Conflict
 响应码

删除文档

删除文档不会立即将文档从磁盘中删除，只是将文档标记为已删除状态（版本号也会+1）。

处理冲突

ES使用乐观并发控制

乐观并发控制

内部版本号

利用 _version
 号来确保 应用中相互冲突的变更不会导致数据丢失

如果该版本不是当前版本号，我们的请求将会失败

当指定的版本号参数和es存储的文档的最新版本号相同，才能更新成功

命令

curl -XPUT 'localhost:9200/website/blog/1?version=2&pretty' -H 'Content-Type: application/json' -d'

{

"title": "My first blog entry",

"text":  "Starting to get the hang of this..."

}

'


成功 200
 ，响应体告诉我们 _version 已经递增到 2

失败 409 Conflict
 HTTP 响应码

外部版本号

外部版本号中ES不是检查当前 _version
 和请求中指定的版本号是否相同，而是检查当前 _version
 是否小于指定的版本号

创建一个新的具有外部版本号 5 的博客文章

curl -XPUT 'localhost:9200/website/blog/2?version=5&version_type=external&pretty' -H 'Content-Type: application/json' -d'

{

"title": "My first external blog entry",

"text":  "Starting to get the hang of this..."

}

'


文档的部分更新

更新的步骤：检索-修改-重建索引

增加字段 tags
 和 views
 到我们的博客文章

curl -XPOST 'localhost:9200/website/blog/1/_update?pretty' -H 'Content-Type: application/json' -d'

{

"doc" : {

"tags" : [ "testing" ],

"views": 0

}

}

'


使用 groovy 脚本更新

更新冲突

如果另一个进程修改了处于检索
和重新索引
步骤之间的文档(版本号递增了)，那么 _version
 号将不匹配，更新请求将会失败

如果文档已经被改变（版本号不匹配）也没有关系，可以通过设置参数 retry_on_conflict
 尝试再次更新(默认值为 0)

返回失败结果之前重试该更新5次

curl -XPOST 'localhost:9200/website/pageviews/1/_update?retry_on_conflict=5&pretty' -H 'Content-Type: application/json' -d'

{

"script" : "ctx._source.views+=1",

"upsert": {

"views": 0

}

}

'


取回多个文档

取回符合对应元数据的两个个文档

curl -XGET 'localhost:9200/_mget?pretty' -H 'Content-Type: application/json' -d'

{

"docs" : [

{

"_index" : "website",

"_type" :  "blog",

"_id" :    2

},

{

"_index" : "website",

"_type" :  "pageviews",

"_id" :    1,

"_source": "views"

}

]

}

'


代价较小的批量操作

与 mget
 可以使我们一次取回多个文档同样的方式， bulk
 API 允许在单个步骤中进行多次 create 、 index 、 update 或 delete 请求

curl -XPOST 'localhost:9200/_bulk?pretty' -H 'Content-Type: application/json' -d'

{ "delete": { "_index": "website", "_type": "blog", "_id": "123" }}

{ "create": { "_index": "website", "_type": "blog", "_id": "123" }}

{ "title":    "My first blog post" }

{ "index":  { "_index": "website", "_type": "blog" }}

{ "title":    "My second blog post" }

{ "update": { "_index": "website", "_type": "blog", "_id": "123", "_retry_on_conflict" : 3} }

{ "doc" : {"title" : "My updated blog post"} }

'


每个子请求都是独立执行，因此某个子请求的失败不会对其他子请求的成功与否造成影响

如果其中任何子请求失败，最顶层的 error
 标志被设置为 true


可以在对应索引或类型下面进行

搜索文档

请求参数方式

curl 'localhost:9200/website/_search?q=*&pretty'


请求体方式

curl -XPOST 'localhost:9200/website/_search?pretty' -d '

{

"query": { "match_all": {} }

}'


原理了解
分布式文档存储

路由一个文档到一个分片中

创建文档时，它根据公式决定这个文档应当被存储在分片 1 还是分片 2：shard = hash(routing) % number_of_primary_shards


routing
 是一个可变值，默认是文档的 _id
 ，也可以设置成一个自定义的值

number_of_primary_shards
 主分片的数量

我们要在创建索引的时候就确定好主分片的数量 并且永远不会改变这个数量：因为如果数量变化了，那么所有之前路由的值都会无效

所有的文档 API（ get 、 index 、 delete 、 bulk 、 update 以及 mget ）都可以接受一个叫做 routing 的路由参数

通过这个参数我们可以自定义文档到分片的映射

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签： 中间间分布式搜索引擎

相关文章推荐

新的分享

章节导航