您的位置：首页 > 运维架构

ES 2.0 集群运维命令整理

2016-07-14 10:36 232 查看

ES 2.0 集群运维命令整理

_cat命令

_cat用于查看集群当前状态,涉及到shard/node/cluster几个层次

基本参数

verbose: 显示列名, 请求参数为v

示例: curl localhost:9200/_cat/master?v

help: 显示当前命令的各列含义, 请求参数为help. 某些命令部分列默认不显示,可通过help该命令可显示的所有列

示例: curl localhost:9200/_cat/master?help

bytes: 数值列还原为原始值. 如diskSize, 默认转为以kb/mb/gb表示, 打开后还原为原始值

示例: curl localhost:9200/_cat/indices?bytes=b

header: 显示指定列的信息, 请求参数为h

示例: curl localhost:9200/_cat/indices?h=i,tm(显示集群各索引的内存使用)

查看segement详细信息(/_cat/segements)

查看各index的segment详细信息,包括segment名, 所属shard, 内存/磁盘占用大小, 是否刷盘, 是否merge为compound文件等. 可以查看指定index的segment信息(/_cat/segments/${index}). 示例:

> curl "localhost:9200/_cat/segments/idx1?v"
index shard prirep ip        segment generation docs.count docs.deleted  size size.memory committed searchable version compound
idx1  0     p      127.0.0.1 _a              10         17            0 3.7kb        2764 true      true       5.2.1   false
idx1  0     p      127.0.0.1 _b              11          2            0 2.9kb        2764 true      true       5.2.1   true
idx1  0     p      127.0.0.1 _c              12          2            0 2.9kb        2764 true      true       5.2.1   true
idx1  0     r      127.0.0.1 _a              10         16            0 3.6kb        2764 true      true       5.2.1   false
idx1  0     r      127.0.0.1 _b              11          3            0 2.9kb        2764 true      true       5.2.1   true
idx1  0     r      127.0.0.1 _c              12          2            0 2.9kb        2764 true      true       5.2.1   true
idx1  1     p      127.0.0.1 _a              10         17            0 3.7kb        2764 true      true       5.2.1   false
idx1  1     p      127.0.0.1 _b              11          2            0 2.9kb        2764 true      true       5.2.1   true
idx1  1     p      127.0.0.1 _c              12          2            0 2.9kb        2764 true      true       5.2.1   true
idx1  1     r      127.0.0.1 _a              10         16            0 3.6kb        2764 true      true       5.2.1   false
idx1  1     r      127.0.0.1 _b              11          3            0 2.9kb        2764 true      true       5.2.1   true
idx1  1     r      127.0.0.1 _c              12          2            0 2.9kb        2764 true      true       5.2.1   true

查看index详细信息(/_cat/indices)

查看集群中所有index的详细信息,包括index状态,shard个数(primary/replica),doc个数等,可参考help. 可以查看指定index的信息(/_cat/indices/${index}). 示例:

> curl localhost:9200/_cat/indices?v
health status index    pri rep docs.count docs.deleted store.size pri.store.size
green  open   idx2       5   1        100            0     92.5kb         32.6kb
green  open   idx1       5   1        100            0     97.7kb         51.6kb
green  open   customer   5   1          0            0      1.5kb           780b

查看alias详细信息(/_cat/aliases)

查看集群中所有alias信息,包括alias对应的index, 路由配置等. 可以查看指定alias的信息(/_cat/aliases/${alias}). 示例:

> curl '192.168.56.10:9200/_cat/aliases?v'
alias  index filter routing.index routing.search
alias2 test1 *      -            -
alias4 test1 -      2            1,2
alias1 test1 -      -            -
alias3 test1 -      1            1

查看shard详细信息(/_cat/shards)

查看各shard的详细情况,包括shard的分布, 当前状态(对于分配失败的shard会有失败原因), doc数量, 磁盘占用情况, shard的访问情况(如所有get请求的成功/失败次数以及对应耗时等). 可以指定index只查看某个index的shard信息(/_cat/shards/${index}). 示例:

> curl "localhost:9200/_cat/shards/idx1?v"
index shard prirep state   docs  store ip        node
idx1  1     p      STARTED   21  9.8kb 127.0.0.1 node-1
idx1  1     r      STARTED   21  9.8kb 127.0.0.1 node-2
idx1  3     p      STARTED   18 12.4kb 127.0.0.1 node-1
idx1  3     r      STARTED   18 12.4kb 127.0.0.1 node-2
idx1  4     p      STARTED   23  9.9kb 127.0.0.1 node-1
idx1  4     r      STARTED   23  9.9kb 127.0.0.1 node-2
idx1  2     p      STARTED   17  9.5kb 127.0.0.1 node-1
idx1  2     r      STARTED   17  3.9kb 127.0.0.1 node-2
idx1  0     p      STARTED   21  9.8kb 127.0.0.1 node-1
idx1  0     r      STARTED   21  9.8kb 127.0.0.1 node-2

对于RELOCATING的shard, 该命令会给出源node和目标node, 官方示例:

> curl 192.168.56.10:9200/_cat/shards | fgrep RELO
wiki1 0 r RELOCATING 3014 31.1mb 192.168.56.20 Commander Kraken -> 192.168.56.30 Frankie Raye
wiki1 1 r RELOCATING 3013 29.6mb 192.168.56.10 Stiletto -> 192.168.56.30 Frankie Raye

查看单节点分配信息(/_cat/allocation)

查看单节点的shard分配整体情况.示例:

> curl localhost:9200/_cat/allocation?v,

shards disk.used disk.avail disk.total disk.percent host      ip        node
5    20.3gb    302.1gb    322.5gb            6 127.0.0.1 127.0.0.1 node-1
5    20.3gb    302.1gb    322.5gb            6 127.0.0.1 127.0.0.1 node-2

注: diskUsed是节点磁盘使用情况,不仅仅是shard大小

查看单节点的自定义属性(/_cat/nodeattrs)

查看单节点的自定义属性,示例

> curl 192.168.56.10:9200/_cat/nodeattrs

node       host    ip          attr  value
Black Bolt epsilon 192.168.1.8 rack  rack314
Black Bolt epsilon 192.168.1.8 azone us-east-1

查看集群当前状态(/_cat/health)

查看集群当前状态, 包括data节点个数,primary shard个数等基本信息. 示例:

> curl localhost:9200/_cat/health?v
epoch      timestamp cluster status node.total node.data shards pri relo init unassign pending_tasks max_task_wait_time active_shards_percent
1468399080 16:38:00  test-es green           2         2     30  15    0    0        0             0                  -                100.0%

status列为green时表示集群正常; yellow表示部分shards的primary已分配,replica未分配; red表示部分shard的primary未分配

注: 该命令可用于跟踪集群由于节点宕机导致的recover过程, 官方示例:

> while true; do curl 192.168.56.10:9200/_cat/health; sleep 120; done
1384309446 18:24:06 foo red 3 3 20 20 0 0 1812 0
1384309566 18:26:06 foo yellow 3 3 950 916 0 12 870 0
1384309686 18:28:06 foo yellow 3 3 1328 916 0 12 492 0
1384309806 18:30:06 foo green 3 3 1832 916 4 0 0
^C

查看集群各个节点的当前状态(/_cat/nodes)

查看集群各个节点的当前状态, 包括节点的物理参数(包括os/jdk版本, uptime, 当前mem/disk/fd使用情况等), 请求访问情况(如search/index成功和失败的次数)等详细信息, 示例:

>  curl "localhost:9200/_cat/nodes?v"
host      ip        heap.percent ram.percent load node.role master name
127.0.0.1 127.0.0.1            5          93 0.26 d         m      node-2
127.0.0.1 127.0.0.1            9          93 0.26 d         *      node-1

查看集群master节点(/_cat/master)

查看集群中的master节点, 示例

> curl localhost:9200/_cat/master?v

id                     host      ip        node
i-SDbdpAQIaPv0J9SIoAvA 127.0.0.1 127.0.0.1 node-1

查看集群fielddata内存占用情况(/_cat/fielddata)

查看当前集群各个节点的fielddata内存使用情况,示例:

> curl '192.168.56.10:9200/_cat/fielddata?v'
id                     host    ip            node          total   body    text
c223lARiSGeezlbrcugAYQ myhost1 10.20.100.200 Jessica Jones 385.6kb 159.8kb 225.7kb
waPCbitNQaCL6xC8VxjAwg myhost2 10.20.100.201 Adversary     435.2kb 159.8kb 275.3kb
yaDkp-G3R0q1AJ-HUEvkSQ myhost3 10.20.100.202 Microchip     284.6kb 109.2kb 175.3kb

total列表示fielddata在该节点的内存占用情况

查看集群doc数量(/_cat/count)

查看当前集群的doc数量; 也可显示指定index的doc数量,格式为/_cat/count/${index}, 示例:

> curl localhost:9200/_cat/indices?v
health status index    pri rep docs.count docs.deleted store.size pri.store.size
green  open   idx2       5   1        100            0     92.5kb         32.6kb
green  open   idx1       5   1        100            0     97.7kb         51.6kb
green  open   customer   5   1          0            0      1.5kb           780b
> curl localhost:9200/_cat/count?v
epoch      timestamp count
1468399423 16:43:43  200

> curl localhost:9200/_cat/count/idx1?v
epoch      timestamp count
1468399428 16:43:48  100

> curl localhost:9200/_cat/count/idx2?v
epoch      timestamp count
1468399430 16:43:50  100

返回前两列是命令当前时间,第三列count列是doc的count值

查看集群的pendingTask情况(/_cat/pending_tasks)

查看当前集群的pending task, 示例:

> curl 'localhost:9200/_cat/pending_tasks?v'
insertOrder timeInQueue priority source
1685       855ms HIGH     update-mapping [foo][t]
1686       843ms HIGH     update-mapping [foo][t]
1693       753ms HIGH     refresh-mapping [foo][[t]]
1688       816ms HIGH     update-mapping [foo][t]
1689       802ms HIGH     update-mapping [foo][t]
1690       787ms HIGH     update-mapping [foo][t]
1691       773ms HIGH     update-mapping [foo][t]

查看集群各节点的plugin信息(/_cat/plugins)

查看集群各个节点上的plugin信息, 示例:

> curl "localhost:9200/_cat/plugins?v"
name   component version type url
node-2 head      master  s    /_plugin/head/
node-2 kopf      2.1.2   s    /_plugin/kopf/

查看集群的recovery情况(/_cat/recovery)

查看集群内每个shard的recovery过程. 调整replica,恢复snapshot或者节点启动都会触发shard的recover.

示例1(节点启动的recovery, 来自官方doc):

> curl -XGET 'localhost:9200/_cat/recovery?v'
index shard time type    stage source target files percent bytes     percent
wiki  0     73   store done  hostA  hostA  36    100.0%  24982806 100.0%
wiki  1     245  store done  hostA  hostA  33    100.0%  24501912 100.0%
wiki  2     230  store done  hostA  hostA  36    100.0%  30267222 100.0%

示例2(增加replica, 来自官方doc):

> curl -XPUT 'localhost:9200/wiki/_settings' -d'{"number_of_replicas":1}'
{"acknowledged":true}

> curl -XGET 'localhost:9200/_cat/recovery?v'
index shard time type    stage source target files percent bytes    percent
wiki  0     1252 store done  hostA  hostA  4     100.0%  23638870 100.0%
wiki  0     1672 replica index hostA  hostB  4     75.0%   23638870 48.8%
wiki  1     1698 replica index hostA  hostB  4     75.0%   23348540 49.4%
wiki  1     4812 store done  hostA  hostA  33    100.0%  24501912 100.0%
wiki  2     1689 replica index hostA  hostB  4     75.0%   28681851 40.2%
wiki  2     5317 store done  hostA  hostA  36    100.0%  30267222 100.0%

示例3(恢复snapshot, 来自官方doc):

> curl -XPOST 'localhost:9200/_snapshot/imdb/snapshot_2/_restore'
{"acknowledged":true}

> curl -XGET 'localhost:9200/_cat/recovery?v'
index shard time type     stage repository snapshot files percent bytes percent
imdb  0     1978 snapshot done  imdb       snap_1   79    8.0%    12086 9.0%
imdb  1     2790 snapshot index imdb       snap_1   88    7.7%    11025 8.1%
imdb  2     2790 snapshot index imdb       snap_1   85    0.0%    12072 0.0%
imdb  3     2796 snapshot index imdb       snap_1   85    2.4%    12048 7.2%
imdb  4     819  snapshot init  imdb       snap_1   0     0.0%    0     0.0%

查看集群各节点的threadpool统计信息(/_cat/thread_pool)

查看集群各节点内部不同类型的threadpool的统计信息, 覆盖了es对外所有请求的threadpool.统计指标包括了threadpool的类型, 线程存活时间,活跃线程数和最大线程数,任务队列大小以及当前任务数等. 示例:

> curl "localhost:9200/_cat/thread_pool?v"
host      ip        bulk.active bulk.queue bulk.rejected index.active index.queue index.rejected search.active search.queue search.rejected
127.0.0.1 127.0.0.1           0          0             0            0           0              0             0            0               0
127.0.0.1 127.0.0.1           0          0             0            0           0              0             0            0               0

由于当前本机没有index/search/bulk请求,所以示例中active/rejected/queue指标为0

后边在补充_node和_cluster的命令

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航