您的位置：首页 > 其它

数据分析框架 - ELK（Elasticsearch, Logstash, Kibana）

2017-12-22 21:47 766 查看

ELK

是

Elastic

公司推出的开源数据分析框架，目前流行于构建

Log/Trace

分析平台，该框架主要由三个开源软件

Elasticsearch

，

Logstash

和

kibana

组成，这三者构成数据分析框架的通用范式，即Ingest Pipeline => Database & Indexing => Logical Frontend
。

数据分析范式 + 数据存储格式

1.

Ingest Pipeline

，将原始数据转化为数据库所对应的存储格式。

2.

Database & Indexing

，以特定格式存储数据并通过外部接口提供数据索引检索功能。

3.

Logical Frontend

，基于特定逻辑发送数据检索消息并对回馈的数据进行相应的分析处理。

数据存储格式是实现数据库存储功能的关键技术点，传统的关系型数据使用所谓“表类型”的存储方式，这种方式的特点是其“表头

(Column)

”字段固定且表建立后无法更改，在如今大数据应用的背景下愈发不灵活，因此出现了摒弃“表”的所谓

NoSQL

数据库，例如

MongoDB

就使用

JSON

条目（

{ xx : yy }

）替代传统表的

Record

，

Elasticsearch

采用

MongoDB

所使用的“

JSON

文档”方式，对比下表可以发现

Elasticsearch

中没有所谓

DB

的概念，其顶级节点

Index

直接类似于

MySQL

的

Table

以及

MongoDB

的

Collection

，于其中存储格式数据即

JSON

条目，根据

Elasticsearch

的方案规定，每条

JSON

记录都要定义其

Type

和

Document ID

，分别对应字段

_type

和

_id

以用于后期检索，记录数据存放在

_source

子字段中。

MySQL： DB -> Table <=> Column&Record (TABLE-TYPE)
MongoDB： DB -> Collection <=> { field : value } (JSON-TYPE)
Elasticsearch： Index <=> { _index : xx , _type : xx, _id : xx, _source : { field : value } } {JSON-TYPE}

ELK

Elastic

产品的

UG

内容详实非常实用，用于指导

How to do

绰绰有余，且软件均基于

Java

等跨平台语言开发因此无需安装只需配置

Java

环境下载解压即可运行程序。

https://www.elastic.co/guide/en/logstash/current/index.html

https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html

https://www.elastic.co/guide/en/kibana/current/index.html

1. Logstash

Logstash

是对数据处理的管道工具，其中两个必须定义的管道为

Input

和

Output

，可选的中间管道为

Filter

，所以

Logstash

的数据处理流程即为

Source -> Input -> Filter -> Output -> Destination

，管道配置可以通过“

-e

”参数直接在命令行中指定，也可以定义在文件中通过“

-f

”参数指定文件，管道配置如下所示。

input {
file { path => "/var/log/syslog" }
tcp {
port => 13000
type => syslog
}
udp {
port => 13000
type => syslog
}
}

filter {
grok {
match => { "message" => "%{COMBINEDAPACHELOG}" }
}
date {
match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ]
}
}

output {
stdout { codec => rubydebug }
file {
path => "/home/ce/file"
}
elasticsearch { hosts => localhost }
}

其中各个管道中子命令都是可选配置，最简单的执行命令是

bin/logstash -e 'input { stdin { } } output { stdout { } }'

。

2. Elasticsearch

Elasticsearch

是数据分析的核心，用于数据的格式化存储以及索引检索，其数据库功能类似于

MongoDB

使用

Index

记录若干个

JSON

条目，其检索功能基于开源项目

Lucene

并提供基于

JSON

交互的

RESTful

接口，下载

Elasticsearch

软件包解压后直接执行

bin/elasticsearch

即可运行程序，因对内核性能要求较高所以需要优化内核参数。

sysctl -w vm.max_map_count=262144
#/etc/sysctl.conf
vm.max_map_count=262144
#/etc/security/limits.conf
* soft nofile 65536
* hard nofile 131072
* soft nproc 2048
* hard nproc 4096

Elasticsearch

启动后默认监听

127.0.0.1

的

端口，如需外部流量访问的话就要更改

config/elasticsearch.yml

中

network.host: 0.0.0.0

即可通过

http://<IP>:9200

访问

RESTful

接口。

curl -XGET 'localhost:9200/_cat/health?v'

curl -XGET 'localhost:9200/_cat/indices?v'

#Data Add
curl -XPUT 'localhost:9200/customer?pretty'

curl -XPUT 'localhost:9200/customer/doc/1?pretty' -H 'Content-Type: application/json' -d'
{ "name": "John Doe" }
'

curl -XPOST 'localhost:9200/customer/doc/_bulk?pretty' -H 'Content-Type: application/json' -d'
{"index":{"_id":"2"}}
{"name": "John Doe", "yesno": "yes", "goodbad": "bad"}
{"index":{"_id":"3"}}
{"name": "Jane Doe" ,"yesno": "yes", "goodbad": "good"}
{"index":{"_id":"4"}}
{"name": "Tom Doe" ,"yesno": "yesd", "goodbad": "goodd"}
'

curl -H 'Content-Type: application/x-ndjson' -XPOST 'localhost:9200/bank/account/_bulk?pretty' --data-binary @accounts.json
###accounts.json###
{"index":{"_id":"1"<
b312
/span>}}
{"account_number":1,"balance":39225,"firstname":"Amber","lastname":"Duke","age":32,"gender":"M","address":"880 Holmes Lane","employer":"Pyrami","email":"amberduke@pyrami.com","city":"Brogan","state":"IL"}

curl -H 'Content-Type: application/x-ndjson' -XPOST 'localhost:9200/shakespeare/doc/_bulk?pretty' --data-binary @shakespeare_6.0.json
###shakespeare_6.0.json###
{"index":{"_index":"shakespeare","_id":0}}
{"type":"act","line_id":1,"play_name":"Henry IV", "speech_number":"","line_number":"","speaker":"","text_entry":"ACT I"}

curl -H 'Content-Type: application/x-ndjson' -XPOST 'localhost:9200/_bulk?pretty' --data-binary @logs.jsonl
###logs.jsonl###
{"index":{"_index":"logstash-2015.05.18","_type":"log"}}
{"@timestamp":"2015-05-18T09:03:25.877Z","ip":"185.124.182.126","extension":"gif","response":"404","@version":"1"}

#Data Delete
curl -XDELETE 'localhost:9200/customer?pretty'

#Data Get
curl -XGET 'localhost:9200/customer/doc/1?pretty'

#Data Search
curl -XGET 'localhost:9200/_search?pretty' -H 'Content-Type: application/json' -d'
{
"query": {
"match_all": {}
}
}
'

curl -XGET 'http://localhost:9200/_search?pretty' -H 'Content-Type: application/json' -d'
{
"query": {
"match": {"name":"Jane"}
}
}
'

3. Kibana

Kibana

是架构中的

Web

前端，用于检索数据并对回馈的数据进行可视化处理，下载软件包解压后首先编辑

config/kibana.yml

配置文件指定

Elasticsearch RESTful

接口地址和

Kibana Web

服务监听

IP

。

server.host: "0.0.0.0"
elasticsearch.url: "http://localhost:9200"

执行

bin/kibana

启动

Kibana

服务，默认监听

port

口为

，访问

http://<ip>:5601

即可进入

Kibana Web

页面。

a. Management - Index Patterns

Index Pattern

即对

Elasticsearch

数据库中的

Index

名称进行匹配，如使用通配符

即可匹配多个

Index

，建立

Index Pattern

后对应

Index

的数据就可以在

Discover

页面检索到。

b. Discover

该页面可以显示相应

Index

中的数据，注意左上角的

TimeRange

，如果提示

No Data Found

调整时间范围即可。

c. Visualize

该页面可以使用

Index

中某些字段的数据进行绘图，功能比较强大。

d. Dashboard

该页面可以将上步建立的可视化图表作为独立模块显示。

PS: Dev Tools - Console

该工具类似于

curl

用于发送

RESTful

接口消息给

Elasticsearch

，通常用于调试，比直接使用

curl

用户体验好。

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航