09 JobManager 高可用安装(HA)
2015-11-11 21:51
716 查看
JobManager High Availability (HA)
The JobManager is the coordinator of each Flink deployment. It is responsible for both scheduling and resource management.By default, there is a single JobManager instance per Flink cluster. This creates a single point of failure (SPOF): if the JobManager crashes, no new programs can be submitted and running programs
fail.
With JobManager High Availability, you can run multiple JobManager instances per Flink cluster and thereby circumvent the SPOF.
The general idea of JobManager high availability is that there is a single leading JobManager at any time and multiple standby JobManagers to
take over leadership in case the leader fails. This guarantees that there is no single point of failure and programs can make progress as soon as a standby JobManager has taken leadership. There is no explicit distinction
between standby and master JobManager instances. Each JobManager can take the role of master or standby.
As an example, consider the following setup with three JobManager instances:
Configuration
To enable JobManager High Availability you have to set the recovery mode to zookeeper, configure a ZooKeeper quorum andset up a masters file with all JobManagers hosts and their web ui ports.
Flink leverages ZooKeeper for distributed
coordination between all running JobManager instances. ZooKeeper is a separate service from Flink, which provides highly reliable distirbuted coordination via leader election and light-weight consistent state storage. Check out ZooKeeper’s
Getting Started Guide for more information about ZooKeeper.
Setting Flink’s recovery mode to zookeeper in
conf/flink-conf.yamlenables high
availability mode.
Additionally, you have to configure a ZooKeeper quorum in the same configuration file.
In high availabliity mode, all Flink components try to connect to a JobManager via coordination through ZooKeeper.
Recovery mode (required): The recovery mode has to be set in
conf/flink-conf.yamlto zookeeper in
order to enable high availability mode.
recovery.mode: zookeeper
ZooKeeper quorum (required): A ZooKeeper quorum is a replicated group of ZooKeeper servers, which provide the distributed coordination service.
recovery.zookeeper.quorum: address1:2181[,...],addressX:2181
Each addressX:port refers to a ZooKeeper server, which is reachable by Flink at the given address and port.
The following configuration keys are optional:
recovery.zookeeper.path.root: /flink [default]: ZooKeeper directory to use for coordination
TODO Add client configuration keys
Starting an HA-cluster
In order to start an HA-cluster configure the masters file in conf/masters:
masters file: The masters file contains all hosts, on which JobManagers are started, and the ports to which the web user interface binds.
jobManagerAddress1:webUIPort1 [...] jobManagerAddressX:webUIPortX
After configuring the masters and the ZooKeeper quorum, you can use the provided cluster startup scripts as usual. They will start a HA-cluster.Keep in mind that the ZooKeeper quorum
has to be running when you call the scripts.
Running ZooKeeper
If you don’t have a running ZooKeeper installation, you can use the helper scripts, which ship with Flink.There is a ZooKeeper configuration template in
conf/zoo.cfg. You can configure
the hosts to run ZooKeeper on with the
server.Xentries, where X is a unique ID of each server:
server.X=addressX:peerPort:leaderPort [...] server.Y=addressY:peerPort:leaderPort
The script
bin/start-zookeeper-quorum.shwill start a ZooKeeper server on
each of the configured hosts. The started processes start ZooKeeper servers via a Flink wrapper, which reads the configuration from
conf/zoo.cfgand
makes sure to set some required configuration values for convenience. In production setups, it is recommended to manage your own ZooKeeper installation.
Example: Start and stop a local HA-cluster with 2 JobManagers
Configure recovery mode and ZooKeeper quorum in conf/flink.yaml:
recovery.mode: zookeeper recovery.zookeeper.quorum: localhost
Configure masters in
conf/masters:
localhost:8081 localhost:8082
Configure ZooKeeper server in
conf/zoo.cfg(currently
it’s only possible to run a single ZooKeeper server per machine):
server.0=localhost:2888:3888
Start ZooKeeper quorum:
$ bin/start-zookeeper-quorum.sh Starting zookeeper daemon on host localhost.
Start an HA-cluster:
$ bin/start-cluster-streaming.sh Starting HA cluster (streaming mode) with 2 masters and 1 peers in ZooKeeper quorum. Starting jobmanager daemon on host localhost. Starting jobmanager daemon on host localhost. Starting taskmanager daemon on host localhost.
Stop ZooKeeper quorum and cluster:
$ bin/stop-cluster.sh Stopping taskmanager daemon (pid: 7647) on localhost. Stopping jobmanager daemon (pid: 7495) on host localhost. Stopping jobmanager daemon (pid: 7349) on host localhost. $ bin/stop-zookeeper-quorum.sh Stopping zookeeper daemon (pid: 7101) on host localhost.
相关文章推荐
- 朋友介绍的一个购物网站,非常好!!!
- 学习网站
- 简书-一个集优雅文艺和Markdown于一身的网站
- 《大型网站技术架构》读书笔记1
- Ted Mosby - 一个MVP框架的软件架构
- 解决手机网站点击出现蓝色框的问题
- 网站统计中的PV-UV-IP的定义与区别
- ARM内核和架构都是什么意思,内核和架构的关系是什么?
- 理解RESTful架构
- 三层PetShop架构设计
- 建设商业网站
- rem单位在手机网站中的使用
- 网页设计图片素材网站 收集
- web 前端学习网站 收集
- 使用QQ第三方登录时,手机应用和网站应用对同一个QQ号,获取到的openid不一样
- 【转】【支付 . 技术控】最全最强解析:支付宝系统架构内部剖析(架构图)
- 网页设计参考网站 收集
- linux下的php网站放到Windows服务器IIS下.htaccess文件伪静态规则转换
- 网站的安全架构
- 网站建立