Storm Topology的并发度
2016-03-10 16:05
507 查看
概念
一个Topology可以包含一个或多个worker(并行的跑在不同的machine上), 所以worker process就是执行一个topology的子集, 并且worker只能对应于一个topology
一个worker可用包含一个或多个executor, 每个component (spout或bolt)至少对应于一个executor, 所以可以说executor执行一个compenent的子集, 同时一个executor只能对应于一个component
Task就是具体的处理逻辑对象, 一个executor线程可以执行一个或多个tasks
但一般默认每个executor只执行一个task, 所以我们往往认为task就是执行线程, 其实不然
task代表最大并发度, 一个component的task数是不会改变的, 但是一个componet的executer数目是会发生变化的
当task数大于executor数时, executor数代表实际并发数
A worker process executes a subset of a topology.
A worker process belongs to a specific topology and may run one or more executors for one or more components (spouts or bolts) of this topology.
A running topology consists of many such processes running on many machines within a Storm cluster.
An executor is a thread that is spawned by a worker process. It may run one or more tasks for the same component (spout or bolt).
A task performs the actual data processing — each spout or bolt that you implement in your code executes as many tasks across the cluster.
The number of tasks for a component is always the same throughout the lifetime of a topology, but the number of executors (threads) for a component can change over time. This means that the following condition holds true:
#threads ≤ #tasks.
By default, the number of tasks is set to be the same as the number of executors, i.e. Storm will run one task per thread.
Configuring the parallelism of a topology, 并发度的配置
The following sections give an overview of the various configuration options and how to set them in your code. There is more than one way of setting these options though, and the table lists only some of them.
Storm currently has the following order of precedence for configuration settings:
defaults.yaml<
storm.yaml< topology-specific configuration < internal component-specific configuration < external component-specific configuration
对于并发度的配置, 在storm里面可以在多个地方进行配置, 优先级如上面所示...
具体包含,
worker processes的数目, 可以通过配置文件和代码中配置, worker就是执行进程, 所以考虑并发的效果, 数目至少应该大于machines的数目
executor的数目, component的并发线程数,只能在代码中配置(通过setBolt和setSpout的参数), 例如, setBolt("green-bolt", new GreenBolt(), 2)
tasks的数目, 可以不配置, 默认和executor1:1, 也可以通过setNumTasks()配置
Number of worker processes
Description: How many worker processes to create for the topology across machines in the cluster.
Configuration option: TOPOLOGY_WORKERS
How to set in your code (examples):
Config#setNumWorkers
Number of executors (threads)
Description: How many executors to spawn per component.
Configuration option: ?
How to set in your code (examples):
TopologyBuilder#setSpout()
TopologyBuilder#setBolt()
Note that as of Storm 0.8 the
parallelism_hintparameter now specifies the initial number of executors (not tasks!) for that bolt.
Number of tasks
Description: How many tasks to create per component.
Configuration option: TOPOLOGY_TASKS
How to set in your code (examples):
ComponentConfigurationDeclarer#setNumTasks()
Here is an example code snippet to show these settings in practice:
topologyBuilder.setBolt("green-bolt", new GreenBolt(), 2) .setNumTasks(4) .shuffleGrouping("blue-spout);
In the above code we configured Storm to run the bolt
GreenBoltwith an initial number of two executors and four associated tasks. Storm will run two tasks per executor (thread). If you do not explicitly configure the number of tasks, Storm will run by default one task per executor.
Example of a running topology
The following illustration shows how a simple topology would look like in operation.The topology consists of three components: one spout called
BlueSpoutand two bolts called
GreenBoltand
YellowBolt.
The components are linked such that
BlueSpoutsends its output to
GreenBolt, which in turns sends its own output to
YellowBolt.
Config conf = new Config(); conf.setNumWorkers(2); // use two worker processes topologyBuilder.setSpout("blue-spout", new BlueSpout(), 2); // set parallelism hint to 2 topologyBuilder.setBolt("green-bolt", new GreenBolt(), 2) .setNumTasks(4) //set tasks number to 4 .shuffleGrouping("blue-spout"); topologyBuilder.setBolt("yellow-bolt", new YellowBolt(), 6) .shuffleGrouping("green-bolt"); StormSubmitter.submitTopology( "mytopology", conf, topologyBuilder.createTopology() );
图和代码, 很清晰, 通过setBolt和setSpout一共定义2+2+6=10个executor threads
并且同setNumWorkers设置2个workers, 所以storm会平均在每个worker上run 5个executors
而对于green-bolt, 定义了4个tasks, 所以每个executor中有2个tasks
How to change the parallelism of a running topology, 动态的改变并发度
Storm支持在不restart topology的情况下, 动态的改变(增减)worker processes的数目和executors的数目, 称为rebalancing.通过Storm web UI, 或者通过storm rebalance命令, 见下面的例子
A nifty feature of Storm is that you can increase or decrease the number of worker processes and/or executors without being required to restart the cluster or the topology. The act of doing so is called rebalancing.
You have two options to rebalance a topology:
Use the Storm web UI to rebalance the topology.
Use the CLI tool storm rebalance as described below.
Here is an example of using the CLI tool:
# Reconfigure the topology "mytopology" to use 5 worker processes, # the spout "blue-spout" to use 3 executors and # the bolt "yellow-bolt" to use 10 executors. $ storm rebalance mytopology -n 5 -e blue-spout=3 -e yellow-bolt=10 http://www.51studyit.com/html/notes/20140329/45.html[/code]
相关文章推荐
- nio 架构图
- Spark Shell由于Scala编译器原因不能正常启动
- Apache Kafka:下一代分布式消息系统
- webdriver selenium org.openqa.selenium.remote.UnreachableBrowserException: Error
- Apache ActiveMQ
- OpenStack及其构成简介
- apache commons包简介
- linux 安装jdk tomcat
- SOA架构设计经验分享—架构、职责、数据一致性
- 5分钟弄懂Docker
- 并行类加载——让tomcat玩转双十一
- openfire开发文档
- linux下如何关闭防火墙?
- linux 查找目录或文件
- linux 文件属性与权限
- Nginx+Keepalived高可用架构平台
- 异常解决 org.apache.catalina.deploy.WebXml addFilter
- Linux 防火墙永久开放端口
- linux统计文件中关键字出现的行号
- 滑动窗口 TOPN 技术实现演变