“决胜云计算大数据时代”
Spark亚太研究院100期公益大讲堂 【第15期互动问答分享】
Q1:AppClient和worker、master之间的关系是什么?
AppClient是在StandAlone模式下SparkContext.runJob的时候在Client机器上应 用程序的代表。要完毕程序的registerApplication等功能。
当程序完毕注冊后Master会通过Akka发送消息给client来启动Driver;
在Driver中管理Task和控制Worker上的Executor来协同工作;
Q2:Spark的shuffle 和hadoop的shuffle的差别大么?
Spark的Shuffle是一种比較严格意义上的shuffle,在Spark中Shuffle是有RDD操作的依赖关系中的Lineage上父RDD中的每一个partition元素的内容交给多个子RDD;
在Hadoop中的Shuffle是一个相对模糊的概念,Mapper阶段介绍后把数据交给Reducer就会产生Shuffle,Reducer三阶段的第一个阶段即是Shuffle。
Q3:Spark
的HA怎么处理的?
对于Master的HA,在Standalone模式下。Worker节点自己主动是HA的,对于Master的HA,一般採用Zookeeper;
Utilizing ZooKeeper to provide leader election and some statestorage, you can launch multiple Masters in your cluster connected to the sameZooKeeper instance. One will be elected “leader” and the others will remain
instandby mode. If the current leader dies, another Master will be elected,recover the old Master’s state, and then resume scheduling. The entire recoveryprocess (from the time the the first leader goes down) should take between 1and 2 minutes. Note that this
delay only affects scheduling new applications– applications that were already running during Master failover are unaffected;
对于Yarn和Mesos模式,ResourceManager一般也会採用ZooKeeper进行HA;
Spark亚太研究院100期公益大讲堂 【第15期互动问答分享】
Q1:AppClient和worker、master之间的关系是什么?
AppClient是在StandAlone模式下SparkContext.runJob的时候在Client机器上应 用程序的代表。要完毕程序的registerApplication等功能。
当程序完毕注冊后Master会通过Akka发送消息给client来启动Driver;
在Driver中管理Task和控制Worker上的Executor来协同工作;
Q2:Spark的shuffle 和hadoop的shuffle的差别大么?
Spark的Shuffle是一种比較严格意义上的shuffle,在Spark中Shuffle是有RDD操作的依赖关系中的Lineage上父RDD中的每一个partition元素的内容交给多个子RDD;
在Hadoop中的Shuffle是一个相对模糊的概念,Mapper阶段介绍后把数据交给Reducer就会产生Shuffle,Reducer三阶段的第一个阶段即是Shuffle。
Q3:Spark
的HA怎么处理的?
对于Master的HA,在Standalone模式下。Worker节点自己主动是HA的,对于Master的HA,一般採用Zookeeper;
Utilizing ZooKeeper to provide leader election and some statestorage, you can launch multiple Masters in your cluster connected to the sameZooKeeper instance. One will be elected “leader” and the others will remain
instandby mode. If the current leader dies, another Master will be elected,recover the old Master’s state, and then resume scheduling. The entire recoveryprocess (from the time the the first leader goes down) should take between 1and 2 minutes. Note that this
delay only affects scheduling new applications– applications that were already running during Master failover are unaffected;
对于Yarn和Mesos模式,ResourceManager一般也会採用ZooKeeper进行HA;
相关文章推荐
- 2015年中国公有云市场的七个重要特征
- 浅谈透明计算与云计算的区别
- 云南省政府与华为签署战略合作协议 共创云计算产业发展新局面
- 【云计算】Ubuntu DNS配置-可用集群入口
- 【VMCloud云平台】私有云门户第一朵SQL云
- discuz云平台开通地址及方法
- 关于云计算,想明白这三个问题,2016才不会虚度
- 云计算时代告别phpMyAdmin
- 云计算与虚拟化
- 参观SpeedyCloud迅达云成-云计算之旅
- 云计算与虚拟化之KVM入门最佳实践
- SpeedyCloud云计算之旅
- SpeedyCloud-云计算之旅
- SpeedyCloud云计算之旅
- 老男孩教育26期运维班全体参观云计算
- 参观Speedy Cloud云计算公司
- SpeedyCloud-云计算之旅
- SpeedyCloud-云计算之旅
- 参观speedycloud云计算有感
- 参观迅达云成云计算公司有感