您的位置:首页 > 其它

Spark集群安装介绍

2016-06-19 00:39 603 查看
 
(1)初学者对于spark的几个疑问http://aperise.iteye.com/blog/2302481
(2)spark开发环境搭建http://aperise.iteye.com/blog/2302535
(3)Spark Standalone集群安装介绍http://aperise.iteye.com/blog/2305905
(4)spark-shell 读写hdfs 读写redis 读写hbasehttp://aperise.iteye.com/blog/2324253


Spark集群安装介绍


1.Spark集群方式介绍


    1.1 Spark支持的集群管理方式

        在Hadoop中,提供了一种编程模型mapreduce方便开发人员编写程序,编写好的mapreduce程序需要进行分布式计算,这时Hadoop又提供了一种资源管理和调度框架yarn,方便mapreduce程序在集群中节点之间分发,负责mapreduce相关资源调度管理

        与此类似,spark中类似的提供的编程模型是RDD(Resilient Distributed Datasets,弹性分布式数据集),一系列RDDs构成了spark中的计算程序,而spark本身也提供类似yarn的资源管理和调度框架,这个就是spark本身。但spark不仅仅局限于此,它也支持其他调度框架,比如RDD运行于Apache
Mesos
Hadoop YARNEC2,详见官网介绍http://spark.apache.org/docs/1.6.0/cluster-overview.html



 


    1.2 Spark 集群方式介绍

        对于Spark Standalone Mode方式,我的理解是,一系列RDDs组成的计算程序,其管理和调度这是spark本身,不依赖于Apache Mesos、Hadoop YARN、EC2,详见spark官网http://spark.apache.org/docs/1.6.0/spark-standalone.html

        在这种方式下,集群的安装方式又分为三种:
Spark 集群:此方式下,只有一个master管理所有worker节点,如果master宕机或者出问题,整个计算会停止,存在master单点故障,一旦出问题不可恢复
Spark 基于本地文件高可用HA集群:此方式下,只有一个master管理所有worker节点,但会配置一个本地目录文件,master和worker在跑任务时会在此目录下写数据来进行注册,一旦master宕机或者出问题,在再次启动master后,之前任务可以从文件目录中恢复
Spark 基于zookeeper高可用HA集群:此方式下,会启动多个master,多个master中只有一个处于激活状态并且管理所有worker,在激活状态的master宕机或者出问题时候,通过zookeeper的协调服务,将之前注册于zookeeper其上的备用mater中选举一个新的master,让它接管之前master来恢复任务执行


2.Spark集群方式安装


    2.1 集群安装环境介绍



 


    2.2 Spark集群安装前准备


        1)关闭防火墙

centos7防火墙操作介绍 

#centos7启动firewall

systemctl start firewalld.service

#centos7重启firewall

systemctl restart firewalld.service

#centos7停止firewall

systemctl stop firewalld.service 

#centos7禁止firewall开机启动

systemctl disable firewalld.service 

#centos7查看防火墙状态

firewall-cmd --state

#开放防火墙端口

vi /etc/sysconfig/iptables-config

-A RH-Firewall-1-INPUT -p tcp -m state --state NEW -m tcp --dport 6379 -j ACCEPT

-A RH-Firewall-1-INPUT -p tcp -m state --state NEW -m tcp --dport 6380 -j ACCEPT

-A RH-Firewall-1-INPUT -p tcp -m state --state NEW -m tcp --dport 6381 -j ACCEPT

-A RH-Firewall-1-INPUT -p tcp -m state --state NEW -m tcp --dport 16379 -j ACCEPT

-A RH-Firewall-1-INPUT -p tcp -m state --state NEW -m tcp --dport 16380 -j ACCEPT

-A RH-Firewall-1-INPUT -p tcp -m state --state NEW -m tcp --dport 16381 -j ACCEPT

         这里我关闭防火墙,root下执行如下命令:

systemctl stop firewalld.service 

systemctl disable firewalld.service

 


        2)优化selinux

        作用:spark主节点管理子节点是通过SSH实现的, SELinux不关闭的情况下无法实现,会限制ssh免密码登录。

        编辑/etc/selinux/config,修改前:

# This file controls the state of SELinux on the system.

# SELINUX= can take one of these three values:

# enforcing - SELinux security policy is enforced.

# permissive - SELinux prints warnings instead of enforcing.

# disabled - No SELinux policy is loaded.

SELINUX=enforcing

# SELINUXTYPE= can take one of these two values:

# targeted - Targeted processes are protected,

# minimum - Modification of targeted policy. Only selected processes are protected. 

# mls - Multi Level Security protection.

SELINUXTYPE=targeted

          修改后:

# This file controls the state of SELinux on the system.

# SELINUX= can take one of these three values:

# enforcing - SELinux security policy is enforced.

# permissive - SELinux prints warnings instead of enforcing.

# disabled - No SELinux policy is loaded.

#SELINUX=enforcing

SELINUX=disabled

# SELINUXTYPE= can take one of these two values:

# targeted - Targeted processes are protected,

# minimum - Modification of targeted policy. Only selected processes are protected. 

# mls - Multi Level Security protection.

#SELINUXTYPE=targeted

          执行以下命令使selinux 修改立即生效:

setenforce 0

 


        3)机器名配置

        作用:spark集群中机器IP可能变化导致集群间服务中断,所以在Hadoop中最好以机器名进行配置。

        修改各机器上文件/etc/hostname,配置主机名称如下:

127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4

::1 localhost localhost.localdomain localhost6 localhost6.localdomain6

192.168.185.31 hadoop31

192.168.185.32 hadoop32

192.168.185.33 hadoop33

192.168.185.34 hadoop34

192.168.185.35 hadoop35

          而centos7下各个机器的主机名设置文件为/etc/hostname,以hadoop31节点主机配置为例,配置如下:

#localdomain

hadoop31

 


        4)创建hadoop用户和组

        作用:后续单独以用户hadoop来管理spark集群,防止其他用户误操作关闭spark集群

#以root用户创建hadoop用户和组创建hadoop用户和组 

groupadd hadoop 

useradd -g hadoop hadoop 

#修改用户密码

passwd hadoop

 


        5)用户hadoop免秘钥登录

        作用:spark中主节点管理从节点是通过SSH协议登录到从节点实现的,而一般的SSH登录,都是需要输入密码验证的,为了spark主节点方便管理成千上百的从节点,这里将主节点公钥拷贝到从节点,实现SSH协议免秘钥登录,我这里做的是所有主从节点之间机器免秘钥登录

#首先切换到上面的hadoop用户,这里我是在hadoop31机器上操作 

ssh hadoop31

su hadoop 

#生成非对称公钥和私钥,这个在集群中所有节点机器都必须执行,一直回车就行 

ssh-keygen -t rsa 

#通过ssh登录远程机器时,本机会默认将当前用户目录下的.ssh/authorized_keys带到远程机器进行验证,这里是/home/hadoop/.ssh/authorized_keys中公钥(来自其他机器上的/home/hadoop/.ssh/id_rsa.pub.pub),以下代码只在主节点执行就可以做到主从节点之间SSH免密码登录 

cd /home/hadoop/.ssh/ 

#首先将Master节点的公钥添加到authorized_keys 

cat id_rsa.pub>>authorized_keys 

#其次将Slaves节点的公钥添加到authorized_keys,这里我是在Hadoop31机器上操作的 

ssh hadoop@192.168.185.32 cat /home/hadoop/.ssh/id_rsa.pub>> authorized_keys 

ssh hadoop@192.168.185.33 cat /home/hadoop/.ssh/id_rsa.pub>> authorized_keys 

ssh hadoop@192.168.185.34 cat /home/hadoop/.ssh/id_rsa.pub>> authorized_keys 

ssh hadoop@192.168.185.35 cat /home/hadoop/.ssh/id_rsa.pub>> authorized_keys 

#必须设置修改/home/hadoop/.ssh/authorized_keys权限 

chmod 600 /home/hadoop/.ssh/authorized_keys 

#这里将Master节点的authorized_keys分发到其他slaves节点 

scp -r /home/hadoop/.ssh/authorized_keys hadoop@192.168.185.32:/home/hadoop/.ssh/ 

scp -r /home/hadoop/.ssh/authorized_keys hadoop@192.168.185.33:/home/hadoop/.ssh/ 

scp -r /home/hadoop/.ssh/authorized_keys hadoop@192.168.185.34:/home/hadoop/.ssh/ 

scp -r /home/hadoop/.ssh/authorized_keys hadoop@192.168.185.35:/home/hadoop/.ssh/

 


        6)JDK安装

        作用:spark需要java环境支撑,java环境安装如下:

#登录到到到hadoop用户下

su hadoop

#下载jdk-7u65-linux-x64.gz放置于/home/hadoop/java并解压

cd /home/hadoop/java

tar -zxvf jdk-7u65-linux-x64.gz

#编辑vi /home/hadoop/.bashrc,在文件末尾追加如下内容

export JAVA_HOME=/home/hadoop/java/jdk1.7.0_65 

export CLASSPATH=.:$JAVA_HOME/jre/lib/rt.jar:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar 

export PATH=$PATH:$JAVA_HOME/bin 

#使得/home/hadoop/.bashrc配置生效

source /home/hadoop/.bashrc

          很多人是配置linux全局/etc/profile,这里不建议这么做,一旦有人在里面降级了java环境或者删除了java环境,就会出问题,建议的是在管理spark集群的用户下面修改其.bashrc单独配置该用户环境变量

 


        7)zookeeper安装

         作用:用于后期spark基于ZK的HA方式使用

#1登录hadoop用户并下载并解压zookeeper3.4.6

su hadoop

cd /home/hadoop 

tar -zxvf zookeeper-3.4.6.tar.gz 

#2在集群中各个节点中配置/etc/hosts,内容如下:

127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4

::1 localhost localhost.localdomain localhost6 localhost6.localdomain6

192.168.185.31 hadoop31 

192.168.185.32 hadoop32 

192.168.185.33 hadoop33 

192.168.185.34 hadoop34 

192.168.185.35 hadoop35

#3在集群中各个节点中创建zookeeper数据文件

ssh hadoop31

cd /home/hadoop 

#zookeeper数据存放位置

mkdir -p /opt/hadoop/zookeeper 

ssh hadoop32

cd /home/hadoop 

#zookeeper数据存放位置

mkdir -p /opt/hadoop/zookeeper 

ssh hadoop33

cd /home/hadoop 

#zookeeper数据存放位置

mkdir -p /opt/hadoop/zookeeper 

ssh hadoop34

cd /home/hadoop 

#zookeeper数据存放位置

mkdir -p /opt/hadoop/zookeeper 

ssh hadoop35

cd /home/hadoop 

#zookeeper数据存放位置

mkdir -p /opt/hadoop/zookeeper 

#4配置zoo.cfg

ssh hadoop31

cd /home/hadoop/zookeeper-3.4.6/conf

cp zoo_sample.cfg zoo.cfg

vi zoo.cfg

#内容如下

initLimit=10 

syncLimit=5 

dataDir=/opt/hadoop/zookeeper 

clientPort=2181

#数据文件保存最近的3个快照,默认是都保存,时间长的话会占用很大磁盘空间

autopurge.snapRetainCount=3

#单位为小时,每小时清理一次快照数据

autopurge.purgeInterval=1

server.1=hadoop31:2888:3888 

server.2=hadoop32:2888:3888 

server.3=hadoop33:2888:3888

server.4=hadoop34:2888:3888 

server.5=hadoop35:2888:3888 

#5在hadoop31上远程复制分发安装文件

scp -r /home/hadoop/zookeeper-3.4.6 hadoop@hadoop32:/home/hadoop/ 

scp -r /home/hadoop/zookeeper-3.4.6 hadoop@hadoop33:/home/hadoop/ 

scp -r /home/hadoop/zookeeper-3.4.6 hadoop@hadoop34:/home/hadoop/ 

scp -r /home/hadoop/zookeeper-3.4.6 hadoop@hadoop35:/home/hadoop/ 

#6在集群中各个节点设置myid必须为数字 

ssh hadoop31 

echo "1" > /opt/hadoop/zookeeper/myid 

ssh hadoop32 

echo "2" > /opt/hadoop/zookeeper/myid 

ssh hadoop33 

echo "3" > /opt/hadoop/zookeeper/myid 

#7.各个节点如何启动zookeeper

ssh hadoop31

/home/hadoop/zookeeper-3.4.6/bin/zkServer.sh start

#8.各个节点如何关闭zookeeper

ssh hadoop31

/home/hadoop/zookeeper-3.4.6/bin/zkServer.sh stop 

#9.各个节点如何查看zookeeper状态

ssh hadoop31

/home/hadoop/zookeeper-3.4.6/bin/zkServer.sh status 

#10.各个节点如何通过客户端访问zookeeper上目录数据

ssh hadoop31

/home/hadoop/zookeeper-3.4.6/bin/zkCli.sh -server hadoop31:2181,hadoop32:2181,hadoop33:2181,hadoop34:2181,hadoop35:2181

 


        8)scala环境安装

#五台机器上下载scala-2.11.7.tgz放置于/home/hadoop/java下并解压

wget http://downloads.typesafe.com/scala/2.11.7/scala-2.11.7.tgz
cd /home/hadoop/java

tar –zxvf scala-2.11.7.tgz

#编辑/home/hadoop/.bashrc,增如下内容

export SCALA_HOME=/home/hadoop/java/scala-2.11.7

export PATH=$PATH:$SCALA_HOME/bin

#使得配置生效

source /home/hadoop/.bashrc

#检测Scala是否安装成功

scala -version

     


        9)安装spark-1.6.0-bin-hadoop2.6

#下载spark-1.6.0-bin-hadoop2.6.tgz放置于/home/hadoop下并解压

wget http://d3kbcqa49mib13.cloudfront.net/spark-1.6.0-bin-hadoop2.6.tgz
tar -zxf spark-1.6.0-bin-hadoop2.6.tgz

#编辑/home/hadoop/.bashrc,增如下内容

export SPARK_HOME=/home/hadoop/spark-1.6.0-bin-hadoop2.6

export PATH=$PATH:$SPARK_HOME/bin

#使得上述配置生效

source /home/hadoop/.bashrc

 


    2.3 Spark普通集群方式安装


        1)spark-env.sh

        复制/home/hadoop/spark-1.6.0-bin-hadoop2.6/conf/spark-env.sh.template为/home/hadoop/spark-1.6.0-bin-hadoop2.6/conf/spark-env.sh,在最后增加如下内容:

export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=NONE"

export SPARK_HISTORY_OPTS="-Dspark.history.retainedApplications=10 -Dspark.history.fs.logDirectory=hdfs://bigdatacluster-ha/historyserverforspark"

export SCALA_HOME=/home/hadoop/java/scala-2.11.7

export JAVA_HOME=/home/hadoop/java/jdk1.7.0_65

#export SPARK_MASTER_IP=hadoop31

#export SPARK_MASTER_PORT=7077

export SPARK_WORKER_MEMORY=8g

export SPARK_WORKER_CORES=4

export SPARK_WORKER_INSTANCES=4

        SPARK_WORKER_INSTANCES参数设置每个slave节点上开启4个worker进程,SPARK_WORKER_CORES设置开启的每个worker进程使用的最多CPU内核数为4,SPARK_WORKER_MEMORY设置开启的每个worker进程使用的最大内存为8G,这样每个slave节点在启动服务后你会真实的看到4个worker进程,总计消耗掉了32G内存,总计占用了16个内核,所以你的每个机器首先内核总数必须要大于16cores,总内存必须要大于32G,因为你还得留一部分cores和内存供操作系统和其他程序使用.

        上面export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=NONE"可以不配置,因为默认就是NONE

        上面hdfs://bigdatacluster-ha/historyserverforspark是存储spark在HADOOP HDFS上存储执行记录的目录位置,这里我的Hadoop采用的是基于zookeeper的HA安装,如何安装我已经在http://aperise.iteye.com/admin/blogs/2305809进行讲解,这里需要在HDFS上新建目录,操作如下:

hdfs dfs -mkdir -p /historyserverforspark

 


        2)slaves

#修改/home/hadoop/spark-1.6.0-bin-hadoop2.6/conf/slaves内容如下:

hadoop31

hadoop32

hadoop33

hadoop34

hadoop35

 


        3)分发安装到其他机器

#这里我在hadoop31机器上操作的

ssh hadoop31

scp -r /home/hadoop/java/scala-2.11.7 hadoop@hadoop32:/home/hadoop/java/

scp -r /home/hadoop/java/scala-2.11.7 hadoop@hadoop33:/home/hadoop/java/

scp -r /home/hadoop/java/scala-2.11.7 hadoop@hadoop34:/home/hadoop/java/

scp -r /home/hadoop/java/scala-2.11.7 hadoop@hadoop35:/home/hadoop/java/

scp -r /home/hadoop/spark-1.6.0-bin-hadoop2.6 hadoop@hadoop32:/home/hadoop/

scp -r /home/hadoop/spark-1.6.0-bin-hadoop2.6 hadoop@hadoop33:/home/hadoop/

scp -r /home/hadoop/spark-1.6.0-bin-hadoop2.6 hadoop@hadoop34:/home/hadoop/

scp -r /home/hadoop/spark-1.6.0-bin-hadoop2.6 hadoop@hadoop35:/home/hadoop/

 


        4)spark启动

#启动主节点上master以及从节点worker

ssh hadoop31

cd /home/hadoop/spark-1.6.0-bin-hadoop2.6

sbin/start-all.sh

 


        5)spark-shell链接

        这里我只有一个master分布于hadoop31上,任意机器上用spark-shell链接spark如下:

cd /home/hadoop/spark-1.6.0-bin-hadoop2.6/

bin/spark-shell --master spark://hadoop31:7077

 


    2.4 Spark基于本地文件系统高可用HA集群方式安装


        1)spark-env.sh

        复制/home/hadoop/spark-1.6.0-bin-hadoop2.6/conf/spark-env.sh.template为/home/hadoop/spark-1.6.0-bin-hadoop2.6/conf/spark-env.sh,在最后增加如下内容:

export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=FILESYSTEM -Dspark.deploy.recoveryDirectory=/home/hadoop/sparkexecutedata"

export SPARK_HISTORY_OPTS="-Dspark.history.retainedApplications=10 -Dspark.history.fs.logDirectory=hdfs://bigdatacluster-ha/historyserverforspark"

export SCALA_HOME=/home/hadoop/java/scala-2.11.7

export JAVA_HOME=/home/hadoop/java/jdk1.7.0_65

#export SPARK_MASTER_IP=hadoop31

#export SPARK_MASTER_PORT=7077

export SPARK_WORKER_MEMORY=8g

export SPARK_WORKER_CORES=4

export SPARK_WORKER_INSTANCES=4

         SPARK_WORKER_INSTANCES参数设置每个slave节点上开启4个worker进程,SPARK_WORKER_CORES设置开启的每个worker进程使用的最多CPU内核数为4,SPARK_WORKER_MEMORY设置开启的每个worker进程使用的最大内存为8G,这样每个slave节点在启动服务后你会真实的看到4个worker进程,总计消耗掉了32G内存,总计占用了16个内核,所以你的每个机器首先内核总数必须要大于16cores,总内存必须要大于32G,因为你还得留一部分cores和内存供操作系统和其他程序使用.

        上面需要设置spark.deploy.recoveryMode=FILESYSTEM,并且配置数据保存目录spark.deploy.recoveryDirectory=/home/hadoop/sparkexecutedata,需要在每个机器上创建目录,操作如下:

ssh hadoop31

mkdir -p /home/hadoop/sparkexecutedata

ssh hadoop32

mkdir -p /home/hadoop/sparkexecutedata

ssh hadoop33

mkdir -p /home/hadoop/sparkexecutedata

ssh hadoop34

mkdir -p /home/hadoop/sparkexecutedata

ssh hadoop35

mkdir -p /home/hadoop/sparkexecutedata

 

        上面hdfs://bigdatacluster-ha/historyserverforspark是存储spark在HADOOP HDFS上存储执行记录的目录位置,这里我的Hadoop采用的是基于zookeeper的HA安装,如何安装我已经在http://aperise.iteye.com/admin/blogs/2305809进行讲解,这里需要在HDFS上新建目录,操作如下:

hdfs dfs -mkdir -p /historyserverforspark

 


         2)slaves

#修改/home/hadoop/spark-1.6.0-bin-hadoop2.6/conf/slaves内容如下:

hadoop31

hadoop32

hadoop33

hadoop34

hadoop35

 


        3)分发安装到其他机器

#这里我在hadoop31机器上操作的

ssh hadoop31

scp -r /home/hadoop/java/scala-2.11.7 hadoop@hadoop32:/home/hadoop/java/

scp -r /home/hadoop/java/scala-2.11.7 hadoop@hadoop33:/home/hadoop/java/

scp -r /home/hadoop/java/scala-2.11.7 hadoop@hadoop34:/home/hadoop/java/

scp -r /home/hadoop/java/scala-2.11.7 hadoop@hadoop35:/home/hadoop/java/

scp -r /home/hadoop/spark-1.6.0-bin-hadoop2.6 hadoop@hadoop32:/home/hadoop/

scp -r /home/hadoop/spark-1.6.0-bin-hadoop2.6 hadoop@hadoop33:/home/hadoop/

scp -r /home/hadoop/spark-1.6.0-bin-hadoop2.6 hadoop@hadoop34:/home/hadoop/

scp -r /home/hadoop/spark-1.6.0-bin-hadoop2.6 hadoop@hadoop35:/home/hadoop/

 


        4)spark启动

#启动主节点上master以及从节点worker

ssh hadoop31

cd /home/hadoop/spark-1.6.0-bin-hadoop2.6

sbin/start-all.sh

 


        5)spark-shell链接

        这里我只有一个master分布于hadoop31上,任意机器上用spark-shell链接spark如下:

cd /home/hadoop/spark-1.6.0-bin-hadoop2.6/

bin/spark-shell --master spark://hadoop31:7077

 

 


    2.5 Spark基于zookeeper高可用HA集群方式安装


        1)spark-env.sh

        复制/home/hadoop/spark-1.6.0-bin-hadoop2.6/conf/spark-env.sh.template为/home/hadoop/spark-1.6.0-bin-hadoop2.6/conf/spark-env.sh,在最后增加如下内容:

export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=hadoop31:2181,hadoop32:2181,hadoop33:2181,hadoop34:2181,hadoop35:2181 -Dspark.deploy.zookeeper.dir=/spark-zk-path"

export SPARK_HISTORY_OPTS="-Dspark.history.retainedApplications=10 -Dspark.history.fs.logDirectory=hdfs://bigdatacluster-ha/historyserverforspark"

export SCALA_HOME=/home/hadoop/java/scala-2.11.7

export JAVA_HOME=/home/hadoop/java/jdk1.7.0_65

#export SPARK_MASTER_IP=hadoop31

#export SPARK_MASTER_PORT=7077

export SPARK_WORKER_MEMORY=8g

export SPARK_WORKER_CORES=4

export SPARK_WORKER_INSTANCES=4

          SPARK_WORKER_INSTANCES参数设置每个slave节点上开启4个worker进程,SPARK_WORKER_CORES设置开启的每个worker进程使用的最多CPU内核数为4,SPARK_WORKER_MEMORY设置开启的每个worker进程使用的最大内存为8G,这样每个slave节点在启动服务后你会真实的看到4个worker进程,总计消耗掉了32G内存,总计占用了16个内核,所以你的每个机器首先内核总数必须要大于16cores,总内存必须要大于32G,因为你还得留一部分cores和内存供操作系统和其他程序使用.

        上面需要设置spark.deploy.recoveryMode=ZOOKEEPER,并且配置数据保存目录spark.deploy.zookeeper.url=hadoop31:2181,hadoop32:2181,hadoop33:2181,hadoop34:2181,hadoop35:2181spark.deploy.zookeeper.dir=/spark-zk-path,spark-zk-path是zookeeper上的数据存放目录

        上面hdfs://bigdatacluster-ha/historyserverforspark是存储spark在HADOOP HDFS上存储执行记录的目录位置,这里我的Hadoop采用的是基于zookeeper的HA安装,如何安装我已经在http://aperise.iteye.com/admin/blogs/2305809进行讲解,这里需要在HDFS上新建目录,操作如下:

hdfs dfs -mkdir -p /historyserverforspark

 


         2)slaves

#修改/home/hadoop/spark-1.6.0-bin-hadoop2.6/conf/slaves内容如下:

hadoop31

hadoop32

hadoop33

hadoop34

hadoop35

 


        3)分发安装到其他机器

#这里我在hadoop31机器上操作的

ssh hadoop31

scp -r /home/hadoop/java/scala-2.11.7 hadoop@hadoop32:/home/hadoop/java/

scp -r /home/hadoop/java/scala-2.11.7 hadoop@hadoop33:/home/hadoop/java/

scp -r /home/hadoop/java/scala-2.11.7 hadoop@hadoop34:/home/hadoop/java/

scp -r /home/hadoop/java/scala-2.11.7 hadoop@hadoop35:/home/hadoop/java/

scp -r /home/hadoop/spark-1.6.0-bin-hadoop2.6 hadoop@hadoop32:/home/hadoop/

scp -r /home/hadoop/spark-1.6.0-bin-hadoop2.6 hadoop@hadoop33:/home/hadoop/

scp -r /home/hadoop/spark-1.6.0-bin-hadoop2.6 hadoop@hadoop34:/home/hadoop/

scp -r /home/hadoop/spark-1.6.0-bin-hadoop2.6 hadoop@hadoop35:/home/hadoop/

 


        4)zookeeper启动

ssh hadoop31

/home/hadoop/zookeeper-3.4.6/bin/zkServer.sh start

ssh hadoop32

/home/hadoop/zookeeper-3.4.6/bin/zkServer.sh start

ssh hadoop33

/home/hadoop/zookeeper-3.4.6/bin/zkServer.sh start

ssh hadoop34

/home/hadoop/zookeeper-3.4.6/bin/zkServer.sh start

ssh hadoop35

/home/hadoop/zookeeper-3.4.6/bin/zkServer.sh start

 


        5)spark启动

#启动主节点上master以及从节点worker

ssh hadoop31

cd /home/hadoop/spark-1.6.0-bin-hadoop2.6

sbin/start-all.sh

 


        6)spark备用master启动

        master可以多个,只需单独启动就行,比如在hadoop32上启动master

#启动备用master

ssh hadoop32

cd /home/hadoop/spark-1.6.0-bin-hadoop2.6

sbin/start-master.sh

 


        7)spark-shell链接

        这里我有两个master,分别分布于hadoop31和hadoop32之上,任意机器上用spark-shell链接spark如下:

cd /home/hadoop/spark-1.6.0-bin-hadoop2.6/

bin/spark-shell --master spark://hadoop31:7077,hadoop32:7077

 
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签:  spark HA zookeeper s