Spark3.0.0 on yarn (基于hadoop3.2 ) 分布式安装 - 避坑指南
2020-06-03 05:30
218 查看
1 centos安装scala
1.下载:
https://downloads.lightbend.com/scala/2.11.12/scala-2.11.12.tgz
2.解压至【/usr/local】
tar -xvzf scala-2.11.12.tgz
3.配置环境变量
vim /etc/profile #scala export SCALA_HOME=/usr/local/scala-2.11.12 export PATH=$SCALA_HOME/bin:$PATH
配置完成
2 安装spark
以下在master节点上进行dockerapache-01
dockerapache-02
dockerapache-03 #master
1.下载spark,
http://spark.apache.org/downloads.html
这里由于我的hadoop是3.1.3, 所以选择基于3.2的进行尝试
2. 上传至master节点上,解压至【/usr/local】
tar -xvzf spark-3.0.0-preview2-bin-hadoop3.2.tgz
3.配置conf/spark-env.sh
cp spark-env.sh.template spark-env.sh vim spark-env.sh SPARK_CONF_DIR=/usr/local/spark-3.0.0-preview2-bin-hadoop3.2/conf HADOOP_CONF_DIR=/usr/local/hadoop-3.1.3/etc/hadoop YARN_CONF_DIR=/usr/local/hadoop-3.1.3/etc/hadoop SPARK_MASTER_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER-Dspark.deploy.zookeeper.url=DockerApache-01:2181,DockerApache-01:2181,DockerApache-01:2181-Dspark.deploy.zookeeper.dir=/spark"
4.配置 conf/slaves
cp slaves.template slaves vim slaves dockerapache-01 dockerapache-02 dockerapache-03
将master节点安装包拷贝至其他节点
scp -r /usr/local/spark-3.0.0-preview2-bin-hadoop3.2/ DockerApache-02:`pwd` scp -r /usr/local/spark-3.0.0-preview2-bin-hadoop3.2/ DockerApache-01:`pwd`
启动spark
在master节点sbin目录下:
./start-all.sh
在三台机器上分别jps,会发现三台机器上都有worker进程,或者master进程
启动客户端(yarn)
./spark-shell --master yarn --deploy-mode client
问题
启动spark shell出错:
2020-04-27 10:23:49,894 ERROR cluster.YarnClientSchedulerBackend: YARN application has exited unexpectedly with state UNDEFINED! Check the YARN application logs for more details. 2020-04-27 10:23:49,895 ERROR cluster.YarnClientSchedulerBackend: Diagnostics message: Shutdown hook called before final status was reported. 2020-04-27 10:23:49,919 ERROR spark.SparkContext: Error initializing SparkContext. 2020-04-27 10:23:49,936 ERROR client.TransportClient: Failed to send RPC RPC 9166854326805066924 to /10.28.3.12:39182: java.nio.channels.ClosedChannelException java.nio.channels.ClosedChannelException
配置hadoop下yarn-site.xml,添加以下配置
<property> <name>yarn.nodemanager.pmem-check-enabled</name> <value>false</value> </property> <property> <name>yarn.nodemanager.vmem-check-enabled</name> <value>false</value> </property>
相关文章推荐
- Spark on Yarn+Hbase环境搭建指南(二)Hadoop安装
- Spark on Yarn+Hbase环境搭建指南(三)Spark安装
- 基于hadoop2.7集群的Spark2.0,Sqoop1.4.6,Mahout0.12.2完全分布式安装
- 基于Hadoop的Spark完全分布式安装
- Spark2.0.1 on yarn with hue 集群搭建部署(五)hue安装支持hadoop
- Spark On YARN 分布式集群安装
- Spark On YARN 分布式集群安装
- 基于hadoop2.7集群的Spark2.0,Sqoop1.4.6,Mahout0.12.2完全分布式安装
- 安装基于hadoop集群的高可用完全分布式的spark高可用集群
- 雅虎开源CaffeOnSpark:基于Hadoop/Spark的分布式深度学习
- 雅虎开源CaffeOnSpark:基于Hadoop/Spark的分布式深度学习
- 基于hadoop2.7集群的Spark2.0,Sqoop1.4.6,Mahout0.12.2完全分布式安装
- hadoop1.0.2+spark1.0.2伪分布式安装总结
- Hadoop 3.0.0-beta1 伪分布式安装
- Spark on yarn安装部署
- 基于Hadoop2.2.0安装Spark 1.0
- Hive数据分析——Spark是一种基于rdd(弹性数据集)的内存分布式并行处理框架,比于Hadoop将大量的中间结果写入HDFS,Spark避免了中间结果的持久化
- spark 1.X standalone和on yarn安装配置
- 基于CentOS7的Hadoop3.2.0安装和分布式系统集群部署详细
- Spark On YARN 集群安装部署