Spark集群搭建
2016-07-12 18:36
375 查看
Spark集群搭建
1 Spark编译
1.1 下载源代码
git clone git://github.com/apache/spark.git -b branch-1.6
1.2 修改pom文件
增加cdh5.0.2相关profile,如下: <profile> <id>cdh5.0.2</id> <properties> <hadoop.version>2.3.0-cdh5.0.2</hadoop.version> <hbase.version>0.96.1.1-cdh5.0.2</hbase.version> <flume.version>1.4.0-cdh5.0.2</flume.version> <zookeeper.version>3.4.5-cdh5.0.2</zookeeper.version> </properties> </profile>
1.3 编译
build/mvn -Pyarn -Pcdh5.0.2 -Phive -Phive-thriftserver -Pnative -DskipTests package
上述命令,由于国外maven.twttr.com被墙,添加hosts,199.16.156.89 maven.twttr.com,再次执行。
2 Spark集群搭建[SPARK ON YARN]
2.1 修改配置文件
--spark-env.sh-- export SPARK_SSH_OPTS="-p9413" export HADOOP_CONF_DIR=/opt/hadoop/hadoop-cluster/modules/hadoop-2.3.0-cdh5.0.2/etc/hadoop export SPARK_EXECUTOR_INSTANCES=1 export SPARK_EXECUTOR_CORES=4 export SPARK_EXECUTOR_MEMORY=1G export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HADOOP_HOME/lib/native/ --slaves-- 192.168.3.211 hadoop-dev-211 192.168.3.212 hadoop-dev-212 192.168.3.213 hadoop-dev-213 192.168.3.214 hadoop-dev-214
2.2 集群规划,启动集群
--集群规划-- hadoop-dev-211 Master、Woker hadoop-dev-212 Woker hadoop-dev-213 Woker hadoop-dev-214 Woker --启动Master-- sbin/start-master.sh --启动Wokers-- sbin/start-slaves.sh
2.3 查看界面
3 集成hive
将hive-site.xml和hive-log4j.properties至spark中conf目录
4 Spark实例演示
4.1 读取mysql数据至hive
# 步骤1,启动spark-shell bin/spark-shell --jars lib_managed/jars/hadoop-lzo-0.4.17.jar \ --driver-class-path /opt/hadoop/hadoop-cluster/modules/apache-hive-1.2.1-bin/lib/mysql-connector-java-5.6-bin.jar # 步骤2,读取mysql数据 val jdbcDF = sqlContext.read.format("jdbc").options(Map("url" -> "jdbc:mysql://hadoop-dev-212:3306/hive","dbtable" -> "VERSION", "user" -> "hive", "password" -> "123456")).load(); # 步骤3,转成hive表 jdbcDF.saveAsTable("test");
相关文章推荐
- Spark RDD API详解(一) Map和Reduce
- 使用spark和spark mllib进行股票预测
- Spark随谈——开发指南(译)
- Spark,一种快速数据分析替代方案
- eclipse 开发 spark Streaming wordCount
- Understanding Spark Caching
- ClassNotFoundException:scala.PreDef$
- Windows 下Spark 快速搭建Spark源码阅读环境
- Spark中将对象序列化存储到hdfs
- 使用java代码提交Spark的hive sql任务,run as java application
- Spark机器学习(一) -- Machine Learning Library (MLlib)
- Spark机器学习(二) 局部向量 Local-- Data Types - MLlib
- Spark机器学习(三) Labeled point-- Data Types
- Spark初探
- Spark Streaming初探
- Spark本地开发环境搭建
- 搭建hadoop/spark集群环境
- Spark HA部署方案
- Spark HA原理架构图
- spark内存概述