linux上配置spark集群
2016-02-16 14:08
302 查看
环境:
linux
spark1.6.0
hadoop2.2.0
一.安装scala(每台机器)
1.下载scala-2.11.0.tgz
放在目录: /opt下,tar -zxvf scala-2.11.0.tgz
2.在hadoop用户下
3.在profile文件加入Scala路径
4.使配置环境生效
5.检验scala是否安装成功
成功
二.安装spark
1.编译spark1.6.0(在linux下编译很多次都编译不成功,所以我放到mac下编译的。)
官网编译方法:http://spark.apache.org/docs/latest/building-spark.html
进入spark目录,然后执行以下命令:
用idea编译方法:
https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools#UsefulDeveloperTools-IDESetup
2.配置spark
加入
3.配置spark 支持hive
拷贝apache-hive-0.13.1-bin/conf/hive-site.xml到$SPARK_HOME/conf下
在/etc/profile.d目录下创建hive.sh文件
加入环境变量设置
是环境变量生效
4.配置集群
进入spark的conf目录
删除localhost
加入子节点的名字
配置spark系统环境(三个子节点都要配置)
3.把配置好的spark打包,发送到子节点
http://192.168.22.7:8080/
三:错误分析
出现如下错误:
解决方案:
修改saprk-env.sh文件
linux
spark1.6.0
hadoop2.2.0
一.安装scala(每台机器)
1.下载scala-2.11.0.tgz
放在目录: /opt下,tar -zxvf scala-2.11.0.tgz
2.在hadoop用户下
vim /etc/profile
3.在profile文件加入Scala路径
export SCALA_JAVA=/opt/scala-2.11.0 export PATH=$PATH:$SCALA_JAVA/bin
4.使配置环境生效
source /etc/profile
5.检验scala是否安装成功
[hadoop@testhdp01 ~]$ scala -version Scala code runner version 2.10.1 -- Copyright 2002-2013, LAMP/EPF
成功
二.安装spark
1.编译spark1.6.0(在linux下编译很多次都编译不成功,所以我放到mac下编译的。)
官网编译方法:http://spark.apache.org/docs/latest/building-spark.html
进入spark目录,然后执行以下命令:
build/mvn -Pyarn -Phadoop-2.2 -Dhadoop.version=2.2.0 -DskipTests clean package ./make-distribution.sh --name custom-spark --tgz -Psparkr -Phadoop-2.2 -Phive -Phive-thriftserver -Pyarn mvn -Pyarn -Phadoop-2.2 -Dhadoop.version=2.2.0 -Phive -Phive-thriftserver -DskipTests clean package
用idea编译方法:
https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools#UsefulDeveloperTools-IDESetup
2.配置spark
cd /opt/spark-1.6.0-bin-hadoop2.2.0/conf cp spark-env.sh.template spark-env.sh cp slaves.template slaves vim spark-env.sh
加入
export SCALA_HOME=/opt/scala-2.10.1 export JAVA_HOME=/opt/jdk1.7.0_51 export SPARK_MASTER_IP=192.168.22.7 export HADOOP_HOME=/opt/hadoop-2.2.0 export SPARK_HOME=/opt/spark-1.6.0-bin-hadoop2.2.0 export SPARK_LIBRARY_PATH=.:$JAVA_HOME/lib:$JAVA_HOME/jre/lib:$HADOOP_HOME/lib/native export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop/ export LD_LIBRARY_PATH=$HADOOP_HOME/lib/native export SPARK_JAR=$SPARK_HOME/lib/spark-assembly-1.6.0-hadoop2.2.0.jar
3.配置spark 支持hive
vim spark-env.sh export HIVE_HOME=/opt/apache-hive-0.13.0 export SPARK_CLASSPATH=$HIVE_HOME/lib/mysql-connector-java-5.1.26.jar:$SPARK_CLASSPATH
拷贝apache-hive-0.13.1-bin/conf/hive-site.xml到$SPARK_HOME/conf下
cp /opt/apache-hive-0.13.0/conf/hive-site.xml conf/
在/etc/profile.d目录下创建hive.sh文件
加入环境变量设置
#!/bin/bash export HIVE_HOME=/opt/apache-hive-0.13.0 export PATH=$HIVE_HOME/bin:$PATH
是环境变量生效
source /etc/profile.d/hive.sh
4.配置集群
进入spark的conf目录
vim slaves
删除localhost
加入子节点的名字
testhdp02 testhdp03
配置spark系统环境(三个子节点都要配置)
sudo su - root sudo vim /etc/profile export SPARK_HOME=/opt/spark-1.5.0-bin-hadoop2.2.0 export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin
3.把配置好的spark打包,发送到子节点
http://192.168.22.7:8080/
三:错误分析
bin/spark-shell
运行
val textFile = sc.textFile("README.md") textFile.count()
出现如下错误:
Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) ... 61 more Caused by: java.lang.IllegalArgumentException: Compression codec com.hadoop.compression.lzo.LzoCodec not found. at org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:135) at org.apache.hadoop.io.compress.CompressionCodecFactory.<init>(CompressionCodecFactory.java:175) at org.apache.hadoop.mapred.TextInputFormat.configure(TextInputFormat.java:45) ... 66 more Caused by: java.lang.ClassNotFoundException: Class com.hadoop.compression.lzo.LzoCodec not found at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1626) at org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:128) ... 68 more
解决方案:
修改saprk-env.sh文件
export SCALA_HOME=/opt/scala-2.10.1 export JAVA_HOME=/opt/jdk1.7.0_51 export SPARK_MASTER_IP=192.168.22.7 export HADOOP_HOME=/opt/hadoop-2.2.0 export SPARK_HOME=/opt/spark-1.6.0-bin-hadoop2.2.0 export SPARK_LIBRARY_PATH=.:$JAVA_HOME/lib:$JAVA_HOME/jre/lib:$HADOOP_HOME/lib/native export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop/ export LD_LIBRARY_PATH=$HADOOP_HOME/lib/native export SPARK_JAR=$SPARK_HOME/lib/spark-assembly-1.6.0-hadoop2.2.0.jar
export SPARK_CLASSPATH=$SPARK_CLASSPATH:$HADOOP_HOME/share/hadoop/yarn/*:$HADOOP_HOME/share/hadoop/yarn/lib/*:$HADOOP_HOME/share/hadoop/common/*:$HADOOP_HOME/share/hadoop/common/lib/*:$HADOOP_HOME/share/hadoop/hdfs/*:$HADOOP_HOME/share/hadoop/hdfs/lib/*:$HADOOP_HOME/share/hadoop/mapreduce/*:$HADOOP_HOME/share/hadoop/mapreduce/lib/*:$HADOOP_HOME/share/hadoop/tools/lib/*:$SPARK_HOME/lib/*
相关文章推荐
- Linux命令学习之nslookup
- Linux基础知识题解答(四)
- Linux服务软链接
- Linux下jdk安装
- 用Xmanager+SSH使远程Linux安装图形界面本地化显示的方法
- CentOS 6.5下Git服务器搭建
- Ubuntu14.04下安装arm-linux-gcc错误提示
- Linux 命令学习之df and du
- Linux中读取目录: opendir,fdopendir,readdir,rewinddir,closedir,telldir,seekdir
- Linux下无法加载动态库
- Linux学习之软连接硬链接
- linux中关于查看进程端口号,关闭进程
- 安卓机连接Linux无“反应”快速解决方案
- Linux中set,env和export这三个命令的区别
- 在win10下安装centos7
- 解析Linux系统启动的引导流程
- Linux系统下以RPM方式安装mysql-5.7.9【5.7版本以后,启动mysqld服务首次登录需要密码,为root生成的随机密码在错误日志文件/var/log/mysqld.log】
- Linux下生提取一年中的周一和周日
- TQ2440,学习笔记之安装 系统引导安装程序+嵌入式linux系统
- linux线程资源回收方法