您的位置:首页 > 运维架构 > Shell

spark安装:在hadoop YARN上运行spark-shell

2017-04-22 00:00 1126 查看
摘要: 详见:http://www.jianshu.com/p/ca08f2f5ec50

1. spark模式架构图
![](https://static.oschina.net/uploads/img/201704/22200950_0Fu6.png "在这里输入图片标题")

2. Scala下载安装
a. 官网: http://www.scala-alng.org/files/archive/ b. 选择好版本,复制链接,使用wget 命令下载
wget http://www.scala-alng.org/files/archive/scala-2.11.6.tgz c. 解压
tar xvf scala-2.11.6.tgz
sudo mv scala-2.11.6 /usr/local/scala    # 将scala移动到/usr/local目录
d. 设置环境变量
sudo gedit  ~/.bashrc
export SCALA_HOME=/usr/local/scala
export PATH=$PATH:$SCALA_HOME/bin
source ~/.bashrc   # 使配置生效
e. 启动scala
hduser[@master](https://my.oschina.net/u/48054):~$ scala
3. Spark安装
a. 官网: http://spark.apache.org/downloads.html b. 选择版本1.4 || Pre-built for Hadoop 2.6 and later || 复制链接使用wget 命令下载
c.  wget http://d3kbcqa49mib13.cloudfront.net/spark-1.4.0-bin-hadoop2.6.tgz d. 解压并移动到 /usr/local/spark/
e. 编辑环境变量
f. sudo gedit  ~/.bashrc
export SPARK_HOME=/usr/local/spark
export PATH=$PATH:$SPARK_HOME/bin
g. source ~/.bashrc   # 使配置生效
4. 启动spark-shell交互页面
hduser[@master](https://my.oschina.net/u/48054):~$ spark-shell
5. 启动hadoop
6. 在本地运行spark-shell
a. spark-shell --master local[4]
b. 读取本地文件
val textFile=sc.textFile("file:/usr/local/spark/LREADME.md")
textFile.count
7. 在Hadoop Yarn 运行spark-shell
SPARK_JAR=/usr/local/spark/lib/spark-assembly-1.4.0-hadoop2.6.0.jar      HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop MASTER=yarn-client /usr/local/spark/bin/spark-shell
SPARK_JAR=/usr/local/spark/lib/spark-assembly-1.4.0-hadoop2.6.0.jar   # 设置sparkjar文件路径   HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop # 设置hadoop配置文件目录
MASTER=yarn-client    # 设置运行模式是yarn-client
/usr/local/spark/bin/spark-shell    # 要运行的spark-shell的完整路径
8. 构建Spark Standalone Cluster执行环境
a. cp /usr/local/spark/conf/spark-env.sh.template /usr/local/spark/conf/spark-env.sh   # 复制模板文件 在进行设置
b. 设置spark-env.sh
c. sudo gedit /usr/local/spark/conf/spark-env.sh
export SPARK_MASTER_IP=master      master的IP
export SPARK_WORKER_CORES=1    每个worker使用的cpu核心
export SPARK_WORKER_MEMORY=600m   每个worker使用的内存
export SPARK_WORKER_INSTANCES=1    设置每个worker实例
# 一定要注意自己的内存
# hadoop+spark 在多个虚拟机上运行起来后8G内存是不够用的 非常耗内存
# 资源在经过虚拟机后会有比较大的损耗
d. 使用ssh链接data1,data2 并创建spark目录
sudo mkdir /usr/local/spark
sudo chown hduser:hduser /usr/local/spark
# 对data1 和data2执行上面的操作
e. 将master的spark复制到data1,data2上
sudo scp -r /usr/local/spark hduser@data1:/usr/local
sudo scp -r /usr/local/spark hduser@data2:/usr/local
f. 编辑slaves文件
sudo gedit /usr/local/spark/conf/slaves
data1
data2
9. 在Spark Standalone运行spark-shell
a. 启动Spark Standalone Cluster
/usr/local/spark/sbin/start-all.sh
b. 运行
spark-shell --master spark://master:7077
c. 查看Spark Standalone Web UI界面 http://master:8080/ d. 停止Spark Standalone Cluster
/usr/local/spark/sbin/stop-all.sh
10. 命令参考
152  scala
153  jps
154  wget http://d3kbcqa49mib13.cloudfront.net/spark-1.4.0-bin-hadoop2.6.tgz 155  ping www.baidu.com
156  ssh data3
157  ssh data2
158  ssh data1
159  jps
160  start-all.sh
161  jps
162  spark-shell
163  spark-shell --master local[4]
164  SPARK_JAR=/usr/local/spark/lib/spark-assembly-1.4.0-hadoop2.6.0.jar  HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop MASTER=yarn-client /usr/local/spark/bin/spark-shell
165  ssh data2
166  ssh data1
167  cd /usr/local/hadoop/etc/hadoop/
168  ll
169  sudo gedit masters
170  sudo gedit slaves
171  sudo gedit /etc/hosts
172  sudo gedit hdfs-site.xml
173  sudo rm -rf /usr/local/hadoop/hadoop_data/hdfs
174  mkdir -p /usr/local/hadoop/hadoop_data/hdfs/namenode
175  sudo chown -R hduser:hduser /usr/local/hadoop
176  hadoop namenode -format
177  start-all.sh
178  jps
179  spark-shell
180  SPARK_JAR=/usr/local/spark/lib/spark-assembly-1.4.0-hadoop2.6.0.jar  HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop MASTER=yarn-client /usr/local/spark/bin/spark-shell
181  ssh data1
182  ssh data2
183  ssh data1
184  start-all.sh
185  jps
186  cp /usr/local/spark/conf/spark-env.sh.template /usr/local/spark/conf/spark-env.sh
187  sudo gedit /usr/local/spark/conf/spark-env.sh
188  sudo scp -r /usr/local/spark hduser@data1:/usr/local
189  sudo scp -r /usr/local/spark hduser@data2:/usr/local
190  sudo gedit /usr/local/spark/conf/slaves
191  /usr/local/spark/sbin/start-all.sh
192  spark-shell --master spark://master:7077
193  /usr/local/spark/sbin/stop-all.sh
194  jps
195  stop-all.sh
196  history
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签:  hadoop spark YARN