您的位置:首页 > 其它

Spark1.2.0单机环境搭建

2015-04-26 12:57 253 查看
1、准备好两个文件
<span style="font-family:SimHei;">scala-2.11.4.tgz</span>
<span style="font-family:SimHei;">spark-1.2.0-bin-hadoop1.tgz
</span>
<span style="font-family:SimHei;">spark下载地址:wget <span' target='_blank'>http://d3kbcqa49mib13.cloudfront.net/spark-1.2.0-bin-hadoop1.tgz</span>
<span style="font-family:SimHei;">scala下载地址:wget http://downloads.typesafe.com/scala/2.11.4/scala-2.11.4.tgz?_ga=1.254444288.920772718.1430024679</span>[/code] 
2、解压

<span style="font-family:SimHei;">tar -zvxf scala-2.11.4.tgz</span>
<span style="font-family:SimHei;">tar  -zvxf spark-1.2.0-bin-hadoop1.tgz</span>


3、配置scala环境变量
export SCALA_HOME=/usr/local/cdh/spark/scala-2.11.4
export PATH=$PATH:$<span style="font-family: Arial, Helvetica, sans-serif;">SCALA_HOME/bin</span>
配置完成,保存、退出,source/etc/profile

检测:
[root@localhost scala-2.11.4]# scala -version
Scala code runner version 2.11.4 -- Copyright 2002-2013, LAMP/EPFL
[root@localhost scala-2.11.4]# scala
Welcome to Scala version 2.11.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_67).
Type in expressions to have them evaluated.
Type :help for more information.

scala>
出现如上界面,则表示scala安装成功了。

4、配置spark环境变量

export SPARK_HOME=/usr/local/cdh/spark/spark-1.2.0-bin-hadoop1
export PATH=$PATH:$SPARK_HOME/bin
配置完成,保存、退出,source/etc/profile

5、配置spark配置文件

我的spark是解压在了
/usr/local/cdh/spark/spark-1.2.0-bin-hadoop1
[root@localhost spark-1.2.0-bin-hadoop1]# cd conf/
[root@localhost conf]# ls
fairscheduler.xml.template  log4j.properties.template  metrics.properties.template  slaves.template  spark-defaults.conf.template  spark-env.
[root@localhost conf]# cp spark-env.sh.template spark-env.sh
[root@localhost conf]# vi spark-env.sh
在最后加入:
export SCALA_HOME=/home/jifeng/hadoop/scala-2.11.4
export SPARK_MASTER_IP=martin
export SPARK_WORKER_MEMORY=2G
export JAVA_HOME=/usr/local/cdh/jdk1.7
6、启动

启动master
[root@localhost spark-1.2.0-bin-hadoop1]# sbin/start-master.sh
测试:访问:martin:8080
再启动<span style="color: rgb(51, 51, 51); font-family: Arial; font-size: 14px; line-height: 26px;">worker</span>:
<pre name="code" class="html" style="color: rgb(51, 51, 51); font-size: 14px; line-height: 26px;">[root@localhost spark-1.2.0-bin-hadoop1]# <span style="font-family: Arial;">sbin/start-slaves.sh park://martin:7077</span>




7、进入交互模式

master=spark://martin:7077 ./bin/spark-shell
localhost: starting org.apache.spark.deploy.worker.Worker, logging to /usr/local/cdh/spark/spark-1.2.0-bin-hadoop1/sbin/../logs/spark-root-org.apache.spark.deploy.worker.Worker-1-localhost.localdomain.out
[root@localhost spark-1.2.0-bin-hadoop1]# master=spark://martin:7077 ./bin/spark-shell
Spark assembly has been built with Hive, including Datanucleus jars on classpath
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
15/04/25 21:26:54 INFO SecurityManager: Changing view acls to: root
15/04/25 21:26:54 INFO SecurityManager: Changing modify acls to: root
15/04/25 21:26:54 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root)
15/04/25 21:26:54 INFO HttpServer: Starting HTTP Server
15/04/25 21:26:55 INFO Utils: Successfully started service 'HTTP class server' on port 39275.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 1.2.0
      /_/

Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_67)
Type in expressions to have them evaluated.
Type :help for more information.
15/04/25 21:27:04 WARN Utils: Your hostname, localhost.localdomain resolves to a loopback address: 127.0.0.1; using 192.168.41.190 instead (on interface eth1)
15/04/25 21:27:04 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
15/04/25 21:27:04 INFO SecurityManager: Changing view acls to: root
15/04/25 21:27:04 INFO SecurityManager: Changing modify acls to: root
15/04/25 21:27:04 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root)
15/04/25 21:27:05 INFO Slf4jLogger: Slf4jLogger started
15/04/25 21:27:05 INFO Remoting: Starting remoting
15/04/25 21:27:06 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@martin:35821]
15/04/25 21:27:06 INFO Utils: Successfully started service 'sparkDriver' on port 35821.
15/04/25 21:27:06 INFO SparkEnv: Registering MapOutputTracker
15/04/25 21:27:06 INFO SparkEnv: Registering BlockManagerMaster
15/04/25 21:27:06 INFO DiskBlockManager: Created local directory at /tmp/spark-local-20150425212706-43cd
15/04/25 21:27:06 INFO MemoryStore: MemoryStore started with capacity 267.3 MB
15/04/25 21:27:07 INFO HttpFileServer: HTTP File server directory is /tmp/spark-005e80bd-70fe-4be9-88c5-a60de6233c68
15/04/25 21:27:07 INFO HttpServer: Starting HTTP Server
15/04/25 21:27:07 INFO Utils: Successfully started service 'HTTP file server' on port 54723.
15/04/25 21:27:07 INFO Utils: Successfully started service 'SparkUI' on port 4040.
15/04/25 21:27:07 INFO SparkUI: Started SparkUI at http://martin:4040 15/04/25 21:27:08 INFO Executor: Using REPL class URI: http://192.168.41.190:39275 15/04/25 21:27:08 INFO AkkaUtils: Connecting to HeartbeatReceiver: akka.tcp://sparkDriver@martin:35821/user/HeartbeatReceiver
15/04/25 21:27:08 INFO NettyBlockTransferService: Server created on 57860
15/04/25 21:27:08 INFO BlockManagerMaster: Trying to register BlockManager
15/04/25 21:27:08 INFO BlockManagerMasterActor: Registering block manager localhost:57860 with 267.3 MB RAM, BlockManagerId(<driver>, localhost, 57860)
15/04/25 21:27:08 INFO BlockManagerMaster: Registered BlockManager
15/04/25 21:27:09 INFO SparkILoop: Created spark context..
Spark context available as sc.

scala> 
8、单词计数测试

准备好文件file.txt,放到对应的目录

val file=sc.textFile("/usr/temp/file.txt")
val count=file.flatMap(line=>line.split(" ")).map(word=>(word,1)).reduceByKey(_+_)
count.collect()
15/04/25 21:52:40 INFO SparkContext: Starting job: collect at <console>:17
15/04/25 21:52:40 INFO DAGScheduler: Registering RDD 7 (map at <console>:14)
15/04/25 21:52:40 INFO DAGScheduler: Got job 0 (collect at <console>:17) with 1 output partitions (allowLocal=false)
15/04/25 21:52:40 INFO DAGScheduler: Final stage: Stage 1(collect at <console>:17)
15/04/25 21:52:40 INFO DAGScheduler: Parents of final stage: List(Stage 0)
15/04/25 21:52:40 INFO DAGScheduler: Missing parents: List(Stage 0)
15/04/25 21:52:40 INFO DAGScheduler: Submitting Stage 0 (MappedRDD[7] at map at <console>:14), which has no missing parents
15/04/25 21:52:40 INFO MemoryStore: ensureFreeSpace(3544) called with curMem=75526, maxMem=280248975
15/04/25 21:52:40 INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated size 3.5 KB, free 267.2 MB)
15/04/25 21:52:40 INFO MemoryStore: ensureFreeSpace(2501) called with curMem=79070, maxMem=280248975
15/04/25 21:52:40 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 2.4 KB, free 267.2 MB)
15/04/25 21:52:40 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on localhost:51130 (size: 2.4 KB, free: 267.3 MB)
15/04/25 21:52:40 INFO BlockManagerMaster: Updated info of block broadcast_2_piece0
15/04/25 21:52:40 INFO SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:838
15/04/25 21:52:40 INFO DAGScheduler: Submitting 1 missing tasks from Stage 0 (MappedRDD[7] at map at <console>:14)
15/04/25 21:52:40 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks
15/04/25 21:52:40 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, PROCESS_LOCAL, 1320 bytes)
15/04/25 21:52:40 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
15/04/25 21:52:40 INFO HadoopRDD: Input split: file:/usr/local/cdh/spark/spark-1.2.0-bin-hadoop1/Desktop/file.txt:0+61
15/04/25 21:52:41 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 1895 bytes result sent to driver
15/04/25 21:52:41 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 355 ms on localhost (1/1)
15/04/25 21:52:41 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
15/04/25 21:52:41 INFO DAGScheduler: Stage 0 (map at <console>:14) finished in 0.652 s
15/04/25 21:52:41 INFO DAGScheduler: looking for newly runnable stages
15/04/25 21:52:41 INFO DAGScheduler: running: Set()
15/04/25 21:52:41 INFO DAGScheduler: waiting: Set(Stage 1)
15/04/25 21:52:41 INFO DAGScheduler: failed: Set()
15/04/25 21:52:41 INFO DAGScheduler: Missing parents for Stage 1: List()
15/04/25 21:52:41 INFO DAGScheduler: Submitting Stage 1 (ShuffledRDD[8] at reduceByKey at <console>:14), which is now runnable
15/04/25 21:52:41 INFO MemoryStore: ensureFreeSpace(2112) called with curMem=81571, maxMem=280248975
15/04/25 21:52:41 INFO MemoryStore: Block broadcast_3 stored as values in memory (estimated size 2.1 KB, free 267.2 MB)
15/04/25 21:52:41 INFO MemoryStore: ensureFreeSpace(1544) called with curMem=83683, maxMem=280248975
15/04/25 21:52:41 INFO MemoryStore: Block broadcast_3_piece0 stored as bytes in memory (estimated size 1544.0 B, free 267.2 MB)
15/04/25 21:52:41 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory on localhost:51130 (size: 1544.0 B, free: 267.3 MB)
15/04/25 21:52:41 INFO BlockManagerMaster: Updated info of block broadcast_3_piece0
15/04/25 21:52:41 INFO SparkContext: Created broadcast 3 from broadcast at DAGScheduler.scala:838
15/04/25 21:52:41 INFO DAGScheduler: Submitting 1 missing tasks from Stage 1 (ShuffledRDD[8] at reduceByKey at <console>:14)
15/04/25 21:52:41 INFO TaskSchedulerImpl: Adding task set 1.0 with 1 tasks
15/04/25 21:52:41 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 1, localhost, PROCESS_LOCAL, 1056 bytes)
15/04/25 21:52:41 INFO Executor: Running task 0.0 in stage 1.0 (TID 1)
15/04/25 21:52:41 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks
15/04/25 21:52:41 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 4 ms
15/04/25 21:52:41 INFO Executor: Finished task 0.0 in stage 1.0 (TID 1). 1168 bytes result sent to driver
15/04/25 21:52:41 INFO DAGScheduler: Stage 1 (collect at <console>:17) finished in 0.126 s
15/04/25 21:52:41 INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID 1) in 131 ms on localhost (1/1)
15/04/25 21:52:41 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool
15/04/25 21:52:41 INFO DAGScheduler: Job 0 finished: collect at <console>:17, took 1.103095 s
res0: Array[(String, Int)] = Array((this,1), (is,1), (Hello,1), (haoop,1), (home,1), (book;,1), ("",1), (World,1), (j2ee,1), (JAVA,1), (HADOOP,1), (my,1))
9、停止服务./sbin/stop-master.sh

[root@localhost spark-1.2.0-bin-hadoop1]# sbin/stop-master.sh
stopping org.apache.spark.deploy.master.Master
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: