Spark1.2.0单机环境搭建
2015-04-26 12:57
253 查看
1、准备好两个文件
<span style="font-family:SimHei;">scala-2.11.4.tgz</span>
<span style="font-family:SimHei;">spark-1.2.0-bin-hadoop1.tgz </span>
<span style="font-family:SimHei;">spark下载地址:wget <span' target='_blank'>http://d3kbcqa49mib13.cloudfront.net/spark-1.2.0-bin-hadoop1.tgz</span>
<span style="font-family:SimHei;">scala下载地址:wget http://downloads.typesafe.com/scala/2.11.4/scala-2.11.4.tgz?_ga=1.254444288.920772718.1430024679</span>[/code]
2、解压<span style="font-family:SimHei;">tar -zvxf scala-2.11.4.tgz</span><span style="font-family:SimHei;">tar -zvxf spark-1.2.0-bin-hadoop1.tgz</span>
3、配置scala环境变量export SCALA_HOME=/usr/local/cdh/spark/scala-2.11.4export PATH=$PATH:$<span style="font-family: Arial, Helvetica, sans-serif;">SCALA_HOME/bin</span>配置完成,保存、退出,source/etc/profile
检测:[root@localhost scala-2.11.4]# scala -version Scala code runner version 2.11.4 -- Copyright 2002-2013, LAMP/EPFL [root@localhost scala-2.11.4]# scala Welcome to Scala version 2.11.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_67). Type in expressions to have them evaluated. Type :help for more information. scala>出现如上界面,则表示scala安装成功了。
4、配置spark环境变量export SPARK_HOME=/usr/local/cdh/spark/spark-1.2.0-bin-hadoop1export PATH=$PATH:$SPARK_HOME/bin配置完成,保存、退出,source/etc/profile
5、配置spark配置文件
我的spark是解压在了/usr/local/cdh/spark/spark-1.2.0-bin-hadoop1[root@localhost spark-1.2.0-bin-hadoop1]# cd conf/ [root@localhost conf]# ls fairscheduler.xml.template log4j.properties.template metrics.properties.template slaves.template spark-defaults.conf.template spark-env. [root@localhost conf]# cp spark-env.sh.template spark-env.sh[root@localhost conf]# vi spark-env.sh在最后加入:export SCALA_HOME=/home/jifeng/hadoop/scala-2.11.4 export SPARK_MASTER_IP=martin export SPARK_WORKER_MEMORY=2G export JAVA_HOME=/usr/local/cdh/jdk1.76、启动启动master[root@localhost spark-1.2.0-bin-hadoop1]# sbin/start-master.sh测试:访问:martin:8080再启动<span style="color: rgb(51, 51, 51); font-family: Arial; font-size: 14px; line-height: 26px;">worker</span>:<pre name="code" class="html" style="color: rgb(51, 51, 51); font-size: 14px; line-height: 26px;">[root@localhost spark-1.2.0-bin-hadoop1]# <span style="font-family: Arial;">sbin/start-slaves.sh park://martin:7077</span>
7、进入交互模式master=spark://martin:7077 ./bin/spark-shelllocalhost: starting org.apache.spark.deploy.worker.Worker, logging to /usr/local/cdh/spark/spark-1.2.0-bin-hadoop1/sbin/../logs/spark-root-org.apache.spark.deploy.worker.Worker-1-localhost.localdomain.out [root@localhost spark-1.2.0-bin-hadoop1]# master=spark://martin:7077 ./bin/spark-shell Spark assembly has been built with Hive, including Datanucleus jars on classpath Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 15/04/25 21:26:54 INFO SecurityManager: Changing view acls to: root 15/04/25 21:26:54 INFO SecurityManager: Changing modify acls to: root 15/04/25 21:26:54 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root) 15/04/25 21:26:54 INFO HttpServer: Starting HTTP Server 15/04/25 21:26:55 INFO Utils: Successfully started service 'HTTP class server' on port 39275. Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 1.2.0 /_/ Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_67) Type in expressions to have them evaluated. Type :help for more information. 15/04/25 21:27:04 WARN Utils: Your hostname, localhost.localdomain resolves to a loopback address: 127.0.0.1; using 192.168.41.190 instead (on interface eth1) 15/04/25 21:27:04 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address 15/04/25 21:27:04 INFO SecurityManager: Changing view acls to: root 15/04/25 21:27:04 INFO SecurityManager: Changing modify acls to: root 15/04/25 21:27:04 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root) 15/04/25 21:27:05 INFO Slf4jLogger: Slf4jLogger started 15/04/25 21:27:05 INFO Remoting: Starting remoting 15/04/25 21:27:06 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@martin:35821] 15/04/25 21:27:06 INFO Utils: Successfully started service 'sparkDriver' on port 35821. 15/04/25 21:27:06 INFO SparkEnv: Registering MapOutputTracker 15/04/25 21:27:06 INFO SparkEnv: Registering BlockManagerMaster 15/04/25 21:27:06 INFO DiskBlockManager: Created local directory at /tmp/spark-local-20150425212706-43cd 15/04/25 21:27:06 INFO MemoryStore: MemoryStore started with capacity 267.3 MB 15/04/25 21:27:07 INFO HttpFileServer: HTTP File server directory is /tmp/spark-005e80bd-70fe-4be9-88c5-a60de6233c68 15/04/25 21:27:07 INFO HttpServer: Starting HTTP Server 15/04/25 21:27:07 INFO Utils: Successfully started service 'HTTP file server' on port 54723. 15/04/25 21:27:07 INFO Utils: Successfully started service 'SparkUI' on port 4040. 15/04/25 21:27:07 INFO SparkUI: Started SparkUI at http://martin:4040 15/04/25 21:27:08 INFO Executor: Using REPL class URI: http://192.168.41.190:39275 15/04/25 21:27:08 INFO AkkaUtils: Connecting to HeartbeatReceiver: akka.tcp://sparkDriver@martin:35821/user/HeartbeatReceiver 15/04/25 21:27:08 INFO NettyBlockTransferService: Server created on 57860 15/04/25 21:27:08 INFO BlockManagerMaster: Trying to register BlockManager 15/04/25 21:27:08 INFO BlockManagerMasterActor: Registering block manager localhost:57860 with 267.3 MB RAM, BlockManagerId(<driver>, localhost, 57860) 15/04/25 21:27:08 INFO BlockManagerMaster: Registered BlockManager 15/04/25 21:27:09 INFO SparkILoop: Created spark context.. Spark context available as sc. scala>8、单词计数测试
准备好文件file.txt,放到对应的目录val file=sc.textFile("/usr/temp/file.txt") val count=file.flatMap(line=>line.split(" ")).map(word=>(word,1)).reduceByKey(_+_) count.collect()15/04/25 21:52:40 INFO SparkContext: Starting job: collect at <console>:17 15/04/25 21:52:40 INFO DAGScheduler: Registering RDD 7 (map at <console>:14) 15/04/25 21:52:40 INFO DAGScheduler: Got job 0 (collect at <console>:17) with 1 output partitions (allowLocal=false) 15/04/25 21:52:40 INFO DAGScheduler: Final stage: Stage 1(collect at <console>:17) 15/04/25 21:52:40 INFO DAGScheduler: Parents of final stage: List(Stage 0) 15/04/25 21:52:40 INFO DAGScheduler: Missing parents: List(Stage 0) 15/04/25 21:52:40 INFO DAGScheduler: Submitting Stage 0 (MappedRDD[7] at map at <console>:14), which has no missing parents 15/04/25 21:52:40 INFO MemoryStore: ensureFreeSpace(3544) called with curMem=75526, maxMem=280248975 15/04/25 21:52:40 INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated size 3.5 KB, free 267.2 MB) 15/04/25 21:52:40 INFO MemoryStore: ensureFreeSpace(2501) called with curMem=79070, maxMem=280248975 15/04/25 21:52:40 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 2.4 KB, free 267.2 MB) 15/04/25 21:52:40 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on localhost:51130 (size: 2.4 KB, free: 267.3 MB) 15/04/25 21:52:40 INFO BlockManagerMaster: Updated info of block broadcast_2_piece0 15/04/25 21:52:40 INFO SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:838 15/04/25 21:52:40 INFO DAGScheduler: Submitting 1 missing tasks from Stage 0 (MappedRDD[7] at map at <console>:14) 15/04/25 21:52:40 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks 15/04/25 21:52:40 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, PROCESS_LOCAL, 1320 bytes) 15/04/25 21:52:40 INFO Executor: Running task 0.0 in stage 0.0 (TID 0) 15/04/25 21:52:40 INFO HadoopRDD: Input split: file:/usr/local/cdh/spark/spark-1.2.0-bin-hadoop1/Desktop/file.txt:0+61 15/04/25 21:52:41 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 1895 bytes result sent to driver 15/04/25 21:52:41 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 355 ms on localhost (1/1) 15/04/25 21:52:41 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 15/04/25 21:52:41 INFO DAGScheduler: Stage 0 (map at <console>:14) finished in 0.652 s 15/04/25 21:52:41 INFO DAGScheduler: looking for newly runnable stages 15/04/25 21:52:41 INFO DAGScheduler: running: Set() 15/04/25 21:52:41 INFO DAGScheduler: waiting: Set(Stage 1) 15/04/25 21:52:41 INFO DAGScheduler: failed: Set() 15/04/25 21:52:41 INFO DAGScheduler: Missing parents for Stage 1: List() 15/04/25 21:52:41 INFO DAGScheduler: Submitting Stage 1 (ShuffledRDD[8] at reduceByKey at <console>:14), which is now runnable 15/04/25 21:52:41 INFO MemoryStore: ensureFreeSpace(2112) called with curMem=81571, maxMem=280248975 15/04/25 21:52:41 INFO MemoryStore: Block broadcast_3 stored as values in memory (estimated size 2.1 KB, free 267.2 MB) 15/04/25 21:52:41 INFO MemoryStore: ensureFreeSpace(1544) called with curMem=83683, maxMem=280248975 15/04/25 21:52:41 INFO MemoryStore: Block broadcast_3_piece0 stored as bytes in memory (estimated size 1544.0 B, free 267.2 MB) 15/04/25 21:52:41 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory on localhost:51130 (size: 1544.0 B, free: 267.3 MB) 15/04/25 21:52:41 INFO BlockManagerMaster: Updated info of block broadcast_3_piece0 15/04/25 21:52:41 INFO SparkContext: Created broadcast 3 from broadcast at DAGScheduler.scala:838 15/04/25 21:52:41 INFO DAGScheduler: Submitting 1 missing tasks from Stage 1 (ShuffledRDD[8] at reduceByKey at <console>:14) 15/04/25 21:52:41 INFO TaskSchedulerImpl: Adding task set 1.0 with 1 tasks 15/04/25 21:52:41 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 1, localhost, PROCESS_LOCAL, 1056 bytes) 15/04/25 21:52:41 INFO Executor: Running task 0.0 in stage 1.0 (TID 1) 15/04/25 21:52:41 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks 15/04/25 21:52:41 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 4 ms 15/04/25 21:52:41 INFO Executor: Finished task 0.0 in stage 1.0 (TID 1). 1168 bytes result sent to driver 15/04/25 21:52:41 INFO DAGScheduler: Stage 1 (collect at <console>:17) finished in 0.126 s 15/04/25 21:52:41 INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID 1) in 131 ms on localhost (1/1) 15/04/25 21:52:41 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool 15/04/25 21:52:41 INFO DAGScheduler: Job 0 finished: collect at <console>:17, took 1.103095 s res0: Array[(String, Int)] = Array((this,1), (is,1), (Hello,1), (haoop,1), (home,1), (book;,1), ("",1), (World,1), (j2ee,1), (JAVA,1), (HADOOP,1), (my,1))9、停止服务./sbin/stop-master.sh[root@localhost spark-1.2.0-bin-hadoop1]# sbin/stop-master.sh stopping org.apache.spark.deploy.master.Master
相关文章推荐
- Ubuntu /Spark单机环境搭建
- 单机搭建spark环境
- spark1.2.0版本搭建伪分布式环境
- spark1.2.0+hadoop2.4.0集群环境搭建
- Apache Spark1.2.0 集群环境搭建
- spark1.2.0版本搭建伪分布式环境
- spark-1.2.0 集群环境搭建
- 在Ubuntu14.04 64bit上搭建单机Spark环境,IDE为Intelli IDEA
- spark-1.2.0 集群环境搭建
- 在Ubuntu 14.04 64bit上搭建单机本地节点Spark 1.3.0环境
- spark-1.2.0 集群环境搭建
- spark-1.2.0 集群环境搭建
- Ubuntu 16.04 Spark单机环境搭建
- spark-1.2.0 集群环境搭建
- spark-1.2.0 集群环境搭建(完整一套)
- 搭建单机CDH环境,并更新spark环境
- 单机RedHat6.5+JDK1.8+Hadoop2.7.3+Spark2.1.1+zookeeper3.4.6+kafka2.11+flume1.6环境搭建步骤
- 单机Docker搭建Hadoop/Spark环境
- 单机搭建基于Hadoop的Spark环境
- spark windows java 单机搭建环境并且读取文本中字母数量