您的位置:首页 > 编程语言 > Java开发

基于eclipse maven 开发 spark 集群计算

2016-01-23 21:51 411 查看
1. 根据前面的文章,搭建好spark on yarn的集群,即hadoop和spark均搭建成功

/usr/local/hadoop/sbin/start-all.sh

启动hadoop yarn

6661 NameNode
7163 ResourceManager
7300 NodeManager
7012 SecondaryNameNode
3119
7512 Jps
6795 DataNode


2. 打开eclipse,创建maven项目

3.修改pom.xml 增加jar包依赖

<dependency> <!-- Spark dependency -->
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>1.6.0</version>
<scope>provided</scope>
 </dependency>
4. 点击run---as---maven install

此时会下载依赖的jar包

5.在app.java主类中调用spark

package com.fei.simple_project;

import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.api.java.function.Function;

/**
* Hello world!
*
*/
public class App
{
public static void main( String[] args )
{
String logFile = "README.md";
SparkConf conf = new SparkConf().setAppName("Simple Application");
JavaSparkContext sc = new JavaSparkContext(conf);
JavaRDD<String> logData = sc.textFile(logFile).cache();

long numAs = logData.filter(new Function<String, Boolean>() {
public Boolean call(String s) { return s.contains("a"); }
}).count();

long numBs = logData.filter(new Function<String, Boolean>() {
public Boolean call(String s) { return s.contains("b"); }
}).count();

System.out.println( "Hello World!" );
System.out.println("Lines with a: " + numAs + ", lines with b: " + numBs);
}
}


6. run---as---install

将会在target目录下生成jar包

7. 运行spark.sh

/usr/local/spark/bin/spark-submit --class "com.fei.simple_project.App" --master local[4] /home/tizen/share/working-dir/spark/simple-project/target/simple-project-0.0.1-SNAPSHOT.jar


8. 查看执行效果

Exception in thread "main" org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs://namenode:9000/user/tizen/README.md
at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:285)


说明我们还没有把README.md文件上传到集群中去

9. hdfs dfs -put README.md README.md

10 执行spark.sh

16/01/23 21:49:23 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool
16/01/23 21:49:23 INFO DAGScheduler: ResultStage 1 (count at App.java:25) finished in 0.047 s
16/01/23 21:49:23 INFO DAGScheduler: Job 1 finished: count at App.java:25, took 0.155340 s
Hello World!
Lines with a: 58, lines with b: 26
16/01/23 21:49:23 INFO SparkContext: Invoking stop() from shutdown hook
16/01/23 21:49:23 INFO SparkUI: Stopped Spark web UI at http://192.168.0.101:4040 16/01/23 21:49:23 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
可以看到结果出来了

查看url

http://192.168.0.101:4040
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: