基于eclipse maven 开发 spark 集群计算
2016-01-23 21:51
411 查看
1. 根据前面的文章,搭建好spark on yarn的集群,即hadoop和spark均搭建成功
/usr/local/hadoop/sbin/start-all.sh
启动hadoop yarn
2. 打开eclipse,创建maven项目
3.修改pom.xml 增加jar包依赖
此时会下载依赖的jar包
5.在app.java主类中调用spark
6. run---as---install
将会在target目录下生成jar包
7. 运行spark.sh
8. 查看执行效果
说明我们还没有把README.md文件上传到集群中去
9. hdfs dfs -put README.md README.md
10 执行spark.sh
查看url
/usr/local/hadoop/sbin/start-all.sh
启动hadoop yarn
6661 NameNode 7163 ResourceManager 7300 NodeManager 7012 SecondaryNameNode 3119 7512 Jps 6795 DataNode
2. 打开eclipse,创建maven项目
3.修改pom.xml 增加jar包依赖
<dependency> <!-- Spark dependency --> <groupId>org.apache.spark</groupId> <artifactId>spark-core_2.10</artifactId> <version>1.6.0</version> <scope>provided</scope> </dependency>4. 点击run---as---maven install
此时会下载依赖的jar包
5.在app.java主类中调用spark
package com.fei.simple_project; import org.apache.spark.SparkConf; import org.apache.spark.api.java.JavaRDD; import org.apache.spark.api.java.JavaSparkContext; import org.apache.spark.api.java.function.Function; /** * Hello world! * */ public class App { public static void main( String[] args ) { String logFile = "README.md"; SparkConf conf = new SparkConf().setAppName("Simple Application"); JavaSparkContext sc = new JavaSparkContext(conf); JavaRDD<String> logData = sc.textFile(logFile).cache(); long numAs = logData.filter(new Function<String, Boolean>() { public Boolean call(String s) { return s.contains("a"); } }).count(); long numBs = logData.filter(new Function<String, Boolean>() { public Boolean call(String s) { return s.contains("b"); } }).count(); System.out.println( "Hello World!" ); System.out.println("Lines with a: " + numAs + ", lines with b: " + numBs); } }
6. run---as---install
将会在target目录下生成jar包
7. 运行spark.sh
/usr/local/spark/bin/spark-submit --class "com.fei.simple_project.App" --master local[4] /home/tizen/share/working-dir/spark/simple-project/target/simple-project-0.0.1-SNAPSHOT.jar
8. 查看执行效果
Exception in thread "main" org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs://namenode:9000/user/tizen/README.md at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:285)
说明我们还没有把README.md文件上传到集群中去
9. hdfs dfs -put README.md README.md
10 执行spark.sh
16/01/23 21:49:23 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool 16/01/23 21:49:23 INFO DAGScheduler: ResultStage 1 (count at App.java:25) finished in 0.047 s 16/01/23 21:49:23 INFO DAGScheduler: Job 1 finished: count at App.java:25, took 0.155340 s Hello World! Lines with a: 58, lines with b: 26 16/01/23 21:49:23 INFO SparkContext: Invoking stop() from shutdown hook 16/01/23 21:49:23 INFO SparkUI: Stopped Spark web UI at http://192.168.0.101:4040 16/01/23 21:49:23 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!可以看到结果出来了
查看url
http://192.168.0.101:4040
相关文章推荐
- Java中的类反射与泛型信息
- eclipse 常用快捷键最佳实践
- Java学习之国际化程序
- 阿里大鱼短信平台使用(Java)
- spring+dwr
- 出来驾到学java3
- Java 格式化输出
- Java 取得当前日期之后N天的日期 zz
- spring oxm入门(包含demo)
- JavaSE入门学习8:Java基础语法(四)
- java执行bat批处理文件(下)
- MyEclipse 2015创建Maven Web项目的正确姿势
- Spring MVC 生成EXCEL
- 简易的Java拼图游戏
- spring mvc和web-flow的整合方案
- Mac 终端命令运行java
- struts2配置详解
- Java的垃圾回收机制
- struts2拦截器
- IO流 创建java文件列表