您的位置:首页 > 编程语言 > Java开发

spark入门cogroup简单例子(JAVA)

2017-03-02 13:45 441 查看
maven依赖:

<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>1.6.0</version>
</dependency>

public class CoGroup {
public static void main(String[] args) {
/**
* 创建spark配置对象SparkConf,设置spark运行时配置信息,
* 例如通过setMaster来设置程序要连接的集群的Master的URL,如果设置为local,
* spark为本地运行
*/
SparkConf conf = new SparkConf().setAppName("My first spark").setMaster("local");
/**
* 创建JavaSparkContext对象
* SparkContext是spark所有功能的唯一入口,
* SparkContext核心作用,初始化spark运行所需要的核心组件,同时还会负责spark程序在master的注册。
*
*/
JavaSparkContext sc = new JavaSparkContext(conf);
/**
* 初始化学生集合
*/
List<Tuple2<Integer,String>> nameList = Arrays.asList(new Tuple2<Integer,String>(1,"xiaoming"),
new Tuple2<Integer,String>(2,"feifei"),
new Tuple2<Integer,String>(3,"katong"));
/**
* 初始化分数集合
*/
List<Tuple2<Integer,Integer>> scoreList = Arrays.asList(
new Tuple2<Integer,Integer>(1,90),
new Tuple2<Integer,Integer>(2,80),
new Tuple2<Integer,Integer>(1,70),
new Tuple2<Integer,Integer>(3,60),
new Tuple2<Integer,Integer>(2,80),
new Tuple2<Integer,Integer>(1,70));
//转成rdd
JavaPairRDD<Integer, String> names = sc.parallelizePairs(nameList);
JavaPairRDD<Integer, Integer> scores = sc.parallelizePairs(scoreList);
//聚合分组
JavaPairRDD<Integer, Tuple2<Iterable<String>, Iterable<Integer>>> cogroup = names.cogroup(scores);
//打印
cogroup.foreach(new VoidFunction<Tuple2<Integer, Tuple2<Iterable<String>, Iterable<Integer>>>>() {
public void call(Tuple2<Integer, Tuple2<Iterable<String>, Iterable<Integer>>> integerTuple2Tuple2) throws Exception {
System.out.println(integerTuple2Tuple2._1+"  "+integerTuple2Tuple2._2._1+"  "+integerTuple2Tuple2._2._2);
}
});
//关闭
sc.close();
}

}
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: