spark join操作
2016-05-15 17:13
483 查看
// 使用join算子关联两个RDD
// join以后,还是会根据key进行join,并返回JavaPairRDD
// 但是JavaPairRDD的第一个泛型类型,之前两个JavaPairRDD的key的类型,因为是通过key进行join的
// 第二个泛型类型,是Tuple2<v1, v2>的类型,Tuple2的两个泛型分别为原始RDD的value的类型
// join,就返回的RDD的每一个元素,就是通过key join上的一个pair
public static void myJoin(){
SparkConf conf = new SparkConf()
.setAppName("join")
.setMaster("local");
// 创建JavaSparkContext
JavaSparkContext sc = new JavaSparkContext(conf);
List<Tuple2<Integer, String>> studentList = Arrays.asList(
new Tuple2<Integer, String>(1, "leo"),
new Tuple2<Integer, String>(2, "jack"),
new Tuple2<Integer, String>(3, "tom"));
List<Tuple2<Integer, Integer>> scoreList = Arrays.asList(
new Tuple2<Integer, Integer>(2, 90),
new Tuple2<Integer, Integer>(1, 100),
new Tuple2<Integer, Integer>(3, 60));
JavaPairRDD<Integer, String> students = sc.parallelizePairs(studentList);
JavaPairRDD<Integer, Integer> scores = sc.parallelizePairs(scoreList);
JavaPairRDD<Integer, Tuple2<String, Integer>> studentScores = students.join(scores);
studentScores.foreach(
new VoidFunction<Tuple2<Integer,Tuple2<String,Integer>>>() {
private static final long serialVersionUID = 1L;
@Override
public void call(Tuple2<Integer, Tuple2<String, Integer>> t)
throws Exception {
System.out.println("student id: " + t._1);
System.out.println("student name: " + t._2._1);
System.out.println("student score: " + t._2._2);
System.out.println("===============================");
}
});
sc.close();
}
运算结果:
student id: 1
student name: leo
student score: 100
===============================
student id: 3
student name: tom
student score: 60
===============================
student id: 2
student name: jack
student score: 90
===============================
如果把其中一个删除了会有什么结果呢?
可以看到只是返回找到jion的结果:
student id: 3
student name: tom
student score: 60
===============================
student id: 2
student name: jack
student score: 90
===============================
// join以后,还是会根据key进行join,并返回JavaPairRDD
// 但是JavaPairRDD的第一个泛型类型,之前两个JavaPairRDD的key的类型,因为是通过key进行join的
// 第二个泛型类型,是Tuple2<v1, v2>的类型,Tuple2的两个泛型分别为原始RDD的value的类型
// join,就返回的RDD的每一个元素,就是通过key join上的一个pair
public static void myJoin(){
SparkConf conf = new SparkConf()
.setAppName("join")
.setMaster("local");
// 创建JavaSparkContext
JavaSparkContext sc = new JavaSparkContext(conf);
List<Tuple2<Integer, String>> studentList = Arrays.asList(
new Tuple2<Integer, String>(1, "leo"),
new Tuple2<Integer, String>(2, "jack"),
new Tuple2<Integer, String>(3, "tom"));
List<Tuple2<Integer, Integer>> scoreList = Arrays.asList(
new Tuple2<Integer, Integer>(2, 90),
new Tuple2<Integer, Integer>(1, 100),
new Tuple2<Integer, Integer>(3, 60));
JavaPairRDD<Integer, String> students = sc.parallelizePairs(studentList);
JavaPairRDD<Integer, Integer> scores = sc.parallelizePairs(scoreList);
JavaPairRDD<Integer, Tuple2<String, Integer>> studentScores = students.join(scores);
studentScores.foreach(
new VoidFunction<Tuple2<Integer,Tuple2<String,Integer>>>() {
private static final long serialVersionUID = 1L;
@Override
public void call(Tuple2<Integer, Tuple2<String, Integer>> t)
throws Exception {
System.out.println("student id: " + t._1);
System.out.println("student name: " + t._2._1);
System.out.println("student score: " + t._2._2);
System.out.println("===============================");
}
});
sc.close();
}
运算结果:
student id: 1
student name: leo
student score: 100
===============================
student id: 3
student name: tom
student score: 60
===============================
student id: 2
student name: jack
student score: 90
===============================
如果把其中一个删除了会有什么结果呢?
可以看到只是返回找到jion的结果:
student id: 3
student name: tom
student score: 60
===============================
student id: 2
student name: jack
student score: 90
===============================
相关文章推荐
- Spark RDD API详解(一) Map和Reduce
- 使用spark和spark mllib进行股票预测
- Spark随谈——开发指南(译)
- Spark,一种快速数据分析替代方案
- MySQL中join语句的基本使用教程及其字段对性能的影响
- SQL 外链接操作小结 inner join left join right join
- SQL语句的并集UNION 交集JOIN(内连接,外连接)等介绍
- 浅谈SQL Server中的三种物理连接操作(性能比较)
- oracle中left join和right join的区别浅谈
- MySQL JOIN之完全用法
- mysql多表join时候update更新数据的方法
- SQL中的left join right join
- SQL的Join使用图解教程
- SQL Join的一些总结(实例)
- union这个连接是有什么用的和INNER JOIN有什么区别
- python分割和拼接字符串
- C#多线程之Thread中Thread.Join()函数用法分析
- JavaScript中join()方法的使用简介
- MySQL中Nested-Loop Join算法小结
- awk实现Left、join查询、去除重复值以及局部变量讲解例子