【Spark Java API】Transformation(8)—fullOuterJoin、leftOuterJoin、rightOuterJoin
2016-08-20 10:57
543 查看
fullOuterJoin
官方文档描述:
Perform a full outer join of `this` and `other`. For each element (k, v) in `this`, the resulting RDD will either contain all pairs (k, (Some(v), Some(w))) for w in `other`, or the pair (k, (Some(v), None)) if no elements in `other` have key k. Similarly, for each element (k, w) in `other`, the resulting RDD will either contain all pairs (k, (Some(v), Some(w))) for v in `this`, or the pair (k, (None, Some(w))) if no elements in `this` have key k. Uses the given Partitioner to partition the output RDD.
函数原型:
def fullOuterJoin[W](other: JavaPairRDD[K, W]): JavaPairRDD[K, (Optional[V], Optional[W])] def fullOuterJoin[W](other: JavaPairRDD[K, W], numPartitions: Int) : JavaPairRDD[K, (Optional[V], Optional[W])] def fullOuterJoin[W](other: JavaPairRDD[K, W], partitioner: Partitioner) : JavaPairRDD[K, (Optional[V], Optional[W])]
源码分析:
def fullOuterJoin[W](other: RDD[(K, W)], partitioner: Partitioner) : RDD[(K, (Option[V], Option[W]))] = self.withScope { this.cogroup(other, partitioner).flatMapValues { case (vs, Seq()) => vs.iterator.map(v => (Some(v), None)) case (Seq(), ws) => ws.iterator.map(w => (None, Some(w))) case (vs, ws) => for (v <- vs.iterator; w <- ws.iterator) yield (Some(v), Some(w)) } }
从源码中可以看出,fullOuterJoin() 与 join() 类似,首先进行 cogroup(), 得到
<K, (Iterable[V1], Iterable[V2])>类型的 MappedValuesRDD,然后对 Iterable[V1] 和 Iterable[V2] 做笛卡尔集,注意在V1,V2中添加了None,并将集合 flat() 化。
实例:
List<Integer> data = Arrays.asList(1, 2, 4, 3, 5, 6, 7); final Random random = new Random(); JavaRDD<Integer> javaRDD = javaSparkContext.parallelize(data); JavaPairRDD<Integer,Integer> javaPairRDD = javaRDD.mapToPair(new PairFunction<Integer, Integer, Integer>() { @Override public Tuple2<Integer, Integer> call(Integer integer) throws Exception { return new Tuple2<Integer, Integer>(integer,random.nextInt(10)); } }); //全关联 JavaPairRDD<Integer,Tuple2<Optional<Integer>,Optional<Integer>>> fullJoinRDD = javaPairRDD.fullOuterJoin(javaPairRDD); System.out.println(fullJoinRDD); JavaPairRDD<Integer,Tuple2<Optional<Integer>,Optional<Integer>>> fullJoinRDD1 = javaPairRDD.fullOuterJoin(javaPairRDD,2); System.out.println(fullJoinRDD1); JavaPairRDD<Integer,Tuple2<Optional<Integer>,Optional<Integer>>> fullJoinRDD2 = javaPairRDD.fullOuterJoin(javaPairRDD, new Partitioner() { @Override public int numPartitions() { return 2; } @Override public int getPartition(Object key) { return (key.toString()).hashCode()%numPartitions(); } }); System.out.println(fullJoinRDD2);
leftOuterJoin
官方文档描述:
Perform a left outer join of `this` and `other`. For each element (k, v) in `this`, the resulting RDD will either contain all pairs (k, (v, Some(w))) for w in `other`, or the pair (k, (v, None)) if no elements in `other` have key k. Uses the given Partitioner to partition the output RDD.
函数原型:
def leftOuterJoin[W](other: JavaPairRDD[K, W]): JavaPairRDD[K, (V, Optional[W])] def leftOuterJoin[W](other: JavaPairRDD[K, W], numPartitions: Int) : JavaPairRDD[K, (V, Optional[W])] def leftOuterJoin[W](other: JavaPairRDD[K, W], partitioner: Partitioner): JavaPairRDD[K, (V, Optional[W])]
源码分析:
def leftOuterJoin[W]( other: RDD[(K, W)], partitioner: Partitioner): RDD[(K, (V, Option[W]))] = self.withScope { this.cogroup(other, partitioner).flatMapValues { pair => if (pair._2.isEmpty) { pair._1.iterator.map(v => (v, None)) } else { for (v <- pair._1.iterator; w <- pair._2.iterator) yield (v, Some(w)) } } }
从源码中可以看出,leftOuterJoin() 与 fullOuterJoin() 类似,首先进行 cogroup(), 得到
<K, (Iterable[V1], Iterable[V2])>类型的 MappedValuesRDD,然后对 Iterable[V1] 和 Iterable[V2] 做笛卡尔集,注意在V1中添加了None,并将集合 flat() 化。
实例:
List<Integer> data = Arrays.asList(1, 2, 4, 3, 5, 6, 7); final Random random = new Random(); JavaRDD<Integer> javaRDD = javaSparkContext.parallelize(data); JavaPairRDD<Integer,Integer> javaPairRDD = javaRDD.mapToPair(new PairFunction<Integer, Integer, Integer>() { @Override public Tuple2<Integer, Integer> call(Integer integer) throws Exception { return new Tuple2<Integer, Integer>(integer,random.nextInt(10)); } }); //左关联 JavaPairRDD<Integer,Tuple2<Integer,Optional<Integer>>> leftJoinRDD = javaPairRDD.leftOuterJoin(javaPairRDD); System.out.println(leftJoinRDD); JavaPairRDD<Integer,Tuple2<Integer,Optional<Integer>>> leftJoinRDD1 = javaPairRDD.leftOuterJoin(javaPairRDD,2); System.out.println(leftJoinRDD1); JavaPairRDD<Integer,Tuple2<Integer,Optional<Integer>>> leftJoinRDD2 = javaPairRDD.leftOuterJoin(javaPairRDD, new Partitioner() { @Override public int numPartitions() { return 2; } @Override public int getPartition(Object key) { return (key.toString()).hashCode()%numPartitions(); } }); System.out.println(leftJoinRDD2);
rightOuterJoin
官方文档描述:
Perform a right outer join of `this` and `other`. For each element (k, w) in `other`, the resulting RDD will either contain all pairs (k, (Some(v), w)) for v in `this`, or the pair (k, (None, w)) if no elements in `this` have key k. Uses the given Partitioner to partition the output RDD.
函数原型:
def rightOuterJoin[W](other: JavaPairRDD[K, W]): JavaPairRDD[K, (Optional[V], W)] def rightOuterJoin[W](other: JavaPairRDD[K, W], numPartitions: Int) : JavaPairRDD[K, (Optional[V], W)] def rightOuterJoin[W](other: JavaPairRDD[K, W], partitioner: Partitioner): JavaPairRDD[K, (Optional[V], W)]
源码分析:
def rightOuterJoin[W](other: RDD[(K, W)], partitioner: Partitioner) : RDD[(K, (Option[V], W))] = self.withScope { this.cogroup(other, partitioner).flatMapValues { pair => if (pair._1.isEmpty) { pair._2.iterator.map(w => (None, w)) } else { for (v <- pair._1.iterator; w <- pair._2.iterator) yield (Some(v), w) } } }
从源码中可以看出,rightOuterJoin() 与 fullOuterJoin() 类似,首先进行 cogroup(), 得到
<K, (Iterable[V1], Iterable[V2])>类型的 MappedValuesRDD,然后对 Iterable[V1] 和 Iterable[V2] 做笛卡尔集,注意在V2中添加了None,并将集合 flat() 化。
实例:
List<Integer> data = Arrays.asList(1, 2, 4, 3, 5, 6, 7); final Random random = new Random(); JavaRDD<Integer> javaRDD = javaSparkContext.parallelize(data); JavaPairRDD<Integer,Integer> javaPairRDD = javaRDD.mapToPair(new PairFunction<Integer, Integer, Integer>() { @Override public Tuple2<Integer, Integer> call(Integer integer) throws Exception { return new Tuple2<Integer, Integer>(integer,random.nextInt(10)); } }); //右关联 JavaPairRDD<Integer,Tuple2<Optional<Integer>,Integer>> rightJoinRDD = javaPairRDD.rightOuterJoin(javaPairRDD); System.out.println(rightJoinRDD); JavaPairRDD<Integer,Tuple2<Optional<Integer>,Integer>> rightJoinRDD1 = javaPairRDD.rightOuterJoin(javaPairRDD,2); System.out.println(rightJoinRDD1); JavaPairRDD<Integer,Tuple2<Optional<Integer>,Integer>> rightJoinRDD2 = javaPairRDD.rightOuterJoin(javaPairRDD, new Partitioner() { @Override public int numPartitions() { return 2; } @Override public int getPartition(Object key) { return (key.toString()).hashCode()%numPartitions(); } }); System.out.println(rightJoinRDD2);
相关文章推荐
- Spark算子:RDD键值转换操作(5)–leftOuterJoin、rightOuterJoin、subtractByKey
- Spark算子:RDD键值转换操作(5)–leftOuterJoin、rightOuterJoin、subtractByKey
- 3.3 Spark RDD键值转换操作5-leftOuterJoin、rightOuterJoin、subtractByKey
- Spark算子--leftOuterJoin和rightOuterJoin
- Spark算子:RDD键值转换操作(5)–leftOuterJoin、rightOuterJoin、subtractByKey
- Sql语句中的inner join ,left outer join ,right outer join ,full join 的理解
- Spark算子:RDD键值转换操作(5)–leftOuterJoin、rightOuterJoin、subtractByKey
- 终于找到一个有助理解left/right/full outer join的例子
- Linq表连接大全(INNER JOIN、LEFT OUTER JOIN、RIGHT OUTER JOIN、FULL OUTER JOIN、CROSS JOIN)
- Linq表连接大全(INNER JOIN、LEFT OUTER JOIN、RIGHT OUTER JOIN、FULL OUTER JOIN、CROSS JOIN)
- OCP-1Z0-051 第132题 LEFT OUTER JOIN,RIGHT OUTER JOIN,FULL OUTER JOIN的用法
- Spark编程之基本的RDD算子之join,rightOuterJoin, leftOuterJoin
- left (outer) join , right (outer) join, full (outer) join, (inner) join, cross join 区别
- Linq表连接大全(INNER JOIN、LEFT OUTER JOIN、RIGHT OUTER JOIN、FULL OUTER JOIN、CROSS JOIN)
- 终于找到一个有助理解left/right/full outer join的例子
- Oracle外连接(left/right/full outer join)语法详解
- Spark pyspark rdd连接函数之join、leftOuterJoin、rightOuterJoin和fullOuterJoin介绍
- Linq语句实现(INNER JOIN、LEFT OUTER JOIN、RIGHT OUTER JOIN、FULL OUTER JOIN、CROSS JOIN)
- Linq表连接大全(INNER JOIN、LEFT OUTER JOIN、RIGHT OUTER JOIN、FULL OUTER JOIN、CROSS JOIN)
- [摘]终于找到一个有助理解left/right/full outer join的例子