spark学习-21-Spark的groupByKey
2017-08-24 13:39
435 查看
1.看代码
运行结果如下:
JavaPairRDD
package groupByKey; import java.util.Arrays; import java.util.List; import org.apache.spark.api.java.JavaPairRDD; import org.apache.spark.api.java.JavaRDD; import org.apache.spark.api.java.JavaSparkContext; import org.apache.spark.api.java.function.Function; import org.apache.spark.sql.SparkSession; public class GroupByKey { public static void main(String[] args) { SparkSession spark= SparkSession.builder() .appName("lcc_java_read_hbase_register_to_table") .master("local[*]") .getOrCreate(); JavaSparkContext sc = new JavaSparkContext(spark.sparkContext()); List<Integer> datas = Arrays.asList(1, 2, 3, 4, 5, 6, 7, 8, 9); JavaRDD<Integer> dataRDD = sc.parallelize(datas); JavaPairRDD<Object,Iterable<Integer>> javapairrdd = dataRDD.groupBy(new Function<Integer, Object>() { @Override public Object call(Integer v1) throws Exception { return (v1 % 2 == 0) ? "偶数" : "奇数"; } }); javapairrdd.collect().forEach(System.out::println); sc.parallelize(datas) .groupBy(new Function<Integer, Object>() { @Override public Object call(Integer v1) throws Exception { return (v1 % 2 == 0) ? "偶数" : "奇数"; } }) .collect() .forEach(System.out::println); List<String> datas2 = Arrays.asList("dog", "tiger", "lion", "cat", "spider", "eagle"); JavaRDD<String> dataRDD2 = sc.parallelize(datas2); sc.parallelize(datas2) .keyBy(v1 -> v1.length()) .groupByKey() .collect() .forEach(System.out::println); JavaPairRDD<Object,Iterable<String>> javapairrdd2 = dataRDD2.groupBy(new Function<String, Object>() { @Override public Object call(String v1) throws Exception { System.out.println("=======v1======"+v1); if(v1.length() < 5){ return "过滤"; }else{ return v1.length(); } } }); javapairrdd2.collect().forEach(System.out::println); sc.close(); } }
运行结果如下:
(偶数,[2, 4, 6, 8]) (奇数,[1, 3, 5, 7, 9]) (偶数,[2, 4, 6, 8]) (奇数,[1, 3, 5, 7, 9]) (4,[lion]) (6,[spider]) (3,[dog, cat]) (5,[tiger, eagle]) =======v1======cat =======v1======spider =======v1======eagle =======v1======dog =======v1======tiger =======v1======lion (6,[spider]) (过滤,[dog, lion, cat]) (5,[tiger, eagle])
JavaPairRDD
相关文章推荐
- Spark源码学习笔记(随笔)-groupByKey()是宽依赖吗
- spark新能优化之reduceBykey和groupBykey的使用
- 【Spark系列2】reduceByKey和groupByKey区别与用法
- Spark RDD/Core 编程 API入门系列 之rdd案例(map、filter、flatMap、groupByKey、reduceByKey、join、cogroupy等)(四)
- Spark编程之基本的RDD算子之cogroup,groupBy,groupByKey
- 【转载】Spark中:reduceByKey和groupByKey区别与用法
- Spark程序使用groupByKey后数据存入HBase出现重复的现象
- spark【例子】同类合并、计算(主要使用groupByKey)
- Spark API编程动手实战-04-以在Spark 1.2版本实现对union、groupByKey、join、reduce、lookup等操作实践
- 【Spark系列2】reduceByKey和groupByKey区别与用法
- 在Spark中关于groupByKey与reduceByKey的区别
- spark groupByKey 也是可以filter的
- Spark API 详解/大白话解释 之 groupBy、groupByKey
- [spark]groupbykey reducebykey
- Spark之combineByKey学习理解
- Spark使用小结:Java版的GroupByKey示例
- spark--transform算子--groupByKey
- Spark API编程动手实战-04-以在Spark 1.2版本实现对union、groupByKey、join、reduce、lookup等操作实践
- 在Spark中尽量少使用GroupByKey函数
- spark中groupByKey与reducByKey的区别