spark RDD算子(十)之PairRDD的Action操作countByKey, collectAsMap
2017-04-26 22:11
519 查看
countByKey
def countByKey(): Map[K, Long]以RDD{(1, 2),(2,4),(2,5), (3, 4),(3,5), (3, 6)}为例 rdd.countByKey会返回{(1,1),(2,2),(3,3)}
scala例子
scala> val rdd = sc.parallelize(Array((1, 2),(2,4),(2,5), (3, 4),(3,5), (3, 6))) scala> val countbyKeyRDD = rdd.countByKey() countbyKeyRDD: scala.collection.Map[Int,Long] = Map(1 -> 1, 2 -> 2, 3 -> 3)
java例子
JavaRDD<Tuple2<Integer, Integer>> tupleRDD = sc.parallelize(Arrays.asList(new Tuple2<>(1, 2), new Tuple2<>(2, 4), new Tuple2<>(2, 5), new Tuple2<>(3, 4), new Tuple2<>(3, 5), new Tuple2<>(3, 6))); JavaPairRDD<Integer, Integer> mapRDD = JavaPairRDD.fromJavaRDD(tupleRDD); //countByKey Map<Integer, Object> countByKeyRDD = mapRDD.countByKey(); for (Integer i:countByKeyRDD.keySet()) { System.out.println("("+i+", "+countByKeyRDD.get(i)+")"); } /* 输出 (1, 1) (3, 3) (2, 2) */
collectAsMap
将pair类型(键值对类型)的RDD转换成map, 还是上面的例子scala例子
scala> val rdd = sc.parallelize(Array((1, 2),(2,4),(2,5), (3, 4),(3,5), (3, 6))) scala> rdd.collectAsMap() res1: scala.collection.Map[Int,Int] = Map(2 -> 5, 1 -> 2, 3 -> 6)
java例子
JavaRDD<Tuple2<Integer, Integer>> tupleRDD = sc.parallelize(Arrays.asList(new Tuple2<>(1, 2), new Tuple2<>(2, 4), new Tuple2<>(2, 5), new Tuple2<>(3, 4), new Tuple2<>(3, 5), new Tuple2<>(3, 6))); JavaPairRDD<Integer, Integer> mapRDD = JavaPairRDD.fromJavaRDD(tupleRDD); Map<Integer, Integer> collectMap = mapRDD.collectAsMap();
相关文章推荐
- Spark算子:RDD行动Action操作(4)–countByKey、foreach、foreachPartition、sortBy
- spark RDD算子(九)之基本的Action操作 first, take, collect, count, countByValue, reduce, aggregate, fold,top
- Spark算子:RDD行动Action操作(4)–countByKey、foreach、foreachPartition、sortBy
- Spark算子:RDD行动Action操作(1)–first、count、reduce、collect
- Spark RDD概念学习系列之Pair RDD的action操作
- Spark算子:RDD行动Action操作(4)–countByKey、foreach
- Spark算子:RDD行动Action操作(1)–first、count、reduce、collect
- Spark算子:RDD行动Action操作(4)–countByKey、foreach、foreachPartition、sortBy
- Spark算子:RDD行动Action操作(1)–first、count、reduce、collect
- Spark算子:RDD行动Action操作(3)–aggregate、fold、lookup
- Spark算子:RDD行动Action操作(3)–aggregate、fold、lookup
- Spark算子:RDD行动Action操作(5)–saveAsTextFile、saveAsSequenceFile、saveAsObjectFile
- 【Spark】RDD操作详解4——Action算子
- Spark算子:RDD行动Action操作(5)–saveAsTextFile、saveAsSequenceFile、saveAsObjectFile
- Spark算子:RDD行动Action操作(5)–saveAsTextFile、saveAsSequenceFile、saveAsObjectFile
- Spark算子:RDD行动Action操作(7)–saveAsNewAPIHadoopFile、saveAsNewAPIHadoopDataset
- Spark算子:RDD行动Action操作(7)–saveAsNewAPIHadoopFile、saveAsNewAPIHadoopDataset
- Spark算子:RDD行动Action操作(7)–saveAsNewAPIHadoopFile、saveAsNewAPIHadoopDataset
- Spark算子:RDD行动Action操作(6)–saveAsHadoopFile、saveAsHadoopDataset
- Spark算子:RDD行动Action操作(7)–saveAsNewAPIHadoopFile、saveAsNewAPIHadoopDataset