spark_API-collectAsMap
2017-02-04 14:02
861 查看
collectAsMap()返回hashMap包含所有RDD中的分片,key如果重复,后边的元素会覆盖前面的元素。
/**
* Return the key-value pairs in this RDD to the master as a Map.
*
* Warning: this doesn't return a multimap (so if you have multiple values to the same key, only
* one value per key is preserved in the map returned)
*
* @note this method should only be used if the resulting data is expected to be small, as
* all the data is loaded into the driver's memory.
*/
def collectAsMap(): Map[K, V] = self.withScope {
val data = self.collect()
val map = new mutable.HashMap[K, V]
map.sizeHint(data.length)
data.foreach { pair => map.put(pair._1, pair._2) }
map
}
package com.dt.spark.main.RDDLearn.RDDCollectAsMap import org.apache.spark.{SparkContext, SparkConf} /** * Created by hjw on 17/2/4. */ /* collect (较常用) 将RDD中的数据收集起来,变成一个Array,仅限数据量比较小的时候。 collectAsMap() 返回hashMap包含所有RDD中的分片,key如果重复,后边的元素会覆盖前面的元素。 /** * Return the key-value pairs in this RDD to the master as a Map. * * Warning: this doesn't return a multimap (so if you have multiple values to the same key, only * one value per key is preserved in the map returned) * * @note this method should only be used if the resulting data is expected to be small, as * all the data is loaded into the driver's memory. */ def collectAsMap(): Map[K, V] = self.withScope { val data = self.collect() val map = new mutable.HashMap[K, V] map.sizeHint(data.length) data.foreach { pair => map.put(pair._1, pair._2) } map } */ object RDDCollectAsMapTest { def main(args: Array[String]) { val conf = new SparkConf() conf.setAppName("test") conf.setMaster("local") val sc = new SparkContext(conf) val rdd = sc.parallelize(List((1, "a"), (1, "b"), (1, "c"), (2, "d"), (2, "e"), (3, "g")) ) val rddMap = rdd.collectAsMap() rddMap.foreach(println(_)) // (2,e) // (1,c) // (3,g) } }
相关文章推荐
- Spark编程之基本的RDD算子sparkContext,foreach,foreachPartition, collectAsMap
- Spark函数讲解:collectAsMap
- Spark API 详解/大白话解释 之 RDD、partition、count、collect
- Spark算子:RDDAction操作–first/count/reduce/collect/collectAsMap
- Spark API编程动手实战-01-以本地模式进行Spark API实战map、filter和collect
- Spark API编程动手实战-01-以本地模式进行Spark API实战map、filter和collect
- Spark函数讲解:collectAsMap
- [Dynamic Language] pyspark Python3.7环境设置 及py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe解决!
- API。Spark SQL 1.3.0 DataFrame介绍、使用及提供了些完整的数据写入
- Spark RDD/Core 编程 API入门系列 之rdd案例(map、filter、flatMap、groupByKey、reduceByKey、join、cogroupy等)(四)
- spark api
- Spark RDD API扩展开发
- Spark 读取HDFS存入 HBase(1.0.0 新 API)
- Spark算子:RDD行动Action操作(7)–saveAsNewAPIHadoopFile、saveAsNewAPIHadoopDataset
- [Spark][python]RDD的collect 作用是什么?
- spark function api and example
- 这几天折腾spark的kafka的低阶API createDirectStream的一些总结
- Spark API 详解/大白话解释 之 reduce、reduceByKey
- spark streaming源码分析4 DStream相关API
- Spark Python API 学习(3)