groupByKey与reduceByKey区别
2017-10-19 07:43
253 查看
If we compare the result of both ( “groupByKey” and “reduceByKey”) transformations, we have got the same results. I am sure you must be wondering what is the difference in both transformations. The “reduceByKey” transformations first combined the values for
each key in all partition, so each partition will have only one value for a key then after shuffling, in reduce phase executors will apply operation for example, in my case sum(lambda x: x+y).
Source: Databricks
But in case of “groupByKey” transformation, it will not combine the values in each key in all partition it directly shuffle the data then merge the values for each key. Here in “groupByKey” transformation lot of shuffling in the data is required to get the
answer, so it is better to use “reduceByKey” in case of large shuffling of data.
参考文章https://www.analyticsvidhya.com/blog/2016/10/using-pyspark-to-perform-transformations-and-actions-on-rdd/
each key in all partition, so each partition will have only one value for a key then after shuffling, in reduce phase executors will apply operation for example, in my case sum(lambda x: x+y).
Source: Databricks
But in case of “groupByKey” transformation, it will not combine the values in each key in all partition it directly shuffle the data then merge the values for each key. Here in “groupByKey” transformation lot of shuffling in the data is required to get the
answer, so it is better to use “reduceByKey” in case of large shuffling of data.
参考文章https://www.analyticsvidhya.com/blog/2016/10/using-pyspark-to-perform-transformations-and-actions-on-rdd/
相关文章推荐
- groupByKey与reduceByKey区别
- [spark]groupbykey reducebykey
- spark【例子】同类合并、计算(主要使用groupByKey)
- 在Spark中关于groupByKey与reduceByKey的区别
- Hive报错 -- Expression not in GROUP BY key ‘xxx’
- Spark算子:RDD键值转换操作(3)–groupByKey、reduceByKey、reduceByKeyLocally
- spark新能优化之reduceBykey和groupBykey的使用
- Spark算子:RDD键值转换操作(3)–groupByKey、reduceByKey、reduceByKeyLocally
- RDD-Transformation——groupByKey
- Spark API 详解/大白话解释 之 groupBy、groupByKey
- Spark中groupByKey与reduceByKey算子之间的区别
- [Apache Spark API][GroupByKey Vs ReduceByKey]
- spark 的一些常用函数 filter,map,flatMap,lookup ,reduce,groupByKey
- groupByKey一直OOM处理
- spark groupByKey操作
- spark中groupByKey与reducByKey的区别
- 深入理解groupByKey、reduceByKey
- Spark源码学习笔记(随笔)-groupByKey()是宽依赖吗
- spark--transform算子--groupByKey
- 在Spark中尽量少使用GroupByKey函数