scala fold系列函数及 sparkRDD fold 操作解析
2014-12-19 11:48
609 查看
scala 的fold系列 函数用起来比较方便,这里对比总结一下。
fold
fold 的定义:
deffold[A1
Folds the elements of this traversable or iterator using the specified associative binary operator.
The order in which operations are performed on elements is unspecified and may be nondeterministic.
A1
a type parameter for the binary operator, a supertype of
z
a neutral element for the fold operation; may be added to the result an arbitrary number of times, and must not change the result (e.g.,
list concatenation, 0 for addition, or 1 for multiplication.)
op
a binary operator that must be associative
returns
the result of applying fold operator
fold 函数的操作顺序是不确定的,而且 A1 是 A 的超类,这是一个比较有用的特性在并发计算的时候,因为,对于fold的中间计算结果,是允许超类之间合并。
foldLeft
foldLeft 的定义:
deffoldLeft(z: B)(f:
Applies a binary operator to a start value and all elements of this sequence, going left to right.
Note: will not terminate for infinite-sized collections.
B
the result type of the binary operator.
z
the start value.
returns
the result of inserting
going left to right with the start value
where
foldLeft 中操作函数的顺序是严格从左向右执行,而且从数据类型来看,不适合用在并发情况下。
foldLeft 有一个特殊的符号表示:[b]/:
def /:(z: B)(op: (B, A) => B): B = foldLeft(z)(op)
[b]foldRight
foldRight 是先将数据reverse,然后调用foldLeft。对应的,foldRight也有一个特殊的符号表示::\
def foldRight(z: B)(op: (A, B) => B): B =
reversed.foldLeft(z)((x, y) => op(y, x))
def :\[B](z: B)(op: (A, B) => B): B = foldRight(z)(op)
[b]fold、foldLeft 函数 与reduce 有什么区别?
reduce 可是用于并行化操作,foldLeft 则不可以,这个对于分布式计算框架非常重要,这也是为什么spark等要保留reduce操作。
但是,在spark中,没有fold函数,那是因为:
fold 需要计算数据是有序的,reduce没有这个要求。fold中的操作,(x op y != y op x),reduce满足交换律。
这个问题在stackoverflow上有一个比较好的解释:
http://stackoverflow.com/questions/25158780/difference-between-reduce-and-foldleft-fold-in-functional-programming-particula?lq=1
fold
fold 的定义:
deffold[A1
>: A](z: A1)(op:
(A1, A1)
⇒ A1): A1
Folds the elements of this traversable or iterator using the specified associative binary operator.
The order in which operations are performed on elements is unspecified and may be nondeterministic.
A1
a type parameter for the binary operator, a supertype of
A.
z
a neutral element for the fold operation; may be added to the result an arbitrary number of times, and must not change the result (e.g.,
Nilfor
list concatenation, 0 for addition, or 1 for multiplication.)
op
a binary operator that must be associative
returns
the result of applying fold operator
opbetween all the elements and
z
fold 函数的操作顺序是不确定的,而且 A1 是 A 的超类,这是一个比较有用的特性在并发计算的时候,因为,对于fold的中间计算结果,是允许超类之间合并。
foldLeft
foldLeft 的定义:
deffoldLeft(z: B)(f:
(B, A)
⇒ B): B
Applies a binary operator to a start value and all elements of this sequence, going left to right.
Note: will not terminate for infinite-sized collections.
B
the result type of the binary operator.
z
the start value.
returns
the result of inserting
opbetween consecutive elements of this sequence,
going left to right with the start value
zon the left:
op(...op(z, x_1), x_2, ..., x_n)
where
x1, ..., xnare the elements of this sequence.
foldLeft 中操作函数的顺序是严格从左向右执行,而且从数据类型来看,不适合用在并发情况下。
foldLeft 有一个特殊的符号表示:[b]/:
def /:(z: B)(op: (B, A) => B): B = foldLeft(z)(op)
[b]foldRight
foldRight 是先将数据reverse,然后调用foldLeft。对应的,foldRight也有一个特殊的符号表示::\
def foldRight(z: B)(op: (A, B) => B): B =
reversed.foldLeft(z)((x, y) => op(y, x))
def :\[B](z: B)(op: (A, B) => B): B = foldRight(z)(op)
[b]fold、foldLeft 函数 与reduce 有什么区别?
reduce 可是用于并行化操作,foldLeft 则不可以,这个对于分布式计算框架非常重要,这也是为什么spark等要保留reduce操作。
但是,在spark中,没有fold函数,那是因为:
fold 需要计算数据是有序的,reduce没有这个要求。fold中的操作,(x op y != y op x),reduce满足交换律。
这个问题在stackoverflow上有一个比较好的解释:
http://stackoverflow.com/questions/25158780/difference-between-reduce-and-foldleft-fold-in-functional-programming-particula?lq=1
相关文章推荐
- 影响到Spark输出RDD分区的操作函数
- Spark算子:RDD键值转换操作(2)–combineByKey、foldByKey
- Spark算子:RDD键值转换操作(2)–combineByKey、foldByKey
- spark2.x由浅入深深到底系列六之RDD java api调用scala api的原理
- Spark RDD操作:combineByKey函数详解
- Spark算子:RDD键值转换操作(2)–combineByKey、foldByKey
- Spark算子:RDD键值转换操作(2)–combineByKey、foldByKey
- Spark算子:RDD行动Action操作(3)–aggregate、fold、lookup
- Spark RDD/Core 编程 API入门系列 之rdd实战(rdd基本操作实战及transformation和action流程图)(源码)(三)
- 快速上手写spark代码系列01:RDD transformation函数入门
- 影响Spark输出RDD分区的操作函数
- 第6课 :零基础实战Scala集合操作及Spark源码解析
- Spark 的键值对(pair RDD)操作,Scala实现
- 影响Spark输出RDD分区的操作函数
- Scala中隐式转换内幕操作规则揭秘、最佳实践及其在Spark中的应用源码解析之Scala学习笔记-55
- Scala深入浅出进阶经典 第68讲:Scala并发编程原生线程Actor、Cass Class下的消息传递和偏函数实战解析及其在Spark中的应用源码解析
- Spark RDD概念学习系列之RDD的操作(七)
- 大数据Spark “蘑菇云”行动前传第22课:Scala集合和高级函数操作实战及Spark源码鉴赏.
- Spark RDD概念学习系列之transformation操作
- spark RDD算子(六)之键值对聚合操作reduceByKey,foldByKey,排序操作sortByKey