RDD.treeAggregate 的用法
2015-11-23 16:21
176 查看
原文链接:http://stackoverflow.com/questions/29860635/how-to-interpret-rdd-treeaggregate
Spark 源码:GradientDescent 中函数 runMiniBatchSGD下,有如下一段代码:
stackflow 中有人给出了解释:
treeAggregate is a specialized implementation of aggregate that iteratively applies the combine function to a subset of partitions.
This is done in order to prevent returning all partial results to the driver where a single pass reduce would take place as the classic aggregate does.
For all practical purposes, treeAggregate follows the same principle than aggregate explained in this answer: Explain the aggregate functionality in Python with the exception that it takes an extra parameter to indicate the depth of the partial aggregation level.
Let me try to explain what’s going on here specifically:
For aggregate, we need a zero, a combiner function and a reduce function. aggregate uses currying to specify the zero value independently of the combine and reduce functions.
We can then dissect the above function like this . Hopefully that helps understanding:
Then we can rewrite the call to treeAggregate in a more digestable form:
This form will ‘extract’ the resulting tuple into the named values gradientSum, lossSum, miniBatchSize for further usage.
Note that treeAggregate takes an additional parameter depth which is declared with a default value depth = 2, thus, as it’s not provided in this particular call, it will take that default value.
Spark 源码:GradientDescent 中函数 runMiniBatchSGD下,有如下一段代码:
val (gradientSum, lossSum, miniBatchSize) = data.sample(false, miniBatchFraction, 42 + i) // 数据抽样 .treeAggregate((BDV.zeros[Double](n), 0.0, 0L))( seqOp = (c, v) => { // c: (grad, loss, count), v: (label, features) val l = gradient.compute(v._2, v._1, bcWeights.value, Vectors.fromBreeze(c._1)) (c._1, c._2 + l, c._3 + 1) }, combOp = (c1, c2) => { // c: (grad, loss, count) (c1._1 += c2._1, c1._2 + c2._2, c1._3 + c2._3) })
stackflow 中有人给出了解释:
treeAggregate is a specialized implementation of aggregate that iteratively applies the combine function to a subset of partitions.
This is done in order to prevent returning all partial results to the driver where a single pass reduce would take place as the classic aggregate does.
For all practical purposes, treeAggregate follows the same principle than aggregate explained in this answer: Explain the aggregate functionality in Python with the exception that it takes an extra parameter to indicate the depth of the partial aggregation level.
Let me try to explain what’s going on here specifically:
For aggregate, we need a zero, a combiner function and a reduce function. aggregate uses currying to specify the zero value independently of the combine and reduce functions.
We can then dissect the above function like this . Hopefully that helps understanding:
val Zero: (BDV, Double, Long) = (BDV.zeros[Double](n), 0.0, 0L) val combinerFunction: ((BDV, Double, Long), (??, ??)) => (BDV, Double, Long) = (c, v) => { // c: (grad, loss, count), v: (label, features) val l = gradient.compute(v._2, v._1, bcWeights.value, Vectors.fromBreeze(c._1)) (c._1, c._2 + l, c._3 + 1) val reducerFunction: ((BDV, Double, Long),(BDV, Double, Long)) => (BDV, Double, Long) = (c1, c2) => { // c: (grad, loss, count) (c1._1 += c2._1, c1._2 + c2._2, c1._3 + c2._3) }
Then we can rewrite the call to treeAggregate in a more digestable form:
val (gradientSum, lossSum, miniBatchSize) = treeAggregate(Zero)(combinerFunction, reducerFunction)
This form will ‘extract’ the resulting tuple into the named values gradientSum, lossSum, miniBatchSize for further usage.
Note that treeAggregate takes an additional parameter depth which is declared with a default value depth = 2, thus, as it’s not provided in this particular call, it will take that default value.
相关文章推荐
- keystone详解
- Delphi XE5中的新增内容
- 参考文献引用网页
- 3.Java Script 类型
- iOS学习笔记--03 UITableView相关
- 如果Java 失宠于Oracle,那么未来会怎么样?
- js使用post 方式打开新窗口,隐藏Url参数
- Activity的四种launchMode
- LevelDB
- 十三周 项目1-Prim算法的验证
- java.lang.NoSuchMethodError: org.json.JSONArray.remove
- linux防火墙 iptables详解
- 如何取得android唯一码?
- 极光推送
- iOS项目开发实战(Swift)—初探TableView
- 第12周 项目2 - 操作用邻接表存储的图
- 第13周——Prim算法的验证
- JS正则表达式验证数字
- 第十二周 项目4 利用遍历思想求解图问题(检查是否有简单路径)
- JS清空input type=file内容