您的位置：首页 > 编程语言 > Python开发

Spark的HashPartitioner方式的Python实现

2016-05-26 00:00 651 查看

摘要: Spark的HashPartitioner方式的Python实现，可以通过Python来获取某个键在Spark中的分区索引

spark中的默认分区方式是org.apache.spark.HashPartitioner，具体代码如下所示：

[code=language-scala]class HashPartitioner(partitions: Int) extends Partitioner {
require(partitions >= 0, s"Number of partitions ($partitions) cannot be negative.")

def numPartitions: Int = partitions

def getPartition(key: Any): Int = key match {
case null => 0
case _ => Utils.nonNegativeMod(key.hashCode, numPartitions)
}

override def equals(other: Any): Boolean = other match {
case h: HashPartitioner =>
h.numPartitions == numPartitions
case _ =>
false
}

override def hashCode: Int = numPartitions
}

如果想要在Python中获取一个key的分区，只需要实现hashCode，然后取模。

hashCode的实现方式如下：

[code=language-python]def java_string_hashcode(s):
h = 0
for c in s:
h = (31 * h + ord(c)) & 0xFFFFFFFF
return ((h + 0x80000000) & 0xFFFFFFFF) - 0x80000000

验证

Scala实现

Python实现

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航