您的位置:首页 > 编程语言 > Python开发

Spark的HashPartitioner方式的Python实现

2016-05-26 00:00 651 查看
摘要: Spark的HashPartitioner方式的Python实现,可以通过Python来获取某个键在Spark中的分区索引

spark中的默认分区方式是org.apache.spark.HashPartitioner,具体代码如下所示:

[code=language-scala]class HashPartitioner(partitions: Int) extends Partitioner {
require(partitions >= 0, s"Number of partitions ($partitions) cannot be negative.")

def numPartitions: Int = partitions

def getPartition(key: Any): Int = key match {
case null => 0
case _ => Utils.nonNegativeMod(key.hashCode, numPartitions)
}

override def equals(other: Any): Boolean = other match {
case h: HashPartitioner =>
h.numPartitions == numPartitions
case _ =>
false
}

override def hashCode: Int = numPartitions
}

如果想要在Python中获取一个key的分区,只需要实现hashCode,然后取模。

hashCode的实现方式如下:

[code=language-python]def java_string_hashcode(s):
h = 0
for c in s:
h = (31 * h + ord(c)) & 0xFFFFFFFF
return ((h + 0x80000000) & 0xFFFFFFFF) - 0x80000000

验证

Scala实现



Python实现

内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: