您的位置：首页 > 编程语言 > Java开发

SPARK排序算法，使用Scala开发二次排序自定义KEY值，相比JAVA的罗嗦，Scala优雅简洁！！！

2016-02-28 21:20 585 查看

Spark使用Scala开发的二次排序

【数据文件Input】

2 3

4 1

3 2

4 3

8 7

2 1

【运行结果Output】倒排序

8 7

4 3

4 1

3 2

2 3

2 1

运行结果

【源代码文件】SecondarySortApp.scala SecondarySortKey.scala

class SecondarySortKey
定义排序方法compare

class SecondarySortKey(valfirst:Int,valsecond:Int)
extendsOrdered [SecondarySortKey]
withSerializable {
defcompare(other:SecondarySortKey):Int
={
if(this.first-other.first!=0)
{
this.first-other.first
} else{
this.second-other.second
}
}

SecondarySortApp

1、读入每行数据

vallines =sc.textFile("G://IMFBigDataSpark2016//tesdata//helloSpark.txt", 1) //读取本地文件并设置为一个Partion

2、对每行数据生成一个K，V元组，key值为SecondarySortKey（里面分别放第一个及第二个数据），value
为每一行的数据

val pairWithSortKey = lines.map(line => (

new SecondarySortKey(line.split("")(0).toInt, line.split(" ")(1).toInt),line

))

3、对pairWithSortKey排序，降序排序

val sorted = pairWithSortKey.sortByKey(false)

4、对排序以后的结果， sortedLine为k，v键值对，只输出sortedLine._2的value值，即每行的数据

val sortedResult = sorted.map(sortedLine =>sortedLine._2)

5、collect收集打印输出。

sortedResult.collect().foreach (println)

源代码：

package com.dt.spark

class SecondarySortKey(val first:Int,val second:Int) extends Ordered [SecondarySortKey] with Serializable {

def compare(other:SecondarySortKey):Int = {

if (this.first - other.first !=0) {

this.first - other.first

} else {

this.second - other.second

}

}

}

package com.dt.spark

import org.apache.spark.SparkConf

import org.apache.spark.SparkContext

/*

* *王家林老师授课 http://weibo.com/ilovepains

*/

object SecondarySortApp {

def main(args:Array[String]){

val conf = new SparkConf() //创建SparkConf对象

conf.setAppName("SecondarySortApp!") //设置应用程序的名称，在程序运行的监控界面可以看到名称

conf.setMaster("local") //此时，程序在本地运行，不需要安装Spark集群

val sc = new SparkContext(conf)

val lines = sc.textFile("G://IMFBigDataSpark2016//tesdata//helloSpark.txt", 1) //读取本地文件并设置为一个Partion

val pairWithSortKey = lines.map(line => (

// val splited = line.split(" ")

new SecondarySortKey(line.split(" ")(0).toInt, line.split(" ")(1).toInt),line

))

val sorted = pairWithSortKey.sortByKey(false)

val sortedResult = sorted.map(sortedLine =>sortedLine._2)

sortedResult.collect().foreach (println)

}

}

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航

SPARK排序算法，使用Scala开发 二次排序 自定义KEY值，相比JAVA的罗嗦，Scala优雅简洁！！！

SPARK排序算法，使用Scala开发二次排序自定义KEY值，相比JAVA的罗嗦，Scala优雅简洁！！！