spark的aggregateByKey简单用法
2017-07-25 22:23
417 查看
问题:求key只出现一次的数据, 如果用groupByKey或reduceByKey很容易就做出来了,现在用aggregateByKey求解一下。
输入数据:
代码:
输出结果:
很明显,达到最后的要求了。
输入数据:
asdfgh 546346 retr 4567 asdfgh 7685678 ghj 2345 asd 234 hadoop 435 ghj 23454 asdfgh 54675 asdfgh 546759878 asd 234 asdfgh 5467598782
代码:
package scala import org.apache.spark.rdd.RDD import org.apache.spark.{SparkConf, SparkContext} import scala.collection.mutable import scala.collection.JavaConverters._ object AaidTest { def main(args: Array[String]): Unit = { val conf=new SparkConf().setAppName("AaidTest").setMaster("local") val sc=new SparkContext(conf) sc.textFile("D://sparkmllibData/sparkml/mllibdata/arrregation.txt") .map(line=>{ (line.split("\t")(0),line.split("\t")(1).toLong) }).aggregateByKey(0L)(seqOp,seqOp) .filter(line=> line._2!=-1L) .collect().foreach(println) } def seqOp(U : (Long), v : (Long)) : Long = { println("seqOp") println("U="+U) println("v="+v) var count:Int=0 if(U!=0L){ count+=1 } if(v!=0L){ count+=1 } if(count>1){ -1L }else{ v } } }
输出结果:
seqOp U=0 v=546346 seqOp U=0 v=4567 seqOp U=546346 v=7685678 seqOp U=0 v=2345 seqOp U=0 v=234 seqOp U=0 v=435 seqOp U=2345 v=23454 seqOp U=1 v=54675 seqOp U=1 v=546759878 seqOp U=234 v=234 seqOp U=1 v=5467598782
(hadoop,435) (retr,4567)
很明显,达到最后的要求了。
相关文章推荐
- spark2.1:rdd.combineByKeyWithClassTag的用法示例
- spark streaming updateStateByKey 用法
- spark streaming updateStateByKey 用法
- spark streaming updateStateByKey 用法
- Spark算子篇 --Spark算子之aggregateByKey详解
- spark中算子详解:aggregateByKey
- Spark中的aggregate和aggregateByKey的区别及疑惑
- spark aggregateByKey函数使用问题
- Spark中的aggregate和aggregateByKey的区别及疑惑
- 【转载】Spark中:reduceByKey和groupByKey区别与用法
- Spark操作—aggregate、aggregateByKey详解
- Spark算子[09]:aggregateByKey、aggregate详解
- Spark函数讲解:aggregateByKey
- Spark:reduceByKey函数的用法
- [Spark_API]Transformation-reduceByKey()和aggregateByKey()
- 【Spark系列2】reduceByKey和groupByKey区别与用法
- Spark操作:Aggregate和AggregateByKey
- Spark函数讲解:aggregateByKey
- 【Spark Java API】Transformation(6)—aggregate、aggregateByKey
- 【Spark系列2】reduceByKey和groupByKey区别与用法