scala编写wordCount
2017-02-09 00:00
417 查看
val lines = List("hello tom hello jerry", "hello jerry", "hello kitty")
第一种:
lines.flatMap(_.split(" ")).map((_, 1)).groupBy(_._1).mapValues(_.foldLeft(0)(_+_._2))
第二种:
lines.flatMap(_.split(" ")).map((_, 1)).groupBy(_._1).map(t=>(t._1, t._2.size)).toList.sortBy(_._2).reverse
分步说明:
1:lines.flatMap(_.split(" "))
------将lines压平处理成单个单词------
结果: res16: List[String] = List(hello, tom, hello, jerry, hello, jerry, hello, kitty)
2:lines.flatMap(_.split(" ")).map((_, 1))
------分解成元组(word,1) -----
结果:res17: List[(String, Int)] = List((hello,1), (tom,1), (hello,1), (jerry,1), (hello,1), (jerry,1), (hello,1), (kitty,1))
3:lines.flatMap(_.split(" ")).map((_, 1)).groupBy(_._1)
------按单词分组 -----
结果:res18: scala.collection.immutable.Map[String,List[(String, Int)]] = Map(tom -> List((tom,1)), kitty -> List((kitty,1)), jerry -> List((jerry,1), (jerry,1)), hello -> List((hello,1), (hello,1), (hello,1), (hello,1)))
4:lines.flatMap(_.split(" ")).map((_, 1)).groupBy(_._1).mapValues(_.foldLeft(0)(_+_._2))
------取出map的value值,叠加。。
mapValues(_.foldLeft(0)(_+_._2)) 说明:第一个下划线是List((jerry,1), (jerry,1)) -----
第二个下划线是foldLeft累加后的结果-----
第三个下划线是 (jerry,1) -----
结果: res19: scala.collection.immutable.Map[String,Int] = Map(tom -> 1, kitty -> 1, jerry -> 2, hello -> 4)
第一种:
lines.flatMap(_.split(" ")).map((_, 1)).groupBy(_._1).mapValues(_.foldLeft(0)(_+_._2))
第二种:
lines.flatMap(_.split(" ")).map((_, 1)).groupBy(_._1).map(t=>(t._1, t._2.size)).toList.sortBy(_._2).reverse
分步说明:
1:lines.flatMap(_.split(" "))
------将lines压平处理成单个单词------
结果: res16: List[String] = List(hello, tom, hello, jerry, hello, jerry, hello, kitty)
2:lines.flatMap(_.split(" ")).map((_, 1))
------分解成元组(word,1) -----
结果:res17: List[(String, Int)] = List((hello,1), (tom,1), (hello,1), (jerry,1), (hello,1), (jerry,1), (hello,1), (kitty,1))
3:lines.flatMap(_.split(" ")).map((_, 1)).groupBy(_._1)
------按单词分组 -----
结果:res18: scala.collection.immutable.Map[String,List[(String, Int)]] = Map(tom -> List((tom,1)), kitty -> List((kitty,1)), jerry -> List((jerry,1), (jerry,1)), hello -> List((hello,1), (hello,1), (hello,1), (hello,1)))
4:lines.flatMap(_.split(" ")).map((_, 1)).groupBy(_._1).mapValues(_.foldLeft(0)(_+_._2))
------取出map的value值,叠加。。
mapValues(_.foldLeft(0)(_+_._2)) 说明:第一个下划线是List((jerry,1), (jerry,1)) -----
第二个下划线是foldLeft累加后的结果-----
第三个下划线是 (jerry,1) -----
结果: res19: scala.collection.immutable.Map[String,Int] = Map(tom -> 1, kitty -> 1, jerry -> 2, hello -> 4)
相关文章推荐
- idea利用scala编写wordcount 一些坑
- Scala用actor编写简单WordCount
- 利用Scala编写Wordcount并在spark框架下运行
- scala 编写wordCount
- scala本地wordcount的程序编写
- 本地编写并运行scala(Wordcount)
- 使用Scala编写WordCount详细分析
- scala-eclipse 编写spark简单程序 WordCount
- scala-wordcount
- centos下使用Intellij IDEA 编写 word count
- scala 实现WordCount
- 010-spark standalone模式Scala版本WordCount代码
- Eclipse+Maven+Scala Project+Spark | 编译并打包wordcount程序
- (scala书籍编写)word 2007 目录格式乱的解决办法:编辑word 宏
- Spark Run WordCount On Hdfs using Scala
- Storm的wordcount代码编写与分析
- idea+maven+scala创建wordcount,打包jar并在spark on yarn上运行
- 基于【八股文】格式编写WordCount程序
- maven构建Scala程序,实现spark的wordcount
- hadoop mapreduce wordcount编写