您的位置:首页 > 运维架构 > Shell

spark shell中编写WordCount程序

2017-05-15 22:48 441 查看

启动hdfs


http://blog.csdn.net/zengmingen/article/details/53006541

启动spark


安装:http://blog.csdn.net/zengmingen/article/details/72123717

spark-shell:http://blog.csdn.net/zengmingen/article/details/72162821

准备数据

vi wordcount.txt

hello zeng
hello miao
hello gen
hello zeng
hello wen
hello biao
zeng miao gen
zeng wen biao
lu ting ting
zhang xiao zhu
chang sheng xiang qi lai
zhu ye su ai ni


上传到hdfs

hdfs dfs -put wordcount.txt /

编写代码

用scala语言,在spark-shell命令窗下

sc.textFile("hdfs://nbdo1:9000/wordcount.txt")
.flatMap(_.split(" ")).map((_,1)).reduceByKey(_+_)
.saveAsTextFile("hdfs://nbdo1:9000/out")

运行结果



补充:

将运行结果保存到一个文件。点击阅读扩展

代码:

sc.textFile("hdfs://nbdo1:9000/wordcount.txt")
.flatMap(_.split(" ")).map((_,1)).reduceByKey(_+_)
.coalesce(1,true).saveAsTextFile("hdfs://nbdo1:9000/out2")

运行结果



-------------

更多的Java,Android,大数据,J2EE,Python,数据库,Linux,Java架构师,教程,视频请访问:

http://www.cnblogs.com/zengmiaogen/p/7083694.html
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: