[Spark][Python]Wordcount 例子
2017-09-28 21:18
351 查看
[training@localhost ~]$ hdfs dfs -cat cats.txt
The cat on the mat
The aardvark sat on the sofa
[training@localhost ~]$
mydata001=sc.textFile('cats.txt')
mydata002=mydata001.flatMap(lambda line: line.split(" "))
In [12]: mydata002.take(1)
Out[12]: [u'The']
In [13]: mydata002.take(2)
Out[13]: [u'The', u'cat']
mydata003=mydata002.map(lambda word : (word,1))
In [10]: mydata003.take(1)
Out[10]: [(u'The', 1)]
In [11]: mydata003.take(2)
Out[11]: [(u'The', 1), (u'cat', 1)]
mydata004 = mydata003.reduceByKey(lambda x,y : x+y)
In [15]: mydata004.take(1)
Out[15]: [(u'on', 2)]
In [16]: mydata004.take(2)
Out[16]: [(u'on', 2), (u'mat', 1)]
In [17]: mydata004.take(3)
Out[17]: [(u'on', 2), (u'mat', 1), (u'sofa', 1)]
The cat on the mat
The aardvark sat on the sofa
[training@localhost ~]$
mydata001=sc.textFile('cats.txt')
mydata002=mydata001.flatMap(lambda line: line.split(" "))
In [12]: mydata002.take(1)
Out[12]: [u'The']
In [13]: mydata002.take(2)
Out[13]: [u'The', u'cat']
mydata003=mydata002.map(lambda word : (word,1))
In [10]: mydata003.take(1)
Out[10]: [(u'The', 1)]
In [11]: mydata003.take(2)
Out[11]: [(u'The', 1), (u'cat', 1)]
mydata004 = mydata003.reduceByKey(lambda x,y : x+y)
In [15]: mydata004.take(1)
Out[15]: [(u'on', 2)]
In [16]: mydata004.take(2)
Out[16]: [(u'on', 2), (u'mat', 1)]
In [17]: mydata004.take(3)
Out[17]: [(u'on', 2), (u'mat', 1), (u'sofa', 1)]
相关文章推荐
- Spark wordcount - Python, Scala, Java
- Flume+Kakfa+Spark Streaming整合(运行WordCount小例子)
- SparkStream例子HdfsWordCount--Streaming的Job是如何调度的
- windows下使用idea maven配置spark运行环境、运行WordCount例子以及碰到的问题
- Spark 程序 WordCount实现 Scala、Python
- spark【例子】单词计算(wordcount) 词频排序(TopK)
- Spark 使用Python在pyspark中运行简单wordcount
- Spark及HDFS环境下使用python的wordcount实例
- python、scala、java分别实现在spark上实现WordCount
- spark python wordcount
- Python开发Spark应用之Wordcount词频统计
- Spark 使用Python在pyspark中运行简单wordcount
- Spark学习笔记@第一个例子wordcount+Eclipse
- spark 2.2.0 wordcount python版
- 基于Jupyter平台通过python实现Spark的应用程序之wordCount
- Spark的JavaWordCount例子
- Spark下使用python写wordCount
- Windows下以Local模式调试SparkStreaming的WordCount例子
- spark-shell的wordcount的例子存档
- spark JavaDirectKafkaWordCount 例子分析