Spark--python开发实例
2016-07-26 08:26
483 查看
-spark python开发---------------
cond.py
def isFirstMinute(line) :
return line.split('\t')[0] < '00:01:00'
---------------
sort.py
from pyspark import SparkContext
sc = SparkContext("spark://server1:8888", "Python Sort", pyFiles=['cond.py'])
data = sc.textFile("hdfs://server1:9000/user/cc/reduced/")
print data.filter(lambda line : len(line.split('\t')) == 5).map(lambda line : (line.split('\t')[1],1)).reduceByKey(lambda x , y : x + y ).map(lambda pair : (pair[1],pair[0])).sortByKey(False).map(lambda pair : (pair[1],pair[0])).take(10) #.saveAsTextFile("hdfs://server1:9000/result")
------------------
wc.py
from pyspark import SparkContext
from cond import isFirstMinute
sc = SparkContext("spark://server1:8888", "Python Analysis", pyFiles=['cond.py'])
data = sc.textFile("hdfs://server1:9000/user/cc/reduced/")
#fltData = data.filter(lambda line : line.split('\t')[0] < '00:01:00')
fltData = data.filter(lambda line : isFirstMinute(line))
print 'first minute : ' + str(fltData.count())
-----------------------------------------
cond.py
def isFirstMinute(line) :
return line.split('\t')[0] < '00:01:00'
---------------
sort.py
from pyspark import SparkContext
sc = SparkContext("spark://server1:8888", "Python Sort", pyFiles=['cond.py'])
data = sc.textFile("hdfs://server1:9000/user/cc/reduced/")
print data.filter(lambda line : len(line.split('\t')) == 5).map(lambda line : (line.split('\t')[1],1)).reduceByKey(lambda x , y : x + y ).map(lambda pair : (pair[1],pair[0])).sortByKey(False).map(lambda pair : (pair[1],pair[0])).take(10) #.saveAsTextFile("hdfs://server1:9000/result")
------------------
wc.py
from pyspark import SparkContext
from cond import isFirstMinute
sc = SparkContext("spark://server1:8888", "Python Analysis", pyFiles=['cond.py'])
data = sc.textFile("hdfs://server1:9000/user/cc/reduced/")
#fltData = data.filter(lambda line : line.split('\t')[0] < '00:01:00')
fltData = data.filter(lambda line : isFirstMinute(line))
print 'first minute : ' + str(fltData.count())
-----------------------------------------
相关文章推荐
- 出版商统计出最受欢迎的编程语言:Python 居首
- 出版商统计出最受欢迎的编程语言:Python 居首
- python - 面向对象(二)
- Python 温习
- python批量制作雷达图的实现方法
- python蒙特卡洛求pi
- Python获取amap高德地图
- Python MFCC算法
- stackless突破python用户级线程库和复杂递归---<<python核心编程第二版>>
- python爬虫框架scrapy学习之CrawlSpider
- python爬虫框架scrapy学习图片下载
- Python命名规范
- python流程控制语句 ifelse - 1
- python的学习笔记
- windows下python的NumPy、SciPy、matplotlib安装
- leetcode 99 Recover Binary Search Tree (python)
- python open(文件内建函数)
- .ipynb文件 与ipython notebook
- Python学习2--高阶函数map、reduce、filter、sorted
- python3 脚本爬取今日百度热点新闻并存放到mysql数据库