您的位置:首页 > 编程语言 > Python开发

第一个python实现的mapreduce程序

2017-05-13 21:42 375 查看
map:

# !/usr/bin/env python

import sys

for line in sys.stdin:
line = line.strip()
words = line.split()

for word in words:
print ("%s\t%s") % (word, 1)


reduce:

#!/usr/bin/env python
import operator
import sys
current_word = None
curent_count = 0
word = None
for line in sys.stdin:
line = line.strip()
word, count = line.split('\t', 1)
try:
count = int(count)
except ValueError:
continue
if current_word == word:
curent_count += count
else:
if current_word:
print '%s\t%s' % (current_word,curent_count)
current_word=word
curent_count=count

if current_word==word:
print '%s\t%s' % (current_word,curent_count)


测试:

[root@node1 input]# echo "foo foo quux labs foo bar zoo zoo hying" | /home/hadoop/input/max_map.py | sort | /home/hadoop/input/max_reduce.py




执行:可将其写入脚本文件

//注意\-file之间一定不能空格
hadoop jar /hadoop64/hadoop-2.7.1/share/hadoop/tools/lib/hadoop-*streaming*.jar -D stream.non.zero.exit.is.failure=false \-file /home/hadoop/input/max_map.py -mapper /home/hadoop/input/max_map.py \-file /home/hadoop/input/max_reduce.py  -reducer /home/hadoop/input/max_reduce.py \-input /input/temperature/ -output /output/temperature


内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签:  mapreduce python