通过python 运行hadoop
2015-09-06 22:14
447 查看
时间函数
from datetime import date, timedelta def last_n_days(current_date=date.today(), n=0): if n in (0,1): return str(current_date - timedelta(days=n)) return [str(current_date - timedelta(x)) for x in range(n, 0, -1)]
生成shell命令
# -*- coding: utf-8 -*- import subprocess file_list = last_n_days(n=7) mapper = "mapper.py" reducer = "reducer.py" input_files = " ".join(['-input /dm/qq/userinfo_qq/{date}-*/qq_guid.txt'.format(date=each_date) for each_date in file_list]) output = '/dm/qq/merge' mr_cmd = """hadoop jar /opt/cloudera/parcels/CDH-4.2.0-1.cdh4.2.0.p0.10/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming-2.0.0-mr1-cdh4.2.0.jar \ -output {output} \ -mapper 'python {mapper}' \ -reducer 'python {reducer}' \ -file {mapper} \ -file {reducer} \ {input_files}""".format(output=output, mapper=mapper, reducer=reducer, input_files=input_files) if __name__ = "__main__": print mr_cmd subprocess.call(mr_cmd)
相关文章推荐
- Python动态类型的学习---引用的理解
- Python3写爬虫(四)多线程实现数据爬取
- 垃圾邮件过滤器 python简单实现
- 下载并遍历 names.txt 文件,输出长度最长的回文人名。
- 详解HDFS Short Circuit Local Reads
- install and upgrade scrapy
- Scrapy的架构介绍
- Centos6 编译安装Python
- 使用Python生成Excel格式的图片
- 让Python文件也可以当bat文件运行
- [Python]推算数独
- Python中zip()函数用法举例
- Python中map()函数浅析
- Hadoop_2.1.0 MapReduce序列图
- 使用Hadoop搭建现代电信企业架构
- Python将excel导入到mysql中
- Python在CAM软件Genesis2000中的应用
- 使用Shiboken为C++和Qt库创建Python绑定