oozie fork多mapreduce任务并行处理示例
2014-10-22 15:46
169 查看
<workflow-app name="test7" xmlns="uri:oozie:workflow:0.4"> <start to="firstjob"/> <action name="firstjob"> <map-reduce> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <job-xml>/shareScripts/xxmapred-site.xml</job-xml> <configuration> <property> <name>mapreduce.job.map.class</name> <value>com.besttone.hbase.demo.Identity$IdentityMapper</value> </property> <property> <name>mapreduce.job.reduce.class</name> <value>com.besttone.hbase.demo.Identity$IdentityReducer</value> </property> <property> <name>mapreduce.input.fileinputformat.inputdir</name> <value>${inputDir}</value> </property> <property> <name>mapreduce.output.fileoutputformat.outputdir</name> <value>/user/${wf:user()}/${wf:id()}/temp1</value> </property> <property> <name>mapred.mapper.new-api</name> <value>true</value> </property> <property> <name>mapred.reducer.new-api</name> <value>true</value> </property> <property> <name>mapreduce.job.reduces</name> <value>1</value> </property> </configuration> </map-reduce> <ok to="fork"/> <error to="kill"/> </action> <fork name='fork'> <path start='secondjob' /> <path start='thirdjob' /> </fork> <action name="secondjob"> <map-reduce> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <job-xml>/shareScripts/xxmapred-site.xml</job-xml> <configuration> <property> <name>mapreduce.job.map.class</name> <value>com.besttone.hbase.demo.Identity$IdentityMapper</value> </property> <property> <name>mapreduce.job.reduce.class</name> <value>com.besttone.hbase.demo.Identity$IdentityReducer</value> </property> <property> <name>mapreduce.input.fileinputformat.inputdir</name> <value>/user/${wf:user()}/${wf:id()}/temp1</value> </property> <property> <name>mapreduce.output.fileoutputformat.outputdir</name> <value>/user/${wf:user()}/${wf:id()}/temp2</value> </property> <property> <name>mapred.mapper.new-api</name> <value>true</value> </property> <property> <name>mapred.reducer.new-api</name> <value>true</value> </property> <property> <name>mapreduce.job.reduces</name> <value>1</value> </property> </configuration> </map-reduce> <ok to="join"/> <error to="kill"/> </action> <action name="thirdjob"> <map-reduce> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <job-xml>/shareScripts/xxmapred-site.xml</job-xml> <configuration> <property> <name>mapreduce.job.map.class</name> <value>com.besttone.hbase.demo.Identity$IdentityMapper</value> </property> <property> <name>mapreduce.job.reduce.class</name> <value>com.besttone.hbase.demo.Identity$IdentityReducer</value> </property> <property> <name>mapreduce.input.fileinputformat.inputdir</name> <value>/user/${wf:user()}/${wf:id()}/temp1</value> </property> <property> <name>mapreduce.output.fileoutputformat.outputdir</name> <value>/user/${wf:user()}/${wf:id()}/temp3</value> </property> <property> <name>mapred.mapper.new-api</name> <value>true</value> </property> <property> <name>mapred.reducer.new-api</name> <value>true</value> </property> <property> <name>mapreduce.job.reduces</name> <value>1</value> </property> </configuration> </map-reduce> <ok to="join"/> <error to="kill"/> </action> <join name='join' to='finalejob'/> <action name="finalejob"> <map-reduce> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <prepare> <delete path="${nameNode}${outputDir}"/> </prepare> <job-xml>/shareScripts/xxmapred-site.xml</job-xml> <configuration> <property> <name>mapreduce.job.map.class</name> <value>com.besttone.hbase.demo.WordCount$TokenizerMapper</value> </property> <property> <name>mapreduce.job.reduce.class</name> <value>com.besttone.hbase.demo.WordCount$IntSumReducer</value> </property> <property> <name>mapreduce.job.combine.class</name> <value>com.besttone.hbase.demo.WordCount$IntSumReducer</value> </property> <property> <name>mapreduce.job.output.key.class</name> <value>org.apache.hadoop.io.Text</value> </property> <property> <name>mapreduce.job.output.value.class</name> <value>org.apache.hadoop.io.IntWritable</value> </property> <property> <name>mapreduce.input.fileinputformat.inputdir</name> <value>/user/${wf:user()}/${wf:id()}/temp2,/user/${wf:user()}/${wf:id()}/temp3</value> </property> <property> <name>mapreduce.output.fileoutputformat.outputdir</name> <value>${outputDir}</value> </property> <property> <name>mapred.mapper.new-api</name> <value>true</value> </property> <property> <name>mapred.reducer.new-api</name> <value>true</value> </property> <property> <name>mapreduce.job.reduces</name> <value>1</value> </property> </configuration> </map-reduce> <ok to="end"/> <error to="kill"/> </action> <kill name="kill"> <message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message> </kill> <end name="end"/> </workflow-app>
wordcount.jar 中包含有上面配置中用到的mapper和reducer类
相关文章推荐
- 基于Hadoop实现通用的并行任务处理
- 集算器并行处理大文本文件的示例
- 一个简单的MapReduce示例(多个MapReduce任务处理)
- SpringBoot 多任务并行+线程池处理的实现
- Spring定时任务并行(异步)处理
- select并行处理的一个简单示例
- c#(asp.net) 多线程示例,用于同时处理多个任务
- 【C#】52. 使用Flatten方法处理并行任务抛出的异常
- .Net Core中利用TPL(任务并行库)构建Pipeline处理Dataflow
- PHP使用QPM实现多进程并行任务处理程序
- python并行处理任务时 该用多进程?还是该用多线程?
- 用map函数来完成Python并行任务的简单示例
- c#(asp.net) 多线程示例,用于同时处理多个任务
- iOS中多线程知识总结:进程、线程、GCD、串行队列、并行队列、全局队列、主线程队列、同步任务、异步任务等(有示例代码)
- python的分布式任务并行处理框架Jug简单使用
- PHP 使用 QPM 实现多进程并行任务处理程序
- Java7 Fork-Join 框架:任务切分,并行处理
- 创建ExecutorService并行处理任务,导致内存不足
- 【C#】51. Await 处理并行任务(WhenAll)以及Task.Delay()
- Java7 Fork-Join 框架:任务切分,并行处理