基于【八股文】格式编写WordCount程序
2016-11-01 21:59
330 查看
项目配置
将配置文件拷贝到项目中去:
创建了一个输入目录
将测试数据上传到hdfs目录中去:
编写代码
打jar包
在yarn上运行
命令:
以WordCount程序为例,理解MapReduce如何并行分析数据
使用三个Map任务并行读取三行文件中的内容,对读取的单词进行map操作,每个单词都以
input:
Map output:
Reduce操作是对Map的结果进行排序、合并等操作最后得出词频
Sort:
Combiner:
Reduce output:
MapReduce源码分析的两篇文章
http://blog.csdn.net/recommender_system/article/details/42029311
http://www.tuicool.com/articles/v6VNza
将配置文件拷贝到项目中去:
/opt/tools/workspace/bigdata-hdfs/src/main/reources
cp /opt/modules/hadoop-2.5.0/etc/hadoop/core-site.xml /opt/tools/workspace/bigdata-hdfs/src/main/reources cp /opt/modules/hadoop-2.5.0/etc/hadoop/hdfs-site.xml /opt/tools/workspace/bigdata-hdfs/src/main/reources cp /opt/modules/hadoop-2.5.0/etc/hadoop/log4j.properties /opt/tools/workspace/bigdata-hdfs/src/main/reources
创建了一个输入目录
bin/hdfs dfs -mkdir input
将测试数据上传到hdfs目录中去:
bin/hdfs dfs -put /opt/datas/wc.input /user/beifeng/input
编写代码
package com.ibeifeng.bigdata.senior.hadoop.mapreduce; import java.io.IOException; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; public class WordCountMapReduce { // step 1 : Mapper Class public static class WordCountMapper extends Mapper<LongWritable, Text, Text, IntWritable> { // 输出单词 private Text mapOutputKey = new Text(); // 出现一次就记作一次 private IntWritable mapOutputValue = new IntWritable(1); @Override public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { System.out.println("map-in-0-key: " + key.get() + " -- " + "map-in-value: " + value.toString()); // line value // 获取文件每一行的<key,value> String lineValue = value.toString(); // split // 分割单词,以空格分割 String[] strs = lineValue.split(" "); // iterator // 将数组里面的每一个单词拿出来,一个个组成<key,value> // 生成1 for (String str : strs) { // set map output key // 设置key mapOutputKey.set(str); // output // 最终输出 context.write(mapOutputKey, mapOutputValue); } } } // step 2 : Reducer Class public static class WordCountReducer extends Reducer<Text, IntWritable, Text, IntWritable> { private IntWritable outputValue = new IntWritable(); @Override protected void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { // temp : sum // 定义一个临时变量 int sum = 0; // iterator // 对于迭代器中的值进行迭代累加,最后sum加完以后就是统计的次数 for (IntWritable value : values) { // total sum += value.get(); } // set output value outputValue.set(sum); // output context.write(key, outputValue); } } // step 3 : Driver public int run(String[] args) throws Exception { Configuration configuration = new Configuration(); Job job = Job.getInstance(configuration, this.getClass() .getSimpleName()); job.setJarByClass(WordCountMapReduce.class); // set job // input Path inpath = new Path(args[0]); FileInputFormat.addInputPath(job, inpath); // output Path outpath = new Path(args[1]); FileOutputFormat.setOutputPath(job, outpath); // Mapper job.setMapperClass(WordCountMapper.class); job.setMapOutputKeyClass(Text.class); job.setMapOutputValueClass(IntWritable.class); // Reducer job.setReducerClass(WordCountReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); // submit job boolean isSuccess = job.waitForCompletion(true); return isSuccess ? 0 : 1; } public static void main(String[] args) throws Exception { // 传递两个参数,设置路径 args = new String[] { // 参数1:输入路径 "hdfs://hadoop-senior01.ibeifeng.com:8020/user/beifeng/input", // 参数2:输出路径 "hdfs://hadoop-senior01.ibeifeng.com:8020/user/beifeng/output3" }; // run job int status = new WordCountMapReduce().run(args); System.exit(status); } }
打jar包
在yarn上运行
命令:
bin/yarn jar jars/mr-wc.jar /user/beifeng/input /user/beifeng/output3
以WordCount程序为例,理解MapReduce如何并行分析数据
使用三个Map任务并行读取三行文件中的内容,对读取的单词进行map操作,每个单词都以
<key, value>形式生成
input:
hadoop mapreduce hadoop yarn hadoop hdfs
Map output:
<hadoop,1> <mapreduce,1> <hadoop,1> <yarn,1> <hadoop,1> <hdfs,1>
Reduce操作是对Map的结果进行排序、合并等操作最后得出词频
Sort:
<hadoop,1> <hadoop,1> <hadoop,1> <hdfs,1> <mapreduce,1> <yarn,1>
Combiner:
<hadoop, list(1,1,1)> <mapreduce, list(1)> <yarn, list(1)> <hdfs, list(1)>
Reduce output:
<hadoop,3> <hdfs,1> <mapreduce,1> <yarn,1>
MapReduce源码分析的两篇文章
http://blog.csdn.net/recommender_system/article/details/42029311
http://www.tuicool.com/articles/v6VNza
相关文章推荐
- 用python编写mapreduce版的wordcount程序
- 7.Spark Streaming:输入DStream之基础数据源以及基于HDFS的实时wordcount程序
- 11.updateStateByKey以及基于缓存的实时wordcount程序
- 11.updateStateByKey以及基于缓存的实时wordcount程序
- 基于HDFS的实时计算和wordcount程序
- 7.Spark Streaming:输入DStream之基础数据源以及基于HDFS的实时wordcount程序
- 11.updateStateByKey以及基于缓存的实时wordcount程序
- 11.updateStateByKey以及基于缓存的实时wordcount程序
- 如何编写最简单的MapReduce之WordCount程序
- 7.Spark Streaming:输入DStream之基础数据源以及基于HDFS的实时wordcount程序
- 09、高级编程之基于排序机制的wordcount程序
- 11.updateStateByKey以及基于缓存的实时wordcount程序
- MapReduce编写wordcount程序代码实现
- scala本地wordcount的程序编写
- Hadoop之Mapreduce------>入门级程序WordCount代码编写
- Hadoop之WordCount计数器程序编写并打包
- 编写wordcount程序
- Hadoop MapReduce基于新API的WordCount程序运行过程分析
- 7.Spark Streaming:输入DStream之基础数据源以及基于HDFS的实时wordcount程序
- 基于排序机制的wordcount程序