hadoop 2.7.4 下运行WordCount例子笔记
2018-01-11 11:25
453 查看
1.源码如下:
package com.mapred.core; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import org.junit.Test; import java.io.IOException; public class WordCount { @Test public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); //FileSystem fs = FileSystem.get(new URI("hdfs://192.168.70.128:9000"),conf); //理解为一个访问端到服务端的连接 Job job = new Job(conf); //指明程序的入口 job.setJarByClass(WordCount.class); //指明输入的数据 FileInputFormat.setInputPaths(job, new Path(args[0])); //组织mapper和reducer //设置mapper job.setMapperClass(WordCountMapper.class); job.setMapOutputKeyClass(Text.class); job.setMapOutputValueClass(LongWritable.class); //设置reducer job.setReducerClass(WordCountReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(LongWritable.class); //指明数据输出的路径 FileOutputFormat.setOutputPath(job, new Path(args[1])); //提交任务运行 job.waitForCompletion(true); } } class WordCountMapper extends Mapper<LongWritable,Text,Text,LongWritable>{ @Override protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String val = value.toString(); String[] words = val.split(" "); for(String word : words){ context.write(new Text(word), new LongWritable(1)); } } } class WordCountReducer extends Reducer<Text,LongWritable,Text,LongWritable>{ @Override protected void reduce(Text key, Iterable<LongWritable> values,Context context) throws IOException, InterruptedException { long sum = 0; for(LongWritable value : values){ sum += value.get(); } context.write(key, new LongWritable(sum)); } }
2.运行步骤笔记:
运行WordCount步骤: 1.将项目打成jar包,比如打成mapredProject.jar包。 2.上传mapredProject.jar到/soft目录 3.在/soft目录创建输入数据文件input.txt。查看input.txt的文件内容, more /soft/input.txt: lengend i am a hero i am a fool i am a apple but you are a bastard 4.在hdfs中创建输入数据存放目录: hadoop fs -mkdir -p /wordcount/in 5.将/soft/input.txt上传到hdfs文件系统的wordcount目录下: hadoop fs -put /soft/input.txt /wordcount/in 6.运行WordCount项目: hadoop jar /soft/mapredProject.jar /wocount/in /wocount/output 7.等待完成,控制台会展示进度,作为示例,某次运行进度可能如下所示: [hadoop@node1 soft]$ hadoop jar /soft/mapredProject.jar /wocount/in /wocount/output 18/01/10 21:22:13 INFO client.RMProxy: Connecting to ResourceManager at node1/192.168.209.129:8032 18/01/10 21:22:14 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this. 18/01/10 21:22:17 INFO input.FileInputFormat: Total input paths to process : 1 18/01/10 21:22:17 INFO mapreduce.JobSubmitter: number of splits:1 18/01/10 21:22:18 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1515211219380_0005 18/01/10 21:22:21 INFO impl.YarnClientImpl: Submitted application application_1515211219380_0005 18/01/10 21:22:21 INFO mapreduce.Job: The url to track the job: http://node1:8088/proxy/application_1515211219380_0005/ 18/01/10 21:22:21 INFO mapreduce.Job: Running job: job_1515211219380_0005 18/01/10 21:23:18 INFO mapreduce.Job: Job job_1515211219380_0005 running in uber mode : false 18/01/10 21:23:18 INFO mapreduce.Job: map 0% reduce 0% 18/01/10 21:23:51 INFO mapreduce.Job: map 100% reduce 0% 18/01/10 21:24:10 INFO mapreduce.Job: map 100% reduce 100% 18/01/10 21:24:12 INFO mapreduce.Job: Job job_1515211219380_0005 completed successfully 18/01/10 21:24:13 INFO mapreduce.Job: Counters: 49 File System Counters FILE: Number of bytes read=264 FILE: Number of bytes written=242009 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=171 HDFS: Number of bytes written=76 HDFS: Number of read operations=6 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 Job Counters Launched map tasks=1 Launched reduce tasks=1 Data-local map tasks=1 Total time spent by all maps in occupied slots (ms)=30472 Total time spent by all reduces in occupied slots (ms)=15714 Total time spent by all map tasks (ms)=30472 Total time spent by all reduce tasks (ms)=15714 Total vcore-milliseconds taken by all map tasks=30472 Total vcore-milliseconds taken by all reduce tasks=15714 Total megabyte-milliseconds taken by all map tasks=31203328 Total megabyte-milliseconds taken by all reduce tasks=16091136 Map-Reduce Framework Map input records=19 Map output records=19 Map output bytes=220 Map output materialized bytes=264 Input split bytes=103 Combine input records=0 Combine output records=0 Reduce input groups=12 Reduce shuffle bytes=264 Reduce input records=19 Reduce output records=12 Spilled Records=38 Shuffled Maps =1 Failed Shuffles=0 Merged Map outputs=1 GC time elapsed (ms)=249 CPU time spent (ms)=2860 Physical memory (bytes) snapshot=288591872 Virtual memory (bytes) snapshot=4164571136 Total committed heap usage (bytes)=141230080 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=68 File Output Format Counters Bytes Written=76 8.查看输出: hadoop fs -ls /wocount/output,输出结果可能如下所示: Found 2 items -rw-r--r-- 2 hadoop supergroup 0 2018-01-10 21:24 /wocount/output/_SUCCESS -rw-r--r-- 2 hadoop supergroup 76 2018-01-10 21:24 /wocount/output/part-r-00000 9.打开上述步骤列出的输入文件: hadoop fs -cat /wocount/output/part-r-00000 , 可以看到统计的单词以及对应的数目如下所示: 1 a 4 am 3 apple 1 are 1 bastard 1 but 1 fool 1 hero 1 i 3 lengend 1 you 1
相关文章推荐
- hadoop-0.20.1-examples.jar wordcount 例子运行出现的问题记录
- hadoop自带例子wordcount的具体运行步骤
- hadoop自带例子wordcount的具体运行步骤
- hadoop学习笔记-3-运行wordcount示例
- 如何运行Hadoop例子中的wordcount
- 【hadoop学习笔记】4.eclipse运行wordcount实例
- Hadoop 2.6.3运行自带WordCount程序笔记
- Hadoop伪分布式运行wordcount例子
- linux下在eclipse上运行hadoop自带例子wordcount
- hadoop基础----hadoop实战(三)-----hadoop运行MapReduce---对单词进行统计--经典的自带例子wordcount
- 在linux下eclipse中运行hadoop自带的WordCount例子出现的两个错误
- 简单的在Hadoop2.6.0上安装eclipse运行WORDCOUNT的总结笔记
- Hadoop伪分布式运行wordcount小例子
- hadoop基础----hadoop实战(三)-----hadoop运行MapReduce---对单词进行统计--经典的自带例子wordcount
- hadoop2.7.4 安装配置以及java wordcount 运行出错的总结
- 运行hadoop自带wordcount例子
- centos6.5配置Hadoop环境,运行wordcount例子
- Hadoop 2.2.0新API的WordCount例子(运行通过)
- Ubantu下hadoop运行第一个例子wordcount过程