Hadoop 2.5.1学习笔记3:关于Combiner
2014-11-07 00:00
375 查看
如果把前面的例子加上Combiner.class
public static class Combiner extends Reducer<Text, Text, Text, Text> {
public void reduce(Text key, Iterable<Text> values, Context context)
throws IOException, InterruptedException {
long count = 0;
for (Text val : values) {
count+=Long.parseLong(val.toString());
}
context.write(key, new Text(""+count));
}
}
然后指定 job.setCombinerClass(Combiner.class);
可以观察下两个的效率区别:
4/11/07 14:49:25 INFO mapreduce.Job: Counters: 38
File System Counters
FILE: Number of bytes read=52642504
FILE: Number of bytes written=95200714
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=608036374
HDFS: Number of bytes written=423
HDFS: Number of read operations=22
HDFS: Number of large read operations=0
HDFS: Number of write operations=5
Map-Reduce Framework
Map input records=2923923
Map output records=2923923
Map output bytes=20467464
Map output materialized bytes=26315322
Input split bytes=212
Combine input records=0
Combine output records=0
Reduce input groups=38
Reduce shuffle bytes=26315322
Reduce input records=2923923
Reduce output records=38
Spilled Records=5847846
Shuffled Maps =2
Failed Shuffles=0
Merged Map outputs=2
GC time elapsed (ms)=252
CPU time spent (ms)=0
Physical memory (bytes) snapshot=0
Virtual memory (bytes) snapshot=0
Total committed heap usage (bytes)=1150484480
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=236907275
File Output Format Counters
Bytes Written=423
使用后的:
14/11/07 16:04:49 INFO mapreduce.Job: Counters: 38
File System Counters
FILE: Number of bytes read=16224
FILE: Number of bytes written=704061
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=608036374
HDFS: Number of bytes written=423
HDFS: Number of read operations=22
HDFS: Number of large read operations=0
HDFS: Number of write operations=5
Map-Reduce Framework
Map input records=2923923
Map output records=2923923
Map output bytes=20467464
Map output materialized bytes=523
Input split bytes=212
Combine input records=2923923
Combine output records=39
Reduce input groups=38
Reduce shuffle bytes=523
Reduce input records=39
Reduce output records=38
Spilled Records=78
Shuffled Maps =2
Failed Shuffles=0
Merged Map outputs=2
GC time elapsed (ms)=281
CPU time spent (ms)=0
Physical memory (bytes) snapshot=0
Virtual memory (bytes) snapshot=0
Total committed heap usage (bytes)=1154875392
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=236907275
File Output Format Counters
Bytes Written=423
第一次耗费 28秒
第二次耗费21秒。
public static class Combiner extends Reducer<Text, Text, Text, Text> {
public void reduce(Text key, Iterable<Text> values, Context context)
throws IOException, InterruptedException {
long count = 0;
for (Text val : values) {
count+=Long.parseLong(val.toString());
}
context.write(key, new Text(""+count));
}
}
然后指定 job.setCombinerClass(Combiner.class);
可以观察下两个的效率区别:
4/11/07 14:49:25 INFO mapreduce.Job: Counters: 38
File System Counters
FILE: Number of bytes read=52642504
FILE: Number of bytes written=95200714
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=608036374
HDFS: Number of bytes written=423
HDFS: Number of read operations=22
HDFS: Number of large read operations=0
HDFS: Number of write operations=5
Map-Reduce Framework
Map input records=2923923
Map output records=2923923
Map output bytes=20467464
Map output materialized bytes=26315322
Input split bytes=212
Combine input records=0
Combine output records=0
Reduce input groups=38
Reduce shuffle bytes=26315322
Reduce input records=2923923
Reduce output records=38
Spilled Records=5847846
Shuffled Maps =2
Failed Shuffles=0
Merged Map outputs=2
GC time elapsed (ms)=252
CPU time spent (ms)=0
Physical memory (bytes) snapshot=0
Virtual memory (bytes) snapshot=0
Total committed heap usage (bytes)=1150484480
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=236907275
File Output Format Counters
Bytes Written=423
使用后的:
14/11/07 16:04:49 INFO mapreduce.Job: Counters: 38
File System Counters
FILE: Number of bytes read=16224
FILE: Number of bytes written=704061
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=608036374
HDFS: Number of bytes written=423
HDFS: Number of read operations=22
HDFS: Number of large read operations=0
HDFS: Number of write operations=5
Map-Reduce Framework
Map input records=2923923
Map output records=2923923
Map output bytes=20467464
Map output materialized bytes=523
Input split bytes=212
Combine input records=2923923
Combine output records=39
Reduce input groups=38
Reduce shuffle bytes=523
Reduce input records=39
Reduce output records=38
Spilled Records=78
Shuffled Maps =2
Failed Shuffles=0
Merged Map outputs=2
GC time elapsed (ms)=281
CPU time spent (ms)=0
Physical memory (bytes) snapshot=0
Virtual memory (bytes) snapshot=0
Total committed heap usage (bytes)=1154875392
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=236907275
File Output Format Counters
Bytes Written=423
第一次耗费 28秒
第二次耗费21秒。
相关文章推荐
- Hadoop学习笔记(五):一些关于HDFS的基本知识
- Hadoop学习笔记(五):一些关于HDFS的基本知识
- Hadoop学习笔记-关于Hadoop你不得不知道的12个事实
- Hadoop学习笔记—8.Combiner与自定义Combiner
- [BigData]关于Hadoop学习笔记第三天(PPT总结)(一)
- Hadoop学习笔记—8.Combiner与自定义Combiner
- 关于hadoop的学习笔记
- nutch-1.7-学习笔记(2)-org.apache.nutch.crawl.Generator.java-关于Hadoop的partition
- Hadoop学习笔记—8.Combiner与自定义Combiner
- 【hadoop】Hadoop学习笔记(五):一些关于HDFS的基本知识
- Hadoop学习笔记(2) 关于MapReduce
- nutch-1.7-学习笔记(2)-org.apache.nutch.crawl.Generator.java-关于Hadoop的partition
- Hadoop 2.5.1学习笔记8: 完整的程序模板
- Hadoop 2.5.1学习笔记6:不同数据来源的联结代码范例
- 【hadoop】Hadoop学习笔记(三):Combiner funcitons
- [BigData]关于Hadoop学习笔记第四天(PPT总结)(一)
- Hadoop 2.5.1学习笔记7: 计数器的使用
- Hadoop 2.5.1学习笔记5: mongo-hadoop connector的使用范例
- Hadoop 2.5.1学习笔记1:初探【纯使用篇】
- hadoop介绍(关于hadoop技术知识的学习笔记)