您的位置:首页 > 运维架构

Hadoop 2.5.1学习笔记3:关于Combiner

2014-11-07 00:00 375 查看
如果把前面的例子加上Combiner.class

public static class Combiner extends Reducer<Text, Text, Text, Text> {
public void reduce(Text key, Iterable<Text> values, Context context)
throws IOException, InterruptedException {
long count = 0;
for (Text val : values) {
count+=Long.parseLong(val.toString());
}
context.write(key, new Text(""+count));
}

}

然后指定 job.setCombinerClass(Combiner.class);

可以观察下两个的效率区别:
4/11/07 14:49:25 INFO mapreduce.Job: Counters: 38

File System Counters

FILE: Number of bytes read=52642504

FILE: Number of bytes written=95200714

FILE: Number of read operations=0

FILE: Number of large read operations=0

FILE: Number of write operations=0

HDFS: Number of bytes read=608036374

HDFS: Number of bytes written=423

HDFS: Number of read operations=22

HDFS: Number of large read operations=0

HDFS: Number of write operations=5

Map-Reduce Framework

Map input records=2923923

Map output records=2923923

Map output bytes=20467464

Map output materialized bytes=26315322

Input split bytes=212

Combine input records=0

Combine output records=0

Reduce input groups=38

Reduce shuffle bytes=26315322

Reduce input records=2923923

Reduce output records=38

Spilled Records=5847846

Shuffled Maps =2

Failed Shuffles=0

Merged Map outputs=2

GC time elapsed (ms)=252

CPU time spent (ms)=0

Physical memory (bytes) snapshot=0

Virtual memory (bytes) snapshot=0

Total committed heap usage (bytes)=1150484480

Shuffle Errors

BAD_ID=0

CONNECTION=0

IO_ERROR=0

WRONG_LENGTH=0

WRONG_MAP=0

WRONG_REDUCE=0

File Input Format Counters

Bytes Read=236907275

File Output Format Counters

Bytes Written=423

使用后的:
14/11/07 16:04:49 INFO mapreduce.Job: Counters: 38

File System Counters

FILE: Number of bytes read=16224

FILE: Number of bytes written=704061

FILE: Number of read operations=0

FILE: Number of large read operations=0

FILE: Number of write operations=0

HDFS: Number of bytes read=608036374

HDFS: Number of bytes written=423

HDFS: Number of read operations=22

HDFS: Number of large read operations=0

HDFS: Number of write operations=5

Map-Reduce Framework

Map input records=2923923

Map output records=2923923

Map output bytes=20467464

Map output materialized bytes=523

Input split bytes=212

Combine input records=2923923

Combine output records=39

Reduce input groups=38

Reduce shuffle bytes=523

Reduce input records=39

Reduce output records=38

Spilled Records=78

Shuffled Maps =2

Failed Shuffles=0

Merged Map outputs=2

GC time elapsed (ms)=281

CPU time spent (ms)=0

Physical memory (bytes) snapshot=0

Virtual memory (bytes) snapshot=0

Total committed heap usage (bytes)=1154875392

Shuffle Errors

BAD_ID=0

CONNECTION=0

IO_ERROR=0

WRONG_LENGTH=0

WRONG_MAP=0

WRONG_REDUCE=0

File Input Format Counters

Bytes Read=236907275

File Output Format Counters

Bytes Written=423

第一次耗费 28秒

第二次耗费21秒。
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签:  Hadoop