hadoop.terasort测试
2013-02-25 18:36
363 查看
硬件配置:
node configuration: 2*4-core 16GB-ram 4*1T-storage
node number: 11
软件配置(其他是默认设置):
replication:
1
---------------------------------
测试过程中调节的参数:
mapred.tasktracker.map.tasks.maximum=4(共八个cores,
留一个给datanode和tasktracker使用)
mapred.tasktracker.reduce.tasks.maximum=3
---------------------------------
测试性能的参数:
调节文件块大小:64MB->128MB
调节:
<property>
<name>mapred.map.tasks</name>
<value>2</value>
<description>The default number of
map tasks per job.
Ignored when mapred.job.tracker is "local".
</description>
</property>
[bin/hadoop fs -rmr terasort/input-GB001]
bin/hadoop jar hadoop-0.20.2-examples.jar teragen
10000000
terasort/input-GB001
Generating 10000000 using 2 maps with step of 5000000
10/07/27 12:27:39 INFO mapred.JobClient: Running job:
job_201007271223_0003
10/07/27 12:27:40 INFO mapred.JobClient: map 0%
reduce 0%
10/07/27 12:27:54 INFO mapred.JobClient: map 53%
reduce 0%
10/07/27 12:28:00 INFO mapred.JobClient: map 100%
reduce 0%
10/07/27 12:28:02 INFO mapred.JobClient: Job complete:
job_201007271223_0003
10/07/27 12:28:02 INFO mapred.JobClient: Counters: 6
10/07/27 12:28:02 INFO
mapred.JobClient: Job
Counters
10/07/27 12:28:02 INFO
mapred.JobClient:
Launched map tasks=2
10/07/27 12:28:02 INFO
mapred.JobClient:
FileSystemCounters
10/07/27 12:28:02 INFO
mapred.JobClient:
HDFS_BYTES_WRITTEN=1000000000
10/07/27 12:28:02 INFO
mapred.JobClient: Map-Reduce
Framework
10/07/27 12:28:02 INFO
mapred.JobClient:
Map input records=10000000
10/07/27 12:28:02 INFO
mapred.JobClient:
Spilled Records=0
10/07/27 12:28:02 INFO
mapred.JobClient:
Map input bytes=10000000
10/07/27 12:28:02 INFO
mapred.JobClient:
Map output records=10000000
tersgen测试:
hadoop jar hadoop/hadoop-*-examples.jar
teragen
10 terasort/input-KB001
15s
hadoop jar hadoop/hadoop-*-examples.jar
teragen
10000 terasort/input-MB001
13s
hadoop jar hadoop/hadoop-*-examples.jar teragen
10000000 terasort/input-GB001
22s
hadoop jar hadoop/hadoop-*-examples.jar teragen
20000000 terasort/input-GB002
34s
hadoop jar hadoop/hadoop-*-examples.jar teragen
30000000 terasort/input-GB003
46s
hadoop jar hadoop/hadoop-*-examples.jar teragen
40000000 terasort/input-GB004
55s
hadoop jar hadoop/hadoop-*-examples.jar teragen
50000000 terasort/input-GB005
70s
hadoop jar hadoop/hadoop-*-examples.jar teragen 100000000
terasort/input-GB010
122s(mapred.map.tasks=02)
066s(mapred.map.tasks=04)
048s(mapred.map.tasks=06)
045s(mapred.map.tasks=08)
041s(mapred.map.tasks=09)
038s(mapred.map.tasks=10)
034s(mapred.map.tasks=11)Node
number
034s(mapred.map.tasks=12)
034s(mapred.map.tasks=13)
030s(mapred.map.tasks=14)
030s(mapred.map.tasks=15)
030s(mapred.map.tasks=16)
028s(mapred.map.tasks=20)
028s(mapred.map.tasks=22)2CPU*11Node=22CPU
028s(mapred.map.tasks=23)
028s(mapred.map.tasks=24)
028s(mapred.map.tasks=25)
028±1s(mapred.map.tasks=26)
028±1s(mapred.map.tasks=27)
028±1s(mapred.map.tasks=28)
028±1s(mapred.map.tasks=28)
028±1s(mapred.map.tasks=28)
028s(mapred.map.tasks=30)
029s(mapred.map.tasks=35)
030±1s(mapred.map.tasks=44) available
map number=4Map*11Node
043s(mapred.map.tasks=100)
067s(mapred.map.tasks=200)
------------------------------------------------------------------------------------
bin/hadoop fs -cat terasort/input-GB001/part-00000
.t^#\|v$2\
0AAAAAAAAAABBBBBBBBBBCCCCCCCCCCDDDDDDDDDDEEEEEEEEEEFFFFFFFFFFGGGGGGGGGGHHHHHHHH
75@~?'WdUF
1IIIIIIIIIIJJJJJJJJJJKKKKKKKKKKLLLLLLLLLLMMMMMMMMMMNNNNNNNNNNOOOOOOOOOOPPPPPPPP
w[o||:N&H,
2QQQQQQQQQQRRRRRRRRRRSSSSSSSSSSTTTTTTTTTTUUUUUUUUUUVVVVVVVVVVWWWWWWWWWWXXXXXXXX
------------------------------------------------------------------------------------
bin/hadoop jar hadoop-0.20.2-examples.jar terasort
terasort/input-GB001 terasort/output-GB001
10/07/27 00:11:05 INFO
terasort.TeraSort: starting
10/07/27 00:11:05 INFO mapred.FileInputFormat: Total input paths to
process : 2
10/07/27 00:11:06 INFO util.NativeCodeLoader: Loaded the
native-hadoop library
10/07/27 00:11:06 INFO zlib.ZlibFactory: Successfully loaded
& initialized native-zlib library
10/07/27 00:11:06 INFO compress.CodecPool: Got brand-new
compressor
Making 1 from 100000 records
Step size is 100000.0
10/07/27 00:11:06 INFO mapred.JobClient: Running job:
job_201007270004_0003
10/07/27 00:11:07 INFO mapred.JobClient: map 0%
reduce 0%
10/07/27 00:11:21 INFO mapred.JobClient: map 50%
reduce 0%
10/07/27 00:11:24 INFO mapred.JobClient: map 100%
reduce 0%
10/07/27 00:11:33 INFO mapred.JobClient: map 100%
reduce 14%
10/07/27 00:11:36 INFO mapred.JobClient: map 100%
reduce 25%
10/07/27 00:11:39 INFO mapred.JobClient: map 100%
reduce 33%
10/07/27 00:11:54 INFO mapred.JobClient: map 100%
reduce 69%
10/07/27 00:11:57 INFO mapred.JobClient: map 100%
reduce 74%
10/07/27 00:12:00 INFO mapred.JobClient: map 100%
reduce 79%
10/07/27 00:12:03 INFO mapred.JobClient: map 100%
reduce 83%
10/07/27 00:12:06 INFO mapred.JobClient: map 100%
reduce 88%
10/07/27 00:12:09 INFO mapred.JobClient: map 100%
reduce 93%
10/07/27 00:12:15 INFO mapred.JobClient: map 100%
reduce 100%
10/07/27 00:12:17 INFO mapred.JobClient: Job complete:
job_201007270004_0003
10/07/27 00:12:17 INFO mapred.JobClient: Counters: 19
10/07/27 00:12:17 INFO
mapred.JobClient: Job
Counters
10/07/27 00:12:17 INFO
mapred.JobClient:
Launched reduce tasks=1
10/07/27 00:12:17 INFO
mapred.JobClient:
Rack-local map tasks=4
10/07/27 00:12:17 INFO
mapred.JobClient:
Launched map tasks=16
10/07/27 00:12:17 INFO
mapred.JobClient:
Data-local map tasks=12
10/07/27 00:12:17 INFO
mapred.JobClient:
FileSystemCounters
10/07/27 00:12:17 INFO
mapred.JobClient:
FILE_BYTES_READ=2382257412
10/07/27 00:12:17 INFO
mapred.JobClient:
HDFS_BYTES_READ=1000057358
10/07/27 00:12:17 INFO
mapred.JobClient:
FILE_BYTES_WRITTEN=3402255956
10/07/27 00:12:17 INFO
mapred.JobClient:
HDFS_BYTES_WRITTEN=1000000000
10/07/27 00:12:17 INFO
mapred.JobClient: Map-Reduce
Framework
10/07/27 00:12:17 INFO
mapred.JobClient:
Reduce input groups=10000000
10/07/27 00:12:17 INFO
mapred.JobClient:
Combine output records=0
10/07/27 00:12:17 INFO
mapred.JobClient:
Map input records=10000000
10/07/27 00:12:17 INFO
mapred.JobClient:
Reduce shuffle bytes=951549114
10/07/27 00:12:17 INFO
mapred.JobClient:
Reduce output records=10000000
10/07/27 00:12:17 INFO
mapred.JobClient:
Spilled Records=33355441
10/07/27 00:12:17 INFO
mapred.JobClient:
Map output bytes=1000000000
10/07/27 00:12:17 INFO
mapred.JobClient:
Map input bytes=1000000000
10/07/27 00:12:17 INFO
mapred.JobClient:
Combine input records=0
10/07/27 00:12:17 INFO
mapred.JobClient:
Map output records=10000000
10/07/27 00:12:17 INFO
mapred.JobClient:
Reduce input records=10000000
10/07/27 00:12:17 INFO terasort.TeraSort: done
hadoop jar
hadoop-0.20.2-examples.jar terasort terasort/input~KB001
terasort/output~KB001
22s(2个map)
hadoop jar
hadoop-0.20.2-examples.jar terasort terasort/input~MB001
terasort/output~MB001
22s(2个map因为是批处理,所以省去了网络连接的1s)
hadoop jar hadoop-0.20.2-examples.jar terasort terasort/input~GB001
terasort/output~GB001
76s=22s+54s(16个map)
hadoop jar hadoop-0.20.2-examples.jar terasort
terasort/input~GB002 terasort/output~GB002
136s=22s+114s(30个map)
hadoop jar hadoop-0.20.2-examples.jar terasort
terasort/input~GB003 terasort/output~GB003
187s=22s+165s(46个map)
hadoop jar hadoop-0.20.2-examples.jar terasort
terasort/input~GB004 terasort/output~GB004
250s=22s+228s(60个map)
hadoop jar hadoop-0.20.2-examples.jar terasort
terasort/input~GB005 terasort/output~GB005
307s=22s+285s(76个map)
hadoop jar hadoop-0.20.2-examples.jar terasort terasort/input~GB010
terasort/output~GB010
793s=22s+771s(150个map)
node configuration: 2*4-core 16GB-ram 4*1T-storage
node number: 11
软件配置(其他是默认设置):
replication:
1
---------------------------------
测试过程中调节的参数:
mapred.tasktracker.map.tasks.maximum=4(共八个cores,
留一个给datanode和tasktracker使用)
mapred.tasktracker.reduce.tasks.maximum=3
---------------------------------
测试性能的参数:
调节文件块大小:64MB->128MB
调节:
<property>
<name>mapred.map.tasks</name>
<value>2</value>
<description>The default number of
map tasks per job.
Ignored when mapred.job.tracker is "local".
</description>
</property>
[bin/hadoop fs -rmr terasort/input-GB001]
bin/hadoop jar hadoop-0.20.2-examples.jar teragen
10000000
terasort/input-GB001
Generating 10000000 using 2 maps with step of 5000000
10/07/27 12:27:39 INFO mapred.JobClient: Running job:
job_201007271223_0003
10/07/27 12:27:40 INFO mapred.JobClient: map 0%
reduce 0%
10/07/27 12:27:54 INFO mapred.JobClient: map 53%
reduce 0%
10/07/27 12:28:00 INFO mapred.JobClient: map 100%
reduce 0%
10/07/27 12:28:02 INFO mapred.JobClient: Job complete:
job_201007271223_0003
10/07/27 12:28:02 INFO mapred.JobClient: Counters: 6
10/07/27 12:28:02 INFO
mapred.JobClient: Job
Counters
10/07/27 12:28:02 INFO
mapred.JobClient:
Launched map tasks=2
10/07/27 12:28:02 INFO
mapred.JobClient:
FileSystemCounters
10/07/27 12:28:02 INFO
mapred.JobClient:
HDFS_BYTES_WRITTEN=1000000000
10/07/27 12:28:02 INFO
mapred.JobClient: Map-Reduce
Framework
10/07/27 12:28:02 INFO
mapred.JobClient:
Map input records=10000000
10/07/27 12:28:02 INFO
mapred.JobClient:
Spilled Records=0
10/07/27 12:28:02 INFO
mapred.JobClient:
Map input bytes=10000000
10/07/27 12:28:02 INFO
mapred.JobClient:
Map output records=10000000
tersgen测试:
hadoop jar hadoop/hadoop-*-examples.jar
teragen
10 terasort/input-KB001
15s
hadoop jar hadoop/hadoop-*-examples.jar
teragen
10000 terasort/input-MB001
13s
hadoop jar hadoop/hadoop-*-examples.jar teragen
10000000 terasort/input-GB001
22s
hadoop jar hadoop/hadoop-*-examples.jar teragen
20000000 terasort/input-GB002
34s
hadoop jar hadoop/hadoop-*-examples.jar teragen
30000000 terasort/input-GB003
46s
hadoop jar hadoop/hadoop-*-examples.jar teragen
40000000 terasort/input-GB004
55s
hadoop jar hadoop/hadoop-*-examples.jar teragen
50000000 terasort/input-GB005
70s
hadoop jar hadoop/hadoop-*-examples.jar teragen 100000000
terasort/input-GB010
122s(mapred.map.tasks=02)
066s(mapred.map.tasks=04)
048s(mapred.map.tasks=06)
045s(mapred.map.tasks=08)
041s(mapred.map.tasks=09)
038s(mapred.map.tasks=10)
034s(mapred.map.tasks=11)Node
number
034s(mapred.map.tasks=12)
034s(mapred.map.tasks=13)
030s(mapred.map.tasks=14)
030s(mapred.map.tasks=15)
030s(mapred.map.tasks=16)
028s(mapred.map.tasks=20)
028s(mapred.map.tasks=22)2CPU*11Node=22CPU
028s(mapred.map.tasks=23)
028s(mapred.map.tasks=24)
028s(mapred.map.tasks=25)
028±1s(mapred.map.tasks=26)
028±1s(mapred.map.tasks=27)
028±1s(mapred.map.tasks=28)
028±1s(mapred.map.tasks=28)
028±1s(mapred.map.tasks=28)
028s(mapred.map.tasks=30)
029s(mapred.map.tasks=35)
030±1s(mapred.map.tasks=44) available
map number=4Map*11Node
043s(mapred.map.tasks=100)
067s(mapred.map.tasks=200)
------------------------------------------------------------------------------------
bin/hadoop fs -cat terasort/input-GB001/part-00000
.t^#\|v$2\
0AAAAAAAAAABBBBBBBBBBCCCCCCCCCCDDDDDDDDDDEEEEEEEEEEFFFFFFFFFFGGGGGGGGGGHHHHHHHH
75@~?'WdUF
1IIIIIIIIIIJJJJJJJJJJKKKKKKKKKKLLLLLLLLLLMMMMMMMMMMNNNNNNNNNNOOOOOOOOOOPPPPPPPP
w[o||:N&H,
2QQQQQQQQQQRRRRRRRRRRSSSSSSSSSSTTTTTTTTTTUUUUUUUUUUVVVVVVVVVVWWWWWWWWWWXXXXXXXX
------------------------------------------------------------------------------------
bin/hadoop jar hadoop-0.20.2-examples.jar terasort
terasort/input-GB001 terasort/output-GB001
10/07/27 00:11:05 INFO
terasort.TeraSort: starting
10/07/27 00:11:05 INFO mapred.FileInputFormat: Total input paths to
process : 2
10/07/27 00:11:06 INFO util.NativeCodeLoader: Loaded the
native-hadoop library
10/07/27 00:11:06 INFO zlib.ZlibFactory: Successfully loaded
& initialized native-zlib library
10/07/27 00:11:06 INFO compress.CodecPool: Got brand-new
compressor
Making 1 from 100000 records
Step size is 100000.0
10/07/27 00:11:06 INFO mapred.JobClient: Running job:
job_201007270004_0003
10/07/27 00:11:07 INFO mapred.JobClient: map 0%
reduce 0%
10/07/27 00:11:21 INFO mapred.JobClient: map 50%
reduce 0%
10/07/27 00:11:24 INFO mapred.JobClient: map 100%
reduce 0%
10/07/27 00:11:33 INFO mapred.JobClient: map 100%
reduce 14%
10/07/27 00:11:36 INFO mapred.JobClient: map 100%
reduce 25%
10/07/27 00:11:39 INFO mapred.JobClient: map 100%
reduce 33%
10/07/27 00:11:54 INFO mapred.JobClient: map 100%
reduce 69%
10/07/27 00:11:57 INFO mapred.JobClient: map 100%
reduce 74%
10/07/27 00:12:00 INFO mapred.JobClient: map 100%
reduce 79%
10/07/27 00:12:03 INFO mapred.JobClient: map 100%
reduce 83%
10/07/27 00:12:06 INFO mapred.JobClient: map 100%
reduce 88%
10/07/27 00:12:09 INFO mapred.JobClient: map 100%
reduce 93%
10/07/27 00:12:15 INFO mapred.JobClient: map 100%
reduce 100%
10/07/27 00:12:17 INFO mapred.JobClient: Job complete:
job_201007270004_0003
10/07/27 00:12:17 INFO mapred.JobClient: Counters: 19
10/07/27 00:12:17 INFO
mapred.JobClient: Job
Counters
10/07/27 00:12:17 INFO
mapred.JobClient:
Launched reduce tasks=1
10/07/27 00:12:17 INFO
mapred.JobClient:
Rack-local map tasks=4
10/07/27 00:12:17 INFO
mapred.JobClient:
Launched map tasks=16
10/07/27 00:12:17 INFO
mapred.JobClient:
Data-local map tasks=12
10/07/27 00:12:17 INFO
mapred.JobClient:
FileSystemCounters
10/07/27 00:12:17 INFO
mapred.JobClient:
FILE_BYTES_READ=2382257412
10/07/27 00:12:17 INFO
mapred.JobClient:
HDFS_BYTES_READ=1000057358
10/07/27 00:12:17 INFO
mapred.JobClient:
FILE_BYTES_WRITTEN=3402255956
10/07/27 00:12:17 INFO
mapred.JobClient:
HDFS_BYTES_WRITTEN=1000000000
10/07/27 00:12:17 INFO
mapred.JobClient: Map-Reduce
Framework
10/07/27 00:12:17 INFO
mapred.JobClient:
Reduce input groups=10000000
10/07/27 00:12:17 INFO
mapred.JobClient:
Combine output records=0
10/07/27 00:12:17 INFO
mapred.JobClient:
Map input records=10000000
10/07/27 00:12:17 INFO
mapred.JobClient:
Reduce shuffle bytes=951549114
10/07/27 00:12:17 INFO
mapred.JobClient:
Reduce output records=10000000
10/07/27 00:12:17 INFO
mapred.JobClient:
Spilled Records=33355441
10/07/27 00:12:17 INFO
mapred.JobClient:
Map output bytes=1000000000
10/07/27 00:12:17 INFO
mapred.JobClient:
Map input bytes=1000000000
10/07/27 00:12:17 INFO
mapred.JobClient:
Combine input records=0
10/07/27 00:12:17 INFO
mapred.JobClient:
Map output records=10000000
10/07/27 00:12:17 INFO
mapred.JobClient:
Reduce input records=10000000
10/07/27 00:12:17 INFO terasort.TeraSort: done
hadoop jar
hadoop-0.20.2-examples.jar terasort terasort/input~KB001
terasort/output~KB001
22s(2个map)
hadoop jar
hadoop-0.20.2-examples.jar terasort terasort/input~MB001
terasort/output~MB001
22s(2个map因为是批处理,所以省去了网络连接的1s)
hadoop jar hadoop-0.20.2-examples.jar terasort terasort/input~GB001
terasort/output~GB001
76s=22s+54s(16个map)
hadoop jar hadoop-0.20.2-examples.jar terasort
terasort/input~GB002 terasort/output~GB002
136s=22s+114s(30个map)
hadoop jar hadoop-0.20.2-examples.jar terasort
terasort/input~GB003 terasort/output~GB003
187s=22s+165s(46个map)
hadoop jar hadoop-0.20.2-examples.jar terasort
terasort/input~GB004 terasort/output~GB004
250s=22s+228s(60个map)
hadoop jar hadoop-0.20.2-examples.jar terasort
terasort/input~GB005 terasort/output~GB005
307s=22s+285s(76个map)
hadoop jar hadoop-0.20.2-examples.jar terasort terasort/input~GB010
terasort/output~GB010
793s=22s+771s(150个map)
相关文章推荐
- TeraSort实验--测试Map和Reduce Task数量对Hadoop性能的影响
- TeraSort实验--测试Map和Reduce Task数量对Hadoop性能的影响
- TeraSort实验--测试Map和Reduce Task数量对Hadoop性能的影响
- 测试眼里的Hadoop系列 之Terasort
- TeraSort实验--测试Map和Reduce Task数量对Hadoop性能的影响
- 测试眼里的Hadoop系列 之Terasort
- Hadoop测试TeraSort
- 测试眼里的Hadoop系列 之Terasort
- 测试眼里的Hadoop系列 之Terasort
- 测试眼里的Hadoop系列 之Terasort
- 测试眼里的Hadoop系列 之Terasort
- Hadoop系列 之Terasort
- 用MPI实现Hadoop Map/Reduce的TeraSort
- TeraSort_Hadoop_排序
- Benchmark性能测试工具,TestDFSIO/TeraSort
- Hadoop Terasort
- Hadoop中TeraSort算法分析
- hadoop —— teragen & terasort
- Hadoop系列之Terasort<转>
- Hadoop TeraSort算法之2-trie树构造时间解惑