您的位置:首页 > 运维架构

hadoop之测试KMeans(二):输出结果分析

2013-05-28 12:55 337 查看
上次给出了hadoop之测试KMeans(一):运行源码实例,这次来分析一下整个MapReduce的输出结果。测试数据文件依然是文一中提到的15组数据:

(20,30) (50,61) (20,32) (50,64) (59,67)(24,34) (19,39) (20,32) (50,65) (50,77) (20,30) (20,31) (20,32) (50,64) (50,67)

先上一张我理解的这个程序的一个流程图,尤其注意数据<key, value>的输入输出方面。



现在开始分析输出结果,中间用--***--的是我在程序中加的println输出的

--main::start--//开始进入KMeans中的Main函数

--CenterInitial::run--//开始进入CenterInitial.java,初始化聚类中心操作:CenterInitial centerInitial = new CenterInitial();

CenterInitial::The initial centeris:(50,61) (50,64) (20,30)//初始时随机选择K个不同的中心点,存入HDFS中的center文件中

//初始化完成后启动job,进入Map-->Reduce过程

13/05/28 11:31:33 WARNutil.NativeCodeLoader: Unable to load native-hadoop library for yourplatform... using builtin-java classes where applicable

13/05/28 11:31:33 WARN mapred.JobClient:Use GenericOptionsParser for parsing the arguments. Applications shouldimplement Tool for the same.

13/05/28 11:31:33 WARN mapred.JobClient: Nojob jar file set. User classes may notbe found. See JobConf(Class) or JobConf#setJar(String).

13/05/28 11:31:33 INFOinput.FileInputFormat: Total input paths to process : 1

13/05/28 11:31:33 WARN snappy.LoadSnappy:Snappy native library not loaded

13/05/28 11:31:33 INFO mapred.JobClient:Running job: job_local_0001

13/05/28 11:31:33 INFO util.ProcessTree:setsid exited with exit code 0

13/05/28 11:31:33 INFO mapred.Task: Using ResourceCalculatorPlugin :org.apache.hadoop.util.LinuxResourceCalculatorPlugin@6754d6

13/05/28 11:31:33 INFO mapred.MapTask:io.sort.mb = 100

13/05/28 11:31:33 INFO mapred.MapTask: databuffer = 79691776/99614720

13/05/28 11:31:33 INFO mapred.MapTask:record buffer = 262144/327680

//进入KMapper.java, 首先调用的是setup函数,完成开始初始化聚类中心的数据读入,存入KMapper类全局变量center中,至于为什么程序会自动调用setup函数,在hadoop API的文档中有说明:

/* The framework first calls setup(org.apache.hadoop.mapreduce.Mapper.Context),

followed by map(Object, Object, Context) for each key/value pair in the InputSplit.

Finally cleanup(Context) is called.*/


--Mapper::setup--start--

--Mapper::setup--end--

--Mapper::map--start--//setup函数结束后,调用map函数,这里通过调试可以看出,map的输入参数<key, value> = <0, 文件cluster的15组数据>,系统默认的读入<key, value>,通过map函数处理,输出的<key, value>对如下:

center[pos]:(20,30)outvalue:(20,30)

center[pos]:(50,61)outvalue:(50,61)

center[pos]:(20,30)outvalue:(20,32)

center[pos]:(50,64)outvalue:(50,64)

center[pos]:(50,64)outvalue:(59,67)

center[pos]:(20,30)outvalue:(24,34)

center[pos]:(20,30)outvalue:(19,39)

center[pos]:(20,30)outvalue:(20,32)

center[pos]:(50,64)outvalue:(50,65)

center[pos]:(50,64)outvalue:(50,77)

center[pos]:(20,30)outvalue:(20,30)

center[pos]:(20,30)outvalue:(20,31)

center[pos]:(20,30)outvalue:(20,32)

center[pos]:(50,64)outvalue:(50,64)

center[pos]:(50,64)outvalue:(50,67)

//从输出可以看出,输出的<key, value>对的value值是15个数据点,其key值是对应的到所有中心距离最小的中心值

--Mapper::map--end--//map结束

13/05/28 11:31:33 INFO mapred.MapTask:Starting flush of map output

13/05/28 11:31:33 INFO mapred.MapTask:Finished spill 0

13/05/28 11:31:33 INFO mapred.Task:Task:attempt_local_0001_m_000000_0 is done. And is in the process of commiting

13/05/28 11:31:34 INFOmapred.JobClient: map 0% reduce 0%

13/05/28 11:31:36 INFOmapred.LocalJobRunner:

13/05/28 11:31:36 INFO mapred.Task: Task'attempt_local_0001_m_000000_0' done.

13/05/28 11:31:36 INFO mapred.Task: Using ResourceCalculatorPlugin :org.apache.hadoop.util.LinuxResourceCalculatorPlugin@78bc3b

13/05/28 11:31:36 INFOmapred.LocalJobRunner:

13/05/28 11:31:36 INFO mapred.Merger:Merging 1 sorted segments

13/05/28 11:31:36 INFO mapred.Merger: Downto the last merge-pass, with 1 segments left of total size: 272 bytes

13/05/28 11:31:36 INFOmapred.LocalJobRunner:

--KReducer::reduce--start--(20,30)org.apache.hadoop.mapreduce.ReduceContext$ValueIterable@12c3327 //开始reduce操作,从这里可以看出,这个reduce的输入参数的key是(20,30)这个聚类中心,value是这个聚类中心对应的map中计算的距离最小的8个数据点,一共有三聚类中心,有三个reduce

(20.375,32.5)//KReduce结束前的这组数据的新的中心

key:(20,30)outval+center:(20,30) (20,32)(24,34) (19,39) (20,32) (20,30) (20,31) (20,32) (20.375,32.5)//这是reduce的输出<key, value>

--KReducer::reduce--end--//我理解这个reduce是把输入key中对应的数据点进行合并,分为三个reduce进行合并,如果进行调试也可以看出分三个中心的reduce进行分别处理

--KReducer::reduce--start--(50,61)org.apache.hadoop.mapreduce.ReduceContext$ValueIterable@12c3327//如上,第二个reduce过程,合并中心为(50,61)的数据点

(50.0,61.0)

key:(50,61)outval+center:(50,61)(50.0,61.0)

--KReducer::reduce--end--

--KReducer::reduce--start--(50,64)org.apache.hadoop.mapreduce.ReduceContext$ValueIterable@12c3327//如上,第三个reduce过程,合并中心为(50,61)的数据点

(51.5,67.333336)

key:(50,64)outval+center:(50,65) (50,64)(59,67) (50,77) (50,67) (50,64) (51.5,67.333336)

--KReducer::reduce--end--

13/05/28 11:31:36 INFO mapred.Task:Task:attempt_local_0001_r_000000_0 is done. And is in the process of commiting

13/05/28 11:31:36 INFOmapred.LocalJobRunner:

13/05/28 11:31:36 INFO mapred.Task: Taskattempt_local_0001_r_000000_0 is allowed to commit now

13/05/28 11:31:36 INFOoutput.FileOutputCommitter: Saved output of task'attempt_local_0001_r_000000_0' to hdfs://192.168.56.171:9000/ouput

13/05/28 11:31:37 INFOmapred.JobClient: map 100% reduce 0%

13/05/28 11:31:39 INFOmapred.LocalJobRunner: reduce > reduce

13/05/28 11:31:39 INFO mapred.Task: Task'attempt_local_0001_r_000000_0' done.

13/05/28 11:31:40 INFOmapred.JobClient: map 100% reduce 100%

13/05/28 11:31:40 INFO mapred.JobClient:Job complete: job_local_0001

13/05/28 11:31:40 INFO mapred.JobClient:Counters: 22

13/05/28 11:31:40 INFOmapred.JobClient: File Output FormatCounters

13/05/28 11:31:40 INFOmapred.JobClient: Bytes Written=187

13/05/28 11:31:40 INFOmapred.JobClient: FileSystemCounters

13/05/28 11:31:40 INFOmapred.JobClient: FILE_BYTES_READ=598

13/05/28 11:31:40 INFOmapred.JobClient: HDFS_BYTES_READ=532

13/05/28 11:31:40 INFOmapred.JobClient: FILE_BYTES_WRITTEN=81540

13/05/28 11:31:40 INFOmapred.JobClient: HDFS_BYTES_WRITTEN=235

13/05/28 11:31:40 INFO mapred.JobClient: File Input Format Counters

13/05/28 11:31:40 INFOmapred.JobClient: Bytes Read=121

13/05/28 11:31:40 INFOmapred.JobClient: Map-Reduce Framework

13/05/28 11:31:40 INFOmapred.JobClient: Map outputmaterialized bytes=276

13/05/28 11:31:40 INFOmapred.JobClient: Map input records=1

13/05/28 11:31:40 INFOmapred.JobClient: Reduce shufflebytes=0

13/05/28 11:31:40 INFOmapred.JobClient: Spilled Records=30

13/05/28 11:31:40 INFOmapred.JobClient: Map outputbytes=240

13/05/28 11:31:40 INFOmapred.JobClient: Total committedheap usage (bytes)=258342912

13/05/28 11:31:40 INFOmapred.JobClient: CPU time spent(ms)=0

13/05/28 11:31:40 INFOmapred.JobClient: SPLIT_RAW_BYTES=107

13/05/28 11:31:40 INFO mapred.JobClient: Combine input records=0

13/05/28 11:31:40 INFOmapred.JobClient: Reduce inputrecords=15//输入的记录组数15组

13/05/28 11:31:40 INFOmapred.JobClient: Reduce inputgroups=3//三组reduce

13/05/28 11:31:40 INFOmapred.JobClient: Combine outputrecords=0

13/05/28 11:31:40 INFOmapred.JobClient: Physical memory(bytes) snapshot=0

13/05/28 11:31:40 INFOmapred.JobClient: Reduce outputrecords=3

13/05/28 11:31:40 INFOmapred.JobClient: Virtual memory(bytes) snapshot=0

13/05/28 11:31:40 INFO mapred.JobClient: Map output records=15

13/05/28 11:31:40 INFO mapred.JobClient:Running job: job_local_0001

13/05/28 11:31:40 INFO mapred.JobClient:Job complete: job_local_0001

13/05/28 11:31:40 INFO mapred.JobClient:Counters: 22

13/05/28 11:31:40 INFO mapred.JobClient: File Output Format Counters

13/05/28 11:31:40 INFOmapred.JobClient: Bytes Written=187

13/05/28 11:31:40 INFOmapred.JobClient: FileSystemCounters

13/05/28 11:31:40 INFOmapred.JobClient: FILE_BYTES_READ=598

13/05/28 11:31:40 INFOmapred.JobClient: HDFS_BYTES_READ=532

13/05/28 11:31:40 INFOmapred.JobClient: FILE_BYTES_WRITTEN=81540

13/05/28 11:31:40 INFOmapred.JobClient: HDFS_BYTES_WRITTEN=235

13/05/28 11:31:40 INFOmapred.JobClient: File Input FormatCounters

13/05/28 11:31:40 INFOmapred.JobClient: Bytes Read=121

13/05/28 11:31:40 INFOmapred.JobClient: Map-Reduce Framework

13/05/28 11:31:40 INFOmapred.JobClient: Map outputmaterialized bytes=276

13/05/28 11:31:40 INFOmapred.JobClient: Map input records=1

13/05/28 11:31:40 INFOmapred.JobClient: Reduce shufflebytes=0

13/05/28 11:31:40 INFOmapred.JobClient: Spilled Records=30

13/05/28 11:31:40 INFOmapred.JobClient: Map outputbytes=240

13/05/28 11:31:40 INFOmapred.JobClient: Total committedheap usage (bytes)=258342912

13/05/28 11:31:40 INFOmapred.JobClient: CPU time spent(ms)=0

13/05/28 11:31:40 INFOmapred.JobClient: SPLIT_RAW_BYTES=107

13/05/28 11:31:40 INFOmapred.JobClient: Combine inputrecords=0

13/05/28 11:31:40 INFOmapred.JobClient: Reduce inputrecords=15

13/05/28 11:31:40 INFOmapred.JobClient: Reduce inputgroups=3

13/05/28 11:31:40 INFOmapred.JobClient: Combine outputrecords=0

13/05/28 11:31:40 INFOmapred.JobClient: Physical memory(bytes) snapshot=0

13/05/28 11:31:40 INFOmapred.JobClient: Reduce outputrecords=3

13/05/28 11:31:40 INFOmapred.JobClient: Virtual memory(bytes) snapshot=0

13/05/28 11:31:40 INFO mapred.JobClient: Map output records=15

--NewCenter::run--start--//计算新的中心函数开始,这个函数首先从reduce的输出文件/part-r-00000中读取输出结果,即上面解释过的reduce的输出<key, value>,如下

(20,30) (20,30)(20,32) (24,34) (19,39) (20,32) (20,30) (20,31) (20,32) (20.375,32.5)

(50,61) (50,61)(50.0,61.0)

(50,64) (50,65)(50,64) (59,67) (50,77) (50,67) (50,64) (51.5,67.333336)

//计算出新的聚类中心,并覆盖初始的聚类中心文件center

(20.375,32.5) (50.0,61.0) (51.5,67.333336)

--NewCenter::run--end--//计算新的聚类中心结束,返回主函数main中,并对中心的阈值进行判断,不满足要求,再做while循环,迭代进行map-->reduce操作

13/05/28 11:31:40 WARN mapred.JobClient:Use GenericOptionsParser for parsing the arguments. Applications shouldimplement Tool for the same.

13/05/28 11:31:40 WARN mapred.JobClient: Nojob jar file set. User classes may notbe found. See JobConf(Class) or JobConf#setJar(String).

13/05/28 11:31:40 INFOinput.FileInputFormat: Total input paths to process : 1

13/05/28 11:31:40 INFO mapred.JobClient:Running job: job_local_0002

13/05/28 11:31:40 INFO mapred.Task: Using ResourceCalculatorPlugin :org.apache.hadoop.util.LinuxResourceCalculatorPlugin@1884a40

13/05/28 11:31:40 INFO mapred.MapTask:io.sort.mb = 100

13/05/28 11:31:40 INFO mapred.MapTask: databuffer = 79691776/99614720

13/05/28 11:31:40 INFO mapred.MapTask:record buffer = 262144/327680

//下一轮map-->reduce的迭代开始

--Mapper::setup--start--

--Mapper::setup--end--

--Mapper::map--start--

center[pos]:(20.375,32.5)outvalue:(20,30)

center[pos]:(50.0,61.0)outvalue:(50,61)

center[pos]:(20.375,32.5)outvalue:(20,32)

center[pos]:(50.0,61.0)outvalue:(50,64)

center[pos]:(51.5,67.333336)outvalue:(59,67)

center[pos]:(20.375,32.5)outvalue:(24,34)

center[pos]:(20.375,32.5)outvalue:(19,39)

center[pos]:(20.375,32.5)outvalue:(20,32)

center[pos]:(51.5,67.333336)outvalue:(50,65)

center[pos]:(51.5,67.333336)outvalue:(50,77)

center[pos]:(20.375,32.5)outvalue:(20,30)

center[pos]:(20.375,32.5)outvalue:(20,31)

center[pos]:(20.375,32.5)outvalue:(20,32)

center[pos]:(50.0,61.0)outvalue:(50,64)

center[pos]:(51.5,67.333336)outvalue:(50,67)

--Mapper::map--end--

13/05/28 11:31:40 INFO mapred.MapTask:Starting flush of map output

13/05/28 11:31:40 INFO mapred.MapTask:Finished spill 0

13/05/28 11:31:40 INFO mapred.Task:Task:attempt_local_0002_m_000000_0 is done. And is in the process of commiting

.......

--KReducer::reduce--start--(20.375,32.5)org.apache.hadoop.mapreduce.ReduceContext$ValueIterable@1712651

(20.375,32.5)

key:(20.375,32.5)outval+center:(20,30)(20,32) (24,34) (19,39) (20,32) (20,30) (20,31) (20,32) (20.375,32.5)

--KReducer::reduce--end--

--KReducer::reduce--start--(50.0,61.0)org.apache.hadoop.mapreduce.ReduceContext$ValueIterable@1712651

(50.0,63.0)

key:(50.0,61.0)outval+center:(50,61)(50,64) (50,64) (50.0,63.0)

--KReducer::reduce--end--

--KReducer::reduce--start--(51.5,67.333336)org.apache.hadoop.mapreduce.ReduceContext$ValueIterable@1712651

(52.25,69.0)

key:(51.5,67.333336)outval+center:(50,65)(59,67) (50,77) (50,67) (52.25,69.0)

--KReducer::reduce--end--

13/05/28 11:31:43 INFO mapred.Task:Task:attempt_local_0002_r_000000_0 is done. And is in the process of commiting

13/05/28 11:31:43 INFOmapred.LocalJobRunner:

13/05/28 11:31:43 INFO mapred.Task: Taskattempt_local_0002_r_000000_0 is allowed to commit now

13/05/28 11:31:43 INFOoutput.FileOutputCommitter: Saved output of task'attempt_local_0002_r_000000_0' to hdfs://192.168.56.171:9000/ouput

.......

--NewCenter::run--start--

(20.375,32.5) (20,30) (20,32) (24,34) (19,39) (20,32) (20,30) (20,31) (20,32)(20.375,32.5)

(50.0,61.0) (50,61)(50,64) (50,64) (50.0,63.0)

(51.5,67.333336) (50,65) (59,67) (50,77) (50,67) (52.25,69.0)

(20.375,32.5) (50.0,63.0) (52.25,69.0)

--NewCenter::run--end--

13/05/28 11:31:47 WARN mapred.JobClient:Use GenericOptionsParser for parsing the arguments. Applications shouldimplement Tool for the same.

13/05/28 11:31:47 WARN mapred.JobClient: Nojob jar file set. User classes may notbe found. See JobConf(Class) or JobConf#setJar(String).

13/05/28 11:31:47 INFOinput.FileInputFormat: Total input paths to process : 1

13/05/28 11:31:47 INFO mapred.JobClient:Running job: job_local_0003

13/05/28 11:31:47 INFO mapred.Task: Using ResourceCalculatorPlugin :org.apache.hadoop.util.LinuxResourceCalculatorPlugin@11ef443

13/05/28 11:31:47 INFO mapred.MapTask:io.sort.mb = 100

13/05/28 11:31:47 INFO mapred.MapTask: databuffer = 79691776/99614720

13/05/28 11:31:47 INFO mapred.MapTask:record buffer = 262144/327680

--Mapper::setup--start--

--Mapper::setup--end--

--Mapper::map--start--

center[pos]:(20.375,32.5)outvalue:(20,30)

center[pos]:(50.0,63.0)outvalue:(50,61)

center[pos]:(20.375,32.5)outvalue:(20,32)

center[pos]:(50.0,63.0)outvalue:(50,64)

center[pos]:(52.25,69.0)outvalue:(59,67)

center[pos]:(20.375,32.5)outvalue:(24,34)

center[pos]:(20.375,32.5)outvalue:(19,39)

center[pos]:(20.375,32.5)outvalue:(20,32)

center[pos]:(50.0,63.0)outvalue:(50,65)

center[pos]:(52.25,69.0)outvalue:(50,77)

center[pos]:(20.375,32.5)outvalue:(20,30)

center[pos]:(20.375,32.5)outvalue:(20,31)

center[pos]:(20.375,32.5)outvalue:(20,32)

center[pos]:(50.0,63.0)outvalue:(50,64)

center[pos]:(52.25,69.0)outvalue:(50,67)

--Mapper::map--end--

13/05/28 11:31:47 INFO mapred.MapTask:Starting flush of map output

13/05/28 11:31:47 INFO mapred.MapTask:Finished spill 0

13/05/28 11:31:47 INFO mapred.Task:Task:attempt_local_0003_m_000000_0 is done. And is in the process of commiting

13/05/28 11:31:48 INFOmapred.JobClient: map 0% reduce 0%

13/05/28 11:31:50 INFOmapred.LocalJobRunner:

13/05/28 11:31:50 INFO mapred.Task: Task'attempt_local_0003_m_000000_0' done.

.......

--KReducer::reduce--start--(20.375,32.5)org.apache.hadoop.mapreduce.ReduceContext$ValueIterable@35bb0f

(20.375,32.5)

key:(20.375,32.5)outval+center:(20,30)(20,32) (24,34) (19,39) (20,32) (20,30) (20,31) (20,32) (20.375,32.5)

--KReducer::reduce--end--

--KReducer::reduce--start--(50.0,63.0)org.apache.hadoop.mapreduce.ReduceContext$ValueIterable@35bb0f

(50.0,63.5)

key:(50.0,63.0)outval+center:(50,61)(50,65) (50,64) (50,64) (50.0,63.5)

--KReducer::reduce--end--

--KReducer::reduce--start--(52.25,69.0)org.apache.hadoop.mapreduce.ReduceContext$ValueIterable@35bb0f

(53.0,70.333336)

key:(52.25,69.0)outval+center:(59,67) (50,77)(50,67) (53.0,70.333336)

--KReducer::reduce--end--

13/05/28 11:31:50 INFO mapred.Task:Task:attempt_local_0003_r_000000_0 is done. And is in the process of commiting

13/05/28 11:31:50 INFOmapred.LocalJobRunner:

13/05/28 11:31:50 INFO mapred.Task: Taskattempt_local_0003_r_000000_0 is allowed to commit now

13/05/28 11:31:50 INFOoutput.FileOutputCommitter: Saved output of task'attempt_local_0003_r_000000_0' to hdfs://192.168.56.171:9000/ouput

13/05/28 11:31:51 INFOmapred.JobClient: map 100% reduce 0%

13/05/28 11:31:53 INFOmapred.LocalJobRunner: reduce > reduce

13/05/28 11:31:53 INFO mapred.Task: Task'attempt_local_0003_r_000000_0' done.

.......

13/05/28 11:31:54 INFOmapred.JobClient: Map outputrecords=15

--NewCenter::run--start--

(20.375,32.5) (20,30) (20,32) (24,34) (19,39) (20,32) (20,30) (20,31) (20,32)(20.375,32.5)

(50.0,63.0) (50,61)(50,65) (50,64) (50,64) (50.0,63.5)

(52.25,69.0) (59,67) (50,77) (50,67) (53.0,70.333336)

(20.375,32.5) (50.0,63.5) (53.0,70.333336)

--NewCenter::run--end--

13/05/28 11:31:54 WARN mapred.JobClient:Use GenericOptionsParser for parsing the arguments. Applications shouldimplement Tool for the same.

13/05/28 11:31:54 WARN mapred.JobClient: Nojob jar file set. User classes may notbe found. See JobConf(Class) or JobConf#setJar(String).

13/05/28 11:31:54 INFOinput.FileInputFormat: Total input paths to process : 1

13/05/28 11:31:54 INFO mapred.JobClient:Running job: job_local_0004

13/05/28 11:31:54 INFO mapred.Task: Using ResourceCalculatorPlugin :org.apache.hadoop.util.LinuxResourceCalculatorPlugin@1958bf9

13/05/28 11:31:54 INFO mapred.MapTask:io.sort.mb = 100

13/05/28 11:31:54 INFO mapred.MapTask: databuffer = 79691776/99614720

13/05/28 11:31:54 INFO mapred.MapTask:record buffer = 262144/327680

--Mapper::setup--start--

--Mapper::setup--end--

--Mapper::map--start--

center[pos]:(20.375,32.5)outvalue:(20,30)

center[pos]:(50.0,63.5)outvalue:(50,61)

center[pos]:(20.375,32.5)outvalue:(20,32)

center[pos]:(50.0,63.5)outvalue:(50,64)

center[pos]:(53.0,70.333336)outvalue:(59,67)

center[pos]:(20.375,32.5)outvalue:(24,34)

center[pos]:(20.375,32.5)outvalue:(19,39)

center[pos]:(20.375,32.5)outvalue:(20,32)

center[pos]:(50.0,63.5)outvalue:(50,65)

center[pos]:(53.0,70.333336)outvalue:(50,77)

center[pos]:(20.375,32.5)outvalue:(20,30)

center[pos]:(20.375,32.5)outvalue:(20,31)

center[pos]:(20.375,32.5)outvalue:(20,32)

center[pos]:(50.0,63.5)outvalue:(50,64)

center[pos]:(50.0,63.5)outvalue:(50,67)

--Mapper::map--end--

13/05/28 11:31:54 INFO mapred.MapTask:Starting flush of map output

.......

13/05/28 11:31:57 INFOmapred.LocalJobRunner:

--KReducer::reduce--start--(20.375,32.5)org.apache.hadoop.mapreduce.ReduceContext$ValueIterable@6db724

(20.375,32.5)

key:(20.375,32.5)outval+center:(20,30)(20,32) (24,34) (19,39) (20,32) (20,30) (20,31) (20,32) (20.375,32.5)

--KReducer::reduce--end--

--KReducer::reduce--start--(50.0,63.5)org.apache.hadoop.mapreduce.ReduceContext$ValueIterable@6db724

(50.0,64.2)

key:(50.0,63.5)outval+center:(50,61)(50,65) (50,64) (50,67) (50,64) (50.0,64.2)

--KReducer::reduce--end--

--KReducer::reduce--start--(53.0,70.333336)org.apache.hadoop.mapreduce.ReduceContext$ValueIterable@6db724

(54.5,72.0)

key:(53.0,70.333336)outval+center:(59,67)(50,77) (54.5,72.0)

--KReducer::reduce--end--

13/05/28 11:31:57 INFO mapred.Task:Task:attempt_local_0004_r_000000_0 is done. And is in the process of commiting

.......

13/05/28 11:32:01 INFOmapred.JobClient: Map outputrecords=15

--NewCenter::run--start--

(20.375,32.5) (20,30) (20,32) (24,34) (19,39) (20,32) (20,30) (20,31) (20,32)(20.375,32.5)

(50.0,63.5) (50,61)(50,65) (50,64) (50,67) (50,64) (50.0,64.2)

(53.0,70.333336) (59,67) (50,77) (54.5,72.0)

(20.375,32.5) (50.0,64.2) (54.5,72.0)

--NewCenter::run--end--

13/05/28 11:32:01 WARN mapred.JobClient:Use GenericOptionsParser for parsing the arguments. Applications shouldimplement Tool for the same.

.......

13/05/28 11:32:01 INFO mapred.MapTask:record buffer = 262144/327680

--Mapper::setup--start--

--Mapper::setup--end--

--Mapper::map--start--

center[pos]:(20.375,32.5)outvalue:(20,30)

center[pos]:(50.0,64.2)outvalue:(50,61)

center[pos]:(20.375,32.5)outvalue:(20,32)

center[pos]:(50.0,64.2)outvalue:(50,64)

center[pos]:(54.5,72.0)outvalue:(59,67)

center[pos]:(20.375,32.5)outvalue:(24,34)

center[pos]:(20.375,32.5)outvalue:(19,39)

center[pos]:(20.375,32.5)outvalue:(20,32)

center[pos]:(50.0,64.2)outvalue:(50,65)

center[pos]:(54.5,72.0)outvalue:(50,77)

center[pos]:(20.375,32.5)outvalue:(20,30)

center[pos]:(20.375,32.5)outvalue:(20,31)

center[pos]:(20.375,32.5)outvalue:(20,32)

center[pos]:(50.0,64.2)outvalue:(50,64)

center[pos]:(50.0,64.2)outvalue:(50,67)

--Mapper::map--end--

13/05/28 11:32:01 INFO mapred.MapTask:Starting flush of map output

13/05/28 11:32:01 INFO mapred.MapTask:Finished spill 0

13/05/28 11:32:01 INFO mapred.Task:Task:attempt_local_0005_m_000000_0 is done. And is in the process of commiting

13/05/28 11:32:02 INFOmapred.JobClient: map 0% reduce 0%

13/05/28 11:32:04 INFOmapred.LocalJobRunner:

13/05/28 11:32:04 INFO mapred.Task: Task'attempt_local_0005_m_000000_0' done.

13/05/28 11:32:04 INFO mapred.Task: Using ResourceCalculatorPlugin :org.apache.hadoop.util.LinuxResourceCalculatorPlugin@c06258

13/05/28 11:32:04 INFOmapred.LocalJobRunner:

13/05/28 11:32:04 INFO mapred.Merger:Merging 1 sorted segments

13/05/28 11:32:04 INFO mapred.Merger: Downto the last merge-pass, with 1 segments left of total size: 348 bytes

13/05/28 11:32:04 INFO mapred.LocalJobRunner:

--KReducer::reduce--start--(20.375,32.5)org.apache.hadoop.mapreduce.ReduceContext$ValueIterable@cffc79

(20.375,32.5)

key:(20.375,32.5)outval+center:(20,30)(20,32) (24,34) (19,39) (20,32) (20,30) (20,31) (20,32) (20.375,32.5)

--KReducer::reduce--end--

--KReducer::reduce--start--(50.0,64.2)org.apache.hadoop.mapreduce.ReduceContext$ValueIterable@cffc79

(50.0,64.2)

key:(50.0,64.2)outval+center:(50,61)(50,65) (50,64) (50,67) (50,64) (50.0,64.2)

--KReducer::reduce--end--

--KReducer::reduce--start--(54.5,72.0)org.apache.hadoop.mapreduce.ReduceContext$ValueIterable@cffc79

(54.5,72.0)

key:(54.5,72.0)outval+center:(59,67)(50,77) (54.5,72.0)

--KReducer::reduce--end--

13/05/28 11:32:04 INFO mapred.Task:Task:attempt_local_0005_r_000000_0 is done. And is in the process of commiting

.......

13/05/28 11:32:08 INFOmapred.JobClient: Map output records=15

--NewCenter::run--start--

(20.375,32.5) (20,30) (20,32) (24,34) (19,39) (20,32) (20,30) (20,31) (20,32)(20.375,32.5)

(50.0,64.2) (50,61)(50,65) (50,64) (50,67) (50,64) (50.0,64.2)

(54.5,72.0) (59,67)(50,77) (54.5,72.0)

(20.375,32.5) (50.0,64.2) (54.5,72.0)

--NewCenter::run--end--

Iterator: 5//最后输出迭代次数

以上是整个KMeans源码的输出说明,中间用--***--的是我在程序中加的println输出的,这样有利于对中间输出结果进行分析。不过我个人觉得,还是一步步调试运行能更快捷的看到程序的处理流程以及每个变量的输出值。总结一下这个源码的思路大概有三个地方需要注意的:

1、在调用map函数前框架会自动调用setup,原因上面已经说明

2、要理解这个源码中是有1个map,3个reduce进行对应的处理,我设置的k值为3

3、要注意map的输入输出,reduce的输入输出的<key, value>是什么,这样才能理解整个程序的结构,也有利于自己对源码的修改以满足自己的需求

4、对于我这样的初学者,还需要理解hdfs中文件的读取和写入操作,可以看看我上次的记录:hadoop通过FileSystem API读取和写入数据

我也是刚接触hadoop不久,如有分析不对的,欢迎一起交流进步!
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: