您的位置:首页 > 运维架构

Hadoop 运行 Wordcount程序

2015-10-29 21:48 489 查看
Hadoop在/share/hadoop/mapreduce目录下有hadoop-mapreduce-examples-2.6.0.jar包,可以进行Wordcount运算。

本实验Hadoop版本为2.6.0,不同版本的Hadoop中hadoop-mapreduce-examples-2.6.0.jar存放路径不同

具体步骤如下:

一、在本地硬盘建立test文件夹,我建立在/home/sky目录下

<span style="font-size:18px;">[root@localhost sky]# cd /home/sky/
[root@localhost sky]# mkdir test
[root@localhost sky]# ls
Desktop    eclipse  Music     Public   spark      Videos
Documents  hadoop   mysql     pycharm  Templates  workspace
Downloads  hive     Pictures  scala    test
[root@localhost sky]# </span>






二、建立两个需要统计的文档test1.txt和test2.txt

<span style="font-size:18px;">[root@localhost sky]# cd test/
[root@localhost test]# echo "hello world,hello hadoop" > test1.txt
[root@localhost test]# echo "hello world,hello spark"  > test2.txt
[root@localhost test]# cat test1.txt 
hello world,hello hadoop
</span>


三、在HDFS上创建input目录,并查看

<span style="font-size:18px;">[root@localhost test]# cd /home/sky/hadoop/
[root@localhost hadoop]# bin/hadoop fs -mkdir /input
[root@localhost hadoop]# bin/hadoop fs -ls /
Found 5 items
drwxr-xr-x   - root supergroup          0 2015-10-26 19:47 /home
drwxr-xr-x   - root supergroup          0 2015-10-29 21:56 /input
drwxr-xr-x   - root supergroup          0 2015-10-29 20:54 /output
drwx-wx-wx   - root supergroup          0 2015-10-29 20:57 /tmp
drwxr-xr-x   - root supergroup          0 2015-10-28 21:32 /user
</span>


四、把本地磁盘上的test文件传到HDFS文件新建的/input目录下
[root@localhost hadoop]# bin/hadoop fs -put /home/sky/test/test*.txt /input

[root@localhost hadoop]# bin/hadoop fs -ls /input

Found 2 items

-rw-r--r-- 1 root supergroup 25 2015-10-29 22:02 /input/test1.txt

-rw-r--r-- 1 root supergroup 24 2015-10-29 22:02 /input/test2.txt

五,运行以及运行过程

<span style="font-size:18px;">[root@localhost hadoop]# bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar wordcount /input /output/wordcount2
15/10/29 22:07:43 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
15/10/29 22:07:46 INFO input.FileInputFormat: Total input paths to process : 2
15/10/29 22:07:46 INFO mapreduce.JobSubmitter: number of splits:2
15/10/29 22:07:47 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1445687376888_0005
15/10/29 22:07:48 INFO impl.YarnClientImpl: Submitted application application_1445687376888_0005
15/10/29 22:07:48 INFO mapreduce.Job: The url to track the job: http://localhost:8088/proxy/application_1445687376888_0005/ 15/10/29 22:07:48 INFO mapreduce.Job: Running job: job_1445687376888_0005
15/10/29 22:08:03 INFO mapreduce.Job: Job job_1445687376888_0005 running in uber mode : false
15/10/29 22:08:03 INFO mapreduce.Job:  map 0% reduce 0%
15/10/29 22:08:23 INFO mapreduce.Job:  map 100% reduce 0%
15/10/29 22:08:34 INFO mapreduce.Job:  map 100% reduce 100%
15/10/29 22:08:35 INFO mapreduce.Job: Job job_1445687376888_0005 completed successfully
15/10/29 22:08:35 INFO mapreduce.Job: Counters: 49
	File System Counters
		FILE: Number of bytes read=91
		FILE: Number of bytes written=316819
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=253
		HDFS: Number of bytes written=39
		HDFS: Number of read operations=9
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=2
	Job Counters 
		Launched map tasks=2
		Launched reduce tasks=1
		Data-local map tasks=2
		Total time spent by all maps in occupied slots (ms)=34517
		Total time spent by all reduces in occupied slots (ms)=8834
		Total time spent by all map tasks (ms)=34517
		Total time spent by all reduce tasks (ms)=8834
		Total vcore-seconds taken by all map tasks=34517
		Total vcore-seconds taken by all reduce tasks=8834
		Total megabyte-seconds taken by all map tasks=35345408
		Total megabyte-seconds taken by all reduce tasks=9046016
	Map-Reduce Framework
		Map input records=2
		Map output records=6
		Map output bytes=73
		Map output materialized bytes=97
		Input split bytes=204
		Combine input records=6
		Combine output records=6
		Reduce input groups=4
		Reduce shuffle bytes=97
		Reduce input records=6
		Reduce output records=4
		Spilled Records=12
		Shuffled Maps =2
		Failed Shuffles=0
		Merged Map outputs=2
		GC time elapsed (ms)=645
		CPU time spent (ms)=3820
		Physical memory (bytes) snapshot=505044992
		Virtual memory (bytes) snapshot=6221623296
		Total committed heap usage (bytes)=355999744
	Shuffle Errors
		BAD_ID=0
		CONNECTION=0
		IO_ERROR=0
		WRONG_LENGTH=0
		WRONG_MAP=0
		WRONG_REDUCE=0
	File Input Format Counters 
		Bytes Read=49
	File Output Format Counters 
		Bytes Written=39
[root@localhost hadoop]# 
</span>


六、查看运行结果(注:以空格分词,所以出现“world,hello”)

<span style="font-size:18px;">[root@localhost hadoop]# bin/hdfs dfs -cat /output/wordcount2/*
hadoop	1
hello	2
spark	1
world,hello	2
</span>
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: