Hadoop 运行 Wordcount程序
2015-10-29 21:48
489 查看
Hadoop在/share/hadoop/mapreduce目录下有hadoop-mapreduce-examples-2.6.0.jar包,可以进行Wordcount运算。
本实验Hadoop版本为2.6.0,不同版本的Hadoop中hadoop-mapreduce-examples-2.6.0.jar存放路径不同
具体步骤如下:
一、在本地硬盘建立test文件夹,我建立在/home/sky目录下
二、建立两个需要统计的文档test1.txt和test2.txt
三、在HDFS上创建input目录,并查看
四、把本地磁盘上的test文件传到HDFS文件新建的/input目录下
[root@localhost hadoop]# bin/hadoop fs -put /home/sky/test/test*.txt /input
[root@localhost hadoop]# bin/hadoop fs -ls /input
Found 2 items
-rw-r--r-- 1 root supergroup 25 2015-10-29 22:02 /input/test1.txt
-rw-r--r-- 1 root supergroup 24 2015-10-29 22:02 /input/test2.txt
五,运行以及运行过程
六、查看运行结果(注:以空格分词,所以出现“world,hello”)
本实验Hadoop版本为2.6.0,不同版本的Hadoop中hadoop-mapreduce-examples-2.6.0.jar存放路径不同
具体步骤如下:
一、在本地硬盘建立test文件夹,我建立在/home/sky目录下
<span style="font-size:18px;">[root@localhost sky]# cd /home/sky/ [root@localhost sky]# mkdir test [root@localhost sky]# ls Desktop eclipse Music Public spark Videos Documents hadoop mysql pycharm Templates workspace Downloads hive Pictures scala test [root@localhost sky]# </span>
二、建立两个需要统计的文档test1.txt和test2.txt
<span style="font-size:18px;">[root@localhost sky]# cd test/ [root@localhost test]# echo "hello world,hello hadoop" > test1.txt [root@localhost test]# echo "hello world,hello spark" > test2.txt [root@localhost test]# cat test1.txt hello world,hello hadoop </span>
三、在HDFS上创建input目录,并查看
<span style="font-size:18px;">[root@localhost test]# cd /home/sky/hadoop/ [root@localhost hadoop]# bin/hadoop fs -mkdir /input [root@localhost hadoop]# bin/hadoop fs -ls / Found 5 items drwxr-xr-x - root supergroup 0 2015-10-26 19:47 /home drwxr-xr-x - root supergroup 0 2015-10-29 21:56 /input drwxr-xr-x - root supergroup 0 2015-10-29 20:54 /output drwx-wx-wx - root supergroup 0 2015-10-29 20:57 /tmp drwxr-xr-x - root supergroup 0 2015-10-28 21:32 /user </span>
四、把本地磁盘上的test文件传到HDFS文件新建的/input目录下
[root@localhost hadoop]# bin/hadoop fs -put /home/sky/test/test*.txt /input
[root@localhost hadoop]# bin/hadoop fs -ls /input
Found 2 items
-rw-r--r-- 1 root supergroup 25 2015-10-29 22:02 /input/test1.txt
-rw-r--r-- 1 root supergroup 24 2015-10-29 22:02 /input/test2.txt
五,运行以及运行过程
<span style="font-size:18px;">[root@localhost hadoop]# bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar wordcount /input /output/wordcount2 15/10/29 22:07:43 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032 15/10/29 22:07:46 INFO input.FileInputFormat: Total input paths to process : 2 15/10/29 22:07:46 INFO mapreduce.JobSubmitter: number of splits:2 15/10/29 22:07:47 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1445687376888_0005 15/10/29 22:07:48 INFO impl.YarnClientImpl: Submitted application application_1445687376888_0005 15/10/29 22:07:48 INFO mapreduce.Job: The url to track the job: http://localhost:8088/proxy/application_1445687376888_0005/ 15/10/29 22:07:48 INFO mapreduce.Job: Running job: job_1445687376888_0005 15/10/29 22:08:03 INFO mapreduce.Job: Job job_1445687376888_0005 running in uber mode : false 15/10/29 22:08:03 INFO mapreduce.Job: map 0% reduce 0% 15/10/29 22:08:23 INFO mapreduce.Job: map 100% reduce 0% 15/10/29 22:08:34 INFO mapreduce.Job: map 100% reduce 100% 15/10/29 22:08:35 INFO mapreduce.Job: Job job_1445687376888_0005 completed successfully 15/10/29 22:08:35 INFO mapreduce.Job: Counters: 49 File System Counters FILE: Number of bytes read=91 FILE: Number of bytes written=316819 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=253 HDFS: Number of bytes written=39 HDFS: Number of read operations=9 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 Job Counters Launched map tasks=2 Launched reduce tasks=1 Data-local map tasks=2 Total time spent by all maps in occupied slots (ms)=34517 Total time spent by all reduces in occupied slots (ms)=8834 Total time spent by all map tasks (ms)=34517 Total time spent by all reduce tasks (ms)=8834 Total vcore-seconds taken by all map tasks=34517 Total vcore-seconds taken by all reduce tasks=8834 Total megabyte-seconds taken by all map tasks=35345408 Total megabyte-seconds taken by all reduce tasks=9046016 Map-Reduce Framework Map input records=2 Map output records=6 Map output bytes=73 Map output materialized bytes=97 Input split bytes=204 Combine input records=6 Combine output records=6 Reduce input groups=4 Reduce shuffle bytes=97 Reduce input records=6 Reduce output records=4 Spilled Records=12 Shuffled Maps =2 Failed Shuffles=0 Merged Map outputs=2 GC time elapsed (ms)=645 CPU time spent (ms)=3820 Physical memory (bytes) snapshot=505044992 Virtual memory (bytes) snapshot=6221623296 Total committed heap usage (bytes)=355999744 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=49 File Output Format Counters Bytes Written=39 [root@localhost hadoop]# </span>
六、查看运行结果(注:以空格分词,所以出现“world,hello”)
<span style="font-size:18px;">[root@localhost hadoop]# bin/hdfs dfs -cat /output/wordcount2/* hadoop 1 hello 2 spark 1 world,hello 2 </span>
相关文章推荐
- openstack组件oslo.message之RPCServer实现
- linux多线程的创建基本知识
- 什么是网站地图
- ubuntu+nginx+php+mysql安装配置方法命令
- Nginx之——413 修改上传文件大小限制
- Windows连接Linux的常用工具
- CentOS 7.1 下载,安装,配置
- synopsys verdi2014-03安装破解过程
- Linux Postgresql 安装,使用
- 开源还是商用?十大云运维监控工具横评
- linux笔记:目录处理命令ls,mkdir,cd,pwd,rmdir,cp,mv,rm
- CentOS系统下xinetd启动问题
- Nginx缓存配置及nginx ngx_cache_purge模块的使用
- OpenResty—专注高性能服务开发
- Linux platform驱动代码编写
- 20151029 内存结构,进程结构,Linux解压
- 安装linux系统之RHEL7或CENTOS7系统(完整版)
- Linux HA (一)
- Top与ROW_NUMBER
- Java调用hdfs出现java.lang.VerifyError: class org.apache.hadoop.hdfs.protocol.proto异常