一步一步编译运行wordcount.java
2013-10-12 15:58
330 查看
WordCount是学习Hadoop的经典入门范例。下面通过一步步的操作,来编译、打包、运行WordCount程序。
1、在Hadoop 1.0.4的解压目录的如下位置可以找到WordCount.java的源文件
src/examples/org/apache/hadoop/examples/WordCount.java
2、新建一个dev的文件夹,将WordCount.java拷贝至dev/wordcount文件夹下
1、在Hadoop 1.0.4的解压目录的如下位置可以找到WordCount.java的源文件
src/examples/org/apache/hadoop/examples/WordCount.java
2、新建一个dev的文件夹,将WordCount.java拷贝至dev/wordcount文件夹下
ubuntu@ubuntu:~/dev/wordcount$ pwd /home/ubuntu/dev/wordcount ubuntu@ubuntu:~/dev/wordcount$ ls bin compile.txt WordCount.java3、在dev/wordcount文件夹下创建一个bin文件夹,并将编译WordCount.java得到的class文件生成至bin文件夹下
javac -classpath /home/ubuntu/hadoop-1.0.4/hadoop-core-1.0.4.jar:/home/ubuntu/hadoop-1.0.4/lib/commons-cli-1.2.jar -d bin WordCount.java4、将生成的class文件打包成jar包
jar -cvf WordCount.jar *.class5、在bin下新建一个input文件夹,并生成两个输入文件
ubuntu@ubuntu:~/dev/wordcount/bin/input$ ls words-1.txt words-2.txt ubuntu@ubuntu:~/dev/wordcount/bin/input$ cat words-1.txt i am a student! how are you? my name is lily. ubuntu@ubuntu:~/dev/wordcount/bin/input$ cat words-2.txt i am a student! how are you? she is lily he is my brother ubuntu@ubuntu:~/dev/wordcount/bin/input$6、在hdfs上创建input和output文件夹,并将两个输入文件上传至input文件夹
ubuntu@ubuntu:~/dev/wordcount/bin$ hadoop fs -mkdir /tmp/input ubuntu@ubuntu:~/dev/wordcount/bin$ hadoop fs -mkdir /tmp/output ubuntu@ubuntu:~/dev/wordcount/bin/input$ hadoop fs -put words-1.txt /tmp/input ubuntu@ubuntu:~/dev/wordcount/bin/input$ hadoop fs -put words-2.txt /tmp/input7、运行WordCount程序
ubuntu@ubuntu:~/dev/wordcount/bin$ hadoop jar WordCount.jar WordCount /tmp/input /tmp/output/result 13/01/24 08:09:37 INFO input.FileInputFormat: Total input paths to process : 2 13/01/24 08:09:38 INFO util.NativeCodeLoader: Loaded the native-hadoop library 13/01/24 08:09:38 WARN snappy.LoadSnappy: Snappy native library not loaded 13/01/24 08:09:38 INFO mapred.JobClient: Running job: job_201301240711_0003 13/01/24 08:09:39 INFO mapred.JobClient: map 0% reduce 0% 13/01/24 08:10:13 INFO mapred.JobClient: map 100% reduce 0% 13/01/24 08:10:34 INFO mapred.JobClient: map 100% reduce 100% 13/01/24 08:10:39 INFO mapred.JobClient: Job complete: job_201301240711_0003 13/01/24 08:10:39 INFO mapred.JobClient: Counters: 29 13/01/24 08:10:39 INFO mapred.JobClient: Job Counters 13/01/24 08:10:39 INFO mapred.JobClient: Launched reduce tasks=1 13/01/24 08:10:39 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=56253 13/01/24 08:10:39 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 13/01/24 08:10:39 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 13/01/24 08:10:39 INFO mapred.JobClient: Launched map tasks=2 13/01/24 08:10:39 INFO mapred.JobClient: Data-local map tasks=2 13/01/24 08:10:39 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=18108 13/01/24 08:10:39 INFO mapred.JobClient: File Output Format Counters 13/01/24 08:10:39 INFO mapred.JobClient: Bytes Written=96 13/01/24 08:10:39 INFO mapred.JobClient: FileSystemCounters 13/01/24 08:10:39 INFO mapred.JobClient: FILE_BYTES_READ=251 13/01/24 08:10:39 INFO mapred.JobClient: HDFS_BYTES_READ=320 13/01/24 08:10:39 INFO mapred.JobClient: FILE_BYTES_WRITTEN=65235 13/01/24 08:10:39 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=96 13/01/24 08:10:39 INFO mapred.JobClient: File Input Format Counters 13/01/24 08:10:39 INFO mapred.JobClient: Bytes Read=104 13/01/24 08:10:39 INFO mapred.JobClient: Map-Reduce Framework 13/01/24 08:10:39 INFO mapred.JobClient: Map output materialized bytes=257 13/01/24 08:10:39 INFO mapred.JobClient: Map input records=7 13/01/24 08:10:39 INFO mapred.JobClient: Reduce shuffle bytes=257 13/01/24 08:10:39 INFO mapred.JobClient: Spilled Records=48 13/01/24 08:10:39 INFO mapred.JobClient: Map output bytes=204 13/01/24 08:10:39 INFO mapred.JobClient: CPU time spent (ms)=7650 13/01/24 08:10:39 INFO mapred.JobClient: Total committed heap usage (bytes)=247275520 13/01/24 08:10:39 INFO mapred.JobClient: Combine input records=25 13/01/24 08:10:39 INFO mapred.JobClient: SPLIT_RAW_BYTES=216 13/01/24 08:10:39 INFO mapred.JobClient: Reduce input records=24 13/01/24 08:10:39 INFO mapred.JobClient: Reduce input groups=15 13/01/24 08:10:39 INFO mapred.JobClient: Combine output records=24 13/01/24 08:10:39 INFO mapred.JobClient: Physical memory (bytes) snapshot=301699072 13/01/24 08:10:39 INFO mapred.JobClient: Reduce output records=15 13/01/24 08:10:39 INFO mapred.JobClient: Virtual memory (bytes) snapshot=1129721856 13/01/24 08:10:39 INFO mapred.JobClient: Map output records=25 ubuntu@ubuntu:~/dev/wordcount/bin$8、查看运行结果
ubuntu@ubuntu:~$ hadoop fs -ls /tmp/output/result Found 3 items -rw-r--r-- 1 ubuntu supergroup 0 2013-01-24 08:10 /tmp/output/result/_SUCCESS drwxr-xr-x - ubuntu supergroup 0 2013-01-24 08:09 /tmp/output/result/_logs -rw-r--r-- 1 ubuntu supergroup 96 2013-01-24 08:10 /tmp/output/result/part-r-00000 ubuntu@ubuntu:~$ hadoop fs -cat /tmp/output/result/part-r-00000 a 2 am 2 are 2 brother 1 he 1 how 2 i 2 is 3 lily 1 lily. 1 my 2 name 1 she 1 student! 2 you? 2 ubuntu@ubuntu:~$
相关文章推荐
- hadoop2.7.3 编译运行WordCount.java
- 修改《Spark快速大数据分析》的WordCount.java,使它通过maven编译,并在Spark2上运行
- hadoop2.7.3 编译运行WordCount.java
- hadoop2.7.3 编译运行WordCount.java
- hadoop2.7.3 编译运行WordCount.java
- hadoop2.7.3 编译运行WordCount.java
- hadoop2.7.3 编译运行WordCount.java
- hadoop2.7.3 编译运行WordCount.java
- hadoop2.7.3 编译运行WordCount.java
- 命令行编译运行WordCount.java
- hadoop2.7.3 编译运行WordCount.java
- hadoop2.7.3 编译运行WordCount.java
- Hadoop MapReduce示例程序WordCount.java手动编译运行解析
- hadoop2.7.3 编译运行WordCount.java
- Hadoop MapReduce示例程序WordCount.java手动编译运行解析
- hadoop2.7.3 编译运行WordCount.java
- hadoop2.7.3 编译运行WordCount.java
- Linux CentOS 7下在Hadoop2.7.3全分布式环境编译运行WordCount.java
- hadoop2.7.3 编译运行WordCount.java
- eclipse打包jar发布到linux下运行出错(java.lang.ClassNotFoundException: cmd.WordCount$MyMapper )