hadoop入门之wordcount小案例
2016-08-20 10:51
465 查看
1.创建工程
2.建立工程目录
3.写java文件
3.1WCMapper.java
3.2WCReducer.java
3.3WordCount.java
4.将写好的文件打成jar包
project->e
4000
xport->JAR file->next->在jar file里选择要讲jar放在的目录->next->next->main Class 选择你要在此jar里配置的mian.class->finish
5.上传文件1.txt
6.两种执行jar文件的方式
6.1执行jar程序—工作里执行的方式
6.2执行程序的另外一种方法
7.完成wrodcount案例
查看文件的操作
查看文件的详情
删除文件夹下的所有的东西
将本地文件上传到HDFS文件系统
file->new->other->map/reduce->map/reduce project ->next->project name -->finish
2.建立工程目录
3.写java文件
3.1WCMapper.java
package hadoop.example.wordcount; import java.io.IOException; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Mapper; public class WCMapper extends Mapper<LongWritable,Text,Text,LongWritable>{ @Override protected void map(LongWritable key, Text value,Context context) throws IOException, InterruptedException { // TODO Auto-generated method stub //接收数据 String line = value.toString(); //切分数据 String[] words=line.split(" "); //循环所有数据 for(String w : words){ // 查询一个记一次 //new Text(w), new LongWritable(1) 将数据进行包装 context.write(new Text(w), new LongWritable(1)); } } }
3.2WCReducer.java
package hadoop.example.wordcount; import java.io.IOException; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Reducer; public class WCReducer extends Reducer<Text,LongWritable,Text,LongWritable >{ @Override protected void reduce(Text key, Iterable<LongWritable> values, Context context) throws IOException, InterruptedException { //接受数据 // Text key3=key; //定义一个计数器 long counter=0; //循values for(LongWritable l :values){ counter+=l.get(); } //输出 context.write(key, new LongWritable(counter)); } }
3.3WordCount.java
package hadoop.example.wordcount; import java.io.IOException; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; public class WordCount { public static void main(String args[]) throws IOException, ClassNotFoundException, InterruptedException{ //构建一个job对象 Job job=Job.getInstance(new Configuration()); //action :main 方法所在的类 job.setJarByClass(WordCount.class); //设置Mapper的相关属性 job.setMapperClass(WCMapper.class); job.setMapOutputKeyClass(Text.class); job.setMapOutputValueClass(LongWritable.class); FileInputFormat.setInputPaths(job, new Path("/root/workplace/hdfs/wdcount/1.txt")); //设置Reducer的相关的属性 job.setReducerClass(WCReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(LongWritable.class); FileOutputFormat.setOutputPath(job,new Path("/root/workplace/hdfs/wdcount/output")); //提交任务 //打印进度详情 job.waitForCompletion(true); } }
4.将写好的文件打成jar包
project->e
4000
xport->JAR file->next->在jar file里选择要讲jar放在的目录->next->next->main Class 选择你要在此jar里配置的mian.class->finish
5.上传文件1.txt
hello tom hello jerry hello kitty hello world hello tom 上传文件 [root@centos ~]# hadoop dfs -put /root/workplace/wdcount /root/workplace/hdfs Warning: $HADOOP_HOME is deprecated. 查看文件 [root@centos ~]# hadoop dfs -ls /root/workplace/hdfs/wdcount/1.txt Warning: $HADOOP_HOME is deprecated. Found 1 items -rw-r--r-- 1 root supergroup 57 2016-08-20 09:06 /root/workplace/hdfs/wdcount/1.txt 查看文件详情 [root@centos ~]# hadoop dfs -cat /root/workplace/hdfs/wdcount/1.txt Warning: $HADOOP_HOME is deprecated. hello tom hello jerry hello kitty hello world hello tom
6.两种执行jar文件的方式
6.1执行jar程序—工作里执行的方式
[root@centos wdcount]# hadoop jar /root/workplace/wdcount/wc.jar Warning: $HADOOP_HOME is deprecated. 16/08/20 09:20:34 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 16/08/20 09:20:35 INFO input.FileInputFormat: Total input paths to process : 1 16/08/20 09:20:35 INFO util.NativeCodeLoader: Loaded the native-hadoop library 16/08/20 09:20:35 WARN snappy.LoadSnappy: Snappy native library not loaded 16/08/20 09:20:35 INFO mapred.JobClient: Running job: job_201608192017_0008 16/08/20 09:20:36 INFO mapred.JobClient: map 0% reduce 0% 16/08/20 09:20:43 INFO mapred.JobClient: map 100% reduce 0% 16/08/20 09:20:52 INFO mapred.JobClient: map 100% reduce 33% 16/08/20 09:20:54 INFO mapred.JobClient: map 100% reduce 100% 16/08/20 09:20:56 INFO mapred.JobClient: Job complete: job_201608192017_0008 16/08/20 09:20:56 INFO mapred.JobClient: Counters: 29 16/08/20 09:20:56 INFO mapred.JobClient: Job Counters 16/08/20 09:20:56 INFO mapred.JobClient: Launched reduce tasks=1 16/08/20 09:20:56 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=8719 16/08/20 09:20:56 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 16/08/20 09:20:56 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 16/08/20 09:20:56 INFO mapred.JobClient: Launched map tasks=1 16/08/20 09:20:56 INFO mapred.JobClient: Data-local map tasks=1 16/08/20 09:20:56 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=10405 16/08/20 09:20:56 INFO mapred.JobClient: File Output Format Counters 16/08/20 09:20:56 INFO mapred.JobClient: Bytes Written=38 16/08/20 09:20:56 INFO mapred.JobClient: FileSystemCounters 16/08/20 09:20:56 INFO mapred.JobClient: FILE_BYTES_READ=162 16/08/20 09:20:56 INFO mapred.JobClient: HDFS_BYTES_READ=177 16/08/20 09:20:56 INFO mapred.JobClient: FILE_BYTES_WRITTEN=110365 16/08/20 09:20:56 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=38 16/08/20 09:20:56 INFO mapred.JobClient: File Input Format Counters 16/08/20 09:20:56 INFO mapred.JobClient: Bytes Read=57 16/08/20 09:20:56 INFO mapred.JobClient: Map-Reduce Framework 16/08/20 09:20:56 INFO mapred.JobClient: Map output materialized bytes=162 16/08/20 09:20:56 INFO mapred.JobClient: Map input records=5 16/08/20 09:20:56 INFO mapred.JobClient: Reduce shuffle bytes=162 16/08/20 09:20:56 INFO mapred.JobClient: Spilled Records=20 16/08/20 09:20:56 INFO mapred.JobClient: Map output bytes=136 16/08/20 09:20:56 INFO mapred.JobClient: Total committed heap usage (bytes)=158797824 16/08/20 09:20:56 INFO mapred.JobClient: CPU time spent (ms)=2290 16/08/20 09:20:56 INFO mapred.JobClient: Combine input records=0 16/08/20 09:20:56 INFO mapred.JobClient: SPLIT_RAW_BYTES=120 16/08/20 09:20:56 INFO mapred.JobClient: Reduce input records=10 16/08/20 09:20:56 INFO mapred.JobClient: Reduce input groups=5 16/08/20 09:20:56 INFO mapred.JobClient: Combine output records=0 16/08/20 09:20:56 INFO mapred.JobClient: Physical memory (bytes) snapshot=263684096 16/08/20 09:20:56 INFO mapred.JobClient: Reduce output records=5 16/08/20 09:20:56 INFO mapred.JobClient: Virtual memory (bytes) snapshot=3726540800 16/08/20 09:20:56 INFO mapred.JobClient: Map output records=10 ---------------------- 执行jar程序完成查看执行结果 [root@centos wdcount]# hadoop dfs -ls /root/workplace/hdfs/wdcount/output Warning: $HADOOP_HOME is deprecated. Found 3 items -rw-r--r-- 1 root supergroup 0 2016-08-20 09:20 /root/workplace/hdfs/wdcount/output/_SUCCESS drwxr-xr-x - root supergroup 0 2016-08-20 09:20 /root/workplace/hdfs/wdcount/output/_logs -rw-r--r-- 1 root supergroup 38 2016-08-20 09:20 /root/workplace/hdfs/wdcount/output/part-r-00000 [root@centos wdcount]# hadoop dfs -cat /root/workplace/hdfs/wdcount/output/part-r-00000 Warning: $HADOOP_HOME is deprecated. hello 5 jerry 1 kitty 1 tom 2 world 1 [root@centos wdcount]#
6.2执行程序的另外一种方法
控制台的输出 16/08/20 10:32:12 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 16/08/20 10:32:12 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 16/08/20 10:32:12 WARN mapred.JobClient: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String). 16/08/20 10:32:12 INFO input.FileInputFormat: Total input paths to process : 1 16/08/20 10:32:13 WARN snappy.LoadSnappy: Snappy native library not loaded 16/08/20 10:32:13 INFO mapred.JobClient: Running job: job_local288352385_0001 16/08/20 10:32:13 INFO mapred.LocalJobRunner: Waiting for map tasks 16/08/20 10:32:13 INFO mapred.LocalJobRunner: Starting task: attempt_local288352385_0001_m_000000_0 16/08/20 10:32:13 INFO util.ProcessTree: setsid exited with exit code 0 16/08/20 10:32:13 INFO mapred.Task: Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@68fa8cf9 16/08/20 10:32:13 INFO mapred.MapTask: Processing split: hdfs://localhost:9000/root/workplace/hdfs/wdcount/1.txt:0+57 16/08/20 10:32:14 INFO mapred.MapTask: io.sort.mb = 100 16/08/20 10:32:14 INFO mapred.MapTask: data buffer = 79691776/99614720 16/08/20 10:32:14 INFO mapred.MapTask: record buffer = 262144/327680 16/08/20 10:32:14 INFO mapred.MapTask: Starting flush of map output 16/08/20 10:32:14 INFO mapred.MapTask: Finished spill 0 16/08/20 10:32:14 INFO mapred.Task: Task:attempt_local288352385_0001_m_000000_0 is done. And is in the process of commiting 16/08/20 10:32:14 INFO mapred.LocalJobRunner: 16/08/20 10:32:14 INFO mapred.Task: Task 'attempt_local288352385_0001_m_000000_0' done. 16/08/20 10:32:14 INFO mapred.LocalJobRunner: Finishing task: attempt_local288352385_0001_m_000000_0 16/08/20 10:32:14 INFO mapred.LocalJobRunner: Map task executor complete. 16/08/20 10:32:14 INFO mapred.Task: Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@384f27a1 16/08/20 10:32:14 INFO mapred.LocalJobRunner: 16/08/20 10:32:14 INFO mapred.Merger: Merging 1 sorted segments 16/08/20 10:32:14 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 158 bytes 16/08/20 10:32:14 INFO mapred.LocalJobRunner: 16/08/20 10:32:14 INFO mapred.Task: Task:attempt_local288352385_0001_r_000000_0 is done. And is in the process of commiting 16/08/20 10:32:14 INFO mapred.LocalJobRunner: 16/08/20 10:32:14 INFO mapred.Task: Task attempt_local288352385_0001_r_000000_0 is allowed to commit now 16/08/20 10:32:14 INFO output.FileOutputCommitter: Saved output of task 'attempt_local288352385_0001_r_000000_0' to hdfs://localhost:9000/root/workplace/hdfs/wdcount/output 16/08/20 10:32:14 INFO mapred.LocalJobRunner: reduce > reduce 16/08/20 10:32:14 INFO mapred.Task: Task 'attempt_local288352385_0001_r_000000_0' done. 16/08/20 10:32:14 INFO mapred.JobClient: map 100% reduce 100% 16/08/20 10:32:14 INFO mapred.JobClient: Job complete: job_local288352385_0001 16/08/20 10:32:14 INFO mapred.JobClient: Counters: 22 16/08/20 10:32:14 INFO mapred.JobClient: File Output Format Counters 16/08/20 10:32:14 INFO mapred.JobClient: Bytes Written=38 16/08/20 10:32:14 INFO mapred.JobClient: File Input Format Counters 16/08/20 10:32:14 INFO mapred.JobClient: Bytes Read=57 16/08/20 10:32:14 INFO mapred.JobClient: FileSystemCounters 16/08/20 10:32:14 INFO mapred.JobClient: FILE_BYTES_READ=510 16/08/20 10:32:14 INFO mapred.JobClient: HDFS_BYTES_READ=114 16/08/20 10:32:14 INFO mapred.JobClient: FILE_BYTES_WRITTEN=136042 16/08/20 10:32:14 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=38 16/08/20 10:32:14 INFO mapred.JobClient: Map-Reduce Framework 16/08/20 10:32:14 INFO mapred.JobClient: Reduce input groups=5 16/08/20 10:32:14 INFO mapred.JobClient: Map output materialized bytes=162 16/08/20 10:32:14 INFO mapred.JobClient: Combine output records=0 16/08/20 10:32:14 INFO mapred.JobClient: Map input records=5 16/08/20 10:32:14 INFO mapred.JobClient: Reduce shuffle bytes=0 16/08/20 10:32:14 INFO mapred.JobClient: Physical memory (bytes) snapshot=0 16/08/20 10:32:14 INFO mapred.JobClient: Reduce output records=5 16/08/20 10:32:14 INFO mapred.JobClient: Spilled Records=20 16/08/20 10:32:14 INFO mapred.JobClient: Map output bytes=136 16/08/20 10:32:14 INFO mapred.JobClient: Total committed heap usage (bytes)=258482176 16/08/20 10:32:14 INFO mapred.JobClient: CPU time spent (ms)=0 16/08/20 10:32:14 INFO mapred.JobClient: Virtual memory (bytes) snapshot=0 16/08/20 10:32:14 INFO mapred.JobClient: SPLIT_RAW_BYTES=120 16/08/20 10:32:14 INFO mapred.JobClient: Map output records=10 16/08/20 10:32:14 INFO mapred.JobClient: Combine input records=0 16/08/20 10:32:14 INFO mapred.JobClient: Reduce input records=10
7.完成wrodcount案例
8.HDFS的一些基本操作
删除文件夹的操作[root@centos ~]# hadoop dfs -rmr /root/workplace/wdcount Warning: $HADOOP_HOME is deprecated. Deleted hdfs://localhost:9000/root/workplace/wdcount
查看文件的操作
[root@centos ~]# hadoop dfs -ls /root/workplace/hdfs/wdcount/1.txt Warning: $HADOOP_HOME is deprecated. Found 1 items -rw-r--r-- 1 root supergroup 57 2016-08-20 09:06 /root/workplace/hdfs/wdcount/1.txt
查看文件的详情
[root@centos ~]# hadoop dfs -cat /root/workplace/hdfs/wdcount/1.txt Warning: $HADOOP_HOME is deprecated. hello tom hello jerry hello kitty hello world hello tom
删除文件夹下的所有的东西
[root@centos ~]# hadoop dfs -rm /root/workplace/wdcount/* Warning: $HADOOP_HOME is deprecated. Deleted hdfs://localhost:9000/root/workplace/wdcount/1.txt Deleted hdfs://localhost:9000/root/workplace/wdcount/1.txt~ Deleted hdfs://localhost:9000/root/workplace/wdcount/wc.jar
将本地文件上传到HDFS文件系统
[root@centos ~]# hadoop dfs -put /root/workplace/wdcount/1.txt /root/workplace/hdfs/wdcount Warning: $HADOOP_HOME is deprecated. [root@centos ~]# hadoop dfs -ls /root/workplace/hdfs/wdcount Warning: $HADOOP_HOME is deprecated. Found 1 items -rw-r--r-- 1 root supergroup 57 2016-08-20 09:02 /root/workplace/hdfs/wdcount
相关文章推荐
- Hadoop入门案例(一) wordcount
- hadoop入门之利用hadoop来对文档数据归类统计案例wordcount
- Hadoop入门WordCount代码
- Hadoop入门概述-概念及WordCount实例详解
- Hadoop入门—Linux下伪分布式计算的安装与wordcount的实例展示
- hadoop入门-wordcount
- 第一个hadoop入门程序WordCount
- Hadoop入门实践之从WordCount程序说起
- Hadoop中自带的examples之wordcount应用案例
- MapReducer入门案例MyWordCount
- hadoop案例实现之WordCount (计算单词出现的频数)
- hadoop on yarn 入门系列1-伪分布式环境搭建并运行wordcount
- Hadoop入门经典:WordCount
- Hadoop入门经典:WordCount
- 用eclipse开发hadoop入门程序wordcount
- hadoop案例WordCount
- Hadoop入门经典:WordCount
- Hadoop入门经典:WordCount
- Hadoop入门—WordCount代码分析
- hadoop自带的wordcount小案例