大数据离线计算Hadoop2.x 学习笔记(3)- HDFS写入分析 和 MR
2019-01-14 09:28
786 查看
版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/u012292754/article/details/85599316
1 HDFS 写入分析
public class TestHDFS { public void testWrite() throws IOException { Configuration conf = new Configuration(); FileSystem fs = FileSystem.get(conf); Path path = new Path("/test/a.txt"); FSDataOutputStream fout = fs.create(path); fout.write("Hello World".getBytes()); fout.close(); } }
2 MR
file:///d:/wc d前面的 / 代表根目录
2.1 单词统计
WCMapper.java
package mr; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Mapper; import java.io.IOException; public class WCMapper extends Mapper<LongWritable, Text, Text, IntWritable> { Text keyOut = new Text(); IntWritable valueOut = new IntWritable(); @Override protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String[] arr = value.toString().split(" "); for (String s : arr) { keyOut.set(s); valueOut.set(1); context.write(keyOut,valueOut); } } }
WCReducer.java
package mr; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Reducer; import java.io.IOException; public class WCReducer extends Reducer<Text, IntWritable, Text, IntWritable> { @Override protected void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { int count = 0; for (IntWritable iw : values) { count = count + iw.get(); } context.write(key,new IntWritable(count)); } }
WCApp.java
package mr; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import java.io.IOException; public class WCApp { public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException { Configuration conf = new Configuration(); Job job = Job.getInstance(conf); FileInputFormat.addInputPath(job,new Path(args[0])); FileOutputFormat.setOutputPath(job,new Path(args[1])); job.setJobName("WCApp"); job.setJarByClass(WCApp.class); job.setMapperClass(WCMapper.class); job.setMapOutputKeyClass(Text.class); job.setMapOutputValueClass(IntWritable.class); job.setReducerClass(WCReducer.class); job.setNumReduceTasks(1); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); job.waitForCompletion(false); } }
相关文章推荐
- 云计算学习笔记004---hadoop的简介,以及安装,用命令实现对hdfs系统进行文件的上传下载
- 文件数据云计算学习笔记---Hadoop HDFS和MapReduce 架构浅析
- Hadoop学习笔记——1.java读取Oracle中表的数据,创建新文件写入Hdfs
- 云计算学习笔记---异常处理---hadoop问题处理ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.lang.NullPoin
- 云计算学习笔记005---Hadoop HDFS和MapReduce 架构浅析
- Hadoop学习笔记(九)HDFS架构分析
- Hadoop HDFS源码学习笔记(二)
- Hadoop学习笔记—20.网站日志分析项目案例(二)数据清洗
- Hadoop学习笔记之---HDFS
- Hadoop学习笔记:HDFS的java API使用
- Hadoop学习笔记之初步了解HDFS
- Hadoop学习笔记(二)——HDFS
- 大数据学习笔记之二十六 HDFS Hadoop File System
- 2014-11-09---Hadoop的基础学习(二)----HDFS的特性和JavaAPI源码分析
- Hadoop学习笔记: HDFS
- Hadoop学习笔记(一)HBase脚本分析(二)hbase-daemon.sh
- Hadoop学习笔记 --- HDFS架构笔记
- Hadoop学习笔记—20.网站日志分析项目案例(三)统计分析
- Hadoop源码分析笔记(五):HDFS特点和体系结构
- hadoop学习笔记(HDFS)