MapReduce的KeyValueTextInputFormat
2015-07-26 15:24
267 查看
如果行中有分隔符,那么分隔符前面的作为key,后面的作为value;如果没有分隔符,那么整行作为key,value为空
当输入数据的每一行是两列,并用tab分离的形式的时候,KeyValueTextInputformat处理这种格式的文件非常适合。
代码示例:
当输入数据的每一行是两列,并用tab分离的形式的时候,KeyValueTextInputformat处理这种格式的文件非常适合。
代码示例:
package com.bigdata.hadoop.mapred; import java.io.IOException; import java.net.URI; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.input.KeyValueLineRecordReader; import org.apache.hadoop.mapreduce.lib.input.KeyValueTextInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; public class MyKeyValueTextInputFormatApp { private static final String INPUT_PATH = "hdfs://hadoop1:9000/dir1/hello"; private static final String OUTPUT_PATH = "hdfs://hadoop1:9000/dir1/out"; public static void main(String[] args) throws Exception { Configuration configuration = new Configuration(); //分隔符默认是\t configuration.set(KeyValueLineRecordReader.KEY_VALUE_SEPERATOR, "\t"); Job job = new Job(configuration,MyKeyValueTextInputFormatApp.class.getSimpleName()); final FileSystem fileSystem = FileSystem.get(new URI(OUTPUT_PATH), configuration); fileSystem.delete(new Path(OUTPUT_PATH),true); job.setJarByClass(MyKeyValueTextInputFormatApp.class); FileInputFormat.setInputPaths(job, INPUT_PATH); //指定使用KeyValueTextInputFormat解析内容 分隔key和value的分隔符默认是\t job.setInputFormatClass(KeyValueTextInputFormat.class); job.setMapperClass(MyMapper.class); job.setMapOutputKeyClass(Text.class); job.setMapOutputValueClass(LongWritable.class); job.setNumReduceTasks(0); FileOutputFormat.setOutputPath(job, new Path(OUTPUT_PATH)); job.waitForCompletion(true); } //做简单输出 public static class MyMapper extends Mapper<Text, Text, Text, LongWritable>{ @Override protected void map(Text key, Text value, Mapper<Text, Text, Text, LongWritable>.Context context) throws IOException, InterruptedException { context.write(new Text(key), new LongWritable(1)); context.write(new Text(value), new LongWritable(1)); } } }
相关文章推荐
- HDU5312 Sequence
- [转] Compile、Make和Build的区别
- Rescue zoj1649 优先队列
- UILabel设置多种字体、颜色
- HDU 1047 Integer Inquiry【大数】
- [多校2015.01.1010 容斥+迭代] hdu 5297 Y sequence
- 01-复杂度2. Maximum Subsequence Sum (25)
- iOS Human Interface Guidelines(原创翻译)第三章
- hdu 1941 Justice League 无向完全图
- UVa 11235 FrequentValues(RMQ)
- iOS学习之UIPickerView控件的关联选择
- 使用segue时如何实现login的判断
- iOS UINavigationController与UITabBarController的组合使用
- UINavigationController within a UITabBarController, setting the navig
- 在UINavigationController 中增加 UITabBarController 然后设置navigatio
- 使用segue时实现login的判断,判断正确了才切换到下个视图
- Demo and Test Setup Guide - OFBiz Project Administration Workspac
- hdu1005 Number Sequence
- 用Android手机通过蓝牙模块HC-06连接Arduino串口输出
- BestCoder 1st Anniversary ($) 第三题 Sequence