Hadoop源码学习之-----Mapreduce输入流:InputFormat,InputSplit,RecordReader
2016-09-14 17:35
477 查看
Mapreduce 输入流
abstract class inputSplit
abstract class InputFormat:
getSplits : split the input files/db/sequecefiles createRecordReader : return the RecordReader of one of split of splits
FileInputFormat
TextInputFormat extends FileInputFormat
//计算SplitFile的大小 protected long computeSplitSize( long blockSize , long minSize, long maxSize ) { return Math.max(minSize , Math.min( maxSize, blockSize)); createRecordReader: String delimiter = context.getConfiguration().get( "textinputformat.record.delimiter" ); byte[] recordDelimiterBytes = null ; if (null != delimiter) recordDelimiterBytes = delimiter.getBytes(Charsets. UTF_8); return new LineRecordReader(recordDelimiterBytes );
LineRecordReader:
// We always read one extra line, which lies outside the upper // split limit i.e. (end - 1) while (getFilePosition() <= end || in.needAdditionalRecordAfterSplit()) {
LineReader:
private int bufferSize = DEFAULT_BUFFER_SIZE; private InputStream in; private byte[] buffer ; private int readCustomLine(Text str, int maxLineLength , int maxBytesToConsume) //一个一个buffer读
相关文章推荐
- hadoop学习;自定义Input/OutputFormat;类引用mapreduce.mapper;三种模式
- hadoop学习;自己定义Input/OutputFormat;类引用mapreduce.mapper;三种模式
- Hadoop2.6.0学习笔记(五)自定义InputFormat和RecordReader
- Hadoop MapReduce编程模型之InputFormat接口学习
- Hadoop MapReduce处理海量小文件:自定义InputFormat和RecordReader
- hadoop学习;自定义Input/OutputFormat;类引用mapreduce.mapper;三种模式
- [Hadoop源码解读](一)MapReduce篇之InputFormat
- Hadoop源码解析之: TextInputFormat如何处理跨split的行
- [Hadoop源码解读](一)MapReduce篇之InputFormat
- [Hadoop源码详解]之一MapReduce篇之InputFormat
- [Hadoop源码解读](一)MapReduce篇之InputFormat
- Hadoop-2.4.1学习之NameNode -format源码分析
- Hadoop源码解析之: TextInputFormat如何处理跨split的行
- 源码分析Hadoop FileInputFormat如何分片
- [Hadoop源码解读](一)MapReduce篇之InputFormat
- 自定义 hadoop MapReduce InputFormat 切分输入文件
- Hadoop源码解析之: TextInputFormat如何处理跨split的行
- [Hadoop源码详解]之一MapReduce篇之InputFormat
- [Hadoop源码解读](一)MapReduce篇之InputFormat
- [Hadoop源码解读](一)MapReduce篇之InputFormat