您的位置:首页 > 运维架构

Hadoop源码学习之-----Mapreduce输入流:InputFormat,InputSplit,RecordReader

2016-09-14 17:35 477 查看

Mapreduce 输入流



abstract class inputSplit

abstract class InputFormat:

getSplits : split the input files/db/sequecefiles
createRecordReader  : return the RecordReader of one of split of splits


FileInputFormat

TextInputFormat extends FileInputFormat

//计算SplitFile的大小
protected long computeSplitSize( long blockSize , long minSize,
long maxSize ) {
return Math.max(minSize , Math.min( maxSize, blockSize));

createRecordReader:

String delimiter = context.getConfiguration().get(
"textinputformat.record.delimiter" );
byte[] recordDelimiterBytes = null ;
if (null != delimiter)
recordDelimiterBytes = delimiter.getBytes(Charsets. UTF_8);
return new LineRecordReader(recordDelimiterBytes );


LineRecordReader:

// We always read one extra line, which lies outside the upper
// split limit i.e. (end - 1)
while (getFilePosition() <= end || in.needAdditionalRecordAfterSplit()) {


LineReader:

private int bufferSize = DEFAULT_BUFFER_SIZE;
private InputStream in;
private byte[] buffer ;
private int readCustomLine(Text str, int maxLineLength , int maxBytesToConsume)
//一个一个buffer读
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: