您的位置:首页 > 大数据 > Hadoop

[hadoop2.7.1]I/O之SequenceFile最新API编程实例(写入、读取)

2020-04-22 01:41 676 查看


写操作


根据上一篇的介绍,在hadoop2.x之后,hadoop中的SequenceFile.Writer将会逐渐摒弃大量的createWriter()重载方法,而整合为更为简洁的

createWriter()
方法,除了配置参数外,其他的参数统统使用SequenceFile.Writer.Option来替代,具体有:


新的API里提供的option参数:


FileOption
FileSystemOption
StreamOption
BufferSizeOption
BlockSizeOption
ReplicationOption
KeyClassOption
ValueClassOption
MetadataOption
ProgressableOption
CompressionOption

这些参数能够满足各种不同的需要,参数之间不存在顺序关系,这样减少了代码编写工作量,更为直观,便于理解,下面先来看看这个方法,后边将给出一个具体实例。

  • createWriter

    public static org.apache.hadoop.io.SequenceFile.Writer createWriter(Configuration conf,
    org.apache.hadoop.io.SequenceFile.Writer.Option... opts)
    throws IOException
    Create a new Writer with the given options.
    Parameters:
    conf
    - the configuration to use
    opts
    - the options to create the file with
    Returns:
    a new Writer
    Throws:
    IOException

权威指南第四版中提供了一个SequenceFileWriteDemo实例:

// cc SequenceFileWriteDemo Writing a SequenceFile
import java.io.IOException;
import java.net.URI;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IOUtils;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.SequenceFile;
import org.apache.hadoop.io.Text;

// vv SequenceFileWriteDemo
public class SequenceFileWriteDemo {

private static final String[] DATA = {
"One, two, buckle my shoe",
"Three, four, shut the door",
"Five, six, pick up sticks",
"Seven, eight, lay them straight",
"Nine, ten, a big fat hen"
};

public static void main(String[] args) throws IOException {
String uri = args[0];
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(URI.create(uri), conf);
Path path = new Path(uri);

IntWritable key = new IntWritable();
Text value = new Text();
SequenceFile.Writer writer = null;
try {
writer = SequenceFile.createWriter(fs, conf, path,
key.getClass(), value.getClass());

for (int i = 0; i < 100; i++) {
key.set(100 - i);
value.set(DATA[i % DATA.length]);
System.out.printf("[%s]\t%s\t%s\n", writer.getLength(), key, value);
writer.append(key, value);
}
} finally {
IOUtils.closeStream(writer);
}
}
}
// ^^ SequenceFileWriteDemo
[/code]

对于上面实例中的
createWriter()
方法用整合之后的最新的方法来改写一下,代码如下:

package org.apache.hadoop.io;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IOUtils;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.SequenceFile;
import org.apache.hadoop.io.SequenceFile.Writer;
import org.apache.hadoop.io.SequenceFile.Writer.FileOption;
import org.apache.hadoop.io.SequenceFile.Writer.KeyClassOption;
import org.apache.hadoop.io.SequenceFile.Writer.ValueClassOption;
import org.apache.hadoop.io.Text;

public class THT_testSequenceFile2 {

private static final String[] DATA = { "One, two, buckle my shoe",
"Three, four, shut the door", "Five, six, pick up sticks",
"Seven, eight, lay them straight", "Nine, ten, a big fat hen" };

public static void main(String[] args) throws IOException {
// String uri = args[0];
String uri = "file:///D://B.txt";
Configuration conf = new Configuration();
Path path = new Path(uri);

IntWritable key = new IntWritable();
Text value = new Text();
SequenceFile.Writer writer = null;
SequenceFile.Writer.FileOption option1 = (FileOption) Writer.file(path);
SequenceFile.Writer.KeyClassOption option2 = (KeyClassOption) Writer.keyClass(key.getClass());
SequenceFile.Writer.ValueClassOption option3 = (ValueClassOption) Writer.valueClass(value.getClass());

try {

writer = SequenceFile.createWriter( conf, option1,option2,option3,Writer.compression(CompressionType.RECORD));

for (int i = 0; i < 10; i++) {
key.set(1 + i);
value.set(DATA[i % DATA.length]);
System.out.printf("[%s]\t%s\t%s\n", writer.getLength(), key,
value);
writer.append(key, value);
}
} finally {
IOUtils.closeStream(writer);
}
}
}
[/code]

运行结果如下:


2015-11-06 22:15:05,027 INFO  compress.CodecPool (CodecPool.java:getCompressor(153)) - Got brand-new compressor [.deflate]
[128]	1	One, two, buckle my shoe
[173]	2	Three, four, shut the door
[220]	3	Five, six, pick up sticks
[264]	4	Seven, eight, lay them straight
[314]	5	Nine, ten, a big fat hen
[359]	6	One, two, buckle my shoe
[404]	7	Three, four, shut the door
[451]	8	Five, six, pick up sticks
[495]	9	Seven, eight, lay them straight
[545]	10	Nine, ten, a big fat hen
[/code]

生成的文件:




读操作


新的API里提供的option参数:


FileOption -表示读哪个文件
InputStreamOption
StartOption
LengthOption -按照设置的长度变量来决定读取的字节
BufferSizeOption
OnlyHeaderOption


根据最新的API直接上源码:


package org.apache.hadoop.io;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IOUtils;
import org.apache.hadoop.io.SequenceFile;
import org.apache.hadoop.io.SequenceFile.Reader;
import org.apache.hadoop.io.Writable;
import org.apache.hadoop.util.ReflectionUtils;

public class THT_testSequenceFile3 {

public static void main(String[] args) throws IOException {
//String uri = args[0];
String uri = "file:///D://B.txt";
Configuration conf = new Configuration();
Path path = new Path(uri);
SequenceFile.Reader.Option option1 = Reader.file(path);
SequenceFile.Reader.Option option2 = Reader.length(174);//这个参数表示读取的长度

SequenceFile.Reader reader = null;
try {
reader = new SequenceFile.Reader(conf,option1,option2);
Writable key = (Writable) ReflectionUtils.newInstance(
reader.getKeyClass(), conf);
Writable value = (Writable) ReflectionUtils.newInstance(
reader.getValueClass(), conf);
long position = reader.getPosition();
while (reader.next(key, value)) {
String syncSeen = reader.syncSeen() ? "*" : "";
System.out.printf("[%s%s]\t%s\t%s\n", position, syncSeen, key,
value);
position = reader.getPosition(); // beginning of next record
}
} finally {
IOUtils.closeStream(reader);
}
}
}
[/code]

我这儿设置了一个读取长度的参数,只读到第174个字节那,所以运行结果如下:


2015-11-06 22:53:00,602 INFO  compress.CodecPool (CodecPool.java:getDecompressor(181)) - Got brand-new decompressor [.deflate]
[128]	1	One, two, buckle my shoe
[173]	2	Three, four, shut the door
[/code]






转载于:https://my.oschina.net/tht/blog/658519

  • 点赞
  • 收藏
  • 分享
  • 文章举报
chenxia5645 发布了0 篇原创文章 · 获赞 0 · 访问量 60 私信 关注
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: