您的位置：首页 > 其它

MapReduce并行创建反向索引

2015-10-19 11:25 411 查看

使用Mapreduce可以并行的创建反向索引。假如你输入的是文本文件，输出是元组列表，每个元组由一个数据和包含该数据的文件列表组成。常规处理办法需要将这些数据连接在一起，而且是在内存中执行连接操作。但是有大量数据执行操作的话，将可能消耗掉内存，也可以使用数据库中介存储工具，但是这样会降低运行效率。

更好的方法是标记每行，并生成每行只包含一个数据的中间文件，然后对这些中间文件进行排序，最后打出所有被排序的中间文件，并对每个单独的数据调用一个函数。Mapreduce采用的就是这个方法，其代码如下：

public static class Map extends Mapper<LongWritable, Text, Text, Text>{

private Text documentID;

private Text word = new Text();

@Override

protected void setup(Context context){

String filename = ((FileSplit) context,getInputSplit()).getPath().getName();

documentID = new Text(filename);

}

@Override

protected void map(LongWritable key, Text value,Context context)

throws IOException, InterruptedException{

for(String token:StringUtils.split(value.toString())){

word.set(token);

context.write(word, documentID);

}}}

public static class Reduce extends Reducer<Text, Text, Text, Text>{

private Text docIds = new Text();

public void reduce(Text key, Iterable<Text> values, Context context) throws IOException , InterruptedException{

HashSet<Text> uniqueDocIds = new HashSet<Text>();

for(Text docId : values){

uniqueDocIds.add(new Text(docId));

}

docIds.set(new Text(StringUtils.join(uniqueDocIds, ",")));

context.write(key,docIds);

}}

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航