您的位置:首页 > 运维架构

spatialhadoop2.3源码阅读(八) RTree索引生成方法(二)

2015-12-22 15:04 393 查看
这一章主要介绍MapReduce的具体实现。

1. Map

/**
* The map class maps each object to the cell with maximum overlap.
* @author Ahmed Eldawy
*
*/
public static class RepartitionMapNoReplication<T extends Shape> extends MapReduceBase
implements Mapper<Rectangle, T, IntWritable, T> {
/**List of cells used by the mapper*/
private CellInfo[] cellInfos;

/**Used to output intermediate records*/
private IntWritable cellId = new IntWritable();

@Override
public void configure(JobConf job) {
try {
cellInfos = SpatialSite.getCells(job);
super.configure(job);
} catch (IOException e) {
throw new RuntimeException("Error loading cells", e);
}
}

/**
* Map function
* @param dummy
* @param shape
* @param output
* @param reporter
* @throws IOException
*/
public void map(Rectangle cellMbr, T shape,
OutputCollector<IntWritable, T> output, Reporter reporter)
throws IOException {
Rectangle shape_mbr = shape.getMBR();
if (shape_mbr == null)
return;
double maxOverlap = -1.0;
int bestCell = -1;
// Only send shape to output if its lowest corner lies in the cellMBR
// This ensures that a replicated shape in an already partitioned file
// doesn't get send to output from all partitions
if (!cellMbr.isValid() || cellMbr.contains(shape_mbr.x1, shape_mbr.y1)) {
for (int cellIndex = 0; cellIndex < cellInfos.length; cellIndex++) {
Rectangle overlap = cellInfos[cellIndex].getIntersection(shape_mbr);
if (overlap != null) {
double overlapArea = overlap.getWidth() * overlap.getHeight();
if (bestCell == -1 || overlapArea > maxOverlap) {
maxOverlap = overlapArea;
bestCell = cellIndex;
}
}
}
}
if (bestCell != -1) {
cellId.set((int) cellInfos[bestCell].cellId);
output.collect(cellId, shape);
} else {
LOG.warn("Shape: "+shape+" doesn't overlap any partitions");
}
}
}
Map类大致可以分为两部分:configure方法和map方法。

configure方法的主要功能是获得上一章所讲的CellInfo数组。

接下来重点介绍map方法。

35行:获得输入数据的最小包围矩形。

43行:与spatialhadoop2.3源码阅读(六) grid 索引生成方法(二)中相比,当前cellMbr的值为Rectangle:
(NaN,0.0)-(0.0,0.0),所以if验证的前 半部分始终为true

44-51:遍历所有的网格,判断当前输入数据与哪一个网格的相交面积最大,则认为输入数据属于哪一个网格。

45:获得当前网格与输入数据的相交矩形,若不相交,则为null

47-50:若相交,与前一次的相交面积进行比较,若大于则更新。

55-57:将得出的当前输入数据所属的网格索引和输入数据一起输出。

2. Reduce

Reduce类与spatialhadoop2.3源码阅读(六) grid 索引生成方法(二)中的reduce完全相同,详情见该章。

3. outputCommiter

Commiter类与spatialhadoop2.3源码阅读(六)
grid 索引生成方法(二)中的Commiter完全相同,详情见该章。

commiter的作用是将生成的所有包含_master的文件合并为同一个,生成_master.grid文件
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: