您的位置:首页 > 运维架构

spatialhadoop2.3源码阅读(八) RTree索引生成方法(一)

2015-12-21 19:59 330 查看
SpatialHadoop的索引生成类为edu.umn.cs.spatialHadoop.operations.Repartition。该类的main方法,repartition方法以及repartitionMapReduce的第一部分和第三部分,均与spatialhadoop2.3源码阅读(五)
grid 索引生成方法(一)中介绍的相同,本文重点介绍repartitionMapReduce的第二部分,具体代码如下:

/**
* Create rectangles that together pack all points in sample such that
* each rectangle contains roughly the same number of points. In other words
* it tries to balance number of points in each rectangle.
* Works similar to the logic of bulkLoad but does only one level of
* rectangles.
* @param samples
* @param gridInfo - Used as a hint for number of rectangles per row or column
* @return
*/
public static Rectangle[] packInRectangles(GridInfo gridInfo, final Point[] sample) {
Rectangle[] rectangles = new Rectangle[gridInfo.columns * gridInfo.rows];
int iRectangle = 0;
// Sort in x direction
final IndexedSortable sortableX = new IndexedSortable() {
@Override
public void swap(int i, int j) {
Point temp = sample[i];
sample[i] = sample[j];
sample[j] = temp;
}

@Override
public int compare(int i, int j) {
if (sample[i].x < sample[j].x)
return -1;
if (sample[i].x > sample[j].x)
return 1;
return 0;
}
};

// Sort in y direction
final IndexedSortable sortableY = new IndexedSortable() {
@Override
public void swap(int i, int j) {
Point temp = sample[i];
sample[i] = sample[j];
sample[j] = temp;
}

@Override
public int compare(int i, int j) {
if (sample[i].y < sample[j].y)
return -1;
if (sample[i].y > sample[j].y)
return 1;
return 0;
}
};

final QuickSort quickSort = new QuickSort();

quickSort.sort(sortableX, 0, sample.length);
for(int i = 0;i < sample.length;i++){
System.out.println(sample[i]);
}
int xindex1 = 0;
double x1 = gridInfo.x1;
for (int col = 0; col < gridInfo.columns; col++) {
int xindex2 = sample.length * (col + 1) / gridInfo.columns;

// Determine extents for all rectangles in this column
double x2 = col == gridInfo.columns - 1 ?
gridInfo.x2 : sample[xindex2-1].x;

// Sort all points in this column according to its y-coordinate
quickSort.sort(sortableY, xindex1, xindex2);

// Create rectangles in this column
double y1 = gridInfo.y1;
for (int row = 0; row < gridInfo.rows; row++) {
int yindex2 = xindex1 + (xindex2 - xindex1) * (row + 1) / gridInfo.rows;
double y2 = row == gridInfo.rows - 1 ? gridInfo.y2 : sample[yindex2 - 1].y;

rectangles[iRectangle++] = new Rectangle(x1, y1, x2, y2);
y1 = y2;
}

xindex1 = xindex2;
x1 = x2;
}
return rectangles;
}


12行:new出最后的返回值

15-50:定义排序函数

52-57:对采样的所有点的x坐标进行由小到大排序

60:最外层循环遍历x轴上的每一列

61:将所有的点按照columns均分,即将有序的x坐标分为columns份,在循环中对每一份进行处理。每一次处理xindex1
到 xindex2之间的点(xindex1,xindex2为
sample数组的索引)

64:得出索引xindex2对应的x坐标

68:将xindex1
到 xindex2之间的点按照y坐标进行由小到大排序

72:内层循环遍历y轴上的每一行

73:将xindex1
到 xindex2之间的点按照rows进行均分,即将有序的y坐标分为rows分,在循环中对每一份进行处理。每一次处理yindex1
到 yindex2之间的点

74:得出索引yindex2对应的y坐标,至此已获得x1,x2,y1,y2

76:得出当前网格的(x1,y1)-(x2,y2)

整个算法大概为:先将所有点按照x坐标由小到大排序,然后等分,再将等分后的每一部分按照y坐标由小到大排序,再等分,算出每一份即每一个网格的点数,因为点已经排序,所以可以得到该网格内的最小x1,y1,最大x2,y2.这个网格就可以用该(x1,y1)-(x2,y2)描述。
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: