您的位置：首页 > 其它

hbase region lookups流程以及rpc线程卡死问题分析

2012-12-02 22:31 691 查看

未完待更新

1. hbase分片后的数据查找依靠 region lookups 完成：

1）

ZooKeeper中/hbase/root-region-server保存了-ROOT-表所在的服务器地址

2）hbase采用两张系统表来支持分片数据查找

-ROOT-表（.META.表所在的服务器地址, 一般系统只有一行记录）

.META.表（用户表所在的服务器地，如:'table_test_a'各分片所在的服务器地址）

2. hbase region lookups流程（以在'table_test_a'表, rowkey为'123456'的get为例子）

1）

首先， hbase client到ZooKeeper中获取-ROOT-表的所在位置

2）

再上-ROOT-表所在服务器查询-ROOT-表的.META.表所在的服务器地址

3）

再上.META.表所在的服务器查询用户表'table_test_a'分片信息

4）

根据用户表'table_test_a'分片信息，计算出rowkey为'123456'所在的服务地址

（备注：为了提升性能，hbase client端会缓存分片信息）

5）代码入口： HRegionLocation getRegionLocation(byte [] tableName, byte [] row, boolean reload)

6) HBase Definitive Guide : Region Lookups 图

3. hbase client rpc线程卡死问题分析，已在0.94.1中修复（0.91, 0.92 版本 fix了很多0.94.0中的bug）。

1）hbase region lookups 死锁问题

详细见 @慢半拍de刀刀博客 http://www.cnblogs.com/shenguanpu/archive/2012/12/02/2798217.html
2）两个patch地址，

通过避免嵌套重试循环来解决rpc线程卡死: https://issues.apache.org/jira/browse/HBASE-6326
通过等待-root-的region地址设置到root region tracker 来避免deadlock问题： https://issues.apache.org/jira/browse/HBASE-6115
4. HTabe.get ： retry connect引发rpc线程卡死问题

1）retry connect ： withRetries

public Result get(final Get get) throws IOException {
return new ServerCallable<Result>(connection, tableName, get.getRow(), operationTimeout) {
public Result call() throws IOException {
return server.get(location.getRegionInfo().getRegionName(), get);
}
}.withRetries();
}

2）region lookups获取location，并根据location连接上region server。

/**
* Connect to the server hosting region with row from tablename.
* @param reload Set this to true if connection should re-find the region
* @throws IOException e
*/
public void connect(final boolean reload) throws IOException {
this.location = connection.getRegionLocation(tableName, row, reload);
this.server = connection.getHRegionConnection(location.getHostname(),
location.getPort());
}

3） getRegionLocation 调用locateRegion('用户表') , locateRegion('用户表')的核心流程是去.META.表中查询某用户表分片信息, 从而调用到locateRegionInMeta

private HRegionLocation locateRegion(final byte [] tableName,
final byte [] row, boolean useCache)
throws IOException {
....
if (Bytes.equals(tableName, HConstants.ROOT_TABLE_NAME)) {
...
ServerName servername = this.rootRegionTracker.getRootRegionLocation();
...
} ...
} else {
// Region not in the cache - have to go to the meta RS
return locateRegionInMeta(HConstants.META_TABLE_NAME, tableName, row,
useCache, userRegionLock);
}
}

4）完整一次的'hbase region lookups流程', 会进入locateRegion('-ROOT-') 从而调用到 rootRegionTracker.getRootRegionLocation

(retry流程不会取cache数据，而是先对regionLockObject加锁并prefetchRegionCache的metaScan, MetaScanner是完整一次的'hbase region lookups流程' )

/*
* Search one of the meta tables (-ROOT- or .META.) for the HRegionLocation
* info that contains the table and row we're seeking.
*/
private HRegionLocation locateRegionInMeta(final byte [] parentTable,
final byte [] tableName, final byte [] row, boolean useCache,
Object regionLockObject)
throws IOException {
......

// This block guards against two threads trying to load the meta
// region at the same time. The first will load the meta region and
// the second will use the value that the first one found.
synchronized (regionLockObject) {
// If the parent table is META, we may want to pre-fetch some
// region info into the global region cache for this table.
if (Bytes.equals(parentTable, HConstants.META_TABLE_NAME) &&
(getRegionCachePrefetch(tableName)) )  {
prefetchRegionCache(tableName, row);
}
......
}
......
}
}

4）完整一次的'hbase region lookups流程', 会进入locateRegion('-ROOT-') 从而调用到 rootRegionTracker.getRootRegionLocation

private HRegionLocation locateRegion(final byte [] tableName,
final byte [] row, boolean useCache)
throws IOException {
....
if (Bytes.equals(tableName, HConstants.ROOT_TABLE_NAME)) {
...
ServerName servername = this.rootRegionTracker.getRootRegionLocation();
...
} ...
} else {
// Region not in the cache - have to go to the meta RS
return locateRegionInMeta(HConstants.META_TABLE_NAME, tableName, row,
useCache, userRegionLock);
}
}

5）RootRegionTracker 的getData，从zookeeper取数据引发异常，从而 abort流程

public synchronized byte [] getData(boolean refresh) {
if (refresh) {
try {
this.data = ZKUtil.getDataAndWatch(watcher, node);
} catch(KeeperException e) {
abortable.abort("Unexpected exception handling getData", e);
}
}
return this.data;
}

6） abort流程（deadlock bug所在: abortable实际为HConnectionManager.HConnectionImplementation对象）

private synchronized void ensureZookeeperTrackers()
throws ZooKeeperConnectionException {
...
if (rootRegionTracker == null) {
rootRegionTracker = new RootRegionTracker(zooKeeper, this);
rootRegionTracker.start();
}
}

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航