您的位置：首页 > 其它

HBase 0.1.0 Put流程源码分析

2012-12-02 23:52 471 查看

put的过程相对比较简单，因为根据LSM-Tree的理论，写入操作会写入到内存中，然后再batch的写入磁盘。

HBase的实现也是如此。

首先，从客户端的batch操作中提取出所有的put操作并放在一个sortedmap中（localput）：

Text row = b.getRow();
long lockid = obtainRowLock(row);

long commitTime =
(timestamp == LATEST_TIMESTAMP) ? System.currentTimeMillis() : timestamp;

try {
List<Text> deletes = null;
for (BatchOperation op: b) {
HStoreKey key = new HStoreKey(row, op.getColumn(), commitTime);
byte[] val = null;
if (op.isPut()) {
val = op.getValue();
if (HLogEdit.isDeleted(val)) {
throw new IOException("Cannot insert value: " + val);
}
} else {
if (timestamp == LATEST_TIMESTAMP) {
// Save off these deletes
if (deletes == null) {
deletes = new ArrayList<Text>();
}
deletes.add(op.getColumn());
} else {
val = HLogEdit.deleteBytes.get();
}
}
if (val != null) {
localput(lockid, key, val);
}
}

localput将batch中的put操作的数据存放在targetColumns中，再用update方法update到每个HStore的memcache中：

TreeMap<HStoreKey, byte[]> edits = this.targetColumns.remove(Long.valueOf(lockid));
if (edits != null && edits.size() > 0) {
update(edits);
}

update的过程分为以下几个步骤：

1.hlog增加此次操作的信息

2.遍历每个edit信息。取出key和value，增加memcache的size

3.添加key和value到memcache里

4.如果size大于memcacheflushsize则强制flush

this.log.append(regionInfo.getRegionName(),
regionInfo.getTableDesc().getName(), updatesByColumn);

long size = 0;
for (Map.Entry<HStoreKey, byte[]> e: updatesByColumn.entrySet()) {
HStoreKey key = e.getKey();
byte[] val = e.getValue();
size = this.memcacheSize.addAndGet(key.getSize() +
(val == null ? 0 : val.length));
stores.get(HStoreKey.extractFamily(key.getColumn())).add(key, val);
}
if (this.flushListener != null && size > this.memcacheFlushSize) {
// Request a cache flush
this.flushListener.flushRequested(this);
}

总结：

1.put操作因为只put到内存的sortedmap中就返回，因此速度非常快，这也是HBase的LSM-Tree引以为豪的地方之一

2.写入memcache时会先加读锁，但好像没加写锁，这是为啥。。。是因为memcache里的sortedmap是Collections.synchronizedSortedMap的吗？

3.可以看出是put的过程是先写hlog再写内存的，因此只要写到内存的数据就可以认为是安全的了

4.每次写入memcache都会check是否到了flush的size，如果到了就会触发flush。

5.flush会将内存中的数据写入hdfs文件系统。这种定期batch的写文件效率是非常高的，而且在没有flush时不占读的io，无形中留了很多io给读操作，增加了读的性能

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航