您的位置:首页 > 其它

Is there a limit to the number of columns in an HBase row?

2011-03-04 09:28 501 查看
Quora上一个哥们提出下面这个问题,之前其实读过的不过没怎么在意comments,而且自己还亲自测试了一下这个问题确定当一个row很大的时候不会发生拆分的。今天又看了一遍,其实这里面已经包含了很多信息:

I am wondering if I should have lot of rows or lot of columns in a row? which is better if I have to index them as well
Todd主要从三个方面说明不推荐将大量的column塞到一个row对应的CF下面去:row锁、region分配、性能; 所以如果不需要原子操作很多个cell时,尽量使用多行。
文章来源:http://www.quora.com/Is-there-a-limit-to-the-number-of-columns-in-an-HBase-row
Todd Lipcon, HBase committer

There's not really a limit. Here are some things to consider:

Lock granularity
When you do an operation within a row, the RegionServer code briefly holds a lock on that row while applying the mutation.
On the plus side, this means that you can act atomically on several columns - concurrent readers will either see the entire update or won't see the update at all. They shouldn't (barring one or two bugs we're still stomping on) see a partial update.
On the minus side, this means that the throughput of write operations within a single row is limited (probably a few hundred per second).
We're currently working on some optimizations for specific cases like increment so that multiple incrementers can "line up" behind the lock and then batch their addition together into a single transaction.
Region distribution
The unit of load balancing and distribution is the region, and a row will never be split across regions. So, no matter how hot a row is, it will always be served by a single server. If the data were split across many rows, you could force a split in between two hot rows to distribute load between two hosts.
Bugs
In prior versions of HBase there were some bugs where we would accidentally load or deserialize an entire row into RAM. So if your row is very large (100s of MBs) you may have run into serious performance issues, OOMEs, etc. I think most of these bugs are since squashed, and the RS does a smart job of only loading the necessary columns, but it's something to be aware of.
Summary
In summary, if you don't need to do atomic operations across multiple cells, probably better to make a "tall" data layout.
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: 
相关文章推荐