Is there a limit to the number of columns in an HBase row?
2011-03-04 09:28
501 查看
Quora上一个哥们提出下面这个问题,之前其实读过的不过没怎么在意comments,而且自己还亲自测试了一下这个问题确定当一个row很大的时候不会发生拆分的。今天又看了一遍,其实这里面已经包含了很多信息:
I am wondering if I should have lot of rows or lot of columns in a row? which is better if I have to index them as well
Todd主要从三个方面说明不推荐将大量的column塞到一个row对应的CF下面去:row锁、region分配、性能; 所以如果不需要原子操作很多个cell时,尽量使用多行。
文章来源:http://www.quora.com/Is-there-a-limit-to-the-number-of-columns-in-an-HBase-row
Todd Lipcon, HBase committer
There's not really a limit. Here are some things to consider:
Lock granularity
When you do an operation within a row, the RegionServer code briefly holds a lock on that row while applying the mutation.
On the plus side, this means that you can act atomically on several columns - concurrent readers will either see the entire update or won't see the update at all. They shouldn't (barring one or two bugs we're still stomping on) see a partial update.
On the minus side, this means that the throughput of write operations within a single row is limited (probably a few hundred per second).
We're currently working on some optimizations for specific cases like increment so that multiple incrementers can "line up" behind the lock and then batch their addition together into a single transaction.
Region distribution
The unit of load balancing and distribution is the region, and a row will never be split across regions. So, no matter how hot a row is, it will always be served by a single server. If the data were split across many rows, you could force a split in between two hot rows to distribute load between two hosts.
Bugs
In prior versions of HBase there were some bugs where we would accidentally load or deserialize an entire row into RAM. So if your row is very large (100s of MBs) you may have run into serious performance issues, OOMEs, etc. I think most of these bugs are since squashed, and the RS does a smart job of only loading the necessary columns, but it's something to be aware of.
Summary
In summary, if you don't need to do atomic operations across multiple cells, probably better to make a "tall" data layout.
I am wondering if I should have lot of rows or lot of columns in a row? which is better if I have to index them as well
Todd主要从三个方面说明不推荐将大量的column塞到一个row对应的CF下面去:row锁、region分配、性能; 所以如果不需要原子操作很多个cell时,尽量使用多行。
文章来源:http://www.quora.com/Is-there-a-limit-to-the-number-of-columns-in-an-HBase-row
Todd Lipcon, HBase committer
There's not really a limit. Here are some things to consider:
Lock granularity
When you do an operation within a row, the RegionServer code briefly holds a lock on that row while applying the mutation.
On the plus side, this means that you can act atomically on several columns - concurrent readers will either see the entire update or won't see the update at all. They shouldn't (barring one or two bugs we're still stomping on) see a partial update.
On the minus side, this means that the throughput of write operations within a single row is limited (probably a few hundred per second).
We're currently working on some optimizations for specific cases like increment so that multiple incrementers can "line up" behind the lock and then batch their addition together into a single transaction.
Region distribution
The unit of load balancing and distribution is the region, and a row will never be split across regions. So, no matter how hot a row is, it will always be served by a single server. If the data were split across many rows, you could force a split in between two hot rows to distribute load between two hosts.
Bugs
In prior versions of HBase there were some bugs where we would accidentally load or deserialize an entire row into RAM. So if your row is very large (100s of MBs) you may have run into serious performance issues, OOMEs, etc. I think most of these bugs are since squashed, and the RS does a smart job of only loading the necessary columns, but it's something to be aware of.
Summary
In summary, if you don't need to do atomic operations across multiple cells, probably better to make a "tall" data layout.
相关文章推荐
- what is the difference of select single and select up to one row in abap
- Question 12: In C++, which of the following is the best declaration for an overloaded operator[] to allow read-only access (and
- "Your computer could not be joined to the domain. You have exceeded the maximum number of computer accounts you are allowed to create in this domain. Contact your system administrator to have this limit reset or increased."
- (NOT CLEAR)Question 40: In the declaration below, p is a pointer to an array of 5 int pointers:
- There is an internal error in the React performance measurement code.Did not expect componentDidMount timer to start while render timer is still in progress for another instance
- Given an array of size N in which every number is between 1 and N, determine if there are any dupli
- Is there any best way to reduce the size of ibdata in mysql.?
- Fast ways in R to get the first row of a data frame grouped by an identifier
- InnoDB: The Auto-extending innodb_system data file './ibdata1' is of a different size 640 pages (rounded down to MB) than specified in the .cnf file: initial 768 pages, max 0 (relevant if non-zero) pa
- [ERROR] The goal you specified requires a project to execute but there is no POM in this directory
- - No enclosing instance of type Test is accessible. Must qualify the allocation with an enclosing in
- 报错解决 unable to unroll loop, loop does not appear to terminate in a timely manner (994 iterations) or unrolled loop is too large, use the [unroll(n)] attribute to force an exact higher number
- Oracle 12C ORA-01792: maximum number of columns in a table or view is 1000
- It is not possible to run two different versions of ASP.NET in the same IIS process:IIS
- Q:Is there any way to define what the "Open Resource" dialog in Eclipse should show?
- 76 What is the effect of increasing the value of the ASM_POWER_LIMIT parameter? A. The number of DBW
- In a bunch of number to find out the biggest minimum value
- The method below converts an array of objects to a DataTable object in C#.
- - No enclosing instance of type Test is accessible. Must qualify the allocation with an enclosing in
- Layout inflation is the term used within the context of Android to indicate when an XML layout resou