Facebook's New Real-time Messaging System: HBase to Store 135+ Billion Messages a Month
You may have read somewhere that Facebook has introduced a new Social Inbox integrating email, IM, SMS, text messages, on-site Facebook messages. All-in-all they need to store over 135 billion messages a month. Where do they store all that stuff? Facebook's Kannan Muthukkaruppan gives the surprise answer in The Underlying Technology of Messages: HBase. HBase beat out MySQL, Cassandra, and a few others.
Why a surprise? Facebook created Cassandra and it was purpose built for an inbox type application, but they found Cassandra's eventual consistency model wasn't a good match for their new real-time Messages product. Facebook also has an extensive MySQL infrastructure, but they found performance suffered as data set and indexes grew larger. And they could have built their own, but they chose HBase.
HBase is a scaleout table store supporting very high rates of row-level updates over massive amounts of data. Exactly what is needed for a Messaging system. HBase is also a column based key-value store built on the BigTable model. It's good at fetching rows by key or scanning ranges of rows and filtering. Also what is needed for a Messaging system. Complex queries are not supported however. Queries are generally given over to an analytics tool like Hive, which Facebook created to make sense of their multi-petabyte data warehouse, and Hive is based on Hadoop's file system, HDFS, which is also used by HBase.
Facebook chose HBase because they monitored their usage and figured out what the really needed. What they needed was a system that could handle two types of data patterns:
- A short set of temporal data that tends to be volatile
- An ever-growing set of data that rarely gets accessed
Makes sense. You read what's current in your inbox once and then rarely if ever take a look at it again. These are so different one might expect two different systems to be used, but apparently HBase works well enough for both. How they handle generic search functionality isn't clear as that's not a strength of HBase, though it does integrate with various search systems.
Some key aspects of their system:
- HBase: Has a simpler consistency model than Cassandra.
- Very good scalability and performance for their data patterns.
- Most feature rich for their requirements: auto load balancing and failover, compression support, multiple shards per server, etc.
- HDFS, the filesystem used by HBase, supports replication, end-to-end checksums, and automatic rebalancing.
- Facebook's operational teams have a lot of experience using HDFS because Facebook is a big user of Hadoop and Hadoop uses HDFS as its distributed file system.
I wouldn't sleep on the idea that Facebook already having a lot of experience with HDFS/Hadoop/Hive as being a big adoption driver for HBase. It's the dream of any product to partner with another very popular product in the hope of being pulled in as part of the ecosystem. That's what HBase has achieved. Given how HBase covers a nice spot in the persistence spectrum--real-time, distributed, linearly scalable, robust, BigData, open-source, key-value, column-oriented--we should see it become even more popular, especially with its anointment by Facebook.
Related Articles
- Integrating Hive and HBase by Carl Steinbach
- 1 Billion Reasons Why Adobe Chose HBase
- HBase Architecture 101 - Write-ahead-Log by Lars George
- HBase Architecture 101 - Storage y Lars George
- BigTable Model with Cassandra and HBase by Ricky Ho
- New Facebook Chat Feature Scales To 70 Million Users Using Erlang
- Facebook's New Realtime Analytic…
- Real-time programming applied to the FreeRTOS operating system
- Facebook Data Freeway : Scaling Out to Realtime
- How to read out WhatsApp messages with Tasker and react on their content in real time
- 当php出现“It is not safe to rely on the system's timezone settings”时怎么办?
- System.currentTimeMillis,getTimeInMillis与new Date().getTime获取当前时间戳耗时比较
- System.currentTimeMillis,getTimeInMillis与new Date().getTime获取当前时间戳耗时比较
- Unable to convert MySQL date/time value to System.DateTime
- System.currentTimeMillis()、uptimeMillis和elapsedRealtime 三者区别
- Unable to convert MySQL date/time value to System.DateTime
- 关于Microsoft Unified Communications Managed API 3.0开发的相关技术问题(RealTimeEndpoint, SipEndpoint, SipPeerToPeerEndpoint之间的区别是什么?)
- Tumblr Architecture - 15 Billion Page Views a Month and Harder to Scale than Twitter
- HDU Traffic Real Time Query System
- 项目--Unable to convert MySQL date/time value to System.DateTime
- simulation is not running in real time due to excessive cpu load
- UVALive 4839 HDU 3686 Traffic Real Time Query System
- SystemTimeToFileTime、FileTimeToLocalFileTime、LocalFileTimeToFileTime三函数的跨平台实现
- miniz compared to other real-time and high-ratio compressors
- Why you don't want real-time analytics to be exact
- PHP Warning: date(): It is not safe to rely on the system’s timezone settings