您的位置:首页 > 数据库

[译]拥抱NoSQL数据库[TRANS]

2010-04-13 21:40 281 查看

Saying Yes to NoSQL

[译]拥抱NoSQL数据库

原作者:John Quinn(Digg: http://digg.com/users/doofdoofsf, Twitter: http://twitter.com/doofdoofsf) 翻译:阿木(http://waymangood.spaces.live.com)

The last six months have been exciting for Digg's engineering team. We're working on a soup-to-nuts rewrite. Not only are we rewriting all our application code, but we're also rolling out a new client and server architecture. And if that doesn't sound like a big enough challenge, we're replacing most of our infrastructure components and moving away from LAMP.
近半年来一直为Digg技术团队中正在进行的事情而兴奋不已,因为我们正着手对Digg进行彻底的重写,不仅仅是重写所有的应用程序,同时也推出了一个全新的C/S架构,如果这些听起来还不够有挑战性的话,那我告诉你,我们还正在替换大部分的基础组件,并且决定不再使用LAMP(译者注:Linux+Apache+Mysql+Perl/PHP/Python,一组常用来搭建动态网站或者服务器的开源软件,本身都是各自独立的程序,但是因为常被放在一起使用,拥有了越来越高的兼容度,共同组成了一个强大的Web应用程序平台。)

Perhaps our most significant infrastructure change is abandoning MySQL in favor of a NoSQL alternative. To someone like me who's been building systems almost exclusively on relational databases for almost 20 years, this feels like a bold move.
可能我们最重要的基础架构上的变化,就是放弃使用MySQL,转而使用NoSQL数据库。对于像我一样20多年来一直专注于基于关系数据库来构建系统的人来说,这应该算是一个大胆之举。

What's Wrong with MySQL?
MySQL存在的问题

Our primary motivation for moving away from MySQL is the increasing difficulty of building a high performance, write intensive, application on a data set that is growing quickly, with no end in sight. This growth has forced us into horizontal and vertical partitioning strategies that have eliminated most of the value of a relational database, while still incurring all the overhead.
我们放弃MySQL的首要动机是:基于一个无休止高速增长的数据集构建一个高效率、高密集度写入的应用程序的所带来的不断增加的开发难度。这种增长迫使我们不得不同时采用水平分割和垂直分割的策略,而这又使大多数关系数据库的优点都荡然无存,并且也不能完全解决问题。

Relational database technology can be a blunt instrument and we're motivated to find a tool that matches our specific needs closely. Our domain area, news, doesn't exact strict consistency requirements, so (according to Brewer's theorem) relaxing this allows gains in availability and partition tolerance (i.e. operations completing, even in degraded system states). We're confident that our engineers can implement application level consistency controls much more efficiently than MySQL does generically.
这样的话关系数据库就变得很鸡肋了,我们就迫切需要寻找一种更加契合我们的具体需求的工具。我们的行业领域(新闻)对一致性的要求并不是非常严格,所以(根据Brewer的理论)适当放宽对一致性的要求可以获得更高的有效性和分割容忍度。我们也有足够的信心我们的工程师可以实现应用级的一致性,而且将会比MySQL更有效地对其进行控制。

As our system grows, it's important for us to span multiple data centers for redundancy and network performance and to add capacity or replace failed nodes with no downtime. We plan to continue using commodity hardware, and to continue assuming that it will fail regularly. All of this is increasingly difficult with MySQL.
随着系统的不断庞大,我们需要更多地考虑冗余、网络性能、扩容、以及不停机替换坏的节点,所以设备跨多个信息中心(IDC)对我们来说很重要。我们计划继续使用标准硬件,并假定它们可能时常出现故障,如果使用MySQL的话所有的这些都会变得日益麻烦。

Choosing an Alternative
替代方案
Digg is committed to the use and development of open source software and we're keen to avoid the cost of proprietary large-scale storage solutions. We were inspired by Google and Amazon's broad use of their non-relational BigTable and Dynamo systems. We evaluated all the usual open source NoSQL suspects. After considerable debate, we decided to go with Cassandra.
Digg一直热衷于开源软件的使用和开发,我们也尽量回避因使用商用的大规模的存储方案所带来的成本。Google和Amazon已经在广泛地使用他们的非关系型数据库BigTable和Dynamo,我们此举在一定程序上也是受到他们的启发。我们考量了所有常用的开源的NoSQL方案,经过多次讨论,最终决定使用Cassandra(译者注:http://baike.baidu.com/view/1350234.htm?fr=ala0_1_1)。

Simplistically, Cassandra is a distributed database with a BigTable data model running on a Dynamo like infrastructure. It is column-oriented and allows for the storage of relatively structured data. It has a fully decentralized model; every node is identical and there is no single point of failure. It's also extremely fault tolerant; data is replicated to multiple nodes and across data centers. Cassandra is also very elastic; read and write throughput increase linearly as new machines are added.
简单地说,Cassandra是一个具有BigTable的数据模型并且运行在类似Dynamo的基础架构之上的分布式数据库。他是列导向的并允许相对构化数据存储。他具有完全分散模型;所有的节点都是同一的,没有单点故障。同样他的故障容忍度很高;数据会被复制到跨数据中心的多个节点。Cassandra也很有弹性;当有新设备加入时读写吞吐量会随之呈线性增长。

We experimented on our live site, replacing a relatively high scale MySQL component with a Cassandra alernative. These tests went well. You can read more about these experiments here.
我们在自己的网站上做了实验,用Cassandra替换掉了一个相关的大规模的MySQL组件,测试的结果很令人满意。你可以本文在接下来的内容中了解到更多的细节。

Where We Are
进展

At the time of writing, we've reimplemented most of Digg's functionality using Cassandra as our primary datastore. We've supplemented Cassandra-based indexing using full text, relational and graph indexing systems. We're getting used to dealing with eventual consistency.
到笔者撰稿日为止,我们已经以Cassandra作为我们的主数据库对Digg的绝大多数功能进行了重新实现。并添加了基于Cassandra的全文索引,关联索引和图形索引系统。我们也已经熟悉了如何处理可能的一致性问题。

We've been working on Cassandra itself too. We've made massive performance improvements: increased comparator speed, added better compaction threading, reduced logging overhead, added row-level caching and implemented multi-get capability. We've also implemented native atomic counters using Zookeeper (you can probably guess why were motivated to add that feature :)
我们同时也进行Cassandra的开发,为Cassandra做了大量的性能改进:提高了comparator(比较工具)的速度,引入了更优的内存紧缩线程控制,降低了日志对资源的消耗,加入了行一级的缓存并实现了multi-get(多线程下载?)的能力。我们还使用Zookeeper实现了原生的原子级的计数器(你应该能猜到我们为什么会加这个功能吧^_^)。

We've tested and improved the operational capabilities of Cassandra, upgrading its Rackaware capability, added slow query logging, improved the bulk import functionality and implemented Scribe support for improved logging. We've also done a ton of operational testing.
我们测试并改进了Cassandra的运转能力,升级了他的机架感知能力,加入了slow query logging,改进了批量倒入功能并为了改进日志功能而实现了Scribe支持。我们还做了大量的运行测试。

We're open sourcing all our work on Cassandra.
我们对Cassandra所做的所有的改进也都是开源的。

What's Next?
展望

Currently our main focus is getting Digg's latest release into general availability, but we'll continue to lead the way in championing Cassandra's development and adoption.
目前我们主要在集中精力确保Digg的最近的GA版本(注:软件的通用版本)的发布,接下来我们依然会一如既往地继续拥护Cassandra的开发和使用,并在这条道路上争做领头羊(注:-_-|||)

If you're interested in joining a world-class team using cutting edge, NoSQL technology at scale, check out jobs.digg.com
Take it easy,
你想加入世界顶级的Digg刀锋战队吗?你想掌握国际尖端的NoSQL技术吗?敬请访问:jobs.digg.com,是的!相信我!!你能!!!(译者注:多么华丽的广告植入,春晚都败给他了)

原文地址:http://about.digg.com/node/564
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: