您的位置:首页 > 数据库

打破第一范式的要求 (中英对照)Michael Rys 对 SQL Server 2005 中XML 的 评论——对微软SQL Server项目经理Michael Rys博士的采访

2009-05-12 13:08 393 查看



Michael Rys on XML in SQL Server 2005

Michael Rys对SQL Server 2005中XML的评论

Michael Rys is one of two program managers responsible for the XML features in SQL Server. He also represents Microsoft on the W3C XQuery Working Group, and on the ANSI working group for SQL. I asked him what is distinctive about the XML support in SQL Server 2005, as opposed to that found in rival database management systems. As Rys acknowledges, "all the three major vendors, Oracle, IBM and Microsoft, are moving towards second or third generation support, which means that you get full XML native support, native in the sense that you actually preserve the whole XML document, and provide some form of XQuery support over the data." However, a design goal with SQL Server is that XML is just another datatype. "You have the exact same programming model. With our approach you can evolve the structure of the data over time seamlessly." The idea, explains Rys, is that "you can arbitrarily manage your data regardless of whether it fits into a relational framework or not. And we try to make it really easy to use the functionality in a performant and scaleable way."
Michael Rys是微软SQL Server中XML特性的两个程序经理负责人之一。同时他还是W3C XQuery工作组和ANSI的SQL工作组中的微软代表。我问他SQL Server 2005中XML和竞争对手的数据库管理系统相比最出色的地方是什么。Rys表示:“三巨头,Oracle,IBM和微软都在推进第二或第三代的支持即对XML提供全面的本地支持,本地意味着将保存完整的XML文档,而且将对数据提供XQuery支持。”然而,SQL Server的设计目标是将XML仅仅作为一个数据类型。“你将具有完全相同的编程模型。通过我们提供的方法,你可以无缝的随时改变数据的结构。” ,Rys解释道,他的意思是“你可以任意的管理你的数据而不用考虑他是否适合放进一个关系数据库。而且我们正在试图使它的功能可以容易的通过高性能可扩展的方式被使用。
That sounds good, but it begs the question of when it is approriate to stuff XML into the database. On the face of it, it breaks the tidy and disciplined relational structure. Is there a risk that using an XML data type will complicate data management?
这听起来好极了,但是它带来一个问题,那就是什么时候适合把XML放进数据库中?它破坏了整洁和规律化的关系数据结构。使用XML数据类型会不会带来风险,使得数据库的管理变得复杂?
"I think it makes it harder to model the data," says Rys. "People have to get down to understanding what it means to model hierarchical data in a relational context. Actually, this discussion has been going on since the early 1980’s, at least in the research community. In the early 1980s some people came up with what is called the non first normal form database, which basically is doing nested tables. Some people said that in order to do clear design in the database you have to have first normal form. Other people, including the professor I did my PhD with, said no, the first normal form is not required for doing clear database design. You can use second normal form and third normal form and BCNF [Boyce-Codd Normal Form] and all these other normal forms, without necessarily having to have a first normal form available. But it adds another dimension to your modelling, which means that you have to be very careful to understand how you can utilise it.
“我认为他使得数据的建模变复杂了”,Rys说,“人们必须理解将层次结构数据放入关系数据库意味着什么。事实上,这个争论从1980年就开始了,至少在研究社区中。早在1980年,一些人提出了一种叫做非第一范式数据库的东西,它简单的完成嵌套表的功能。一些人说,为了使得数据库的设计清晰,你必须遵守第一范式。另一些人说,包括我的博士生导师在内,说不,第一范式在清晰的数据结构设计中不是必须的。你可以使用第二范式、第三范式和BCNF(Boyce-Codd Normal Form)范式和所有其它的范式而无须在遵守第一范式的基础之上。但是,它给你的模型增加了一个维度,这意味着你必须清楚你可以从中获得什么好处。
"If your data fits the relational model, you’re probably better off probably still taking the XML and shredding into relational form, because all the data really is relational data and you want to process it that way.
“如果你的数据适合关系数据模型,你何能最好不要使用XML并转而使用关系范式,因为所有的数据都是关系数据并且你希望这样操作它。
"On the other hand, if you actually have a collection of values, if you have object structures of things that you want to treat almost as an atomic item, but which you might want to query to find individual values, then having the ability to manage that in an accessible way inside the database is good. Look at arrays. If you have data that fits into an array paradigm, like a series of measurements for example, you might want to manage that series of measurements as an unique whole. Trying to do that design on the relational level is more problematic than if you can store it within one of these non first normal datatypes."
另一方面,如果你真的有一组值,如果你有对象形式的事物,你又想把它看成一个原子,但是你又希望能够查询到其中每个单独的值,这样能在数据库中以一种可以访问的方式管理他们是非常好的。比如阵列。如果你有阵列形式的数据,例如一系列的测量值,你可能希望能够管理已一个整体来管理这些测量值。试图将他嫩设计到一个关系数据库中和将他们存储在这种非第一范式的字段中相比是很有问题的。

A new way to store data in SQL Server

将数据存储在SQL Server中的新方式

What this means is that the XML datatype is more than merely a more convenient way to store XML. It means that SQL Server can effectively model a wider range of real-world data. In some ways it is catching up with Oracle and DB2. Oracle supports a nested table column type, while DB2 has a similar feature called a structured type, so both these database managers already support non first normal form to an extent. However, the XML datatype is far richer than these. Of course you could always store XML, or any type you like, in a SQL Server text or binary column, but in doing so you severely limit the capability to index and search the data efficiently.
这亦为之XML数据类型不仅仅是一个更方便的存储XML的方式。它意味着SQL Server可以更有效的对一个更宽泛的现实世界种的数据进行建模。在某些方面它正在赶上Oracle和DB2.Oracle支持一个嵌套表的列类型而DB2有一个相似的特性叫做结构类型,所以这两个数据库管理者已经在一定程度上支持了非第一范式形式。然而,XML数据类型比这些都更丰富。当然你总是可以再SQL Server中以Text或Binary字段存储XML或者其他的数据,但是这样做你就严格的限制了数据的索引和搜索能力。
Departing from the pure relational model may be a culture shock. Rys notes that our typical approach to modeling data does not take account of all possibilities. "The modelling approach has to be extended to take into account that if you have an object that has no separate functional dependencies to things outside that object, you might want to treat it separately. So the question becomes, do you need a new theory of normalization? There are researchers working on that. These researchers often only look at modelling functional dependencies within an XML document, but I think the interesting part is where you have multiple different documents as well as relational data and you want to try to find out how to best model it. That’s an interesting research topic. But you still can start out with relational modelling, look at your XML documents and determine whether you want to break them up because they fit the relational model, or keep them together because they are more like objects."
背离纯的关系数据库可能会造成一种文化冲击。Rys指出我们传统的数据建模方法没有涵盖所有的可能。建模方法必须被扩展并考虑到你可能有一个对象,它和外部没有功能上的依赖,你可能想单独的处理它。所以问题就变为,你需要一个新的正规的理论吗?已经有研究者在研究这一点了。这些研究者经常值关心XML文档模型内部的功能依赖,但是我认为值得研究部分是当你有很多不同的文档和关系数据时如何找到最好的建模方法。这是一个有趣的研究课题。但是你也可以从关系建模开始,看看你的XML文档,看看你是否想将他们分离开来适应关系模型,或者让他们在一起,因为他们看起开更像一个对象。
One implication of the XML datatype is that storing objects of an arbitrary size in the database will become more common. Traditionally, the advice has been to store such objects in the file system for better peformance. I asked Rys if these technical issues have been solved?
XML数据类型意味着将任意大小的对象存入数据库将变得越来越普遍。从传统上来说,为了获得更好的性能,推荐的方式是将他们存储到文件系统中。我问了Rys这些技术问题有没有得到解决?
"I don’t think all the technical problems have been solved, but certainly Microsoft, and also I think the other major relational database vendors, are working hard to make management of really large objects inside the database as good as managing smaller text or binary objects. If you look at SQL Server 2005, it now has new nvarchar(max) and varbinary(max) extensions of the varchar and varbinary types to replace these dreadful ntext and image types that we are deprecating. But you will still have the issue that if the row gets bigger than 8K, or whatever your buffer size is, you will have to deal with in-row and out-of-row storage of the data. In turn you are gaining all the benefits of storing it in the database, which means you get concurrent access against the data, backup and recovery, logging of operations, and being able to undo operations on such types, which you don’t get if you keep them out of the database.
我不认为所有的技术问题都被解决了,但是当然微软和,我想其他的关系数据库巨头都在努力使在数据库中管理大的对象和关系小的文本或二进制对象一样好。如果你看看SQL Server 2005,它现在又新的nvarchar(max)和varbinary(max)扩展了varcharvarbinary类型来代替那些可怕的我们已经不赞成使用的ntextimage类型,但是你如果行大于8K或者你的缓存大小,你仍然会遇到问题,你将不得不处理行中和超出行的存储数据。作为回报,你获得了将它存储在数据库中的所有好处,这意味着你可以并行的访问数据,备份、恢复和记录你的操作,并且有撤销操作。将他们存储在数据库之外你是不会得到这些特性的。
"However if your main goal is just to have fast file-type access to these large objects, then obviously file system is still more performant. One of the main problems people have in general with large objects is that it’s hard to unlock the information in the data. With XML being a structured format, with a query language, you have additional benefits from putting the XML into the database versus keeping it on the file system. You can query into the data, and potentially even do partial updates, which will be much harder to do if you have it outside the database in the file system."
然而如果你的主要目标仅仅是拥有对这些大对象的一个快速的文件访问,那么文件系统的表现更好。对于大对象,人们遇到的普遍问题是很难解析数据中的信息。通过使用XML作为几种结构形式,通过一种查询语言,和放入文件系统相比,你可以通过将XML放入数据库来获得额外的好处。你可以查询数据和做部分的更新,如果你将它存在数据库之外的文件系统中,这是很难做到的。

Use cases for XML data

XML数据的用例

I asked Rys for some examples of when you might want to use the XML features of SQL Server 2005.
我问了Rys一些你在SQL Server 2005中可能希望使用XML特性的一些例子。
"There are three or 4 major scenarios that I see. The first, which is still probably about the 60% scenario, is that you use XML on the wire. You have your messages that might be SOAP or XML over HTTP, and you need to unpack the data. So you need a mechanism to get at the data, shred it out, and put it back into relational form; or going the other way, to take your relational data and put it into XML form.
这里我见过有3或4中场景。首先,60%的可能性,你使用了XML。通过HTTP可以使SOAP或XML的消息,并且你需要解析数据。所以你需要一个活的这些数据的机制,并且将它放入关系型数据库之中或者另一种方式,使用你的关系数据并将它放入XML形式。
"The interesting part is that often all or part of that message doesn’t really fit the relational model that well. Then people want to manage their XML messages as XML documents. That might simply be because they are doing routing, and don’t want to decompose and recompose the data; or it might be because the XML structure is really not relational.
有趣的部分是通常全部或者部分的消息并不真的适合关系数据模型。人们想要通过XML文档的方式管理XML消息。这可能会比较简单因为他们按常规办事,而且不想把数据分解和聚合或者可能因为XML结构事实上不是关系型的。
"A third scenario is where people have actual business documents in XML form for example from Microsoft Office, Infopath, or Open Office, or Adobe documents. Many of these documents have useful structural information, and it is useful to use XPath expressions on them.
第三个场景是人们可能有真实的XML商业文档,例如来自Microsoft Office,Infopath,或Open Office,或Adobe文档。很多这些文档有有用的机构信息,并且可以使用有用的Xpaht表达式。
"Another scenario that I see is the ad-hoc use of XML, where you use XML to model something because the relational model doesn’t give you the capability. Examples of that are open-ended property bags, where you have an XML column where you store name-value pairs which are so instance-specific that it doesn’t make sense to put them into a column format. The rest of the structured information is clustered together using columns. You have your normal structural aspect and then you have an XML column that contains the varying properties.
另一个场景是XML的特别应用,当你使用XML来建模一些因为使用关系模型不能给你能力的时候。例如可修整的属性包,在这里你使用XML列来存储键值对,他们太特殊了,以至于将他们存入一列没有意义。其余的结构信息都使用列串在一起。一方面你有你常规的结构,同时你还用XML列来包含变化的属性。
"A further use is to store programming objects, or at least the state of programming objects, in the database. It’s very hard to represent the state of lots of varying objects in the database if you’re a programmer and want to change the schema all the time. Previously you might have put them into a binary blob, and then couldn’t access any of the properties in the database without getting the whole object back.
进一步的应用是在数据库中存储编程对象或至少程序对象的状态。在数据库中存储众多变化对象的状态是非常困难的,如果你是一个程序员并且总是希望能够改变框架。在此之前你可能会把他们存储到二进制包中并且不能访问任何的属性除非将这个对象取回。
"You could use the CLR [Common Language Runtime] support that we have now in SQL Server 2005 to do that, but to be able to do that you have to know beforehand exactly what objects you want to put into the database, and you will not be able to manage heterogeneous sets of such objects. The CLR, at least in this release, also has a slight limitation which the XML data type doesn’t have. CLR objects can only be 8K. If people decide to store their objects in an XML data type, they still can access the properties using XQuery, while not having to bother about very strict object registration requirements. You just put the state of the object in the database, and keep the programming logic on the mid-tier."
你可以使用我们现在在SQL Server 2005 中支持的CLR(通用语言运行时)来完成,但是在这样做之前,你必须清楚地知道你想要将什么对象放入到数据库当中,而且你不能管理这些对象的不同集合。至少在现在发布的CLR中还有XML数据类型所没有的一个缺陷。CLR对象只能有8K。如果人们决定将对象存储在XML数据类型中,他们仍然可以通过使用Xquery来访问属性,同时无需关心严格的对象注册条件。你只需将对象的状态放入数据库,然后将程序逻辑放在中间层。

XQuery and XPath Standards

Xquery和Xpath标准

The XQuery 1.0 and XPath 2.0 standards are critically important to SQL Server, but will they be finalised in time for its release? "No," says Rys. "We are going to ship before the standard gets its final recommendation. In April 2005 the W3C moved into what’s called the “last call” phase. This phase will take up until the summer. There is a summer break in August, and then the next phase is the Candidate Recommendation. During this phase the different vendors and software developers are invited to build prototypes based on the specification, and to make sure that individual features that the specification describes are interoperable, in the sense that they return the same results for the same syntax. That phase will probably take at least 4 to 5 months. Then you will have the inevitable discussions inside the working group whether certain tests should be included or not, and what the expected result for a certain test should be, and to determine what are the exit criteria out of the Candidate Recommendation phase. I would expect this to take 5 months. Assuming that we might get into Candidate Recommendation by October, then getting to the Proposed Recommendation phase will probably be late Q1 2006."
Xquery 1.0和Xpath 2.0标准对SQL Server来说是非常重要的,但是他们会在这个版本发布的时候完成吗?“不”,Rys说。“我们将在标准获得最终推荐之前发布。在2005年四月,W3C转移到了‘last call’阶段。这个阶段将会持续到夏天。在8月有一个暑假,然后下一个阶段是候选推荐。在这个阶段不同的销售商和软件开发者会被邀请创建基于条款的原型,并且确定条款的每个特性描述都是可以被理解的。如果他们能够为每个语法返回相同的结果。这个阶段将会大约持续4到5个月。然后你就要在工作组中进行不可避免的讨论来确定某些测试是否应该包含在其中。一个特定测试期望的结果是什么。并且决定在离开候选阶段时的最终标准。这估计需要5个月。加入我们能在10月进入候选推荐阶段,之后获得提议的推荐阶段将会在2006年1月末。
This being the case, the question is how SQL Server will be updated to achieve full conformity? "We tried to scope the XQuery support for this release to a subset of the specification that we felt was stable. Many of the more controversial features of the spec we have not implemented. We have also not implemented many of the functions, especially all the date-time functions. The goal that I have with the XQuery support in SQL Server is that we are shipping the subset with SQL Server 2005 now, and for the next release the standard will be fixed and finalised. We will still provide backward compatibility support for people that have been building on SQL Server 2005, in case something which we have supported changes."
这就是情况,问题是SQL Server将会怎样获得一致的更新?我们试图使得Xquery在这个版本中支持特性中一个稳定的子集。特性中很多有争议的特性我们还没有实现。我们有很多函数还没有实现,特别是日期-时间函数。目标是我将发布Xquery的一个子集和Sql Server 2005 一起,并且在下一个版本中标准将被修整并完成。我们仍将提供向前的兼容性支持为那些人们已经使用SQL Server 2005设计的东西,以防我们支持的一些特性的变动。

Performance

性能

Finally I asked Rys about the performance implications of using an XML data type versus shredding XML data into columns. "The general recommendation today is if your data fits the relational framework, and you want to have relational type queries over the data, then it is more efficient to shred it into relational form, and run your queries there. Even though we have indexes on our XML structures, the query expressions are internally more complex than if you query relational content. In an XML data type, even if you have fairly structured data, it is still such that you are not guaranteed that a specific element is necessarily always at the same place. So it is a bit more effort to execute those queries.
最后,我问了Rys关于使用XML数据类型和松散XML数据放入列中的性能的差异。通用的建议是如果你的数据适合关系数据模型,并且你希望使用关系数据类型来查询这些数据。那将它们放入关系数据库中效率更高,并且在那运行查询。尽管我们在我们的XML结构中有索引,内部的查询表达式比起你查询关系数据库来说更复杂。在一个XML数据类型中,即使你有相当结构化的数据,你不能保证一个特殊的元素始终在一个相同的位置。所以要查询他们要费些力气。
"The sweet spot for XML is not replacing relational processing. The sweet spot is to enable you to query data that you either had a hard time shredding before, or that you couldn’t query at all before."
XML的sweet spot不在于它代理了关系处理,而在于它使你能够查询数据或者原先很难遍历,或者以前根本不能查询。

Links

Copyright Tim Anderson 20th July 2005. All rights reserved

初次翻译,错误应该很多,请读者见谅。

原文链接:
http://www.itwriting.com/sqlxml.php
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: