XML and Relational Databases

<< Click to Display Table of Contents >>

Navigation:  Other Topics > Smart Access 1996-2006 > Jun-2003 >

XML and Relational Databases

Peter Vogel        

ONE course that I teach is four days on relational database design. The last time that I taught the course, one participant asked about “XMLdatabases” and whether they would replace relational databases. Before answering the question of whether or not XML databases will replace relational databases, I need to answer the embedded question: “What is an XML database?”

My assumption would be that an XML database is a database that stores XML documents as a single unit and lets you retrieve XML documents by using some combination of XPath and the developing XQuery specification. Will these databases replace relational databases? No.

I’m a consultant by profession (though I also do teaching and technical writing), so I’m not used to giving flat “Yes” or “No” answers. In this case, I feel pretty confident, though.

To begin with, XML documents simply aren’t as flexible a data storage mechanism as relational databases. The fundamental data structure in an XML document is hierarchical. I worked on hierarchical databases back in the dawn of time. I’m here to say that the reason we’re all using relational databases now is because relational is better. And it’s not because of performance, either. Relational databases are inherently slower than hierarchical databases. It’s only the steady increases in the speed of computer processors that have made relational databases acceptable. Relational is better.

In addition, the fundamental unit of storage for XML is the document. This makes XML databases less homogenous, more “lumpy,” and less flexible than relational databases. One of the reasons that welldesigned relational databases are so wonderful is that they let you combine, mix, and slice and dice your data in so many ways. This is because a fully normalized relational database stores data at the lowest level of granularity—the tuple (or row of data in a table).

Does that mean that there are a whole bunch of stupid people out there talking about XML databases who don’t know what they’re talking about, while I do know what I’m talking about? No, I’m not egotistical enough to believe that.

To begin with, there is a need to store whole XML documents. In applications, the primary purpose of XML is to provide a method of communication. In the future,applications will communicate by exchanging XML documents. Even databases will send and receive data as XML documents. As a result, it may be necessary for audit or legal reasons to store the XML document that you just received or that you’re about to send to a business partner. This will be in addition to (not instead of) storing the data that makes up the document in a relational database.

The usual objection here is that, “If the data is in the XML document, why take it out of the document and store it in the relational database? Why not just store the document?” The answer is that we extract the data from the XML documents for the same reason that we extract data from forms, from flat files, and from our users’ brains: because inside a relational database the data is usable—in its original format, the data is not.

The second reason that people discuss XML databases is because they aren’t building applications. Instead, these people are generating content and, typically, unstructured content. The material that makes up this newsletter is a perfect example of unstructured content. Storing this data in an XML database makes sense because, more and more, this content will be generated by tools that output XML. The next versions of Microsoft Word and Excel are excellent examples of this—the user will be able to create documents in the Microsoft proprietary format or in a more open XML format. This data doesn’t easily break down into the table, row, and column model of the relational database.

XML databases for unstructured content are a good thing. With all documents stored in a common text-fileswith-tags format, it’s considerably easier to build links among those documents. And, since databases now communicate by sending XML documents, it’s possible to extend those links from unstructured data into existing structured data stores. This is where tools like Microsoft’s InfoPath become important.

When I used to have a job (my pre-consultant days) and worked on mainframes, I used to talk about how personal computers were going to change the world. My mainframe-based coworkers used to tell me that mainframes would be around forever. Of course, they were right. There are still lots of mainframes in the world. Where there aren’t mainframes, there are “servers” big and powerful enough to be classified as mainframes.

 

See all the Editorials   or ALL THE ONLINE ARTICLES