Access to XML

<< Click to Display Table of Contents >>

Navigation:  Imports, Exports and XML >

Access to XML

Michael Corning            
 
The eXtensible Markup Language (XML) provides a powerful way to embed information about data into text. In this article, Michael introduces you to XML and shows how it can be used with Access.
 
The Internet and the Web continue to exert a highly visible and potentially profound influence on current and future business practice. The recent ratification of the eXtensible Markup Language (XML) marks the beginning of the next phase of the evolution of the Web and its impact on software development and business. This article will introduce the fundamentals of this important technology. I'll show you how to use XML in a three-tier client/server architecture with Microsoft Access as the server (or data) tier, Microsoft Internet Information Server and ActiveX Data Objects in the middle tier, and Internet Explorer 5.0 in the client tier.
 
XML

XML exposes an object model that can use declarative programming all the way from the client to the server. XML can do things that HTML can't, including binding data to elements of your Web page. XML can even be used to define other languages, like the eXtensible Style Language (XSL). The combination of XML and XSL allows you to display data in richer and more flexible ways than HTML. In addition, XML is the basis for XML-Data Schemas (currently a technology preview product). Schemas can be used to specify the precise structure of the data passed between client and server using an XML syntax. In essence, XML enables the Web to describe itself. With XML, a whole new world emerges within the World Wide Web.
 
It's probably easiest to describe XML by comparing it to HTML. Most of the few dozen HTML tags are descriptive: They tell the Web browser how to display data. The <P> tag tells the browser that the following data belongs in a new paragraph. The <OL> tag says the following lines (defined by <LI> tags) should be displayed in a numbered list. Even the actual display of individual characters can be specified with the <FONT> tag.
 
A few HTML tags, however, also control processing. For example, the <A> tag tells the browser how to behave -- go to another page (<A HREF>) or another location in the current page (<A NAME>). Still, HTML was made mainly for displaying data.
 
XML does what HTML can't: It provides meaning for data. With XML (and an XML parser), <OrderID>1000</OrderID> is more than merely the number 1000; it represents an OrderID field for a specific order transaction in a data set somewhere. As I noted previously, XML can be used to define other languages (indeed, HTML is really a subset of XML). XML can be used to tell the browser's XML parser how to confirm the validity of the data formatted by XML.
 
So there are two key ideas to take away from this. One is that text coming into the browser is just that to the machine: just text. However, when this incoming text is wrapped with XML tags, then the application can tell what the incoming text means and can act accordingly. The second important point to appreciate is that XML itself can be used to process XML. I think these two features are on par with each other in their significance to the programming community.
 
XML techniques
 
Let me turn now to a discussion of how Access and Jet fit into this emerging picture. In the rest of this article, I'll show you how Jet can provide the data to an XML-aware application. From there, I'll go on to show you a simple way to display this XML data.
 
At this writing, both the current versions of Jet and ADO are agnostic with respect to XML. Jet simply provides data as it always has; it doesn't care about who consumes it or how. The upcoming version of ADO (2.1) lets you save recordsets to disk by using XML to format the data. Listing 1 shows a sample ADO recordset saved in XML format. Notice how the dataset begins with the XML that defines the format of the recordset and then is followed by the XML containing the actual data. Effectively, the XML defines the XML used in the recordset in Listing 1.
 

Listing 1. A recordset saved in XML.

 

<xml 

  xmlns:s='uuid:BDC6E3F0-6DA3-11d1-A2A3-00AA00C14882'

  xmlns:dt='uuid:C2F41010-65B3-11d1-A29F-00AA00C14882'

  xmlns:rs='urn:schemas-microsoft-com:rowset'

  xmlns:z='#RowsetSchema'>

<s:Schema id='RowsetSchema'>

   <s:ElementType name='row' content='eltOnly'>

      <s:attribute type='CustomerID'/>

      <s:attribute type='CompanyName'/>

       ...

      <s:attribute type='Phone'/>

      <s:attribute type='Fax'/>

      <s:extends type='rs:rowbase'/>

   </s:ElementType>

   <s:AttributeType name='CustomerID' 

       rs:number='1' rs:nullable='true'

       rs:write='true'>

      <s:datatype dt:type='string' 

       dt:maxLength='5' rs:maybenull='true'/>

   </s:AttributeType>

   <s:AttributeType name='CompanyName' 

       rs:number='2' rs:nullable='true' 

       rs:write='true'>

      <s:datatype dt:type='string' 

       dt:maxLength='40' rs:maybenull='true'/>

   </s:AttributeType>

  ...

</s:Schema>

<rs:data>

   <z:row CustomerID='ALFKI' 

       CompanyName='Alfreds Futterkiste' 

       ContactName='Maria Anders'

       ContactTitle='Sales Representative' 

       Address='Obere Str. 57' City='Berlin'

       PostalCode='12209' Country='Germany'

       Phone='030-0074321' Fax='030-0076545'/>

   <z:row CustomerID='ANATR' 

       CompanyName='Ana Trujillo Emparedados y helados'

       ContactName='Ana Trujillo' ContactTitle='Owner'

       Address='Avda. de la Constitucion 2222' 

       City='Mexico D.F.' PostalCode='05021'

       Country='Mexico' Phone='(5) 555-4729'

       Fax='(5) 555-3745'/>

</rs:data>

</xml>

 
 
Formatting XML
 
There are two approaches to formatting recordset data for XML. The ADO Save method uses the first of the two techniques to render XML data and is referred to as the attributes technique. With the attributes technique, each "row" of XML data is a single tag with attributes for each field of the record. The following XML line shows a customer record as a single tag. Within that tag, each field (customerID, companyName, contactName) appears as an attribute of the tag:
 

<customer customerID="MPC" companyName="Microsoft"

    contactName="Michael Corning" />

 
 
Here's some Access code that, passed a table name, generates XML attribute formatted records for every record in the table:
 

Sub WriteXML(strTableName As String)

    Dim dbs As Database

    Dim tbl As TableDef

    Dim fld As Field

    Dim rec As Recordset

    Dim strXML As String

 

    Set dbs = CurrentDb()

    Set rec = dbs.OpenRecordset(strTableName)

    Set tbl = dbs.TableDefs(strTableName)

    Open "c:\sample.html" For Output As #1

    Do Until rec.EOF

      strXML = "<" & tbl.Name & " "

      For Each fld In tbl.Fields

        strXML = strXML & fld.Name & "=""" & _

           rec(fld.Name) & """ " 

      Next

      strXML = strXML & "/>"

      Print #1, strXML

      rec.MoveNext

    Loop

    Close #1

End Sub

 
 
The second technique uses tags for each individual field. The attributes technique is more compact (there are fewer tags) but less flexible than the tag technique because attributes can't have attributes and attributes can't contain XML tags. The attributes technique is often handy for highly structured row-based data -- for example, data stored in a table. Since any data that has an OLE DB Provider can be rendered as table-formatted data, the attribute format is very useful for structured data.
 
As I said, the alternative technique uses XML tags for each data field. The tag technique is far more flexible and is often more appropriate for data that's less structured than relational data. The tag technique is more flexible because you can define any tag you want and place it anywhere in the data schema. In addition, each tag can have its own set of attributes. Should you break out of the row-oriented approach to data, keep the tag technique in mind.
 
Where XML?
 
Both the attributes technique and the tag technique work when the XML processing will be done on the client using a browser that includes an XML parser. This doesn't mean that you can't do complex XML processing on the server. There are at least four scenarios that would require XML to be processed on the server:

XML as the "poor man's heterogeneous join." With XML, you can retrieve data from disparate data sources into a single universal data format. With XML data from SQL Server, Access, even Oracle, Sybase, or DB2 can be accessed by a single ASP page and the aggregated data sent to the client as a single data stream of XML-enhanced text.

If you want your application to reach the broadest audience with a browser-neutral solution, then only HTML should be returned to the client. If the application must also do complex processing or rich display (assuming the client supports Cascading Style Sheets), then the application developer can use both XML and XSL on the server to process the data and generate HTML for the client.

If complex processing of tree-structured data needs to be done on the middle tier, then you should research the W3C Compliant XML object model. The XML object model is exposed through the XML parser, an Automation component that can be used by any COM-compliant tool. The XML parser produces a tree structure of data that's easily navigated, manipulated, and translated using the methods and properties of the XML object model.

If an application needs to communicate with other servers before it sends data to the client, or if the application's middle tier must process data returned from the client, then the incoming data will already be formatted as XML, and that data can be accessed with the XML object model.

 
At this writing, only Internet Explorer 5.0 fully supports the recently ratified Document Object Model (DOM) and XML specifications as well as the features specified in the latest findings of the XSL Working Group. Indeed, to follow along with some of the things in this article requires Internet Explorer 5.0 beta 2. Internet Explorer 5.0 provides the user with the richest possible experience in a Web browser. There isn't enough space in this article to describe how to exploit the full feature set of Internet Explorer 5.0, but I'll give you a flavor for what you can do. Specifically, the following example uses ADO and an ADO Data Source Object (DSO) for XML to bind and navigate recordsets. One of the wonderful things about ADO is that if a DSO exists for some data, then you can process that data using the ADO objects.
 
An example
 
The following example creates a simple hierarchical recordset with data taken from the Northwind.MDB Customer and Orders table. An ASP page, CustomerOrders.ASP, reads the data from the database and writes XML to a page called CustomerOrders.HTM file. Figure 1 shows the output displayed by Internet Explorer 5.0 with its built-in XML parser.

199903_MC1 Figure 1
 
The complete listing for CustomerOrders.ASP is included in the accompanying Download file. It's written in JScript and might look a little foreign to developers accustomed to VBScript. The resulting XML is well-formed, but, in the absence of a DTD, it can't be validated (see the sidebar, "Valid and Well-Formed XML"). The XML produced looks something like this:
 

<CUSTOMER CustomerId="001" CompanyName="ALKI" 

       ContactName="Peter">

<ORDER OrderId="100" OrderDate="1/1/95">

<ORDER OrderId="101" OrderDate="1/5/95">

</CUSTOMER>

<CUSTOMER CustomerId="002" CompanyName="SAMM"

       ContactName="Jane">

<ORDER OrderId="110" OrderDate="7/8/95">

<ORDER OrderId="115" OrderDate="12/5/95">

</CUSTOMER>

 
 
When read by I.E. 5.0, these tags are processed by the ADO DSO for XML. The DSO creates a recordset from the XML data. In fact, the recordset demonstrates another feature of ADO, as it's hierarchical: One recordset is nested inside another. Since there are multiple Customer tags, a recordset called Customer is defined that has the fields CustomerId, CompanyName, and ContactName. The Customer recordset will also contain an Order field. In turn, since there are multiple occurrences of the Order tag, the Order field will contain yet another recordset called Order. This Order recordset will have fields for OrderId and OrderDate. Each Order recordset is attached to the correct customer record in the nested data structures.
 
So, when the XML is finished processing, an ADO hierarchical recordset will have been created. This recordset is stored in memory, ready for the databinding that will display it in familiar HTML <TABLE> tags. Figure 2 compares the resulting hierarchical view of data with the traditional relational view. Incidentally, other than memory, there's no limit to the amount of nesting you can use.

Click to see full size Figure 2
 
Databinding
 
Now that I have a nice little hierarchical recordset, it's time to bind the data to HTML. The resulting Web page is shown in Figure 3. To see the order data for a customer, simply select the customer row. The recordset navigation buttons on the right can be used to move forward and backward one two-record page at a time (the buttons on the left to go to the top and bottom of the recordset). I haven't fully bullet-proofed the navigation routines, but you should find that they meet the needs of this demo.

199903_MC3 Figure 3
 
 
All you need to bind an HTML table to a hierarchical recordset generated by ADO from XML is to ensure that each child recordset is bound to its own HTML <TABLE> tag using the tag's DATAFLD parameter (the table's DATASRC parameter specifies the name of the ADO provider for XML: xmldso). Here are the HTML tags that will display the Order recordset that was created from the XML:
 

<TABLE WIDTH="100%" CELLSPACING="0" BORDER="1" 

  DATASRC="#xmldso" DATAFLD="order">

  <THEAD><TR><TH>Order ID</TH>

       <TH>Order Date</TH></TR>

  </THEAD>

  <TBODY>

    <TR>

      <TD ALIGN="middle" VALIGN="top"> 

       <SPAN DATAFLD="orderid"></SPAN> </TD>

      <TD ALIGN="middle" VALIGN="top"> 

       <SPAN DATAFLD="orderdate"></SPAN> </TD>

    </TR>

  </TBODY>

</TABLE>

 
 
The databinding in the preceding HTML is what I referred to earlier as declarative programming. With declarative programming, you simply specify what you want and let the operating system do all the heavy lifting for you.
 
Summary
 
What I've given you is a very cursory introduction to one of the hottest new technologies to make the Web go 'round. I've sketched the ways XML can be used in and between all three tiers of a client/server application. Access can continue to provide data, but when the data is returned to the client or middle tier, it can now be formatted as XML. In this article, I showed you one technique you can use to manually structure your Access data as XML. In the process, I also explained what it means to have XML that's "well-formed" and what it takes to be "valid" XML. I then demonstrated a simple application of XML to create hierarchical data that can be displayed with very simple HTML.
 
I've included in the accompanying Download file all the files, code, and sample pages that I've used in this article. I hope that you have as much fun exploring this technology as I've had creating this article.
 
 
 
Your download file is called XMLASP.ZIP  in the file SA199903_down.zip

This can be purchased with all the other downloads on this page

 

 
 
Sidebar: Valid and Well-Formed XML
 
The XML I show in this article is well-formed but hasn't been validated. Well-formed XML is defined as XML that uses either end tags or "empty" tags to mark all the data. All XML tags must begin with alphabetic characters or an underscore. Where text is enclosed between two tags, the </ characters must begin the end tag that follows the enclosed text. If all the data is specified by attributes inside the tag (an "empty tag"), then /> marks the end of the tag.
 
So, for example, the <name> tag in the code that follows contains data between its tags, so the text is terminated with the </name> end tag. The "empty" <hrData> tag, on the other hand, uses attributes to contain all its data, so there's no separate end tag and the tag ends with the characters />.
 
XML data is validated with a Document Type Definition (DTD) or XML Schema, which specifies all the tags that are going to be used. However, the XML parser isn't aware of this restriction. So when the XML parser sees the /> characters, it knows that the <hrData> tag is self-contained. In addition, well-formed XML is case-sensitive, and all values assigned to attributes must be quoted (for instance, Name=Me is bad, Name="Me" is good). Finally, XML can be nested if the tags are fully enclosed. This XML is well-formed but, in the absence of a DTD or Schema, not validated:
 

<customer id="MPC">

  <name>Michael Corning</name>

  <hrData position="Escalation Engineer"

      manager="Kurt Philips"/>

</customer>

 
 
The following text isn't well-formed because the <name> tags aren't fully nested inside the <customer> tags, and the id attribute in the <customer> tag isn't quoted:
 

<customer id=MPC>

  <name>Michael Corning

  <hrData position="Escalation Engineer"

      manager="Kurt Philips"/>

</customer>

  </name>

 

Other Articles On The Site You May Wish To Read

 

Access 2003 and XML

Put XML to Use in Access 2000+

Taking Outlook and XML to Task