XML representation of a relational database

A relational database consists of a set of tables, where each table is a set of records. A record in turn is a set of fields and each field is a pair field-name/field-value. All records in a particular table have the same number of fields with the same field-names.

This article describes an application of (a simple subset of) XML that can be used to represent such a database.

The relational data-model also defines certain constraints on the tables and defines operations on them. We are not concerned with the constraints and operations here. In other words, we are not trying to create a query language or a data-definition language, just a language that captures the data in a database or in a particular view of the database.

Several such languages are possible, of course, and it not hard to come up with alternative and equally valid ones as the one described below.

Introduction

The description of the database above suggests a simple nesting of fields inside records inside tables inside databases. Here is an example of a single database with two tables:

<!doctype mydata "http://www.w3.org/mydata">
<mydata>

<authors>
<author>
<name>Robert Roberts</name>
<address>10 Tenth St, Decapolis</address>
<editor>Ella Ellis</editor>
<ms type="blob">ftp://docs/rr-10</ms>
<born>1960/05/26</born>
</author>

<author>
<name>Tom Thomas</name>
<address>2 Second Av, Duo-Duo</address>
<editor>Ella Ellis</editor>
<ms type="blob">ftp://docs/tt-2</ms>
</author>

<author>
<name>Mark Marks</name>
<address>1 Premier, Maintown</address>
<editor>Ella Ellis</editor>
<ms type="blob">ftp://docs/mm-1</ms>
</author>
</authors>

<editors>
<editor>
<name>Ella Ellis</name>
<telephone>7356</telephone>
</editor>
</editors>

</mydata>

The format is verbose, since XML is verbose. On the other hand, it compresses well with standard compression tools. It is also easy to print the database (or a part of it) with standard XML browsers and a simple style sheet.

The database

A relational can be modeled as a hierarchy of depth four: the database consists of a set of tables, which in turn consist of records, which in turn consist of fields.

We can model the database with a document node and its associated element node:

<!doctype name "url">
<name>
table1
table2
...
tablen
</name>

The name is arbitrary. The url is optional, but can be used to point to information about the database. We don't define what it points to. [Or should we?]

The order of the tables is also arbitrary, since a relational database defines no ordering on them.

The table

Each table of the database is represented by an element node with the records as its children:

<name>
record1
record2
...
recordm
</name>

The name is the name of the table. The order of the records is arbitrary, since the relational data model defines no ordering on them.

The record

A record is also represented by an element node, with its fields as children:

<name>
field1
field2
...
fieldm
</name>

The name is arbitrary, since the relational data model doesn't define a name for a record type. However, in XML it cannot be omitted. One scheme is to re-use the name of the table, or, if the table has a name that is a plural, to use the singular form (`persons' -> `person', `parts' -> `part').

The order of the fields is again immaterial.

The field

A field is represented as an element node with a data node as its only child:

<name type="t">
d
</name

If d is omitted, it means the value of the fields is the empty string.

The value of t indicates the type of the value (such as string, number, boolean, date). [Should we give a complete list?] If the type attribute is omitted, the type is assumed to be `string.'

Null values

Null values are represented by the absence of the field.

Note that this is different from leaving the field empty, which indicates that the field contains a string of length zero. Null values have special properties in relational databases. For example, two fields both with null values are not equal (in contrast to two fields with zero-length strings, which are).

Strong typing

Tim Bray has written a proposal for adding strong typing to XML, using a set of fixed attributes. The above example would get attributes declared as follows:

<!doctype mydata "http://www.w3.org/mydata">
<mydata>

<authors>
<?xml default name
    xml-sqltype="varchar"
    xml-sqlsize="40"
?>
<?xml default address
    xml-sqltype="varchar"
    xml-sqlsize="40"
?>
...
<?xml default born
    xml-sqltype="date"
    xml-sqlmin="1900/01/01"
    xml-sqlmax="1990/01/01"
?>
...
</authors>

<editors>
<?xml default name
    xml-sqltype="varchar"
    xml-sqlsize="40"
?>
...
</editors>

</mydata>

etc. This will allow an application that knows about these attributes to check the content of each field.

References

The relational data model is explained in many books. Two examples are:

XML is currently a draft and `The XML data model' is one proposal for developing that draft:

Postscript

If you like to edit XML by hand, with the help of a text editor, you can of course use some pretty-printing to make things easier to recognize. Here is one example:

<!doctype mydata "http://www.w3.org/mydata">
<mydata>

<AUTHORS>

<author>
<name                >Robert Roberts<          /name>
<address             >10 Tenth St, Decapolis<  /address>
<editor              >Ella Ellis<              /editor>
<ms type="blob"      >ftp://docs/rr-10<        /ms>
</author>

<author>
<name                >Tom Thomas<              /name>
<address             >2 Second Av, Duo-Duo<    /address>
<editor              >Ella Ellis<              /editor>
<ms type="blob"      >ftp://docs/tt-2<         /ms>
</author>

<author>
<name                >Mark Marks<              /name>
<address             >1 Premier, Maintown<     /address>
<editor              >Ella Ellis<              /editor>
<ms type="blob"      >ftp://docs/mm-1<         /ms>
</author>

</AUTHORS>

<EDITORS>

<editor>
<name                >Ella Ellis<              /name>
<telephone           >7356<                    /telephone>
</editor>

</EDITORS>

</mydata>


Bert Bos
Last modified: $Date: 1997/07/11 19:39:01 $