graphic with four colored squares
Cover page images (keys)

Introduction to XML Schemas

24 Aug 2005

Ivan Herman, W3C

Reminder: Role of DTD-s in XML

<?xml version="1.0"?>
  <Staff>
    <Employee>
      <Name>
        <Surname>Herman</Surname>
        <Firstname>Ivan</Firstname>
      </Name>
      <Title>Dr</Title>
      <Dept>INS</Dept>
      <Started>01.10.88</Started>
      <Email>ivan@w3.org</Email>
      <Tel>4163</Tel>
      <Fax>4199</Fax>
      <Building>Secondary</Building>
      <Room>C112</Room>
    </Employee>
  </Staff>

In traditional XML, possible tags are defined by a DTD

(Document Type Definition)

DTD Type Definition

<!ELEMENT Staff (Employee)*>
<!ELEMENT Employee (Name,Title,Dept,
   Started,Email?,Tel,Fax?,Building,Room)>
<!ELEMENT Name (Surname, FirstName+)>
<!ELEMENT Surname (#PCDATA)>
<!ELEMENT FirstName (#PCDATA)>
<!ELEMENT Title (#PCDATA)>
<!ELEMENT Dept (#PCDATA)>
<!ELEMENT Started (#PCDATA)>
<!ELEMENT Email (#PCDATA)>
<!ELEMENT Tel (#PCDATA)>
<!ELEMENT Fax (#PCDATA)>
<!ELEMENT Building (#PCDATA)>
<!ELEMENT Room (#PCDATA)>
<!ATTLIST Tel CDATA #REQUIRED>
<!ATTLIST Fax CDATA #REQUIRED>
<!ATTLIST Building CDATA #REQUIRED>

What is wrong with DTD-s?

So XML made them optional (well-formedness may suffice)!

Schemas

Like DTDs, but better:

Schema Version for our DTD

<element name="Staff" type="Staff_type">
<complexType name="Staff_type">
  <element name="Employee" type="Employee_type" 
    minOccurs="1" maxOccurs="unbounded"/>
</complexType>
<complexType name="Employee_type">
  <sequence>
    <element name="Name"  type="Name_type" />
    <element name="Title" type="string"/>
    ...
    <element name="Email" type="string"
      minOccurs="0" maxOccurs="1"/>
  </sequence>
<complexType>
<complexType name="Name_type">
  <sequence>
    <element name="Surname"   type="string"/>
    <element name="FirstName" type="string"
      minOccurs="1" maxOccurs="unbounded"/>
  </sequence>
</complexType>

Schema Terminology

Same Schema with Anonymous Types

  <element name="Staff">
    <complexType name="Staff_type">
      <element name="Employee" type="Employee_type"
          minOccurs="1" maxOccurs="unbounded"/>
    </complexType>
  </element>
  <complexType name="Employee_type">
    <sequence>
      <element name="Name"/>
        <sequence>
          <element name="Surname"   type="string"/>
          <element name="FirstName" type="string"
             minOccurs="1" maxOccurs="unbounded"/>
        </sequence>
      </element>
      <element name="Title" type="string"/>
      ...
    </sequence>
  </complexType>

You can repeat type names!

Groups

If you want to share parts of a definition, you can use groups:

<group name="Name_group">
  <sequence>
    <element name="Surname"   type="string"/>
    <element name="FirstName" type="string"
       minOccurs="1" maxOccurs="unbounded"/>
  </sequence>
</group>
<complexType name="Employee_type">
    <sequence>
      <group name="Name" ref="Name_group"/>
    <element name="Title" type="string"/>
    ...
  </sequence>
</complexType>

Any Order

The DTD statement:

  <!ELEMENT Employee (Name,Title,Dept,...

defines a fixed order. Whereas:

 <complexType name="Employee_type">
   <all>
     <element name="Name"/>
       <sequence>
         <element name="Surname"   type="string"/>
         <element name="FirstName" type="string"
            minOccurs="1" maxOccurs="unbounded"/>
       </sequence>
     </element> 
     <element name="Title" type="string"/>
     ...
   </all>
 </complexType>

allows any order

Content models

Define the content of complex types. It can be defined using:

Content model specifiers:
sequence,all,choice
Special attributes:
minOccurs, maxOccurs
Groups

Adding attributes

<element name="Employee">
  ...
  <attribute name="ssn" type="string">
</element>

Allows for the xml statement:

<Employee ssn="123456">...</Employee>

There are also attribute groups to simplify the specification.

Datatypes

Until now, we have copied the DTD, with some simplification.

But, one can:

The ability of defining complex data is one of the main strengths of schemas! For example:

Built-in simple datatypes

Schemas have a number of built-in simple datatypes:

eg, we could have said:

  <element name="Room" type="positiveInteger"/>

Restriction on simple datatypes

Eg, if the room numbers are numbered between 100 and 500:

<element name="Room" type="Room_Type"/>
  <simpleType name="Room_Type">
    <restriction base="integer">
      <xsd:minInclusive value="100"/>
      <xsd:maxInclusive value="500"/>
    </restriction>
  </simpleType>

Examples for restrictions

Examples for regular expression usage

For example, the type:

<simpleType name="Dutch_ZIP_Code">
  <restriction base="string">
    <pattern value="\d{4} {0,1}[A-Z]{2}"/>
 </restriction>
</simpleType>

defines the Dutch ZIP codes of the type "1183 NW" or "1183NW".

Use of enumeration

CWI has two buildings:

<simpleType name="Building_Type">
  <restriction base="string"> 
    <xsd:enumeration value="Main"/>
    <xsd:enumeration value="Secondary"/>
  </restriction>
</simpleType>

Ie, the following is valid:

<Building>Main</Building>

but

<Building>Third</Building>

is not (although valid XML and valid for the DTD)

List types

Lists of simple types can be constructed by:

  <element name="Buildings" type="Building_List"/>
  <simpleType name="Building_List">
    <list itemType="Building_Type">
  </simpleType>

Which allows for:

  <Buildings>Main Main Secondary Main Main Secondary</Buildings>

But the following is illegal:

 <Buildings>Main Other Secondary Main Main Secondary</Buildings>

Union types

Some phone numbers can be 0800-7663 or 0800-SOME:

<simpleType name="OnlyNumbers">
  <restriction base="string">
    <pattern value="\d{4}-\d{4}"/>
  </restriction>
</simpleType>
<simpleType name="NumbersAndNames">
  <restriction base="string">
    <pattern value="\d{4}-[A-Z]{4}"/>
  </restriction>
</simpleType>
<simpleType name="Number">
  <union memberTypes="OnlyNumbers NumbersAndNames"
</simpleType>

Type Hierarchies

Example for extension

<complexType name="Address">
  <sequence>
   <element name="name"   type="string"/>
   <element name="street" type="string"/>
   <element name="city"   type="string"/>
  </sequence>
</complexType>
<complexType name="USAddress">
  <complexContent>
    <extension base="Address">
       <sequence>
         <element name="state" type="USState"/>
         <element name="zip"   type="positiveInteger"/>
       </sequence>
     </extension>
   </complexContent>
 </complexType>

Ie, additional elements are added to the address (US State and Zip)

USAddress can now be used wherever an Address is expected.

Example for restriction

<complexType name="Name">
    <sequence>
      <element name="Surname"   type="string"/>
      <element name="FirstName" type="string"
         minOccurs="1" maxOccurs="unbounded"/>
    </sequence>
</complexType>
<complexType name="SimpleName">
  <complexContent>
    <restriction base="Name">
       <sequence>
         <element name="Surname"   type="string"/>
         <element name="FirstName" type="string"
            minOccurs="1" maxOccurs="1"/>
       </sequence>
     </restriction>
  </complexContent>
</complexType>

Ie, only one first name is allowed

Substitution Groups

A looser form of type replacement than subtypes

<element name="Room" type="positiveInteger"/>
<element name="Staff" type="Staff_type">
  ...
  <element ref="Room" minOccurs="1">
  ...
</element>
  ...
<simpleType name="B_Room" substitutionGroup="Room">
  <restriction base="string">
    <xsd:pattern value="[A-Z][0-9]+"/>
  </restriction>
</simpleType>

makes the following fragment legal:

<Staff>...<B_Room>M379</B_Room>...<Staff>

Combination of schemas

One can:

include
Textual include of another schema
import
Refer to another schema through namespaces
redefine
Get a type definition from another schema and redefine it

(this is not subtyping, the type name remains the same!)

Combination of specifications

Parts of the document is defined elsewhere, eg:

<element name="htmlExample">
  <complexType>
    <sequence>
      <any namespace="http://www.w3.org/1999/xhtml"
          minOccurs="1" maxOccurs="unbounded"
          processContents="skip"/>
    </sequence>
    <anyAttribute namespace="http://www.w3.org/1999/xhtml"/> 
  </complexType>
</element>

<htmlExample> element may contain any valid XHTML!

Specifying of Uniqueness

You can define uniqueness of certain values:

<unique name="dummy1">
  <selector>regions/zip</selector>
  <field>@code</field>
</unique>

code attribute must be unique for the elements zip that are children of regions.

Conclusions