<?xml version="1.0"?>
<Staff>
<Employee>
<Name>
<Surname>Herman</Surname>
<Firstname>Ivan</Firstname>
</Name>
<Title>Dr</Title>
<Dept>INS</Dept>
<Started>01.10.88</Started>
<Email>ivan@w3.org</Email>
<Tel>4163</Tel>
<Fax>4199</Fax>
<Building>Secondary</Building>
<Room>C112</Room>
</Employee>
</Staff>
In traditional XML, possible tags are defined by a DTD
(Document Type Definition)
<!ELEMENT Staff (Employee)*>
<!ELEMENT Employee (Name,Title,Dept,
Started,Email?,Tel,Fax?,Building,Room)>
<!ELEMENT Name (Surname, FirstName+)>
<!ELEMENT Surname (#PCDATA)>
<!ELEMENT FirstName (#PCDATA)>
<!ELEMENT Title (#PCDATA)>
<!ELEMENT Dept (#PCDATA)>
<!ELEMENT Started (#PCDATA)>
<!ELEMENT Email (#PCDATA)>
<!ELEMENT Tel (#PCDATA)>
<!ELEMENT Fax (#PCDATA)>
<!ELEMENT Building (#PCDATA)>
<!ELEMENT Room (#PCDATA)>
<!ATTLIST Tel CDATA #REQUIRED>
<!ATTLIST Fax CDATA #REQUIRED>
<!ATTLIST Building CDATA #REQUIRED>
So XML made them optional (well-formedness may suffice)!
Like DTDs, but better:
<element name="Staff" type="Staff_type">
<complexType name="Staff_type">
<element name="Employee" type="Employee_type"
minOccurs="1" maxOccurs="unbounded"/>
</complexType>
<complexType name="Employee_type">
<sequence>
<element name="Name" type="Name_type" />
<element name="Title" type="string"/>
...
<element name="Email" type="string"
minOccurs="0" maxOccurs="1"/>
</sequence>
<complexType>
<complexType name="Name_type">
<sequence>
<element name="Surname" type="string"/>
<element name="FirstName" type="string"
minOccurs="1" maxOccurs="unbounded"/>
</sequence>
</complexType>
<element name="Staff">
<complexType name="Staff_type">
<element name="Employee" type="Employee_type"
minOccurs="1" maxOccurs="unbounded"/>
</complexType>
</element>
<complexType name="Employee_type">
<sequence>
<element name="Name"/>
<sequence>
<element name="Surname" type="string"/>
<element name="FirstName" type="string"
minOccurs="1" maxOccurs="unbounded"/>
</sequence>
</element>
<element name="Title" type="string"/>
...
</sequence>
</complexType>
You can repeat type names!
If you want to share parts of a definition, you can use groups:
<group name="Name_group">
<sequence>
<element name="Surname" type="string"/>
<element name="FirstName" type="string"
minOccurs="1" maxOccurs="unbounded"/>
</sequence>
</group>
<complexType name="Employee_type">
<sequence>
<group name="Name" ref="Name_group"/>
<element name="Title" type="string"/>
...
</sequence>
</complexType>
The DTD statement:
<!ELEMENT Employee (Name,Title,Dept,...
defines a fixed order. Whereas:
<complexType name="Employee_type">
<all>
<element name="Name"/>
<sequence>
<element name="Surname" type="string"/>
<element name="FirstName" type="string"
minOccurs="1" maxOccurs="unbounded"/>
</sequence>
</element>
<element name="Title" type="string"/>
...
</all>
</complexType>
allows any order
Define the content of complex types. It can be defined using:
sequence,all,choice
minOccurs, maxOccurs
all
or maxOccurs
do not exist!)<element name="Employee">
...
<attribute name="ssn" type="string">
</element>
Allows for the xml statement:
<Employee ssn="123456">...</Employee>
There are also attribute groups to simplify the specification.
Until now, we have copied the DTD, with some simplification.
But, one can:
The ability of defining complex data is one of the main strengths of schemas! For example:
Schemas have a number of built-in simple datatypes:
eg, we could have said:
<element name="Room" type="positiveInteger"/>
Eg, if the room numbers are numbered between 100 and 500:
<element name="Room" type="Room_Type"/>
<simpleType name="Room_Type">
<restriction base="integer">
<xsd:minInclusive value="100"/>
<xsd:maxInclusive value="500"/>
</restriction>
</simpleType>
length, minLength, maxLength
for stringsminInclusive, maxInclusive, minExclusive
for numbersduration, period
for time related typespattern
(regular expressions) for stringsenumeration
for allFor example, the type:
<simpleType name="Dutch_ZIP_Code">
<restriction base="string">
<pattern value="\d{4} {0,1}[A-Z]{2}"/>
</restriction>
</simpleType>
defines the Dutch ZIP codes of the type "1183 NW" or "1183NW".
CWI has two buildings:
<simpleType name="Building_Type">
<restriction base="string">
<xsd:enumeration value="Main"/>
<xsd:enumeration value="Secondary"/>
</restriction>
</simpleType>
Ie, the following is valid:
<Building>Main</Building>
but
<Building>Third</Building>
is not (although valid XML and valid for the DTD)
Lists of simple types can be constructed by:
<element name="Buildings" type="Building_List"/>
<simpleType name="Building_List">
<list itemType="Building_Type">
</simpleType>
Which allows for:
<Buildings>Main Main Secondary Main Main Secondary</Buildings>
But the following is illegal:
<Buildings>Main Other Secondary Main Main Secondary</Buildings>
Some phone numbers can be 0800-7663 or 0800-SOME:
<simpleType name="OnlyNumbers">
<restriction base="string">
<pattern value="\d{4}-\d{4}"/>
</restriction>
</simpleType>
<simpleType name="NumbersAndNames">
<restriction base="string">
<pattern value="\d{4}-[A-Z]{4}"/>
</restriction>
</simpleType>
<simpleType name="Number">
<union memberTypes="OnlyNumbers NumbersAndNames"
</simpleType>
<complexType name="Address">
<sequence>
<element name="name" type="string"/>
<element name="street" type="string"/>
<element name="city" type="string"/>
</sequence>
</complexType>
<complexType name="USAddress">
<complexContent>
<extension base="Address">
<sequence>
<element name="state" type="USState"/>
<element name="zip" type="positiveInteger"/>
</sequence>
</extension>
</complexContent>
</complexType>
Ie, additional elements are added to the address (US State and Zip)
USAddress
can now be used wherever an Address
is expected.
<complexType name="Name">
<sequence>
<element name="Surname" type="string"/>
<element name="FirstName" type="string"
minOccurs="1" maxOccurs="unbounded"/>
</sequence>
</complexType>
<complexType name="SimpleName">
<complexContent>
<restriction base="Name">
<sequence>
<element name="Surname" type="string"/>
<element name="FirstName" type="string"
minOccurs="1" maxOccurs="1"/>
</sequence>
</restriction>
</complexContent>
</complexType>
Ie, only one first name is allowed
A looser form of type replacement than subtypes
<element name="Room" type="positiveInteger"/>
<element name="Staff" type="Staff_type">
...
<element ref="Room" minOccurs="1">
...
</element>
...
<simpleType name="B_Room" substitutionGroup="Room">
<restriction base="string">
<xsd:pattern value="[A-Z][0-9]+"/>
</restriction>
</simpleType>
makes the following fragment legal:
<Staff>...<B_Room>M379</B_Room>...<Staff>
One can:
(this is not subtyping, the type name remains the same!)
Parts of the document is defined elsewhere, eg:
<element name="htmlExample">
<complexType>
<sequence>
<any namespace="http://www.w3.org/1999/xhtml"
minOccurs="1" maxOccurs="unbounded"
processContents="skip"/>
</sequence>
<anyAttribute namespace="http://www.w3.org/1999/xhtml"/>
</complexType>
</element>
<htmlExample>
element may contain any valid XHTML!
namespace
can be set to, eg, ##any
(any well formed XML could be used)
processContents
can set to, eg, strict
(schema validation of the content, too)
You can define uniqueness of certain values:
<unique name="dummy1">
<selector>regions/zip</selector>
<field>@code</field>
</unique>
code
attribute must be unique for the elements zip
that are children of regions
.