SWAD-Europe Deliverable 5.1: Schema Technology Survey

Project name:
Semantic Web Advanced Development for Europe (SWAD-Europe)
Project Number:
Workpackage name:
5. Integration with XML Technology
Workpackage description:
Deliverable title:
SWAD-Europe: Schema Technology Survey
Stephen Buswell, Dan Brickley, Brian Matthews
This report surveys the state of schema annotation and mapping technology. It takes a practical approach by targeting the work to the needs of developers, providing background to support our attempts to answer frequently asked questions on this subject. The report first reviews previous work on 'bridging languages', giving an overview of the major approaches and uses that to motivate further technical work to progress the state of the art in this area.

Snapshot release for discussion and editorial work. Further revisions are planned during WP4.

Comments on this document are welcome and should be sent to the public-esw@w3.org list. An archive of this list is available at http://lists.w3.org/Archives/Public/public-esw/

This report is part of SWAD-Europe Work package 5: Integration with XML Technology and addresses the topic of Schema annotation, and the relationship(s) between RDF and XML technologies. The variety of so-called 'schema languages' for the Web has caused some confusion. This document attempts to place them in context, and explore the state of the art in tools for mapping data between the different approaches.

To do this, we need to draw on a variety of examples. The following diagram depicts a very simple RDF Schema, as well as some instance data that uses it. The example was originally created for the RDF Schema specification, and is used here as a basis for explaining the RDF 'world view', and contrasting that with the perspective implicit in XML, XML DTDs and XML-based schema languages.

RDF Schema example 1:

RDF schema example 1

"Bridging Languages", An Introduction

XML encodings of information models fall largely into two groups:

This paper discusses ways in which some meaning can be attached to, or inferred from, XML structures. It then looks at two languages, Schema Adjunct Framework (SAF) and Meaning Definition Language (MDL) which build on this approach.

NB: Part of this analysis is based on ideas found Robert Worden's paper on MDL [MDL]

A Simple Model of Meanings

Broadly speaking, when we describe our universe of discourse, we want to make statements of three types:

These three concepts can be found in many modelling paradigms from UML to Entity-Relationship diagrams. In RDF Schema the single concept 'property' covers both 'Attribute/Value' and 'Relationship'.

The implied semantics of XML structures

'Structural' XML (DTD, XSD and the like) does not explicitly encode the information in the manner discussed above. However, inspecting XML instances, we can see some patterns.

Objects and Instances

In general, objects are represented by XML elements:
<winegrower name="Chateau Verpriced" >

implies the existence of a winegrower object. This could also be represented as, for example:
<organisation orgtype = "winegrower" name="Chateau Verpriced" >

 </organisation >
where the orgtype attribute is used to select a subclass of a more generic superclass, or even:
<organisation name="Chateau Verpriced" >
        <orgtype >winegrower</orgtype >

 </organisation >
where the orgtype subelement is used to select a subclass of a more generic superclass. Note that not every XML element corresponds necessarily to an object. Here 'name' corresponds to an Attribute/Value.
<winegrower >
        <name>Chateau Verpriced</name>

Note that one cannot distinguish between this and the preceding example relying purely on syntax of the source document - some higher-level interpretation is always required. A further complication is that there is not necessarily a simple relation between elements and objects: this may be context-dependent. Here the type of object represented by 'organisation' is modified by the context.
    <organisation name="Chateau Verpriced" >
    <organisation name="Chateau Verdrawn" >

    <organisation name="Cheap+Cheerful" >
    <organisation name="Rough+Ready" >
 </ winemerchants >
So in general we can say something like "An element with name E represents an object of type T ", where this may be further qualified by

Attribute Values

In the majority of cases, object attribute values are represented by the contents of XML attributes or subelements. The semantics of the following are indistinguishable:

<name>Vielles Bottes</name >

<wine name = "Vielles Bottes" >

There may a level of conditionality, for example in the generic uncommitted-schema style

    <wine-prop prop-name = "name"  prop-value = "Vielles Bottes" / >
    <wine-prop prop-name = "colour" prop-value = "noir" / >

Here the meaning of the 'prop-value' attribute depends on the contents of 'prop-name' attribute.


Relationships can be represented in XML structures in various ways. The simplest of these is nesting:

<winemerchant name =  "Bristol Bottlers" >
                <name>Vielles Bottes</name>
                <name>Weston's Finest</name>

Here the nesting establishes a relationship between the "Bristol Bottlers" winemerchant and the wines they sell. Note that there is no fixed semantics to the parent-child element relationship - we could as easily list the winemerchants by wine, as in:

                <name>Vielles Bottes</name>
        <winemerchant name =  "Bristol Bottlers" />
        <winemerchant name =  "Bath Brewers" />

Section to write on relationships implied by shared values (strings, ID-IDREF, URI, ..)

The Schema Adjunct Framework

The Schema Adjunct Framework (SAF, [2]) tries to extend the structural model of a document given by the schema with additional information about the 'meaning' of pieces of the information within the instance. In the SAF, this meaning is specified by adding information about the processing which should be applied when a particular item is received. One could see this as definition of meaning in terms of the operational semantics of the target system.

Such information includes mappings to relational databases, indexing parameters for native XML databases, business rules for additional validation, internationalization and localization parameters, or parameters used for presentation and input forms. Some of this information is used for domain-specific validation, some to provide information for domain-specific processing.

Information items are selected by means of XPath expressions; the processing information is given by reference to an external schema which represents the processing functionality of the target system.

SAF Examples (from the SAF draft)

<schema-adjunct target="http://www.example.com/pat-admit.xsd"
        xmlns:sql="http://www.example.com/sql-map.xsd" ...>


        <element context='admission'> 
        <element context='age'> 
        <attribute context='admission/@id'> 

Here we are specifying storage/retrieval rules for the information in a relational database. Note that we give rules at all levels: document, element, attribute. The 'context' attribute selects the instance data by means of an XPath expression.

Meaning Definition Language

Schema Languages such as XSD Schema and RELAXNG are concerned with the structure of XML documents. UML, DAML+OIL and RDF-Schema are concerned with meaning. Meaning Definition Language (MDL) is a brideg between structure and meaning. Using MDL, an XML language designer can express how the structure of an XML document conveys its meaning.

Meaning Definition Language (MDL, [1][3]) is a SAF implementation which extends the ideas behind SAF by specifying the processing rules for the schema instance information in terms of a formal representation of the knowledge by means of a UML class model or XSD Schema.

MDL provides:

Literature Overview

This literature review is intended to provide a useful central starting point for locating resources on the Web relevant to the workpackage. It was developed by taking a base set of resources and following the links, verifying links lead to the correct resource and removing those links which lead to subject matter too far removed from the subject of the workpackage (a necessarily subjective decision). The process is then repeated with the next set of links. This list makes no claim to be inclusive.

Resources have been classified under 6 top-level categories as below. Necessarily, many resources discuss issues related to more than one category; a more detailed analysis is given by the associated cross-reference table Resource Cross Reference:

Subject Area XML Core Techn -ology Schemas               Bridging       KN. Rep         Tools & Methods
      DTD XSD XDR SOX Schema -tron DSD RELAX -NG   SAF MDL Interop   RDF DAML+OIL OWL Topic Maps  
Extensible Markup Language (XML) 1.0 x                                    
XML Schema Part 0: Primer   x   x                              
XML Schema Part 1: Structures   x   x                              
XML Schema Part 2: Datatypes   x   x                              
XML-Namespaces Namespaces in XML x                                    
XML Information Set x                                    
XML Transform -ations (XSLT) Version 1.0 x                                    
XML Path Language (XPath) Version 1.0 x                                    
Document Definition Markup Language (DDML) Specification - Version 1.0   x                                  
Document Content Description for XML   x                                  
XML-Data   x                                  
XML-Data Reduced (XDR)   x     x                            
Docu- ment Struct -ure Descrip- tion 1.0   x                                  
RELAX   x             x                    
RELAX NG   x             x                    
Sch- ema for Object-Oriented XML 2.0   x       x                          
RDF Model and Syntax                           x x        
RDF Schemas                           x x        
An RDF Schema for the XML Inform- ation Set   x                     x   x        
Schema Adjunct Frame- work                   x x   x            
Meaning Defini- tion Lang- uage                   x x   x            
Schem- atron   x         x                        
ISO/IEC 13250: 2000 Topic Maps                           x       x  
XML Topic Maps (XTM) 1.0                           x       x  
Comp- arative Analysis of 6 XML schema lang- uages   x x x x x x x                      
Descr- ibing Your Data: DTDs and XML Schemas                                      
Schema- tron Tutorial             x                        
Valid- ating XML with schematron             x                        
Schema- chine             x                        
XML Pipeline Definition Language Version 1.0 x                                    
The 'Cambridge Communique'                         x   x        
Markup Languages: Comparison and Examples - XML RDF DAML                         x   x x      
Dublin Core in RDF/XML                         x   x        
DTD's for the Dublin Core Element Set x                                    
Harvesting RDF from XLINKs                         x   x        
Language Comparisons - XML RDF DAML+OIL OWL                         x x x x x    
XML Processing: position paper                         x   x        
Bridging the Gap between RDF and XML                         x   x        
Connecting XML RDF and Web Technologies for Representing Knowledge on the Semantic Web x     x         x       x   x        
Why RDF model is different from the XML model                         x   x        
The Yin Yang Web: XML Syntax and RDF Semantics                         x x x        
Comparison of ontology languages                         x x x x      
Topic maps RDF DAML OIL                         x x x x      
A Topic Map Data Model: An Infoset-based Proposal                           x       x  
tolog: A topic map query language                           x       x  
On the integration of Topic Map data and RDF data                         x   x     x  
RDF and TopicMaps: An Exercise in Convergence                         x   x     x  
XML Topic Maps through RDF glasses                         x   x     x  
Ontology Development 101: A Guide to Creating Your First Ontology                           x         x
Ontology Editing Tools                           x         x
Report on ontology tools                           x         x
Evaluation of Ontology-based Tools                           x         x
Common European Research Information Format (CERIF)                                     x
Architectural Principles of the World Wide Web x                                   x

Topics Covered

Language Specifications

Extensible Markup Language (XML) 1.0


XML Schema Part 0: Primer


XML Schema Part 1: Structures


XML Schema Part 2: Datatypes


XML-Namespaces Namespaces in XML


XML namespaces provide a simple method for qualifying element and attribute names used in Extensible Markup Language documents by associating them with namespaces identified by URI references.

XML Information Set


This specification provides a set of definitions for use in other specifications that need to refer to the information in an XML document.

XML Transformations (XSLT) Version 1.0


XML Path Language (XPath) Version 1.0


XPath is a language for addressing parts of an XML document, designed to be used by both XSLT and Xpointer

Document Definition Markup Language (DDML) Specification, Version 1.0


Document Content Description for XML




This paper describes an XML vocabulary for schemas, that is, for defining and documenting object classes. It can be used for classes which as strictly syntactic (for example, XML) or those which indicate concepts and relations among concepts (as used in relational databases, KR graphs and RDF). The former are called "syntactic schemas;" the latter "conceptual schemas."

XML-Data Reduced (XDR)


Document Structure Description 1.0


See also "DSD: A Schema Language for XML", N. Klarlund, A. Moller, M. I. Schwatzbach, Proc. 3rd ACM Workshop on Formal Methods in Software Practice, 2000.

RELAX (REgular LAnguage description for XML)




RELAX NG, the next generation schema language for XML: clean, simple and powerful.

Schema for Object-Oriented XML 2.0


RDFMS Resource Description Framework (RDF) Model and Syntax


RDF Schema Resource Description Framework (RDF) Schemas


An RDF Schema for the XML Information Set


This W3C Note defines an RDF schema for the XML Infoset.

Schema Adjunct Framework


Meaning Definition Language




An XML Structure Validation Language using Patterns in Trees

ISO/IEC 13250:2000 Topic Maps


XML Topic Maps (XTM) 1.0


This specification provides a model and grammar for representing the structure of information resources used to define topics, and the associations (relationships) between topics. Names, resources, and relationships are said to be characteristics of abstract subjects, which are called topics. Topics have their characteristics within scopes: i.e. the limited contexts within which the names and resources are regarded as their name, resource, and relationship characteristics. One or more interrelated documents employing this grammar is called a "topic map."

Schema Language Reviews, Validation Issues

Comparative Analysis of 6 XML schema languages


As XML is emerging as the data format of the internet era, there is an substantial increase of the amount of data in XML format. To better describe such XML data structures and constraints, several XML schema languages have been proposed. This paper presents a comparative analysis of six noteworthy XML schema languages: XML DTD; XML Schema; XDR; SOX ;Schematron; DSD.

Describing Your Data: DTDs and XML Schemas


Document Type Definitions and XML Schemas both provide descriptions of document structures. The emphasis is on making those descriptions readable to automated processors such as parsers, editors, and other XML-based tools. They may also carry information for human consumption, describing what different elements should contain, how they should be used, and what interactions may take place between parts of a document

Schematron Tutorial

http://www.zvon.org/HTMLonly/SchematronTutorial/General/contents.html http://www.zvon.org/HTMLonly/SchematronTutorial/General/contents.html

The Schematron is a simple and powerful Structural Schema Language

Validating XML with schematron


Schematron is an XML schema language, and it can be used to validate XML. (Requires familiarity with XML 1.0, DTDs, XSLT, and Xpath).



This note proposes a possible framework for supporting modular XML validation.

XML Pipeline Definition Language Version 1.0


This Note describes the features and syntax for XML Pipeline Definition Language. Pipeline is an XML vocabulary for describing the processing relationships between XML resources. A pipeline document specifies the inputs and outputs to XML processes and a pipeline controller uses this document to figure out the chain of processing that must be executed in order to get a particular result.

Semantic-Syntactic Relationship, Differences, Interopability

The 'Cambridge Communique'


This discusses the architectural relationship between the schema work being undertaken within XML and RDF activities.

Markup Languages: Comparison and Examples - XML, RDF, DAML


Contains a comparison table showing the tradeoffs and differences among these markup languages.

Dublin Core in RDF/XML


The Dublin Core Metadata Element Set V1.1 (DCMES) can be represented in many syntax formats. This document explains how to encode the DCMES in RDF/XML, provides a DTD to validate the documents and describes a method to link them from web pages.

DTD's for the Dublin Core Element Set


Harvesting RDF from XLINKs


Both XLink and RDF provide a way of asserting relations between resources. RDF is primarily for describing resources and their relations, while XLink is primarily for specifying and traversing hyperlinks. However, the overlap between the two is sufficient that a mapping from XLink links to statements in an RDF model can be defined. Such a mapping allows XLink elements to be harvested as a source of RDF statements. XLink links (hereafter, "links") thus provide an alternate syntax for RDF information that may be useful in some situations.

This Note specifies such a mapping, so that links can be harvested and RDF statements generated. The purpose of this harvesting is to create RDF models that, in some sense, represent the intent of the XML document.

Language Comparisons - XML, RDF, DAML+OIL, OWL


Contains a table summarizes differentiating language features available in XML, RDF, DAML+OIL, and OWL.

XML Processing: position paper


This paper outlines how RDF fits into the XML family of specifications; how RDF software components and vocabularies might relate to the XML processing environment. XML Documents represent the XML Infoset; RDF graphs represent what those Infosets are trying to tell us about objects, their inter-relationships and properties.

Bridging the Gap between RDF and XML


The convoluted syntax of the RDF 1.0 specification is a major obstacle for the broad acceptance of RDF. The goal of this proposal is to allow every "legacy" XML document to have an RDF model. The advantages of this approach include:

  1. The semantics of XML documents can be made explicit. Both structural and semantic markup can coexist in the same document.
  2. RDF can be used to annotate existing XML documents.
  3. "RDF-enabled" XML can still be rendered and transformed using XSLT.
  4. Using small changes in XML DTDs, meaningful RDF documents can be produced from original XML documents. But every XML document (even those without DTDs) has a default RDF interpretation.

Connecting XML, RDF and Web Technologies for Representing Knowledge on the Semantic Web


In order to represent knowledge for it to be usable web-wide and in interoperable ways, it should be done using well-known and appropriate web technologies. These include XML and RDF. RDF can be used with many XML technologies such as XML Namespaces, XML Schema, RELAX NG, XSLT and is related to many more.

This paper describes how these technologies are best used together, their relationships and where each of them can be appropiately applied.

Why RDF model is different from the XML model


This note is an attempt to answer the question, "Why should I use RDF - why not just XML?". This note assumes that the XML data model in all its complexity, and the RDF syntax as in RDF Model and Syntax, in all its complexity. It doesn't try to map one directly onto the other -- it expresses the RDF model using XML.

The Yin Yang Web: XML Syntax and RDF Semantics


XML is the W3C standard document format for writing and exchanging information on the Web. RDF is the W3C standard model for describing the semantics and reasoning about information on the Web. RDF and XML are based on two different paradigms.

This paper develops a model-theoretic semantics for the XML XQuery 1.0 and XPath 2.0 Data Model, which provides a unified model for both XML and RDF. This unified model can serve as the basis for Web applications that deal with both data and semantics. The paper shows how the RDF world can take advantage of XML query languages, and how the XML world can take advantage of the reasoning capabilities available for RDF.

Knowledge Representation Language Reviews

Comparison of ontology languages


Ontology Overview from Motorola Labs with a comparison of ontology languages

Topic maps, RDF, DAML, OIL


This paper provides quick introductions to each of the technologies, highlighting the similarities and differences between them. These technologies all come from very different backgrounds, and tend to be presented in very different ways, and yet on closer examination their anatomies turn out to be surprisingly similar.

The starting point of the paper is the observation that all these technologies provide a restricted set of mechanisms for making statements about the Universe of Discourse. The introductions to the technologies are done by describing how concepts such as 'thing', 'relationships between things', 'properties of things', 'types of things', 'names of things', 'kinds of things' and so on are represented within each technology.

After comparing the technologies it is shown how each of these technologies relate to one another and to what extent they build on each other or compete with each other. More importantly, the paper also shows how data can be moved from one representation to another, and how tools implementing the various technologies can be made to work together.

A Topic Map Data Model: An Infoset-based Proposal


This document defines an abstract model for topic maps which makes explicit the implicit data models of ISO 13250 and XTM 1.0. It also defines a processing model for XTM 1.0 based on the data model.

tolog: A topic map query language


This paper describes a query language for topic maps.

On the integration of Topic Map data and RDF data


RDF and TopicMaps: An Exercise in Convergence


XML Topic Maps through RDF glasses


The information represented in a topic map, expressed in one of the XML interchange syntaxes for topic maps, can, at some level of detail, be translated into information that is expressed in one of the XML interchange syntaxes for RDF information. The translated information can then be used in the context of RDF applications that would otherwise not be able to use it.

Bridging Language Reviews

Meaning Definition Language


Schema Languages such as XSD Schema and RELAXNG are concerned with the structure of XML documents. UML, DAML+OIL and RDF-Schema are concerned with meaning. Meaning Definition Language (MDL) is a brideg between structure and meaning. Using MDL, an XML language designer can express how the structure of an XML document conveys its meaning.

Schema Adjunct Framework


Schema adjuncts are a mechanism for extending XML schema languages, and for providing information from such extensions to programs that process XML instances. To process XML instances for a given schema, many environments need additional information which is typically not available in the schema itself. Such information includes mappings to relational databases, indexing parameters for native XML databases, business rules for additional validation, internationalization and localization parameters, or parameters used for presentation and input forms.

The Schema Adjunct Framework is an XML-based language used to associate domain-specific data with schemas and their instances, effectively extending the power of existing XML schema languages such as DTDs or XML Schema.

Ontologies, Methodologies, Tools, Architecture

Ontology Development 101: A Guide to Creating Your First Ontology


Ontology Editing Tools


This survey covers software tools that have ontology editing capabilities and are in use today. The tools may be useful for building ontology schemas (terminological component) alone or together with instance data. Ontology browsers without an editing focus and other types of ontology building tools are not included. Otherwise, the objective was to identify as broad a cross-section of editing software as possible. The editing tools are not necessarily production level development tools, and some may offer only limited functionality and user support.

Report on ontology tools


OntoWeb report on a comparative study of 11 ontology editors plus several other ontology tools

Evaluation of Ontology-based Tools


Common European Research Information Format (CERIF)


The CERIF- SW project brings together expertise of research institution in developing distributed information system of heterogeneous data sources, database harmonization, providing data access to research data, developing Knowledge Management and Semantic Web solutions, metadata creation and use to develop a set of tools for building next generation of research information systems, and to integrate using new technologies a set of European Research Information Systems for advanced research information retrieval capabilities.

Architectural Principles of the World Wide Web


The World Wide Web is a networked information system. Web Architecture is the set of principles that all agents in the system follow to create the large-scale effect of a shared information space. Identification, data formats, and protocols are the main technical components of Web Architecture, but the large-scale effect depends on social behavior as well.

This document strives to establish a reference set of principles for Web architecture

RSS 1.0+ Sample Scenario (draft)

Some in the RSS community would prefer to use a non-RDF serialization for RSS feeds, to avoid the percieved syntactic burden of using RDF (eg. serialiszation rules). Others have been creating extension vocabularies for use in RSS feeds, to augment the basic structure of a feed with additional information, about the channel, the documents mentioned in the channel, or the things described by those examples. The goal here will be to explore the applicability of schema annotation to this problem: is it possible to deploy mixed-namespace RSS using annotated XML schemas, instead of RDF's XML syntax. If so, what does this mean for practicalities of defining extension vocabularies for use in RSS? eg. are those vocabularies also re-usable in non-RSS RDF documents (eg. Dublin Core, Creative Commons, etc.).

reagleMIT thinks a response to DanC is scaling the SW, most of the RSS people won't care and won't do the work, but the SW folks need to spread some pixie-dust and suck in/import whole realms of XML apps into the SW for them (and part of this importation is problably this chunk is ambigous).

WSDL Sample Scenario

See EricP's work on this, and other...

P3P: Brian's mapping

Apple plist files

PropertyList-1.0.dtd , and XSLT to convert it to RDF. Also sample data.

RDF calendar tests

See RDF Calendar workspace, esp test file collection: xcal to rdf scenario.


[MDL] A Meaning Definition Language, R.Worden, Charteris 2001 http://www.charteris.com/mdl/

[SAF] Schema Adjunct Framework http://www.tibco.com/solutions/products/extensibility/resources/saf_dec2000.htm

[XMD] Professional XML Meta Data, Ahmed et al, Wrox. ISBN 1-861004-51-6 (MDL - ch 8)

Resource Description Framework (RDF) Model and Syntax Specification, O. Lassies and R. Swick, Editors. World Wide Web Consortium. 22 February 1999. This version is http://www.w3.org/TR/1999/REC-rdf-syntax-19990222. The latest version of RDF M&S is available at http://www.w3.org/TR/REC-rdf-syntax.
RDF Vocabulary Description Language 1.0: RDF Schema, D. Brickley, E.V. Guha, Editors, World Wide Web Consortium W3C Working Draft, work in progress, 19 March 2002. This version of the RDF Primer is http://www.w3.org/TR/2002/WD-rdf-schema-20020430/. The latest version of the RDF Primer is at http://www.w3.org/TR/rdf-schema/.
Storing RDF in a relational database, Sergey Melnik, Stanford University, 2000-2001
The Syntactic Web - Syntax and Semantics on the Web, Jonathan Robie, Software AG, USA in proceedings XML Conference, December 9-14 2001, Orlando Florida, USA.
RDF Site Summary (RSS) 1.0, RSS-DEV Working Group, 2001-05-30

Further References


This section tracks links, content and ideas that should be integrated into the document. For now I'm just hoarding links I don't want to miss in the final report. (mail me if I've missed anything obvious from this survey.

B References - Tools and Projects