W3C NOTE-rdfarch

W3C Data Formats

W3C NOTE 29-October-1997

This document: http://www.w3.org/TR/NOTE-rdfarch

Author: Tim Berners-Lee, W3C  <timbl@w3.org> and the W3C team.


Status of This Document

This document is a response to a frequently asked question about the relationship between the various specifications for data formats.  It is a simple explanation rather than a technical document. It will not be updated with time, carries no endorsement as an official specification itself. Please see the linked parts of the W3C web for details and the current status of developments at any time.

Abstract

XML is becoming increasingly adopted as a common syntax for expressing structure in data. Now the resource Description Framework (RDF), a layer on top of XML, provides a common basis for expressing semantics. Applications which allow programs to combine data logically will be built using RDF (and therefor XML) and this will enhance the modularity and extensibility of the Web. This is essential to its rapid future growth, multiplying together the strengths of new, independently developed, applications.


Data Format Architecture

This note gives an overview of some of the W3C data format specifications, and the relationships between them.

HTML on SGML, P3 etc on RDF on XML

XML replaces SGML and allows the expression of structrue; RDF allows the expression of semantics.

Expressing Structure

 The SGML standard gave text processing applications a common way of expressing the structure of data, even when different document types had different strcutures.

HyperText Markup Language (HTML) is one particular application based on SGML. Others include many document types defined for the CALS initiative, for example.

W3C's new Extensible Markup Language (XML) provides the same function as SGML in a simpler and more powerful way.  Future text markup from W3C will be built on XML rather than SGML. This may even apply to future versions of HTML, depending on technical work on back-compatiability and transition strategy.

Metadata

Increasingly, the Web is being used for machine-understandable content.  XML's arrival is very timely for this, as it provides a common syntax for a large number of new data formats.

Of these applications, many have more in common than simply being structured data.  The data represents machine-understandable assertions about objects on or off the web.  This "Metadata" will allow huge amounts of information in databases and existing applications to be put on the web not just of human browsing but for machine understanding: searching, reasoning and analysing.

Expressing semantics

The "Resource Description Framework"  (RDF) is a set of specifications which allow metadata applications to be combined, and to operate with a common way of expressing the semantics which they share. RDF is a further layer in top of XML.

Currently there is not only a large industry in applications to put infomation from legacy information systems onto the web, there is also an industry in applications which surf the web and, programmed with some idea of how the web pages were autmatically generated, retrieve the information and reconvert it into hard well defined machine-processable data.  RDF will allow this long route to be short-circuited, and allow programs to gather data directly.

Well-defined extensibility

With the benefit of experience, RDF makes a great step forward from specifications such as HTML and HTTP.  These specifications have been extended significantly duruing their evolution to date. The process has been one of the experimental addition of new tags.  The significance or importance of new tags has never been evident to software not involved in the experiment, with resulting danger of serious incompatiability.

RDF has the goal to allow documents to be written in a mixture of old standard vocabularies and specific new experimental or proprietary vocabularies, but with well defined way of knowing what is important, what can be ignored, and how old software can deduce or download and understanding of a new vocabulary.  This will hopefully allow powerful combinations of applications when, for example, documents can be made which combine in a well-defined way concepts for instance from banking, engineering and legal vocabularies.  The power of the web as an expressive medium will become the product (rather than the sum) of the individual developments.

 Furthermore, by allowing experimental vocabularies to intermix with standard vocabularies without threatening the integrity of the system, RDF will give a surer and faster deployment path for for new ideas free from the need for so many standards meetings.

Transition to RDF

The plan is for W3C's existing specifications such as PICS 1.0, written before XML and RDF, to make the transtion to a 2.0 version defined in terms of RDF and therefore XML.  This will make it possible to mix PICS labels with information about privacy (for example) from the P3P project, which will use RDF.  Many new applications will, like P3P, by directly built on RDF. The Consortium encourages existing applications to make the transtion to RDF in order to take advantage of this extensabuility, and the power of combination with other applications.

We hope this overview clarifies the relationships between thise specifications. For more information, please see the linked information on the W3C web site, http://www.w3.org.

The W3C team, October 1997


References

Links in the text, for those reading this on paper, are to:

HTML http://www.w3.org/MarkUp
Metadata http://www.w3.org/Metadata
P3P http://www.w3.org/P3
PICS http://www.w3.org/PICS
RDF http://www.w3.org/Metadata/RDF
SGML http://www.w3.org/MarkUp/SGML
XML http://www.w3.org/XML
W3C http://www.w3.org/