Notes on RDF, Xlink as linked information systems

Initial version: 2000-02-02
Current version: 2000-09-08
Author: Dan Brickley <danbri@w3.org>

Status:
These are informal notes concerning the kind of application that might be facilitated by interoperability between XLink and the Resource Description Framework (RDF). They represent only the author's view of these issues, and are made available as input to any RDF Interest Group discussions on this topic.

Recent changes:

Removed some material on the details of XLink. Other documents do a better job in that area.

Overview

The purpose of this document is to provide a broad overview of some areas in which HTML, XHTML, XLink and RDF linking systems might need to interoperate. It does not specify an RDF to XLink mapping. For recent work in this area see the discussion documents section of the RDF Interest Group home page.

This document explores the representation of typed links using examples such as those enumerated in the HTML specifications. While simple HTML links are semantically uninformative (ie. all we're told is that these two things are connected in some way), the XML Linking Language raises the prospect of a Web built from more informative relationships. By mining the Web for Xlink-described relationships it should be possible to build RDF models that aggregate linking information from multiple sources. To achieve this we need to map from Xlink constructs to the RDF data model. This document serves to motivate (rather than specify) such a mapping, through articulating a scenario in which various kinds of linking information might usefully be aggregated.

Context: RDF, XML Linking, HTML

Resource Description Framework (RDF)

RDF provides a syntax-neutral data model based around the notion of URI-named links between URI-named resources. The RDF approach uses this common model to represent attributes of, and relationships between, Web-identifiable resources. RDF uses the term 'property' in both cases to describe these named relationships or links. RDF properties can be used to describe anything name-able with a URI (this includes, but is not limited to, XML documents).

XML Linking Language (Xlink)

The XML linking work is intended to provide support for simple and complex inter-document linking constructs within and between XML documents and other Web resources (such as images). The XML Linking Language (XLink) allows elements to be inserted into XML documents in order to create and describe links between resources.

Hypertext Markup Language (HTML)

The HTML 4.01 specification defines a widely used document format for the Web. HTML 4.01 is not expressed in XML; the reformulation of HTML4 in XML 1.0 is known as XHTML. The XHTML 1.0 specification is a fairly literal reflection of of HTML into an XML representation. Both flavours of HTML provide (at least) two mechanisms for representing linking information.

The A element: The 'A HREF=' construct in HTML provides the basic mechanism that links most Web pages today. As currently deployed, these links are almost invariably untyped, in that we know nothing of the nature of the link other than that the two resources are connected in some way. Despite this, the HTML 4.0x specifications define ''rel' and 'rev' constructs that allow a space separate list of (pre-arranged) link types to be associated with individual hyperlinks, ie. that describes the relationship from the current document to the anchor specified by the href attribute.
The LINK element: The header section of an HTML document can also contain assertions about the (typed) links that hold between the current document and other Web resources. These are 'document relationships' in that they do not have a specific origin point within a document. This mechanism can be used for style sheets, simple metadata and sitemap-style information (eg. links to 'next page in sequence','contents page','copyright' etc.). A basic set of typed links is defined in the HTML specification ( 4.0 section 6.12), 4.01 section 6.12).

The HTML link types defined in section 6.12 of the HTML 4.0x specifications include: alternate, stylesheet, start, next, prev, contents, index, glossary, copyright, chapter, section, subsection, appendix, help, bookmark.

Working Scenario

An example scenario can be used to illustrate the kinds of application made possible by the XML/RDF/HTML linking models.

Consider four hypothetical documents, a.html, b.html, c.html, d.html. The first (a.html) is the front page of some collection of documents, and serves as an overview of the contents provided by b, c and d.html. These are in sequence.

Note: the ad-hoc representation of links/relations/properties used here has been adopted as a neutral representation independent of HTML, XML or RDF syntax. When we write next( a.html, b.html) this means that there is a link of type 'next' from the resource whose (relative) URI is a.html to the resource b.html. Prefixes such as 'html4:' and 'dc:' are an informal indication of the importance of URI-names, ie. namespaces. Readers should assume that such prefixes are associated with URIs, and that relative URI references have a full expanded equivalent. These examples are mostly couched at the level of document-to-document linking, but should apply quite naturally to XPointer-based linking scenarios.

We can describe the 'table of contents' role played by a.html as a 'contents' link from the sub-pages b.html, c.html and d.html to the overview document a.html:

html4:contents (b.html, a.html)
html4:contents (c.html, a.html)
html4:contents (d.html, a.html)

We can also represent the sequencing of these documents using HTML4 typed links ('next' and 'prev' are inverse relations, here we show both, though the 'prev' relations don't tell us anything new)

html4:next ( b.html, c.html )
html4:next ( c.html, d.html )

html4:prev ( c.html, b.html )
html4:prev ( d.html, c.html )

We might also relate one or more of these pages to a copyright or help document using the HTML link types. In a similar fashion, many other relationships (eg. indicating critiques or other annotations, authorship and other metadata) might be defined in application-specific link vocabularies.

html4:help ( a.html, help.html )
html4:copyright ( a.html, fair_usage_statement.html )

These documents also have other metadata (titles, descriptions, relationships to authors, alternate language and format versions, critiques etc). Since RDF models all of this in a common format, we can represent (and query) such facts without concern for details of syntax and data format. Here we show a few other pieces of information that RDF might represent using simple URI-named relationships between URI-named resources (and strings):

dc:title ( a.html, "The front page, which we'll call A" )
dc:description ( a.html, "A fictional document lacking real content.")
dc:creator ( a.html, web:registry.org/persons/danbri )
ldap:surname( web:registry.org/persons/danbri, "Brickley" )

dc:description( help.html, "This is the help document.")
skeptic:critique( c.html, http://dodgystuff.com/fictionaldoc.html )
dc:title( c.html, "A critique of a dodgy web page")

RDF Representation of Web links

The information sketched above has a trivial mapping to the RDF data model. Link types (ie. relationships, attributes) such as dc:title, html4:help, skeptic:critique etc are RDF Properties. RDF Properties must be identifiable using a Web identifier (ie. have a URI name).

While these names are not required to be dereferencable, it is common for RDF applications to choose identifiers from a URI scheme that has some means for 'asking the Web' about the identified resource. In practice this has to date meant that most RDF properties have URIs in the 'http:' namespace. Since HTTP content negotation allows a Web service to expose different views of the same resource, the URIs of these properties can in principle be used to acquire a variety of additional information about the named property. This can prove useful for internationalisation, accessibility, vocabulary mapping and other applications. Much of this information can be described using RDF models, which in turn might be represented in various ways using XML.

The RDF data model provides a common representation that allows us to aggregate information from various syntaxes, including those not initially targetted at RDF. For example, the 'working scenario' sketched above can be instantiated using HTML 4.01 syntax, yet provide unambiguous data that can be modelled, stored and queried by RDF-based systems. The goal of this discussion note is to explore a similar mapping for Web links that are described using the XML Linking Language.

RDF mapping strategy

To map data structures into the RDF model, it is necessary to agree upon a way in which the information model of the target domain can be projected onto the simple 'directed labelled graph' model of RDF. The main requirement here is that there be a URI name for the relationship types (ie. links, properties, attributes) used.

XML Links as RDF models

Conceptually, XML Linking Language is attempting something similar to RDF. Both systems aim to improve the Web's support for informative linking between documents. Xlink is concerned more with the way in which such information can be embedded into actual XML documents and used for Web navigation; RDF is concerned more with the abstract information content that is being represented.

These are complementary goals. For example, RDF systems should be able to exploit Xlink-aware software agents to acquire information such as that sketched in our working scenario. Xlink-aware browsers should be able to exploit RDF databases to manifest annotations, cross-references, sitemap and other metadata services. Wherever we have Web linking data that fits the template of [resource], [typed-relationship],[resource], we should have data that can serve equally well in RDF and XML contexts.

Web identifiers (URIs) provide the architectural glue that can hook together diverse uses of the same data on the Web. RDF is built around little more than the notion of linked resources and URIs for identification. XML Namespaces provide for the use of URI names for XML syntactic constructs (elements, attributes etc).

Worked Example in Xlink

[this section removed; the feb-2000 version was obsolete and likely to cause confusion]

Recent Changes (CVS log)

$Log: Overview.html,v $
Revision 1.10  2000/09/08 19:37:29  danbri
removed dated paragraph about capabilities of  an earlier XLink spec.

Revision 1.9  2000/09/08 18:53:44  danbri
ready to go, slimmed down version based around simple example. No XLink details yet. Nor
RDFViz pictures...

References

(to be added)

[Swick99] The Cambridge Communiqué, W3C NOTE 7 October 1999

[WebData99] Web Architecture: Describing and Exchanging Data, W3C Note 7 June 1999

[ArchDoc] Web Architecture from 50,000 feet, Design Issues overview.

danbri@w3.org

$Date: 2000/09/08 19:37:29 $