CURIE Syntax 1.0

1 Motivation

A convention is increasingly used that URIs can be expressed in XML using QNames. Since QNames are invariably shorter than the URI that they express, this is obviously a very useful device. However, a major problem is that the origin of the notion of a QName [NAMESPACES-IN-XML-QNAMES] is such that it does not allow all possible URIs to be expressed. (For the definition of the XML Schema datatype for QNames see [XML-SCHEMA-QNAME].)

A specific example of the problem this causes comes from the IPTC [IPTC]. They would like to be able to use attributes in their mark-up to carry metadata in their documents, and as a consequence sought to make extensive use of QNames to keep the amount of data being transferred as small as possible. In other words, instead of sending lots of long URIs, QNames were to be used to abbreviate them. However, the purpose of QNames in XML is to provide a way for XML elements that contain a colon to be interpreted as an element with a different name (see [NAMESPACES-IN-XML-QNAMES]). For this reason, the definition is such that the part after the colon must be a valid element name, making an example such as the following invalid:

iptc:10112244

This is not a valid QName simply because '10112244' is not a valid element name. Yet the whole reason for using a QName in the first place was to abbreviate the URI, and not to create a namespace qualified element name. This gives rise to an interesting problem; the definition of a QName insists on the use of valid XML element names, but an increasingly common use of QNames is as a means to abbreviate URIs, and unfortunately the two are in conflict with each other.

This note suggests that we overcome this by simply creating a new data type whose purpose is specifically to allow for the abbreviation of URIs in exactly this way. This type is called a "CURIE" or a "Compact URI", and QNames are a subset of this.

1.1 Existing Uses of CURIEs

Although they are not currently called CURIEs, the technique described here is in widespread usage. However, taken literally, QNames would not support many of the examples that we would find 'in the wild'--the fact that they do is mainly because systems and authors take a very lax approach to QNames.

In other words, the principle used in QNames--that of substituting a namespace prefix for a URI and thereby producing a longer URI--is widely used, but little checking is done on the element part to ensure that the string is a valid element name. However, this does mean that CURIEs can be easily used in a number of places, since there is already a large amount of 'mind-share'. Current uses include:

1.1.1 Wikis

Many Wikis support a feature where a prefix like isbn can be substituted for something like:

http://www.amazon.com/?isbn=

or:

http://www.barnesandnoble.com/?q=

When a Wiki author wants to make use of this, they can simply enter:

Go and buy T. V. Raman's [[isbn:0321154991][book on XForms]].

and the Wiki software will automatically generate:

Go and buy T. V. Raman's <a href="http://www.amazon.com/?isbn=0321154991">book on XForms</a>.

Note:

This is not technically subliminal advertising, since the advert is still on your screen.

2 Usage

CURIEs can be used in exactly the same way that QNames are, with the modification that the format of the strings before and after the colon are looser. In all cases a parsed CURIE will produce a URI. However, the process of parsing involves substituting the namespace represented by the namespace prefix for the prefix itself, and then simply appending the part after the colon. There is one additional rule; if there is no colon present, then the current default XML namespace is prepended. This should become clearer with examples.

2.1 Examples

All of the following are valid CURIEs--even though they are not valid QNames--and they take advantage of the fact that the part after the colon no longer needs to conform to the rules for element names:

home:#start
joseki:
google:xforms or 'xml forms'

The part before the colon cannot be quite as lax as the part after, since it must conform to the syntax for a namespace prefix. However, given that a simple string substitution takes place, the following are also all valid:

:#start
:
:xforms or 'xml forms'

Note how ":" falls out as an abbreviation for the default XML namespace.

2.2 Ambiguities Between CURIEs and URIs

There will be situations where it is desirable for an attribute to be able to contain both CURIEs and URIs. For example, in XHTML the href attribute allows a URI to be specified that will be navigated on user action, but it would also be useful to be able to abbreviate this URI, using the compact syntax. However, the problem is that it is not possible to be completely sure whether you have a CURIE or a URI. For example, a link to an email address can be expressed like this:

Why not <a href="smtp:contactus@example.com">drop us a line</a>.

There is no way to be sure that this is a normal URI, or a CURI. Therefore the syntax for carrying a CURIE when there is any possibility of ambiguity is to enclose the CURIE in square brackets, as in the following example:

<html xmlns:wiki="http://en.wikipedia.org/wiki/">
    <head>...</head>
    <body>
        <p>
            Find out more about <a href="[wiki:Thales]">Thales</a>.
        </p>
    </body>
</html>

Note:

Not only does this abbreviate the URI, but it also makes it possible to change a whole group of URIs to point to some other source, simply by changing the namespace definition. For example, consider the following mark-up:

<html xmlns:wiki="http://en.wikipedia.org/wiki/">
    <head>...</head>
    <body>
        <p>
            Thales had a profound influence on other Greek thinkers and therefore on Western history.
            Some believe <a href="[wiki:Anaximander]">Anaximander</a> was a pupil of Thales. Early
            sources report that one of Anaximander's more famous pupils,
            <a href="[wiki:Pythagoras]">Pythagoras</a>, visited Thales as a young man, and that Thales
            advised him to travel to Egypt to further his philosophical and mathematical studies.
        </p>
    </body>
</html>

Given that all references to Wiki entries are based on the namespace defined in xmlns:wiki, then simply changing this namespace changes the base for all Wiki references within the document. It is not difficult to see how, by extending this principle a user can begin to get control of their own browsing experience. For example, a document might contain a reference to a company, with links to news about the company, financial information and details on key diretors. By using CURIEs to express those links it is possible to use different sources for the information, event to the extent that they could be overridden the user:

<html xmlns:finance="..."
      xmlns:news="..."
      xmlns:people="...">
    <head>...</head>
    <body>
        <p>
            Ballmer told German publication <i>Manager</i> that previous reports about partnership talks
            with <b>Time Warner's</b> (nyse: <a href="finance:TWX" class="maintkrlink">TWX</a> -
            <a href="news:TWX">news</a> - <a href="people:TWX">people</a>) AOL unit were "an unconfirmed
            rumor", but that the software leviathan is indeed keen to expand its internet presence.
        </p>
    </body>
</html>

3 Bibliography

RDF-SYNTAX: RDF/XML Syntax and Grammar (See http://www.w3.org/TR/rdf-syntax-grammar/.)
XML-SCHEMA-QNAME: XML Schema Part 2: Datatypes Second Edition: Section 3.2.18 QName (See http://www.w3.org/TR/xmlschema-2/#QName.)
NAMESPACES-IN-XML-QNAMES: Namespaces in XML: Section 3: Qualified Names (See http://www.w3.org/TR/REC-xml-names/#dt-qname.)
IPTC: International Press Telecommunications Council (See http://www.iptc.org/.)
RDFHTML: RDF-in-HTML Task Force (See http://w3.org/2001/sw/BestPractices/HTML/.)
SWBPD-WG: Semantic Web Best Practices and Deployment Working Group (See http://w3.org/2001/sw/BestPractices/.)
HTML-WG: HTML Working Group (See http://w3.org/MarkUp/Group/.)

CURIE Syntax 1.0

A compact syntax for expressing URIs

note 27 October 2005

Abstract

Status of this Document

Table of Contents