The aim of this document is to outline a syntax for expressing URIs in an abbreviated syntax.
This is an internal draft produced by the RDF-in-HTML task force [RDFHTML], a joint task force of the Semantic Web Best Practices and Deployment Working Group [SWBPD-WG] and HTML Working Group [HTML-WG].
Last Modified: 2005-10-27
A convention is increasingly used that URIs can be expressed in XML using QNames. Since QNames are invariably shorter than the URI that they express, this is obviously a very useful device. However, a major problem is that the origin of the notion of a QName [NAMESPACES-IN-XML-QNAMES] is such that it does not allow all possible URIs to be expressed. (For the definition of the XML Schema datatype for QNames see [XML-SCHEMA-QNAME].)
A specific example of the problem this causes comes from the IPTC [IPTC]. They would like to be able to use attributes in their mark-up to carry metadata in their documents, and as a consequence sought to make extensive use of QNames to keep the amount of data being transferred as small as possible. In other words, instead of sending lots of long URIs, QNames were to be used to abbreviate them. However, the purpose of QNames in XML is to provide a way for XML elements that contain a colon to be interpreted as an element with a different name (see [NAMESPACES-IN-XML-QNAMES]). For this reason, the definition is such that the part after the colon must be a valid element name, making an example such as the following invalid:
This is not a valid QName simply because '10112244' is not a valid element name. Yet the whole reason for using a QName in the first place was to abbreviate the URI, and not to create a namespace qualified element name. This gives rise to an interesting problem; the definition of a QName insists on the use of valid XML element names, but an increasingly common use of QNames is as a means to abbreviate URIs, and unfortunately the two are in conflict with each other.
This note suggests that we overcome this by simply creating a new data type whose purpose is specifically to allow for the abbreviation of URIs in exactly this way. This type is called a "CURIE" or a "Compact URI", and QNames are a subset of this.
Although they are not currently called CURIEs, the technique described here is in widespread usage. However, taken literally, QNames would not support many of the examples that we would find 'in the wild'--the fact that they do is mainly because systems and authors take a very lax approach to QNames.
In other words, the principle used in QNames--that of substituting a namespace prefix for a URI and thereby producing a longer URI--is widely used, but little checking is done on the element part to ensure that the string is a valid element name. However, this does mean that CURIEs can be easily used in a number of places, since there is already a large amount of 'mind-share'. Current uses include:
Many Wikis support a feature where a prefix like
isbn can be substituted for something like:
When a Wiki author wants to make use of this, they can simply enter:
Go and buy T. V. Raman's [[isbn:0321154991][book on XForms]].
and the Wiki software will automatically generate:
Go and buy T. V. Raman's <a href="http://www.amazon.com/?isbn=0321154991">book on XForms</a>.
Note:This is not technically subliminal advertising, since the advert is still on your screen.
CURIEs can be used in exactly the same way that QNames are, with the modification that the format of the strings before and after the colon are looser. In all cases a parsed CURIE will produce a URI. However, the process of parsing involves substituting the namespace represented by the namespace prefix for the prefix itself, and then simply appending the part after the colon. There is one additional rule; if there is no colon present, then the current default XML namespace is prepended. This should become clearer with examples.
All of the following are valid CURIEs--even though they are not valid QNames--and they take advantage of the fact that the part after the colon no longer needs to conform to the rules for element names:
home:#start joseki: google:xforms or 'xml forms'
The part before the colon cannot be quite as lax as the part after, since it must conform to the syntax for a namespace prefix. However, given that a simple string substitution takes place, the following are also all valid:
:#start : :xforms or 'xml forms'
Note how "
:" falls out as an abbreviation for the default XML namespace.
There will be situations where it is desirable for an attribute to be able to contain both
CURIEs and URIs. For example, in XHTML the
href attribute allows a URI to be specified that
will be navigated on user action, but it would also be useful to be able to abbreviate this URI, using
the compact syntax. However, the problem is that it is not possible to be completely sure whether
you have a CURIE or a URI. For example, a link to an email address can be expressed like this:
Why not <a href="smtp:firstname.lastname@example.org">drop us a line</a>.
There is no way to be sure that this is a normal URI, or a CURI. Therefore the syntax for carrying a CURIE when there is any possibility of ambiguity is to enclose the CURIE in square brackets, as in the following example:
<html xmlns:wiki="http://en.wikipedia.org/wiki/"> <head>...</head> <body> <p> Find out more about <a href="[wiki:Thales]">Thales</a>. </p> </body> </html>
Not only does this abbreviate the URI, but it also makes it possible to change a whole group of URIs to point to some other source, simply by changing the namespace definition. For example, consider the following mark-up:
<html xmlns:wiki="http://en.wikipedia.org/wiki/"> <head>...</head> <body> <p> Thales had a profound influence on other Greek thinkers and therefore on Western history. Some believe <a href="[wiki:Anaximander]">Anaximander</a> was a pupil of Thales. Early sources report that one of Anaximander's more famous pupils, <a href="[wiki:Pythagoras]">Pythagoras</a>, visited Thales as a young man, and that Thales advised him to travel to Egypt to further his philosophical and mathematical studies. </p> </body> </html>
Given that all references to Wiki entries are based on the namespace defined in
then simply changing this namespace changes the base for all Wiki references within the document. It is not
difficult to see how, by extending this principle a user can begin to get control of their own browsing
experience. For example, a document might contain a reference to a company, with links to news about the
company, financial information and details on key diretors. By using CURIEs to express those links it is
possible to use different sources for the information, event to the extent that they could be overridden
<html xmlns:finance="..." xmlns:news="..." xmlns:people="..."> <head>...</head> <body> <p> Ballmer told German publication <i>Manager</i> that previous reports about partnership talks with <b>Time Warner's</b> (nyse: <a href="finance:TWX" class="maintkrlink">TWX</a> - <a href="news:TWX">news</a> - <a href="people:TWX">people</a>) AOL unit were "an unconfirmed rumor", but that the software leviathan is indeed keen to expand its internet presence. </p> </body> </html>