Using Qualified Names (QNames) as Identifiers in Content -- Review Version

1 Preface

This TAG Finding documents a portion of the web architecture where conflicting requirements and design goals intersect. It is a simple matter of fact that specifications which have chosen one set of design criteria interoperate less well with specifications that have chosen a different set.

Given that there are existing specifications which exhibit incompatible designs and strong arguments in favor of each design, the TAG elects not to assert architectural principles that would be in direct conflict with some significant set of specifications.

It's possible that these issues could be addressed in the scope of some larger, more global redesign of, for example, XML, but no short-term solution presents itself

2 QNames as Identifiers

This finding is concerned with the use of qualified names (QNames) as identifiers. That is, the contexts in which a colonized name can be understood to be a QName.

A related TAG issue, rdfmsQnameUriMapping-6, concerns the mechanism by which one can (or can not) construct a URI for a particular QName. We do not consider that issue in this finding.

3 QNames in XML

Qualified names were introduced by [XML Namespaces]. They were defined for element and attribute names (only) and provide a mechanism for concisely identifying a {URI, local-name} pair. For example, in the following document:

<?xml version='1.0'?>
<doc xmlns:x="http://example.com/ns/foo">
<x:p/>
</doc>

The QName "x:p" is a concise, unambiguous name for the {URI, local-name} pair {"http://example.com/ns/foo", "p"}.

When used solely in element and attribute names, all QNames are identified by the XML processor and can logically be replaced by the URI/local-name pair they identify.

4 QNames in Other Contexts

At the request of the XML Schema Working Group, the XML Core Working Group is producing an erratum to [XML Namespaces] to clarify the meaning of colons in other contexts.

In particular, this erratum makes it clear that entity names, processing instruction targets, and notation names are not QNames and they may not include any colons. Documents that do not satisfy this constraint are not namespace well-formed. Furthermore, the values of attributes of type ID, IDREF(S), ENTITY(IES), and NOTATION are also forbidden from containing colons. Documents that do not satisfy this constraint are not namespace valid.

A colon that introduces a namespace validity or namespace well-formedness error into a document does not introduce a QName. In other words, the term "identifier" in this finding is not related to XML identifiers of type ID since they cannot be QNames.

4.1 QNames in Other Specifications

Other specifications, starting with [XSLT], have taken QNames and employed them in contexts other than element and attribute names. Specifically, QNames have been used in attribute values and element content.

For example, in the following document, "x:p" is understood to be a QName even though it appears in an attribute value, not an element or attribute name.

<?xml version='1.0'?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                xmlns:x="http://example.com/ns/foo"
                version="1.0">

<xsl:output method="html"/>

<xsl:template match="x:p">
  <p>
    <xsl:apply-templates/>
  </p>
</xsl:template>

</xsl:stylesheet>
</doc>

In attribute values and element content, QNames are often used to identify a particular element type; they are, in principle, using QNames as they were intended. However, some specifications use QNames as shortcuts for unique identifiers derived from a {URI, local-name} pair that have no relationship to element or attribute types.

Using a QName as a shortcut for a {URI, local-name} pair is often convenient, but it carries a price. There is no single, accepted way to convert QNames into {URI, local-name} pairs or vice versa. Different Different specifications have chosen different algorithms. This means that even if a QName can be identified in content, it may be difficult or or impossible to determine what {URI, local-name}URI/local-name it represents. TheAt the very least, the mapping depends on the context in which it occurs, therefore, at the very least, it is important for specifications to identify the mapping algorithm that they have chosen.

Specifications that use QNames to represent {URI, local-name}URI/local-name pairs MUST describe the algorithm that is used to map between them.

We observe also that there is an overlap in the lexical space of QNames and URIs.

Specifications that use QNames to represent {URI, local-name}URI/local-name pairs SHOULD NOT allow both forms in attribute values or element content where they would be indistinguishable.

4.2 Namespace Bindings

Some specifications rely on the in-scope namespace bindings in the XML document to associate prefixes with namespace names. Other specifications rely on application-specific mechanisms.

Using the in-scope namespace bindings has the advantage that it theoretically allows a generic processor to interpret QNames in content without having to be aware of any application-specific mechanisms. The alternative, where every specification defines its own mechansism, could clearly lead to a badly fragmented web.

However, there is at least one application where a compelling argument has been made for requiring an alternative mechanism for defining namespace bindings. That application is [XPointer Framework].

It is an architectural principle of URIs that they be context-independent. It follows that the QNames that appear in an XPointer must not refer to in-scope namespaces as this would make transcription impossible in the general case.

We must therfore accept that there are some applications which use in-scope namespaces and some which use their own mechanisms.

5 Architectural Observations

The TAG makes the following observations:

Whatever the architectural ramifications of using QNames as identifiers in contexts other than XML element and attribute names, it is already established practice.
It is simply not practical to suggest that this usage should be forbidden on architectural grounds.
Using QNames in untyped (#PCDATA or xs:string) attribute values or element content places an additional burden on the processor that was not anticipated by [XML Namespaces].
If QNames are only used in element and attribute names, the processor can fully resolve all of the prefixes as it parses. This gives it the freedom to discard the prefix-to-URI mappings when they go out of scope. A serializer, presented with an object model that conforms to [XML Namespaces] can manufacture new prefixes on the fly. (In practice, users expect most prefixes to be preserved through transformations, so things aren't quite this simple for most developers, but this is still theoretically the case.)
As soon as QNames may appear in element or attribute values, the processor must retain all of the prefix-to-URI mappings (and any API must expose these mappings). This is necessary because some subsequent micro-parser, in the course of examining some content, may encounter a token that it recognizes as a QName and need to find its {URI, local-name}.URI/local-name.
In our previous XSL example, from the perspective of the XML processor, there are no qualified names that use the x: prefix. However, when the XSL processor examines the match attribute on xsl:template, it must be able to resolve the x: prefix.
QNames in attribute values or element content by themselves, in other words, in contexts that could be typed as xs:QName ([XML Datatypes]) in a schema, could in principle be identified by the schema processor.
For example, given:
```
<elem ref="data:myInteger"/>
```
If schema validation reveals that the following component applies to this instance of the elem element:
```
<xs:complexType name="elemType">
  <xs:complexContent>
    <xs:restriction base="xs:anyType">
      <xs:attribute name="ref" type="xs:QName"/>
    </xs:restriction>
  </xs:complexContent>
</xs:complexType>
```
The schema processor can determine that data:myInteger is a QName and must therefore be a concise name for the {URI, local-name}URI/local-name pair consisting of the in-scope namespace URI for the prefix data and the local-name myInteger.
Perhaps the most common use of QNames in untyped values at the moment is in locations where XPath expressions may occur. As XPath is reused in more and more specifications, it may eventually be reasonable to define an XPath data type to identify all of these values in a way that makes them accessible to higher-level parsers.
Some specifications rely on mechanisms other than in-scope namespaces to associate prefixes with namespace names. In general, therefore, even when all of the in-scope namespace declarations are available, there may still be prefixes which can only be known in an application-dependent manner.

Almost every specification that has wrestled with what XML documents mean or how two putatively identical documents can be compared has stumbled over the issues associated with QNames in content. Some of these issues are described in detail in [Canonical XML], but they are also known to have occurred in the development of RDF, XML Query, security specifications such as [XML-DSig Core], and elsewhere.

6 Architectural Statement

In so far as the identification mechanism of the Web is the URI and QNames are not URIs, it is a mistake to usethat QNames a QNameprovide for identification whenelement a URI would serve equally well.

Thatcontexts. said,In addition, the TAG recognizes that there are sometimes pragmatic reasonsthe for chosing short, lexical representations of more complexTAG names and accepts that QNames are an established mechanism for doing so. Further, it must be observed that some things are identified by QNames: element andQNames attribute names, types in W3C XML Schema,this etc.

Where there is a compelling reason to use QNames instead ofwhere URIs for identification, it is imperative that specificationsare not provide a mapping betweenbe QNames and URIs, ifor formats such a mapping is possible.

Finally,not a we observe that aavoid an whole class of interpretation problems can be avoided if theURIs. The use of QNames can be restricted to contexts whereof their use identification is natural and unambiguous (element and attribute names,a simple content of type xs:QName,extent etc.) and we encouragewhich developers to employ such restrictions wherever possible.specifications.

7 References

Canonical XML: John Boyer, editor. Canonical XML Version 1.0. World Wide Web Consortium, 2001. (See http://www.w3.org/TR/xml-c14n.)
XPointer Framework: Paul Grosso, Eve Maler, Jonathan Marsh, Norman Walsh, editors. XPointer Framework. World Wide Web Consortium, 2002. (See http://www.w3.org/TR/xptr-framework/.)
XML Datatypes: Paul V. Biron and Ashok Malhotra, editors. XML Schema Part 2: Datatypes. World Wide Web Consortium, 2000. (See http://www.w3.org/TR/xmlschema-2/.)
XML Namespaces: Tim Bray, Dave Hollander, Andrew Layman, editors. Namespaces in XML. World Wide Web Consortium, 1999. (See http://www.w3.org/TR/REC-xml-names/.)
XML-DSig Core: Donald Eastlake, Joseph Reagle, and David Solo, editors. XML-Signature Syntax and Processing. World Wide Web Consortium, 2002. (See http://www.w3.org/TR/xmldsig-core/.)
XSLT: James Clark, editor. XML Transformations (XSLT) Version 1.0. World Wide Web Consortium, 1999. (See http://www.w3.org/TR/xslt.)

Using Qualified Names (QNames) as Identifiers in Content

[Editor’s Draft] TAG Finding 06 January 2004

Abstract

Status of this Document

Table of Contents