This is a draft document created to aid discussion of these issues by the W3C Technical Architecture Group. It has not been reviewed or approved by anyone.

Introduction

When URIs contain fragment identifiers, they are interpreted based on the mime type of the representation that is retrieved when the URI is requested. The Generic Syntax for URIs [[!URI]] states:

The semantics of a fragment identifier are defined by the set of representations that might result from a retrieval action on the primary resource. The fragment's format and resolution is therefore dependent on the media type [RFC2046] of a potentially retrieved representation, even though such a retrieval is only performed if the URI is dereferenced. If no such representation exists, then the semantics of the fragment are considered unknown and are effectively unconstrained. Fragment identifier semantics are independent of the URI scheme and thus cannot be redefined by scheme specifications.

Individual media types may define their own restrictions on or structures within the fragment identifier syntax for specifying different types of subsets, views, or external references that are identifiable as secondary resources by that media type. If the primary resource has multiple representations, as is often the case for resources whose representation is selected based on attributes of the retrieval request (a.k.a., content negotiation), then whatever is identified by the fragment should be consistent across all of those representations. Each representation should either define the fragment so that it corresponds to the same secondary resource, regardless of how it is represented, or should leave the fragment undefined (i.e., not found).

This dependence of fragment identifier interpretation on media type causes problems when a media type "inherits" fragment identifier semantics from multiple "parent" media types. For example, as well as defining its own method of interpreting fragment identifiers, SVG [[SVG11]] has the media type image/svg+xml and therefore will inherit the common fragment identifier semantics for all images (Media Fragments URI 1.0 [[MEDIA-FRAGMENTS]]) and for all XML documents (XML Media Types Draft). If RDF is embedded within the SVG, fragment identifiers might have RDF semantics and be used to refer to real-world things pictured within the SVG; and if fragment identifiers are interpreted by scripts embedded within the SVG, they have yet another semantic: they encode application state.

This finding uses SVG as an example, but conflicts between different uses of fragment identifiers are not limited to SVG; they appear in other media types as well, in particular within (X)HTML which also inherits from the XML Media Types draft, contains active content, and may be used to carry data interpreted according to RDF semantics.

Fragment Structures

This section looks at various fragment structures that apply to SVG. A fragment structure is a defined fragment identifier syntax, semantics and processing requirements. This section uses as an example a simple bar chart at http://example.org/potter, which has an SVG representation:

<svg xmlns="http://www.w3.org/2000/svg" width="300" height="225" viewBox="0 0 300 225">
  <g stroke="grey" stroke-width="40">
    <line id="harry" x1="50" x2="50" y1="300" y2="50" />
    <line id="hermione" x1="100" x2="100" y1="300" y2="0" />
    <line id="ron" x1="150" x2="150" y1="300" y2="100" />
    <line id="hagrid" x1="200" x2="200" y1="300" y2="50" />
    <line id="dumbledore" x1="250" x2="250" y1="300" y2="150" />
  </g>
</svg>
			

which appears as:

Media Fragment URIs

The Media Fragment URIs specification defines a fragment structure for image/*, video/* and audio/* media types. They cover identification of spatial areas, time segments, tracks or named segments.

Under this specification, the area covering the first two lines within the example bar chart can be addressed using a URI like:

http://example.org/potter#xywh=25,0,100,225

which would identify the area highlighted here:

The syntax for fragment identifiers defined as part of this specification is:

namevalues = namevalue *( "&" namevalue )
namevalue  = name [ "=" value ]
name       = fragment - "&" - "="
value      = fragment - "&"

; defined in RFC 3986
fragment      = *( pchar / "/" / "?" )
pchar         = unreserved / pct-encoded / sub-delims / ":" / "@"
unreserved    = ALPHA / DIGIT / "-" / "." / "_" / "~"
pct-encoded   = "%" HEXDIG HEXDIG
sub-delims    = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "="

; defined in RFC 5234
ALPHA         =  %x41-5A / %x61-7A   ; A-Z / a-z
DIGIT         =  %x30-39 ; 0-9
HEXDIG        =  DIGIT / "A" / "B" / "C" / "D" / "E" / "F"
				

This syntax essentially allows anything within a fragment identifier, although applications that follow the specification will attempt to interpret any fragment identifier on an image, audio or video representation as a set of name[=value] pairs separated by ampersands.

Named segments under this specification are addressable with the syntax id=id. Thus, the URI:

http://example.org/potter#id=hermione

could (assuming an application that recognises id attributes in SVG as naming segments addressable through fragment structures defined in the Media Fragment URI draft) identify the second bar within the bar chart, which has been labelled as hermione.

XML Media Types

The XML Media Types Draft defines (among other things) syntax and processing for fragment identifiers for */*+xml media types. It states (emphasis added):

A family of specifications define fragment identifiers for XML media types. A modular syntax and semantics of fragment identifiers for the XML media types is as defined by the [XPointerFramework] W3C Recommendation. It allows simple names, and more complex constructions based on named schemes. The syntax of a fragment identifier part of any URI or IRI with a retrieved media type governed by the specification MUST conform to the syntax specified in [XPointerFramework]. Conformant applications MUST interpret such fragment identifiers as designating that part of the retrieved representation specified by [XPointerFramework] and whatever other specifications define any XPointer schemes used. Conformant applications MUST support the 'element' scheme as defined in [XPointerElement].

A registry of XPointer schemes [XPtrReg] is maintained at the W3C. Unregistered schemes SHOULD NOT be used.

When an XML-based MIME media type follows the naming convention '+xml', the fragment identifier syntax for this media type MAY restrict the syntax to a specified subset of schemes, but MUST support barenames and 'element' scheme pointers. It MAY further allow other registered schemes such as the xmlns scheme and other schemes.

If [XPointerFramework] and [XPointerElement] are inappropriate for some XML-based media type, it SHOULD NOT follow the naming convention '+xml'.

The XML Media Types draft thus defers the interpretation for fragment identifiers for */*+xml media types to XPointer. XPointer specifies the syntax:

[1]   	Pointer        ::=   	Shorthand | SchemeBased
[2]   	Shorthand      ::=   	NCName
[3]   	SchemeBased    ::=   	PointerPart (S? PointerPart)*
[4]   	PointerPart    ::=   	SchemeName '(' SchemeData ')'
[5]   	SchemeName     ::=   	QName
[6]   	SchemeData     ::=   	EscapedData*
[7]   	EscapedData    ::=   	NormalChar | '^(' | '^)' | '^^' | '(' SchemeData ')'
[8]   	NormalChar     ::=   	UnicodeChar - [()^]
[9]   	UnicodeChar    ::=   	[#x0-#x10FFFF]
				

For example, because SVG is XML, the second of the line elements in the SVG bar chart can be addressed using:

http://example.org/potter#element(/1/1/2)

This is highlighted in the following XML:

<svg xmlns="http://www.w3.org/2000/svg" width="300" height="225" viewBox="0 0 300 225">
  <g stroke="grey" stroke-width="40">
    <line id="harry" x1="50" x2="50" y1="300" y2="50" />
    <line id="hermione" x1="100" x2="100" y1="300" y2="0" /> 
    <line id="ron" x1="150" x2="150" y1="300" y2="100" />
    <line id="hagrid" x1="200" x2="200" y1="300" y2="50" />
    <line id="dumbledore" x1="250" x2="250" y1="300" y2="150" />
  </g>
</svg>
				

The scheme used within a scheme-based XPointer determines what it identifies; the element() XPointer scheme used above is used to identify element nodes for example. The XPointer Framework specification also states:

A shorthand pointer, formerly known as a barename, consists of an NCName alone. It identifies at most one element in the resource's information set; specifically, the first one (if any) in document order that has a matching NCName as an identifier.

This defines the semantics of a simple fragment identifier, such that a URI such as:

http://example.org/potter#hermione

means an element within the XML information set, in this case the second line element node.

SVG Fragment Identifiers

SVG itself describes how fragment identifiers can be used to identify views on SVG content. It says:

An SVG fragment identifier can come in two forms:

  • Shorthand bare name form of addressing (e.g., MyDrawing.svg#MyView). This form of addressing, which allows addressing an SVG element by its ID, is compatible with the fragment addressing mechanism for older versions of HTML.
  • SVG view specification (e.g., MyDrawing.svg#svgView(viewBox(0,200,1000,1000))). This form of addressing specifies the desired view of the document (e.g., the region of the document to view, the initial zoom level) completely within the SVG fragment specification. The contents of the SVG view specification are the five parameter specifications, viewBox(...), preserveAspectRatio(...), transform(...), zoomAndPan(...) and viewTarget(...), whose parameters have the same meaning as the corresponding attributes on a ‘view’ element, or, in the case of transform(...), the same meaning as the corresponding attribute has on a ‘g’ element).

SVG's fragment identifiers are conformant with XPointer: they follow the same syntax and are defined in the terms given in XPointer. For example, the URI:

http://example.org/potter#hermione

in this case will address the element with the ID hermione, highlighted here:

<svg xmlns="http://www.w3.org/2000/svg" width="300" height="225" viewBox="0 0 300 225">
  <g stroke="grey" stroke-width="40">
    <line id="harry" x1="50" x2="50" y1="300" y2="50" />
    <line id="hermione" x1="100" x2="100" y1="300" y2="0" /> 
    <line id="ron" x1="150" x2="150" y1="300" y2="100" />
    <line id="hagrid" x1="200" x2="200" y1="300" y2="50" />
    <line id="dumbledore" x1="250" x2="250" y1="300" y2="150" />
  </g>
</svg>
				

When fragment identifiers of this kind are used, SVG introduces a :target pseudo-class to CSS which enables the identified element to be highlighted. For example, the SVG:

<svg xmlns="http://www.w3.org/2000/svg" width="300" height="225" viewBox="0 0 300 225">
  <style type="text/css">
    line:target { stroke: red; }
  </style>
  <g stroke="grey" stroke-width="40">
    <line id="harry" x1="50" x2="50" y1="300" y2="50" />
    <line id="hermione" x1="100" x2="100" y1="300" y2="0" />
    <line id="ron" x1="150" x2="150" y1="300" y2="100" />
    <line id="hagrid" x1="200" x2="200" y1="300" y2="50" />
    <line id="dumbledore" x1="250" x2="250" y1="300" y2="150" />
  </g>
</svg>
				

means that the URI

http://example.org/potter#hermione

is displayed with the second line (identified as hermione) stroked in red:

SVG introduces a svgView() XPointer scheme that is used to describe views onto SVG images; one possible argument is viewBox(), which selects a particular area of an image in the same way as the xywh parameter defined for Media Fragment URIs described above. Thus the URI:

http://example.org/potter#svgView(viewBox(25,0,100,225))

identifies the area of the chart that covers the first two bars.

Active Content

SVG, like HTML, enables scripts to be embedded within documents and to respond to events such as clicks on particular parts of the content. Active content can read the document location and base the behaviour of the script on the fragment identifier.

For example, the following SVG parses the fragment identifier that's used to access the bar chart and uses it to highlight one of the bars:

<svg xmlns="http://www.w3.org/2000/svg" width="300" height="225" viewBox="0 0 300 225" onload="highlight()">
  <script type="application/ecmascript">
    function highlight () {
      var id = document.location.hash.substring(1);
      if (id) {
        var element = document.getElementsByTagName('line')[id];
        if (element) {
          element.setAttribute('stroke', 'red');
        }
      }
    }
  </script>
  <g stroke="grey" stroke-width="40">
    <line id="harry" x1="50" x2="50" y1="300" y2="50" />
    <line id="hermione" x1="100" x2="100" y1="300" y2="0" />
    <line id="ron" x1="150" x2="150" y1="300" y2="100" />
    <line id="hagrid" x1="200" x2="200" y1="300" y2="50" />
    <line id="dumbledore" x1="250" x2="250" y1="300" y2="150" />
  </g>
</svg>
				

The URI:

http://example.org/potter#2

thus highlights the second bar within the bar chart.

In this case, the recognised syntax of the fragment identifier is determined by the script, which recognises any numeric fragment identifier between one and five. The fragment identifier has no declarative semantics -- there is no specification that says what it means -- but in effect this script supports the identification of a bar of the bar chart through a fragment identifier.

Semantic Content

SVG allows extensions; any element in a different namespace will be ignored by SVG processors. This facility can be used to embed semantic content through RDF/XML.

The following example contains some RDF/XML which makes some basic assertions about the resource

http://example.org/potter#hermione

This resource has been identified with a fragment identifier within an SVG document like this:

<svg xmlns="http://www.w3.org/2000/svg" width="300" height="225" viewBox="0 0 300 225">
  <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:foaf="http://xmlns.com/foaf/0.1/">
    <foaf:Person rdf:about="#hermione">
      <foaf:name>Hermione Granger</foaf:name>
    </foaf:Person>
  </rdf:RDF>
  <g stroke="grey" stroke-width="40">
    <line x1="50" x2="50" y1="300" y2="50" />
    <line x1="100" x2="100" y1="300" y2="0" />
    <line x1="150" x2="150" y1="300" y2="100" />
    <line x1="200" x2="200" y1="300" y2="50" />
    <line x1="250" x2="250" y1="300" y2="150" />
  </g>
</svg>
				

In semantic content, fragment identifiers can mean anything. In this particular example, we can tell from the RDF that the above URI means the person named Hermione Granger. It is common practice when using RDF for URIs that include fragment identifiers to be used to refer to things that are described by the document retrieved at the base URI, as this makes it easy to serve RDF content.

Problems and Recommendations

As discussed above, fragment identifiers used on SVG documents can be interpreted in multiple ways:

Just like URIs in general, fragment identifiers have two roles:

  1. They are used to name a secondary resource associated with the primary resource.
  2. They are processed, resulting in particular behaviour on the part of a client application.

Using different fragment identifiers to name different things is important in semantic web applications, as it enables statements to be made about those things. For these applications, it's important that different names are used for different things (and the same name for the same thing), so that statements can be interpreted correctly.

Other applications are more concerned about being able to process a fragment identifier as intended. In the SVG example above this might mean clipping an image to a particular area, highlighting a portion of that image or selecting an element from the SVG for further processing. The processing of a given URI might be built into the client application, configurable for a given file (as in the :target CSS pseudo-class) or might be entirely controlled through a script within the file.

Problems with fragment identifiers being dependent on media type thus fall into two general categories: problems arising at the semantic level, when the same fragment identifier means different things according to different specifications; and at the processing level, when different processors process the same fragment identifier in different ways.

A second consideration is the interaction between fragment identifiers and content negotiation. By their very nature, different fragment identifiers have a different potential for being used across representations. For example:

Multiple Fragment Identifiers

The previous section described various ways in which fragment identifiers might be used to refer to different secondary resources. In several cases there are multiple methods of referring to the same secondary resource:

Having multiple fragment identifiers that refer to the same secondary resource is a problem in two ways. First, at the semantic level, it means that statements made about one of the fragment identifier variants do not carry over as they should to the others. For example, the RDF statements about #hermione:

<#hermione> a eg:XMLelement ;
  eg:nodeName "line" ;
  .
				

should also be true of #element(/1/1/2). If someone wishes to make statements about these resources, they have to make a decision about which variant to use.

Canonical Fragment Identifiers

Fragment structures which provide multiple ways of addressing the same secondary resource should indicate which fragment identifier is canonical and should be used for making statements about that secondary resource.

Second, at the processing level, different applications will only recognise particular fragment structures. For example, a browser might support the Media Fragment URIs specification and thus support the xywh= fragment identifier, whereas an XML pipeline processor might have built-in knowledge of SVG and the XPointer svgView() scheme. When a third party wants to provide a link to an area of an SVG image, they have to make a choice about which of these fragment structures to use:

http://example.org/potter#xywh=25,0,100,225

or

http://example.org/potter#svgView(viewBox(25,0,100,225))

This choice may be based on:

Whichever is chosen, the resulting URI is only usable by a subset of applications despite there being a perfectly good alternative that is supported by a different subset. Thus these URIs are not universal.

Generic Fragment Structures

Fragment structures should be defined at levels that anticipate content negotiation. For example, the semantics of the svgView() fragment identifiers could be meaningfully applied to all image formats. Were a similar scheme developed in future, it should be defined for all images rather than a particular image format.

A radical move would be to provide a method of giving fallback fragment identifiers within a URI. XPointer does this already (you can provide multiple XPointers sequentially and if the first doesn't succeed, the next will be used instead) but its approach isn't general purpose for other fragment structures.

Inconsistent Semantics and Processing

The validity and meaning of a given fragment identifier can be different under each of the interpretations described previously. For example:

However, the impact of these inconsistencies is largely theoretical from the standpoint of a single application. For semantic web applications, it's the statements about a URI (including a URI with a fragment identifier) that provide information about its meaning, and not a description within a media type definition. Other applications recognise the media types and fragment identifiers that they have been programmed to recognise: an application that understands Media Fragment URIs is likely to simply ignore the contradictory definition of how SVG fragment identifiers should be interpreted as given in the image/svg+xml media type definition.

Fragment Identifiers in Media Type Definition

Media type definitions should avoid 'must' language when describing supported fragment identifiers as in practice it is likely to be ignored. Instead, they should provide pointers to any known fragment structures that might be applied to that media type and give warnings of any contradictions between them.

Generous Fragment Structure Definitions

Fragment structures should be defined in ways that enable other processors to ignore fragment identifiers that use different fragment structures. Invalid fragment identifiers should result in a warning rather than an error.

Acknowledgements

Many thanks to Robin Berjon for ReSpec.js.