Best Practices for Fragment Identifiers and Media Type Definitions

~~Fragment~~ According to the relevant specifications, fragment identifiers ( fragids ) ~~within URIs~~ are ~~specified as being~~ interpreted based on the media type of a representation. Media type definitions therefore have to provide details about how fragids are interpreted for that media type. This document recommends best practices for the authors of media type definitions, for the authors of structured syntax suffix registrations (such as +xml ), for the authors of specifications that define fragid structures , and for authors that publish documents that are intended to be used with fragids or who refer to fragments within documents using URIs with fragids.

Media type registrations should ensure that fragids matching syntax "inherited" from top-level types such as image/* and +suffix registrations such as for +xml are always interpreted ~~based on~~ in the same way as specified for that generic ~~processing.~~ processing . If the possible syntaxes of "inherited" fragid ~~syntaxes~~ structures overlap and may provide inconsistent meanings or processing for the same fragid, ~~they~~ media type registrations should not adopt the +suffix. Where media type registrations reuse fragid structures that overlap, media type registrations should specify which take priority in resolving a given fragid. Media type registrations should also reserve the use of plain name fragids for local identifiers within content, and specify any restrictions on the interpretation of fragids by scripts. They should avoid defining new fragid structures within the registration document itself, and should avoid constraining how applications handle fragids that do not resolve.

Structured syntax suffix registrations are based on a metaformat which usually will have its own media type registration. The +suffix registration should define the same fragid rules as are used in that media type registration. Further, they should specify that any fragids that do not resolve according to these rules should be handled in the way specified by the specific media type adopting that +suffix .

The designers of fragid structures (such as XPointer) should avoid syntactic overlaps with existing fragid structures and ensure that fragids can be used across formats with similar semantics.

Publishers should ensure that any addressable structures within documents that are served through content negotiation are consistent across content-negotiated variants. They should also ensure that scripts handle fragids consistently with the fragid rules for the relevant media type. Authors referring to URIs with fragids should avoid using fragids that are specific to a particular document format (such as XPointer, which is specific to XML) unless they can ascertain that the base URI only serves one representation.

Introduction

Fragment identifiers ( fragids ) within URIs are used in three main ways:

to jump to, highlight, or zoom in to a particular piece of content when displaying a larger document
to identify a piece of content for extraction, for example for embedding within another document
to provide an identifier for either a piece of content or something described within a document that can be used as the basis of annotation

When URIs contain fragids, they are interpreted based on the media type of the representation that is retrieved when the URI is requested. The Generic Syntax for URIs ~~[ URI ]~~ [[!URI]] states:

The semantics of a fragment identifier are defined by the set of representations that might result from a retrieval action on the primary resource. The fragment's format and resolution is therefore dependent on the media type ~~[ RFC2046 ]~~ [[RFC2046]] of a potentially retrieved representation, even though such a retrieval is only performed if the URI is dereferenced. If no such representation exists, then the semantics of the fragment are considered unknown and are effectively unconstrained. Fragment identifier semantics are independent of the URI scheme and thus cannot be redefined by scheme specifications.

Individual media types may define their own restrictions on or structures within the fragment identifier syntax for specifying different types of subsets, views, or external references that are identifiable as secondary resources by that media type. If the primary resource has multiple representations, as is often the case for resources whose representation is selected based on attributes of the retrieval request (a.k.a., content negotiation), then whatever is identified by the fragment should be consistent across all of those representations. Each representation should either define the fragment so that it corresponds to the same secondary resource, regardless of how it is represented, or should leave the fragment undefined (i.e., not found).

Media Type Specifications and Registration Procedures includes a "Fragment identifier considerations" section within the template for registering media types and says:

Media type registrations can specify how applications should interpret fragment identifiers (specified in section 3.5 of ~~[ RFC3986 ])~~ [[RFC3986]]) associated with the media type.

Media types are encouraged to adopt fragment identifier schemes that are used with semantically similar media types. In particular, media types that use a named structured syntax with a registered "+suffix" ~~must~~ MUST follow whatever fragment identifier rules are given in the structured syntax suffix registration.

Problems arise when a media type wishes to adopt several fragid structures because of its similarity with other media types and/or its use of a metaformat . For example, as well as defining its own method of interpreting fragids, SVG ~~[ SVG11 ]~~ [[SVG11]] has the media type image/svg+xml and therefore must follow the rules for fragids that are common to all XML documents ( XML Media Types Draft ). As an image format, it should also use the common fragid structure for images ( Media Fragments URI 1.0 ~~[ MEDIA-FRAGMENTS ]).~~ [[MEDIA-FRAGMENTS]]). If RDF is embedded within the SVG through RDF/XML or RDFa, fragids might additionally have RDF semantics and be used to refer to real-world things pictured within the SVG. Finally, if fragids are interpreted by scripts embedded within the SVG, they may have yet another purpose: to encode local application state. This is described in detail in ~~section B. Analysis~~ .

SVG is only one example of a media type in which conflicts between different uses of fragids occur. XHTML, which also uses the +xml suffix, contains scripts that may interpret fragids and may be used to carry data interpreted according to RDF semantics.

This document recommends some Best Practices for those registering media types, those registering structured syntax suffixes, the authors of fragid structures and individual document authors. Other issues with using fragment identifiers, such as unicode normalisation, internationalisation and the use of fragids with Javascript APIs, are not discussed in this document.

4. Best Practices for Media Type Registrations

Individual media type registrations define how fragids should be interpreted when found in documents of that media type. These registrations must balance the following goals:

enable consistent processing of fragids by applications that are aware of the specific media type and by generic processors for types that share the same metaformat (if applicable)
facilitate content negotiation between documents of the media type and other media types that might be used for representations of the same resource
if the media type supports scripting, enable publishers to use fragids to encode application state where appropriate

Generic ~~applications~~ processors may process documents of a particular media type without knowing about the specific rules that apply to that media type as specified in its registration. For example, a browser might always attempt to display any text/* document as text, or any application/*+xml document using XML syntax highlighting. The interpretation of fragids by media-type-aware applications should match the behaviour of these generic ~~applications,~~ processors, so that the same fragment is identified whether or not the application has built-in knowledge of the media type.

As specified in Media Type Specifications and Registration Procedures ~~[RFCXXXX],~~ [RFC6838], media type registrations that adopt a registered +suffix , such as +xml or +json, must follow whatever fragid rules are specified in the +suffix registration . This ensures that there is consistency in processing between generic ~~applications~~ processors that understand the metaformat and those that are aware of the specific media type that uses the +suffix. Similar considerations also apply, however, to other fragid structures that may be used by generic processors, for example those that perform generic processing based on the top-level type ( text, image and so on).

For example, the media type registration for application/rdf+xml must include fragid rules that adhere to those specified in the +suffix registration for +xml. If a application/rdf+xml document contained an element with a @xml:id attribute with the value me then the fragid #me would be interpreted as referring to that element by generic XML processors. It would be inconsistent for other applications to interpret the #me fragid to refer to a person, and the media type registration for application/rdf+xml should not allow such an interpretation.

Another source of constraints on fragid structures supported by a media type is support for content negotiation. When multiple representations with different media types are served for the same URL, fragids should be used consistently across those documents, either identifying content with the same semantic content in each representation, or giving an error.

For example, it might be anticipated that documents in ABC Music Notation with a media type of text/vnd.abc would frequently be served up through content negotiation alongside documents in MusicXML with a media type of application/vnd.recordare.musicxml+xml . People referring to music may want to reference particular bars within the musical score using fragids. To enable this to happen, both media types would need to support the same fragid structure, so that the same bar could be identified regardless of which content negotiated representation was served up. In this example, the two formats do not share the same metaformat and are not within the same top-level type: the need for consistency arises because the two formats have the same semantic content.

With multiple potential fragid structures to comply with, there's the potential for the syntaxes of those structures to overlap with each other, which means that any given fragid might:

identify the same part of the document under both interpretations
only identify a part of the document under one interpretation (the fragid being an error under the alternative interpretation)
identify different parts of the document under the different interpretations

The last of these possibilities is problematic, and it can be hard for someone writing a URI reference to know which of these three categories a given fragid falls into. In addition, if the base document is changed after a given URI reference is created, a fragid might switch category, and suddenly become problematic without the creator of the URI reference being aware of the error.

For these reasons, it is best if media types avoid syntactic conflicts between fragid structures. When syntactic overlaps occur due to a requirement to support different types of generic processing (ie to support generic processing based on the top-level type and generic processing based on a +suffix), the media type registrant should ensure that all fragids that are of the overlapping syntax identifies the same fragment in each; if that is not possible, the media type should not use the +suffix.

~~Best Practice 1: Ensure Consistent Generic Processing~~ Media type registrations should ensure consistent generic processing of ~~Overlapping Fragid Structures~~ ambiguous fragids

If two or more fragid structures used by different generic processors applicable to the media type ~~overlap in~~ use the same or overlapping syntax, ~~they should have consistent semantics for~~ any fragids that ~~use that common syntax.~~ syntactically match both SHOULD have the same semantic meaning and be processed in the same way by those generic processors.

There may also be syntactic overlaps between fragid structures that address application-specific fragments (as in the music notation example above), and between application-specific fragid structures and generic fragid structures. Applications that are aware of the application-specific fragid structures will know about and can therefore follow guidance within the media type registration about how to interpret fragids, so fragids that follow the syntax of more than one application-specific fragid structure can be resolved in a predictable and consistent manner across applications. The media type registration simply has to specify how this happens.

~~Best Practice 2: Specify Resolution of Fragids that Comply with Multiple Fragid Structures~~ Media type registrations should specify how to resolve ambiguous fragids

If there's the possibility for a fragid to comply with the syntax of multiple fragid structures used by the media type, and the fragid would identify different fragments in those cases, the registration ~~should~~ SHOULD specify ~~how such fragids are resolved.~~ which interpretation should be used by applications that understand the media type.

Plain names are a common type of fragid structure. A plain name fragid is a fragid that is used to identify a named structure within a document, such as one identified by an @id attribute in HTML, a @xml:id attribute in XML or the name of a function within a Python program. These fragids are opaque to processors and as such they do not normally include punctuation characters, though this depends on the language: in XML, for example, they usually match the NCName production from XML Namespaces ~~[ XML-NAMES11 ]~~ [[XML-NAMES11]] which means they can contain hyphens ( - ) and periods ( . ).

Plain name fragids are usually created by human authors but may also be generated by applications. They provide a good method of identifying content that is equivalent across content-negotiated variants of a document, for example paragraphs of text in French and Chinese that contain the same semantic content. Plain name fragids that do not identify a portion of a document are frequently used in Semantic Web applications as a way of providing an identifier for something described by the document.

~~Best Practice 3: Reserve Plain Name Fragids~~ Media type registrations should reserve plain name fragids

If the media type includes structures that can be given local names or identifiers, plain name fragids ~~should~~ SHOULD be reserved for addressing those structures.

Some media types support active content , whereby scripts provided by the publisher are used to manipulate the document while it is being viewed. Depending on the scripting support in the media type, such scripts may use the fragid to encode application state (see Identifying Application State for details). The presence of a script does not change what fragment a given fragid identifies, but individual scripts may extend the space of meaningful fragids for a particular document, by virtue of interpreting those fragids in code.

As described in ~~section 7. Best Practices for Document Authors~~ , the developers of scripts need to ensure that when a fragid's behaviour is defined by the media type of a document, the script handles it consistently with that definition. Media type registrations therefore need to make it easy for such developers to understand how fragids will be interpreted by other applications, and what syntax can be used by script developers to encode application state. For example, in HTML hash-bang URIs, in which the fragid starts with #!, are commonly reserved for interpretation by scripts.

~~Best Practice 4: Define Active Content Processing~~ Media type registrations should define active content processing of ~~Fragids~~ fragids

If the media type supports active ~~content,~~ content (scripts), the registration ~~should~~ SHOULD specify any constraints on how scripts may process fragids adhering to known fragid structures . The registration ~~may~~ MAY define a reserved syntax for fragids that are intended to be interpreted by scripts.

Aside from specifying support for plain name fragids and any fragid syntax reserved for use by scripts, individual media type registrations should not contain the specifications for media-type-specific fragid structures. Instead, registrants should consider creating a separate specification for the fragid structure, following the guidelines in ~~section 6. Best Practices for Fragid Structures~~ , and referencing that specification from the media type registration. This ensures that other media types with similar content can easily reference and reuse the same fragid structure.

~~Best Practice 5: Avoid Specifying Fragid Structures within~~ Media ~~Type Registrations~~ type registrations should avoid embedding fragid structure specifications

Media type registrations ~~should~~ SHOULD reference external fragid structures specifications where they exist, and the registrant ~~should~~ SHOULD create such specifications if required, rather than embedding the definitions of fragid structures within media type registrations.

It is possible for a given fragid used with a document to be an error in two ways:

the fragid might not match the syntax of any of the fragid structures used by the media type
the fragid might match the syntax but not resolve to a fragment of the document (for example because there is no named structure with a given plain name identifier)

There are several legitimate reasons for fragids to error in these ways. Fragids, particularly plain name fragids, are sometimes used within URIs to identify things described by the document, rather than a fragment within the document. Active content may interpret otherwise unrecognised fragids. A given document may have a content-negotiated variant for which the fragid is meaningful. Thus the purpose of a media type registration is to define how recognised fragids are to be resolved, not to constrain the syntax of fragids used for a given document. The behaviour of an application faced with a fragid that does not resolve to a fragment for whatever reason should be implementation defined.

5. Best Practices for +Suffix Registrations

+Suffix registrations are designed to enable generic processing of media types that share a metaformat such as XML ( +xml ) and JSON ( +json ~~). These~~ ), that is processing that does not rely on knowledge of the details of a specific media type, such as identifying elements within XML using XPointer. +Suffix registrations should describe the generic processing of fragids within documents that use the metaformat.

The processing of fragids for media types that adopt a +suffix and the media type for the metaformat itself should be identical. For example, fragids for +xml media types should be processed in the same way as fragids for the application/xml media type and fragids for +json should be processed in the same way as fragids for application/json. This ensures that generic processors designed for the generic media type can be used with the media types that adopt the +suffix.

~~Best Practice 6: Process Fragids~~ +Suffix registrations should process fragids in the ~~Same Way~~ same way as for the ~~Associated Media Type~~ associated media type

+Suffix registrations ~~should~~ SHOULD define fragid processing rules that are consistent with their associated media type.

As described in ~~section 4. Best Practices for Media Type Registrations~~ , individual media types are required by Media Type Specifications and Registration Procedures ~~[RFCXXXX]~~ [RFC6838] to follow the fragid rules given in the registrations for any +suffixes that they adopt. They may need to adopt several fragid structures to support other generic processing or for consistency with other types with which they share semantics. For example, image/svg+xml needs to follow the generic fragid processing specified by the +xml registration as well as the generic processing of fragids used to identify portions of images.

Fragid rules in +suffix registrations should therefore be focused on the generic processing of fragids for the metaformat . They should not specify the behaviour of fragids that fall outside those generic fragid structures, because if they did it would be hard for media types to adopt fragid structures aside from those specified by the +suffix registration.

~~Best Practice 7: Enable Additional Fragids~~ +Suffix registrations should enable additional fragids to be ~~Processed According~~ processed according to ~~Media Type~~ media type

+Suffix registrations ~~should not~~ SHOULD NOT classify as errors fragids that do not match the defined fragid syntax for the +suffix or that do not resolve to a fragment, or constrain what they identify; instead the +suffix registration ~~should~~ SHOULD say that such fragids are resolved according to rules in the registration of specific media type that adopts the +suffix.

As described in ~~section 4. Best Practices for Media Type Registrations~~ , plain name fragids are commonly used within media types to address named structures within a document. The ways in which these structures are named may be specified at the metaformat level, or at the specific media type level, or both. For example, XML itself defines a mechanism for naming elements within a document (using xml:id attributes ~~[ XML-ID ]~~ [[XML-ID]] and the ID attribute type ), but an XML-based markup language such as RDF/XML may specify an alternative semantics for plain name fragids, such as their RDF semantics.

Following the two best practices above ensures that plain name fragids which identify fragments through the generic processing of the metaformat have a consistent semantics based on that processing, while the semantics of those that do not identify a fragment according to that generic processing can be determined by the individual media type.

7. Best Practices for Document Authors

There are two categories of usage of fragids by document authors: publishing documents that contain addressable content or use scripts that process fragids, and using fragids within referenced URIs to address content on documents published by others.

7.1 Best Practices for Publishers

Fragids are not passed to servers for processing, but publishers can influence the interpretation of fragids within documents they publish, typically by naming structures within them so that they can be addressed through plain name fragids . As described in ~~[ WEBARCH ],~~ [[WEBARCH]], when publishers do this with content-negotiated resources, they should make sure that each plain name fragid identifies a set of fragments with consistent semantics across the content-negotiated representations.

~~Best Practice 10: Name Structures Consistently Across Content-Negotiated Representations~~ Publishers should name structures consistently across content-negotiated representations

Publishers ~~should~~ SHOULD ensure that structures identifiable with the same fragid in two content-negotiated representations have the same semantics. Equally, where two structures in content-negotiated representations have the same semantics, they ~~should~~ SHOULD be addressable through the same fragid.

Note that this best practice does not imply that every structure that is addressable within one content-negotiated representation must have an equivalent structure addressable by the same fragid in all other content-negotiated representations. It is likely that some fragids will have meaning only in one of the content-negotiated representations, for example because they are interpreted by a script within an HTML representation but not in any others.

Publishers are also responsible for any scripts called from documents that they publish. Scripts can enhance the display of fragments within documents, for example by smoothly scrolling to the relevant area of the document, highlighting it, or zooming in or out to the selected area. Scripts may also process and alter fragids as a way of managing application state, as described in Identifying Application State . Scripts can also be used to map a fragid on a document that contains an embedded resource (such as an image or video) into a fragid that applies to that embedded resource. Scripts can use the fragid of the location navigated to within an iframe to alter the display of the embedding document.

Scripts do not change what a fragid identifies, but they can change what users see. For users viewing a document, it is helpful if scripts handle fragids in a way that is consistent with how they are resolved based on the media type. For example, given an HTML document, it would be confusing if a fragid #example were interpreted by a script as meaning that all instances of the word example should be highlighted, rather than scrolling to the HTML anchor named example. As described in , media type registrations may specify constraints on how fragids are handled by scripts, and may specify a reserved syntax for scripts that wish to use fragids to store application state.

~~Best Practice 11: Scripts Must Adhere~~ Publishers should ensure scripts adhere to ~~Constraints Specified~~ the constraints specified within ~~Media Type Registrations~~ media type registrations

Scripts ~~must~~ SHOULD adhere to any constraints that are placed on their behaviour in the appropriate media type registration.

7.2 Best Practices for Referrers

Different fragids are understood by different processors and have different longevity and utility across content-negotiated representations of a resource:

plain name fragids are typically author-generated, and therefore are relevant for so long as the structure that is named can be found within the document; referencing them can be useful when dealing with content-negotiated resources as publishers should ensure that they are consistently used across representations, but they generally require interpretation by processors that are aware of the particular media type of the representation
referencing semantic fragid structures such as Media Fragment URIs is particularly useful when addressing content-negotiated resources or resources that do not support plain name fragids; they are also likely to be more robust than syntax-based fragids in the face of changes, but like plain name fragids usually require interpretation by processors that are aware of the specific media type of the representation
syntax-based fragid structures such as XPointer are useful when targetting a representation that uses a known format, particularly as they can be used by generic processors without knowledge of the underlying media type; they can be fragile in the face of changes to the underlying document, however, and do not generally work across content-negotiated variants (and when they do, they are likely to identify different fragments)

In general, plain name and semantic fragids are much more useful than syntax-based fragids, the exception being when specifically targetting fragments of a document in a particular format for processing.

~~Best Practice 12: Do Not Use Syntax-Based Fragids~~ URI references should not use syntax-based fragids with ~~Multiple Content-Negotiated Variants~~ content-negotiated documents

Authors ~~should not~~ SHOULD NOT use URIs with syntax-based fragids unless they can ascertain (through server documentation, an HTTP OPTIONS request or other methods) that the base URI addresses a resource with a single representation format.

A. Acknowledgements

Many thanks to Robin Berjon, Tim Berners-Lee, Marcos Caceres, Richard Cyganiak, Sebastian Hellmann, Yehuda Katz, Yves Lafon, Chris Lilley, Peter Linss, Ashok Malhotra, Larry Masinter, Noah Mendelsohn, Jonathan ~~Rees~~ Rees, Alex Russell and Henry Thompson for their comments, and to Robin Berjon for ReSpec.js .

Existing Fragment Identifier Structures

This appendix details existing fragment identifier structures at time of writing. These are listed here with the aims of:

helping authors to avoid using existing fragid syntax within their scripts in incompatible ways (particularly where there is the possibility of embedded resources or content negotiation)
helping creators of fragid structures to avoid overlaps with existing fragid structures
identifying common syntactic patterns within fragid structures

Media Fragment URIs

The Media Fragment URIs specification [[MEDIA-FRAGMENTS]] defines a fragid structure for images, videos and audio. They cover identification of spatial areas, time segments, tracks or named segments.

Note that these kinds of resources are likely to be embedded within HTML pages. Authors of HTML pages that embed images, video or audio and that want to support addressing areas within images, times within video and so on should use this fragment identifier syntax for the HTML page.

The syntax for fragids understood by applications that adhere to the Media Fragment URIs specification is:

namevalues = namevalue *( "&" namevalue )
namevalue  = name [ "=" value ]
name       = fragment - "&" - "="
value      = fragment - "&"

The names within this syntax whose interpretation is defined within the specification are:

B. id is used to address named structures within the media
track is used to denote one or more tracks in the media
t is used to identify times within the media
xywh is used to identify a rectangular area within the media

The legal values for each of these are dependent on the name.

XPointer

XPointer [[XPTR-FRAMEWORK]] defines a fragid structure that is used to address points, elements and ranges within XML. It has a generic syntax described by the EBNF:

[1]   	Pointer        ::=   	Shorthand | SchemeBased
[2]   	Shorthand      ::=   	NCName
[3]   	SchemeBased    ::=   	PointerPart (S? PointerPart)*
[4]   	PointerPart    ::=   	SchemeName '(' SchemeData ')'
[5]   	SchemeName     ::=   	QName
[6]   	SchemeData     ::=   	EscapedData*
[7]   	EscapedData    ::=   	NormalChar | '^(' | '^)' | '^^' | '(' SchemeData ')'
[8]   	NormalChar     ::=   	UnicodeChar - [()^]
[9]   	UnicodeChar    ::=   	[#x0-#x10FFFF]

The two XPointer Schemes that are part of the core of XPointer are:

element() is used to address elements through XML identifiers and child element counts
xmlns() is used to bind prefixes to XML namespaces

These and other XPointer schemes are registered within the XPointer Scheme Registry .

JSON Pointer

RFC6901: JavaScript Object Notation (JSON) Pointer defines a fragid structure for JSON documents, with URL-escaped versions of the syntax:

json-pointer    = *( "/" reference-token )
reference-token = *( unescaped / escaped )
unescaped       = %x00-2E / %x30-7D / %x7F-10FFFF
   ; %x2F ('/') and %x7E ('~') are excluded from 'unescaped'
escaped         = "~" ( "0" / "1" )
  ; representing '~' and '/', respectively

This effectively covers all fragids that start with a forward slash ( / ) character, and specifies a path into the JSON document.

RFC6901 does not specify that JSON Pointer fragid structures should be used with JSON or JSON-based media types. Developers who wish to enable content negotiation between JSON-based media types and HTML should avoid directly using JSON Pointer syntax, as this is likely to conflict with fragids used in frameworks such as Ember.js .

PDF Fragid Syntax

[[RFC3778]] defines the application/pdf syntax and includes a section that summarises the fragids that are supported for PDF documents. The syntax for these are a sequence of name=value pairs separated by either ampersands ( & ) or hash signs ( # ), where each pair is taken as a further action on the basis of the previous pair. The supported names are:

nameddest opens a PDF at a named location or view
page identifies a particular page
zoom zooms to a particular zoom level and to a particular region of the selected page
view identifies a particular portion of the page
viewrect identifies a particular zoomed in portion of the page as a rectangle within it
highlight highlights a rectangular portion of the page

Text/Plain Fragid Syntax

URI Fragment Identifiers for the text/plain Media Type defines a fragid structure for text/plain documents. The syntax for these structures is:

text-fragment   =  text-scheme 0*( ";" integrity-check )
text-scheme     =  ( char-scheme / line-scheme )
char-scheme     =  "char=" ( position / range )
line-scheme     =  "line=" ( position / range )
integrity-check =  ( length-scheme / md5-scheme )
                    [ "," mime-charset ]
position        =  number
range           =  ( position "," [ position ] ) / ( "," position )
number          =  1*( DIGIT )
length-scheme   =  "length=" number
md5-scheme      =  "md5=" md5-value
md5-value       =  32HEXDIG

This defines a set of fragids that follow the basic pattern of a number of name=value parts separated by semi-colons. The possible names are:

char indicates a point between characters or a range of characters within the text
line identifies a line within the text
length gives the length of the text, and is used as an integrity check to make sure the file hasn't altered since the text was edited
md5 gives the MD5 hash of the text, again used as an integrity check to make sure the file hasn't altered since the text was edited

Text/CSV Fragid Syntax

The URI Fragment Identifiers for the text/csv Media Type Internet Draft adopts a similar syntax to that used for text/plain:

csv-fragment =  rowsel / colsel / cellsel
rowsel       =  "row=" singlespec 0*( ";" singlespec)
colsel       =  "col=" singlespec 0*( ";" singlespec)
cellsel      =  "cell=" cellspec 0*( ";" cellspec)
singlespec   =  position [ "-" position ]
cellspec     =  cellrow "," cellcol [ "-" cellrow "," cellcol ]
cellrow      =  position
cellcol      =  position
position     =  number / "*"
number       =  1*( DIGIT )

This again uses a name=value syntax, but the values can contain semi-colons. The permitted names are:

row designates the row or range of rows in the CSV file
col identifies a column or range of columns in the CSV file
cell identifies a cell or range of cells in the CSV file

Hash-Bang Fragids

While not specified formally, there is a common practice of using fragment identifiers that start with #! (hash-bang fragids) for fragids that are interpreted by scripts within an HTML page.

Analysis

This appendix looks at various fragid structures that apply to SVG and uses as an example a simple bar chart at http://example.org/potter, which has an SVG representation:

<svg xmlns="http://www.w3.org/2000/svg" width="150px" height="120px" viewBox="0 0 300 225">
  <g stroke="grey" stroke-width="40">
    <line id="harry" x1="50" x2="50" y1="300" y2="50" />
    <line id="hermione" x1="100" x2="100" y1="300" y2="0" />
    <line id="ron" x1="150" x2="150" y1="300" y2="100" />
    <line id="hagrid" x1="200" x2="200" y1="300" y2="50" />
    <line id="dumbledore" x1="250" x2="250" y1="300" y2="150" />
  </g>
</svg>

which appears as:

~~B.1~~

Media Fragment URIs

The Media Fragment URIs specification ~~[ MEDIA-FRAGMENTS ]~~ [[MEDIA-FRAGMENTS]] defines a fragid structure for images, videos and audio. They cover identification of spatial areas, time segments, tracks or named segments.

Under that specification, the area covering the first two lines within the example bar chart can be addressed using a URI like:

http://example.org/potter#xywh=25,0,100,225

which would identify the area highlighted here:

The syntax for fragids defined as part of that specification is:

namevalues = namevalue *( "&" namevalue )
namevalue  = name [ "=" value ]
name       = fragment - "&" - "="
value      = fragment - "&"
; defined in RFC 3986
fragment      = *( pchar / "/" / "?" )
pchar         = unreserved / pct-encoded / sub-delims / ":" / "@"
unreserved    = ALPHA / DIGIT / "-" / "." / "_" / "~"
pct-encoded   = "%" HEXDIG HEXDIG
sub-delims    = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "="
; defined in RFC 5234
ALPHA         =  %x41-5A / %x61-7A   ; A-Z / a-z
DIGIT         =  %x30-39 ; 0-9
HEXDIG        =  DIGIT / "A" / "B" / "C" / "D" / "E" / "F"

This syntax essentially allows anything within a fragid, although applications that follow the specification will attempt to interpret any fragid on an image, audio or video representation as a set of name[=value] pairs separated by ampersands.

Named segments under this specification are addressable with the syntax id= id. Thus, the URI:

http://example.org/potter#id=hermione

could (assuming an application that recognises id attributes in SVG as naming segments addressable through fragid structures defined in the Media Fragment URI specifications) identify the second bar within the bar chart, which has been labelled as hermione.

~~B.2~~

XML Media Types

The XML Media Types Draft defines (among other things) syntax and processing for fragids for */*+xml media types. It states (emphasis added):

A family of specifications define fragment identifiers for XML media types. A modular syntax and semantics of fragment identifiers for the XML media types is as defined by the [XPointerFramework] W3C Recommendation. It allows simple names, and more complex constructions based on named schemes. The syntax of a fragment identifier part of any URI or IRI with a retrieved media type governed by the specification ~~must~~ MUST conform to the syntax specified in [XPointerFramework]. Conformant applications ~~must~~ MUST interpret such fragment identifiers as designating that part of the retrieved representation specified by [XPointerFramework] and whatever other specifications define any XPointer schemes used. Conformant applications ~~must~~ MUST support the 'element' scheme as defined in [XPointerElement].

A registry of XPointer schemes [XPtrReg] is maintained at the ~~W3C .~~ W3C. Unregistered schemes ~~should not~~ SHOULD NOT be used.

When an XML-based MIME media type follows the naming convention '+xml', the fragment identifier syntax for this media type ~~may~~ MAY restrict the syntax to a specified subset of schemes, but ~~must~~ MUST support barenames and 'element' scheme pointers. It ~~may~~ MAY further allow other registered schemes such as the xmlns scheme and other schemes.

If [XPointerFramework] and [XPointerElement] are inappropriate for some XML-based media type, it ~~should not~~ SHOULD NOT follow the naming convention '+xml'.

The XML Media Types draft thus defers the interpretation for fragids for */*+xml media types to XPointer. XPointer specifies the syntax:

[1]   	Pointer        ::=   	Shorthand | SchemeBased
[2]   	Shorthand      ::=   	NCName
[3]   	SchemeBased    ::=   	PointerPart (S? PointerPart)*
[4]   	PointerPart    ::=   	SchemeName '(' SchemeData ')'
[5]   	SchemeName     ::=   	QName
[6]   	SchemeData     ::=   	EscapedData*
[7]   	EscapedData    ::=   	NormalChar | '^(' | '^)' | '^^' | '(' SchemeData ')'
[8]   	NormalChar     ::=   	UnicodeChar - [()^]
[9]   	UnicodeChar    ::=   	[#x0-#x10FFFF]

For example, because SVG is XML, the second of the line elements in the SVG bar chart can be addressed using:

http://example.org/potter#element(/1/1/2)

This is highlighted in the following XML:

<svg xmlns="http://www.w3.org/2000/svg" width="150px" height="120px" viewBox="0 0 300 225">
  <g stroke="grey" stroke-width="40">
    <line id="harry" x1="50" x2="50" y1="300" y2="50" />
    <line id="hermione" x1="100" x2="100" y1="300" y2="0" /> 
    <line id="ron" x1="150" x2="150" y1="300" y2="100" />
    <line id="hagrid" x1="200" x2="200" y1="300" y2="50" />
    <line id="dumbledore" x1="250" x2="250" y1="300" y2="150" />
  </g>
</svg>

The scheme used within a scheme-based XPointer determines what it identifies; the element() XPointer scheme used above is used to identify element nodes for example. The XPointer Framework specification also states:

A shorthand pointer, formerly known as a barename, consists of an NCName alone. It identifies at most one element in the resource's information set; specifically, the first one (if any) in document order that has a matching NCName as an identifier.

This defines the semantics of a simple fragment identifier, such that a URI such as:

http://example.org/potter#hermione

means an element within the XML information set, in this case the second line element node.

~~B.3~~

SVG Fragment Identifiers

SVG itself describes how fragids can be used to identify views on SVG content. It says:

An SVG fragment identifier can come in two forms:

Shorthand bare name form of addressing (e.g., MyDrawing.svg#MyView). This form of addressing, which allows addressing an SVG element by its ID, is compatible with the fragment addressing mechanism for older versions of HTML.

SVG view specification (e.g., MyDrawing.svg#svgView(viewBox(0,200,1000,1000))). This form of addressing specifies the desired view of the document (e.g., the region of the document to view, the initial zoom level) completely within the SVG fragment specification. The contents of the SVG view specification are the five parameter specifications, viewBox(...), preserveAspectRatio(...), transform(...), zoomAndPan(...) and viewTarget(...), whose parameters have the same meaning as the corresponding attributes on a ‘view’ element, or, in the case of transform(...), the same meaning as the corresponding attribute has on a ‘g’ element).

SVG's fragids are conformant with XPointer: they follow the same syntax and are defined in the terms given in XPointer. For example, the URI:

http://example.org/potter#hermione

in this case will address the element with the ID hermione, highlighted here:

<svg xmlns="http://www.w3.org/2000/svg" width="150px" height="120px" viewBox="0 0 300 225">
  <g stroke="grey" stroke-width="40">
    <line id="harry" x1="50" x2="50" y1="300" y2="50" />
    <line id="hermione" x1="100" x2="100" y1="300" y2="0" /> 
    <line id="ron" x1="150" x2="150" y1="300" y2="100" />
    <line id="hagrid" x1="200" x2="200" y1="300" y2="50" />
    <line id="dumbledore" x1="250" x2="250" y1="300" y2="150" />
  </g>
</svg>

When fragids of this kind are used, SVG uses CSS's :target pseudo-class which enables the identified element to be highlighted. For example, the SVG:

<svg xmlns="http://www.w3.org/2000/svg" width="150px" height="120px" viewBox="0 0 300 225">
  <style type="text/css">
    line:target { stroke: red; }
  </style>
  <g stroke="grey" stroke-width="40">
    <line id="harry" x1="50" x2="50" y1="300" y2="50" />
    <line id="hermione" x1="100" x2="100" y1="300" y2="0" />
    <line id="ron" x1="150" x2="150" y1="300" y2="100" />
    <line id="hagrid" x1="200" x2="200" y1="300" y2="50" />
    <line id="dumbledore" x1="250" x2="250" y1="300" y2="150" />
  </g>
</svg>

means that the URI

http://example.org/potter#hermione

is displayed with the second line (identified as hermione ) stroked in red:

SVG introduces a svgView() XPointer scheme that is used to describe views onto SVG images; one possible argument is viewBox(), which selects a particular area of an image in the same way as the xywh parameter defined for Media Fragment URIs described above. Thus the URI:

http://example.org/potter#svgView(viewBox(25,0,100,225))

identifies the area of the chart that covers the first two bars.

~~B.4~~

Active Content

SVG, like HTML, enables scripts to be embedded within documents and to respond to events such as clicks on particular parts of the content. Active content can read the document location and base the behaviour of the script on the fragid.

For example, the following SVG parses the fragid that's used to access the bar chart and uses it to highlight one of the bars:

<svg xmlns="http://www.w3.org/2000/svg" width="150px" height="120px" viewBox="0 0 300 225" onload="highlight()">
  <script type="application/ecmascript">
    function highlight () {
      var id = document.location.hash.substring(1);
      if (id) {
        var element = document.getElementsByTagName('line')[id];
        if (element) {
          element.setAttribute('stroke', 'red');
        }
      }
    }
  </script>
  <g stroke="grey" stroke-width="40">
    <line id="harry" x1="50" x2="50" y1="300" y2="50" />
    <line id="hermione" x1="100" x2="100" y1="300" y2="0" />
    <line id="ron" x1="150" x2="150" y1="300" y2="100" />
    <line id="hagrid" x1="200" x2="200" y1="300" y2="50" />
    <line id="dumbledore" x1="250" x2="250" y1="300" y2="150" />
  </g>
</svg>

The URI:

http://example.org/potter#2

thus highlights the second bar within the bar chart.

In this case, the recognised syntax of the fragid is determined by the script, which recognises any numeric fragid between one and five. The fragid has no declarative semantics -- there is no specification that says what it means -- but in effect this script supports the identification of a bar of the bar chart through a fragid.

~~B.5~~

Semantic Content

SVG allows extensions; any element in a different namespace will be ignored by SVG processors. This facility can be used to embed semantic content through RDF/XML.

The following example contains some RDF/XML which makes some basic assertions about the resource

http://example.org/potter#hermione

This resource has been identified with a fragid within an SVG document like this:

<svg xmlns="http://www.w3.org/2000/svg" width="150px" height="120px" viewBox="0 0 300 225">
  <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:foaf="http://xmlns.com/foaf/0.1/">
    <foaf:Person rdf:about="#hermione">
      <foaf:name>Hermione Granger</foaf:name>
    </foaf:Person>
  </rdf:RDF>
  <g stroke="grey" stroke-width="40">
    <line x1="50" x2="50" y1="300" y2="50" />
    <line x1="100" x2="100" y1="300" y2="0" />
    <line x1="150" x2="150" y1="300" y2="100" />
    <line x1="200" x2="200" y1="300" y2="50" />
    <line x1="250" x2="250" y1="300" y2="150" />
  </g>
</svg>

In this example, the id attributes have been removed from the line elements. If they were still present as in the previous examples, the fragid #hermione would be interpreted as a line element by XML processors and as a Person by RDF processors.

In semantic content, fragids can mean anything. In this particular example, we can tell from the RDF that the above URI means the person named Hermione Granger. It is common practice when using RDF for URIs that include fragids to be used to refer to things that are described by the document retrieved at the base URI, as this makes it easy to serve RDF content.

Introduction

2. Terminology

4. Best Practices for Media Type Registrations

5. Best Practices for +Suffix Registrations

6. Best Practices for Fragid Structures

7. Best Practices for Document Authors

7.1 Best Practices for Publishers

7.2 Best Practices for Referrers

A. Acknowledgements

Existing Fragment Identifier Structures

Media Fragment URIs

XPointer

JSON Pointer

PDF Fragid Syntax

Text/Plain Fragid Syntax

Text/CSV Fragid Syntax

Hash-Bang Fragids

Analysis

Media Fragment URIs

XML Media Types

SVG Fragment Identifiers

Active Content

Semantic Content