As the Web has evolved from a Web of documents to a Web of applications, the use of the
#, in URIs has evolved correspondingly. Originally introduced
as a static "fragment identifier" to identify locations in a document,
it is now being used in many more complex ways, for example, by SVG and PDF to select from and render documents and as arguments to Web applications that are interpreted
to the client-side application, such as the actual URI of a video to be played to a
video player, or the position and zoom to a map. Unlike query parameters preceded
?, the characters in the URI bar after the
# can be changed without causing the page to be reloaded.
Such uses of the "fragment identifier" have interesting and different properties, and
the usage differs from the way it is described in existing specifications. Recently added functionality in [HTML5] (
history.replaceState()) allows browser history to be changed without causing a page reload thereby providing an alternative to the use of fragment identifiers to identify application state.
This document explores the issues that arise from these new uses of fragment identifiers and attempts to define best practices. We argue that, in many cases, the use of query parameters along with the new HTML5 functionality mentioned above is preferable to fragment identifiers to identify application state.
This document has been developed for discussion by the W3C Technical Architecture Group and is being published as a Public Working Draft in order to get additional input from the Web community.
Publication of this draft finding does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time.
2 Use Case Scenarios
2.1 Addressing Into Multimedia Streams -- CNN
2.1.1 Things to Note
2.1.2 Extrapolating From This Pattern
2.2 Creating URIs for Media Fragments
2.3 Interaction State and Browser History
2.4 AJAX Libraries And State Management
2.5 Other Examples
2.6 The Naked Hash-Ref
3 Managing Browser History
4 Architectural Questions
5 Recommended Best Practices
6 Should Existing Specifications be Updated to Cover New Usage?
[RFC 3986] defines the character string following the
? sign in
a URI as the "query component". The character string following the
# sign is
known as the "fragment identifier" and used to address specific locations in a
document. Nearly twenty years later, the Web has built a strong set of conventions around how URI parameters are used. As transactional applications began moving on to the Web in the late 1990's, query parameters formed a core building block for how application state was communicated between client and server. In this phase of Web evolution, clients were still comparatively simple, and client-side URI parameters did not move beyond the use of fragment identifiers to identify parts of the document.
[RFC 3986] defines the use of fragment identifiers thus: "The fragment identifier component of a URI allows indirect identification of a secondary resource by reference to a primary resource and additional identifying information. The identified secondary resource may be some portion or subset of the primary resource, some view on representations of the primary resource, or some other resource defined or described by those representations."
This document explores the issues that arise in this context, and attempts to define best practices that help:
In this document we use the term "URI" to include URLs, URIs and IRIs.
This section discusses several Web applications that identify application state using fragment identifiers or query parameters and discusses the consequences of that choice.
When publishing multimedia streams, there is often a need to address into specific points in the multimedia stream, e.g., by using a time-index. The simplest means of doing this is to pass in the start-time as a query parameter in the URI to the server, e.g.
http://www.example.com/media.stream?start=03:06:09 and have the server start streaming the content starting at 3 hours, 6 minutes and 9 seconds into the content. This creates distinct URIs for each point in the media stream and these URIs can be used to bookmark locations of interest.
It is also possible to leverage client-side parameters encoded as part of the URI (using a # ), where this pseudo fragment identifier is used by client-side scripts as an argument to be passed to an appropriate locator function. Consider the following example taken from cnn.com:
<a href="http://www.cnn.com/video/#/video/tech/2008/02/19/vo.aus.sea.spider.ap"> Giant sea spider filmed deep underwater </a>
CNN uses links like the above for all the topical video segments that are published on its site. The URI in this case has the following components:
The browser does a GET on the URI leading up to the
The fragment identifier in this usecase is intentionally referred to as a client parameter.
Treating it as a regular fragment identifier in this usage would result in incorrectly inferring that the URI for the video resource being addressed is
This would result in all the video links on the CNN site getting the same URI.
Thus, the entire URI in this case is
A consumer of this URI who goes looking for an
id within the Response that matches the
#-suffix of this URI will fail.
The reported Content-Type for the resource is
text/html. However, the behavior of the
#-suffix in this case is not defined by the HTML specification.
As used, the
#-suffix is a first-class client parameter in that it gets consumed by a script that is served as part of the HTML document returned by the server upon receiving a GET request.
This embedded script examines the URI available to it as script variable
content.location, strips off the # and uses the rest of the suffix as an argument to a function that generates the actual URI.
Having constructed this content URI, the script then proceeds to instruct the browser to play the media at the newly constructed location.
The CNN example cited above is not unique with respect to its use of a
within the URI for encoding parameters to the receiving application. It shows that in a world of dynamic documents, the traditional fragment identifier need no longer be an
idref value that addresses an existing node in the serialized HTML making up the HTTP Response. In addition to possibly being a static
idref, the fragment identifier in the URI, the pattern demonstrated here and in other uses cases
discussed in this document generalizes to the following:
idrefto a dynamically generated node.
Also, modifying the behaviour using only the fragment identifier allows caching and pushing content through CDNs, which is a nice property.
The Media Fragments Working Group at W3C is developing a specification to address
spatial and temporal media fragments on the Web: [Media Fragments URI 1.0].
As the title of the specification states, the objective is to create URIs for
media fragments. This is done by using fragment identifiers. For example:
http://www.example.org/video.ogv#t=60,100. In this case, the user agent knows that the primary resource is
http://www.example.org/video.ogv and that it is expected to display the portion of the primary resource that relates to the fragment
#t=60,100, i.e. seconds 60 to 100. The syntax is very rich and allows media fragments to be identified along a number of different dimensions.
The fragment identifiers are constructed as name-value pairs separated by the
The relationship between the primary resource and the secondary resource is, in these cases, quite complicated. Selection of the fragment may involve decompression, mapping fragment ranges to byte offsets and other complex calculations. Depending on the format and the capabilities of the user agent, some of this may be done at the user agent and some at the server. Retrieval of the primary resource may be accomplished using several requests to conserve bandwidth and adapt to network conditions.
In some cases, for optimization reasons or because the user agent cannot perform the fragment to byte mapping, only the bytes required for the fragment are retrieved directly from the server. HTTP header extensions are used to accomplish this.
In summary: the [Media Fragments URI 1.0] specification does create URIs for media fragments using fragment identifiers but the derivation of the secondary resource from the primary resource is complex and specialized to the type of the media stream and the storage format. This is discussed further in 6 Should Existing Specifications be Updated to Cover New Usage?.
A variety of methods are available in Web Architecture to save application state. Cookies store information on the client-side that is sent along with the GET request. Similarly, data can be stored on the server-side -- in a database, for example, identified by a cookie -- and can be used to change the details of the GET request. There are also specifications under development (See [Web Storage]) that extend the cookie mechanism in several directions. These specifications allow large amounts of data to be stored on the client and can also be used to encode application state.
These mechanisms, however, encode private applications states.
In some cases, an application may want to allow selected states to be made public
and shareable. For this we require a URI, appropriately decorated with client-side and server-side parameters. The challenge in designing a mechanism to encode state
is to preserve the familiar user experience especially to
make the back button do the right thing. For live examples of this design pattern, see GMail and Google Maps both of which take extreme care to ensure that the user's expectations of
Web interaction are preserved. These applications use
iframe proxies to achieve the desired effect.
A very early interactive Web application was the [Xerox Parc Map Viewer]. When you bring up the application it shows you, by default, a map of the world. If you select a spot on the map it changes to show you a map centered on the selected spot. Users can interact with the map in various ways: pan, zoom, select degree of detail, etc. Each interaction is encoded as a parameter in a URI which is sent back to the server. The server generates a new map from this URI and refreshes the page. The format of the URI is:
If you point your browser at
http://maps.google.com an an HTML page is returned.
determines the default location, depending on whether Geolocation is enabled, etc.
Based on this, an XHR request is made to the server to pull down the tiles for the default map
If you work with Google Maps you
will notice that even after you have customized the map by enetering a different
location, adding nearby attractions or scrolling, panning a or zooming, the
address bar has not changed - it still says
http://maps.google.com/. If you want a link to the displayed map, you click the "Link" button on the right and it gives you a URI to the map displayed. For example:
Notice the structure of this URI: it includes the address as well as other parameters. The URI that Google Maps creates for the customized map has a long query parameter string but no fragment identifier. Thus, the Web paradigm is preserved: maps are displayed as documents, the back button works, and if you use the "Link" button, each map has a distinct URI that can be transmitted in email or an instant message and used to regenerate the map.
processes your mail. For example,
identifies the inbox and a specific piece of mail in the inbox. If a piece of mail
is not selected, the fragment identifier merely identifies the inbox:
Note that this "URI" only works for your mailbox. It cannot be mailed to someone
else be and used by them to address into their mailbox.
AJAX applications use features of Dynamic HTML (DHTML) to create highly reactive user experiences. Updates to the Web user interface in response to user actions no longer require a full page reload. Consequently, the user can perform a sequence of interaction steps while remaining on the same page at least as seen from the browser's perspective of
content.location. This makes for a good user experience, however,
additional facilities must be provided for the following:
Today, many of the details of AJAX programming have been abstracted away by higher level toolkits such as [Dojo AJAX Toolkit] and [google-gwt]. Management of interaction state and browser history is one of the key affordances implemented in these libraries. History mechanisms in AJAX libraries like GWT and Dojo share a lot in common, and the approach can be traced back to Really Simple History (RSH).
The basic premise is to keep track of the application's internal state in the URI fragment identifier. Here the mantra "give everything a URI" can beneficially be extended to Web applications that use active content. This works because updating the fragment identifier to change the state doesn't typically cause the page to be reloaded. This approach has several benefits:
history.replaceState()) now allows browser history to be changed without causing a page reload. See below.
One of the techniques that is used to provide this functionality is to open a number
of frames within a browser window. In such an architecture, parent and child frames
are allowed to change each others'
location URI as long as the frames
information from the same domain or have agreed to collaborate by some other
means. Otherwise, changing a frame's
location URI opens up a cross-site scripting vulnerability.
If the frames can collaborate, then one of the frames, say the parent, passes data to
the child via a fragment identifier by resetting the child's
location URI. Thus, given a parent frame
P and a child frame
C , where the location URIs
U_C may come from different domains, the parent frame might pass data to the child by resetting its location URI to
U_C#data; the child picks up this data by polling for changes in its location URI. This technique is used in Comet Programming. As an example, the [Dojo AJAX Toolkit] uses an
IFrame proxy to enable cross-domain XML HTTP Requests. this is a useful technique when writing cross-site mashups. As an example, see XKCD and AxsJAX a cross-site mashup that mashes together XKCD comics with their associated transcripts to create a speech-friendly XKCD experience.
When applications are built from Web parts, there is often a need to configure them when the application is launched. Traditional applications would call these default start-up or command-line options. We see the equivalent emerging for configuring desktop gadgets and widgets where command-line options are passed in via URI parameters. In this context, the URI is the Web command-line. For one sample implementation and its associated usage, see Using URIs To Pass Parameters To The Web. Dave Raggett's HTMLSlidy uses URIs of the form
...#(nn) to address into a deck of slides.
Similarly, [Superfeedr] allows you to subscribe to a fragment of a
document using a fragment identifier, for example,
http://www.nytimes.com/weather#.wCurrent%20.summary. [Addrable] uses a fragment identifier to
address into a dataset consisting of comma-separated values. In many such examples
the pattern is so obvious and intuitive that the user can just type in the URI.
In some situations a single
# sign is used as the value of the
reproducible states using fragment identifiers. The characters following the hash
is also desirable to change the URI in the address bar so URIs can be emailed and
bookmarked and the back-button works as expected.
location.href property can be used to get and set
the URI and the
can be used to get and set various parts of the URI.
These methods work in conjunction with the
window.onpopstate. event. See, for example, Manipulating Browser History.
onhashchange event enables code that reacts to
changes in the fragment identifier.
Changes to any part of the URI, other than the fragment identifier, causes the page
to be reloaded.
More recently, [HTML5] introduced the
which allow you to add and modify history entries, respectively.
These new facilities have an interesting consequence: the path, query or fragment identifier portion of a URI in the address
bar can be changed without causing a page reload. Thus, an application
can choose to identify a state with a URI that includes query parameters or a different path instead
of changing the fragment identifier. This has several advantages.
In situations where the URI is used in another context or with another tool, the entire URI is transmitted the server and it can
respond in a manner tailored to the capabilities of the client and do something
history.replaceState() are relatively new and browser uptake is changing as we speak. Currently, they are
supported by Google Chrome, Mozilla Firefox and Apple Safari but not by Microsoft IE9
or Opera. This is a problem because if you
use these facilities your website will not work correctly with IE9 or Opera, or at least not yet
The design patterns discussed in this document use fragment identifiers to
an application that wants to identify certain states that can be bookmarked
and reproduced must decide whether to mint URIs using fragment identifiers or
query parameters. Before [HTML5] introduced
the answer was clear: fragment identifiers could be changed without causing a page
reload. Most of the usecases discussed here chose that route. The disadvantage of
fragment identifier are not visible to search engines and web crawlers. As we have discussed
Google Maps goes to great lengths to get around this limitation by providing the
"Link" button which mints URIs for map states using query parameters.
scripting turned off due to security or performance concerns? For the same URI,
is likely to be very different from user-agents that do. Notice, further, that the HTTP Response headers do not give the client any indication that this is likely to be so.
Until now, URIs have been equally useful to browsers and other kinds of user agents
with different capabilities. This pattern demonstrates a case where the behavior for
the same URI is different depending on whether scripting is supported.
A user-agent with scripting disabled that receives a URI of the form
and sees a
text/html might incorrectly
assume that the URI for this video resource is
[Jeni Tennison's Blog] discusses in detail the behavior of two applications
that use fragment identifiers in this way and shows how their behavior is different for agents
that do and do not support scripting. She also quotes statistics that 2% of users do
We could choose to ignore this philistine minority
but there is a deeper problem. Search engines, web crawlers and the like do not
scanned by the Google search engine, for example. To get around this, Google
#! convention which converts fragment identifiers into server-side
parameters. This is discussed below.
With the introduction of
history.replaceState() the choice
less clear. Using these facilities, states can be identified by URIs that include
query parameters (or URI paths) and these URIs can be pushed onto the history stack and bookmarked
without causing a page reload. This has several advantages. In copy-and-paste
situations the entire URI is sent to the server and the server can decide, based on
the client's capabilities. how to respond. Search engines and web crawlers work as
expected. The only problem is that, as discussed above, these features are currently
not supported by all browsers.
RECOMMENDATIONS FOR APPLICATION DEVELOPERS:
Applications should be designed so that as the state of the resource and the display changes, sharable, reproducible states should be identified by URIs.
This a special case of "use URIs to name things". There are things that happen when you manipulate the user interface. You can name the resulting state with a URI or you can name it some other way. If you use a URI then you have a control you can move out of the page.
Once the state is identified with an URI, the address bar should be updated to reflect the change as discussed above in 3 Managing Browser History so that the user's Web experience is preserved.
The URI that is generated to identify application states can either use query parameters or URI paths or it can use fragment identifiers.
history.replaceState()have to be used to manage the browser history. This has the advantage that if the URI is moved and reused the entire URI is transmitted to the server and the server can make intelligent decisions based on the capabilities of the client. It has the disadvantage that these new capabilities may not be supported by all browsers and that may limit the reach of the website.
#!convention. See [Making AJAX Applications Crawlable]. If you use this convention then client-side fragment identifier parameters are converted into server-side parameters for the search engine. So far, this special convention works only for Googlebots. If
history.replaceState()become generally accepted it may wither away. [Jeni Tennison's Blog] discusses the
#!convention in more detail
For HTML and XML the fragment identifier processing rules are defined in
[RFC 2854] and [XPointer] respectively. Essentially,
the fragment identifier is a pointer into a document. If the value of the fragment
identifier equals the
id of an element in the document, then it identifies that
The media fragments case is a bit different. Most of the media type specifications do not specify fragment identifier processing rules. Thus, they do not explicitly violate [RFC 3986] although the relationship between primary and secondary resources is quite a bit more complex than [RFC 3986] describes. In some cases they fetch the fragment directly from the server rather than selecting it on the client.
RECOMMENDATIONS FOR MEDIA TYPE REGISTRATIONS:
All media type specifications and registrations, especially new types, must specify fragment identifier semantics for both static use and use in active content as appropriate. The text/html and application/xhtml+xml media types defined for HTML5 need to define the use of fragment identifiers with active content.
Extend the definition in the media types that accept "active content", like HTML and SVG, to acknowledge the fact that fragment identifiers might also be used (if not in contradiction with the 'static' use of those fragment identifiers) for programmatic purposes. The media type registration needs to say (for active content) how fragment identifiers are used as paramerts by the active content and how they may be used to identify the portion of the state that is reproducible and can be referenced externally.
As the Web has evolved from showing things to doing things i.e. from from showing documents to running applications where the applications use code running on the client to construct the display by retreiving and manipulating bits of information gathered from several sources, the need has arisen to identify application states that are reproducible and can be shared. This paper recommends that these states should be identified by URIs and, further, that the browser history should be managed so that users' expectations and the Web paradigm are preserved. The URIs for the application states can be constructed in a couple of different ways and this paper discusses the pros and cons of the different approaches.