Copyright © 2008 W3C® (MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark and document use rules apply.
Designers of URls have traditionally used ?
to encode server-side
parameters. At its inception, the Web also introduced fragment identifiers (preceded by #
)
as a means of addressing specific locations in a document.
As highly interactive applications get built using Web parts, there is an increasing need for encoding
interaction state as part of the URL. The Web is beginning to discover and codify design patterns based on fragment identifiers for many of these use cases.
This draft finding is being prepared in response to TAG issue #60. This document explores the issues that arise in this context, and attempts to define best practices that help:
Create URLs for intermediate pages in a Web application so that the back button does the right thing
Enable clients to address into specific points in a stream of content, e.g., video.
The goal of this finding is to initially collect the various usage scenarios that are leading to innovative uses of client-side URL parameters, along with the solutions that have been developed by the Web community. When this exercise is complete, this finding will conclude by ensuring that these design patterns are mutually compatible. If some of these usage patterns are identified as being in conflict, we will recommend best practices that help side-step such conflicts. We encourage the wider Web community to point us at emerging usage scenarios and design patterns so that we maximize our chances of arriving at a final finding that helps move forward the architecture of the Web in a self-consistent manner.
This document has been developed for discussion by the W3C Technical Architecture Group. This version, dated February 11, 2008 is the first draft of this finding and has been prepared for discussion at the upcoming Vancouver F2F.
Publication of this draft finding does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time.
Please send comments on this finding to the publicly archived TAG mailing list www-tag@w3.org (archive).
1 Introduction
2 Use Case Scenarios
2.1 Addressing Into Multimedia Streams
2.1.1 Things To Note
2.1.2 Extrapollating From This Pattern
2.1.3 Architectural Questions
2.2 Interaction State And Browser History
2.3 AJAX Libraries And State Management
2.4 Web Command Lines
2.5 Passing Data Among Frames
2.6 The Naked Hash-Ref
3 Recommended Best Practices
4 Conclusions
5 Open Issues
6 References
At the beginning of the Web, we decided to encode
server-side URL parameters
with a ?
. At the same time, the Web adopted #
to attach fragment identifiers to URLs so that user-agents could address into specific locations in an HTML document.
Nearly 20 years later, the Web has built a strong set of conventions around how URL parameters are used.
As transactional applications began moving on to the Web in the late 1990's,
server-side parameters formed a core building block for how application state was communicated
between client and server. In this phase of Web evolution, clients were still comparatively simple, and client-side URL parameters
did not move beyond the use of fragment identifiers.
But with Web 2.0 applications increasingly moving
traditional client-side applications to the Web, we are now seeing a variety of design patterns beginning to emerge with respect to how
client-side URL parameters are used in order to influence client interaction.
The need to remain consistent with the prevalent Web architecture
has seen these design patterns build on the existing mechanism of fragment identifiers in URLs.
This finding enumerates the various emerging patterns along with their associated use cases as a means of documenting existing practice on the Web.
This section enumerates the various usage scenarios that are leading to innovative uses of client-side URL parameters on the Web.
When publishing multimedia streams, there is often a need
to address into specific points in the multimedia stream,
e.g., by using a time-index. The simplest means of doing this
is to pass in the start-time as a server-side parameter in
the URL, e.g.,
http://www.example.com/media.stream?start=03:06:09
and have the server start streaming the content starting at 3
hours, 6 minutes and 9 seconds into the content. This has the
additional side-benefit of creating distinct URLs for each
point in the media stream and such URLs can be used to
bookmark locations of interest.
It is also possible to leverage client-side parameters encoded
as part of the URL (using a #
), where this
pseudo fragment identifier is used by client-side
scripts as an argument to be passed to an appropriate
locator function. Consider the following example
taken from cnn.com:
<a href="http://www.cnn.com/video/#/video/tech/2008/02/19/vo.aus.sea.spider.ap"> Giant sea spider filmed deep underwater</a>
CNN uses links like the above for all the topical video segments that are published on its site. The URL in this case has the following components:
Component | Value |
---|---|
Protocol | http |
Host | www.cnn.com |
Path | video |
Client Param | #/video/tech/2008/02/19/vo.aus.sea.spider.ap |
The browser is expected to do a GET of the URL leading up to the fragment, and the processing application, in this case, the JavaScript embedded in the HTML Response processes the portion of the URL following the
#
.
The fragment identifier has been intentionally identified as a client parameter.
Treating it as a regular fragment identifier in this usage would result in one incorrectly infering that the URL for the video resource being addressed ishttp://www.cnn.com/video
.
This would result in all the video links on the CNN site getting the same URL.
Thus, the entire URL in this case is http://www.cnn.com/video/#/video/tech/2008/02/19/vo.aus.sea.spider.ap
A consumer of this URL who goes looking for anid
within the Response that matches the#-suffix
of this URL will fail.
The reported Content-Type for the resource istext/html
. However the behavior of the#-suffix
in this case is not defined by the HTML specification.
As used, the#-suffix
is a first-class client parameter in that it gets consumed by ascript
that is served as part of the HTML document returned by the server upon receiving a GET request.
This embedded script examines the URL available to it as script variablecontent.location
, strips off the#
and uses the rest of the prefix as an argument to function that generates the actual URL.
Having constructed this content URL, the script then proceeds to instruct the browser to play the media at the newly constructed location.
The CNN example cited above is not unique with respect to its
use of #
within the URL for encoding parameters to
the receiving application.
It shows that in a world of dynamic documents,
the traditional fragment identifier need no longer be an idref
value that addresses an existing node in the serialized HTML making up the HTTP Response.
In addition to being a static idref
,
the fragment identifier in the URL,
the pattern demonstrated here
generalizes to the following:
An
idref
to a dynamically generated node.
A parameter to be consumed by the application that is delivered as the HTTP Response to the original GET request.
This section enumerates some of the questions raised by the described design pattern:
What if the returned HTML contains an element that has the same fragment ID as the one being used as a client-side parameter.
What should the correct behavior be in the face of such conflicts?
(1) To scroll down to that element
(2) play the video
(3) Error message
(4) Do nothing?
Until now, URLs have been equally useful to browsers and non-browser consumers. this pattern demonstrates a case where the URL infered by browsers vs non-browsers is different. A non-browser that receives a URL as in the above, and sees aContent-Type
oftext/html
might assume (incorrectly) that the URL for this video resource ishttp://www.cnn.com/video.html
.
A related fragment id meaning arises when one considers content-negotiation. For instance:
a) get application/rdf+xml "http://example.com/exp/#something"
b) get text/html "http://example.com/exp/#something"
Given that the fragment identifier leads to a subsequent request, who should process the error response if one should be raised by that subsequent request?
AKA make the back button do the right thing.
For live examples of this design pattern, see Google Maps
which takes extreme care to ensure that the back button works as the user would expect.
Google Maps uses iframe
proxies to achieve the desired effect.
AJAX applications use features of Dynamic HTML (DHTML) to
create highly reactive user experiences. Updates to the Web
user interface in response to user actions no longer require a
full page reload. Consequently, the user can perform a sequence
of interaction steps while remaining on the same
page at least as seen from the browser's perspective of
content.location
. This makes for a good user
experience, except for the following:
Recording key points in the interaction flow, e.g., for bookmarking.
Providing intuitive behavior for the browser's history mechanism.
Snapshoting interaction state so that one can return to a partially completed task at a later time.
Today, many of the details of AJAX programming have been abstracted away by higher level toolkits such as Dojo [dojo] and [google-gwt]GWT. Management of interaction state and browser history is one of the key affordances implemented in these libraries. History mechanisms in AJAX libraries like GWT and Dojo share a lot in common, and the approach can be traced back to Really Simple History (RSH). In addition, the mechanism described here has also been adopted by a recent update to GMail.
The basic premise
is to keep track of the application's internal state
in the url fragment
identifier. This works because updating the fragment doesn't typically cause
the page to be reloaded.
This approach has several benefits:
It's about the only way to control the browser's history reliably.
It provides good feedback to the user.
It'sbookmarkable
— i.e., the user can create a bookmark to the current state and save it, email it, or whatever.
When applications can be built of Web parts, there is a need to configure them at the point the application is launched.
Traditional applications would call these default start-up or command-line options.
We see the equivalent emerging for configuring desktop gadgets and widgets
where command-line
options are passed in via URL parameters — in this context, the URL is the Web command-line.
For one sample implementation and its associated usage, see Using URLs To Pass Parameters To The Web.
Dave Raggett's
HTMLSlidy uses
URLs of the form ...#(nn)
to address into a deck of slides.
Web applications that use multiple frames often need
to pass data between them. This problem gets even more
interesting when the child frame displays content from a
domain different from that of its parent. In this case,
the parent and child frames do not share any script
context — that would open a cross-site scripting
hole. A common technique that is used where the parent
and child have mutually agreed to collaborate is for the
parent to pass data to the child via a fragment
identifier by reseting the child's location
URL. Thus, given a parent frame P
and a
child frame C
, where the location URLs
U_P
and U_C
come from different
domains, the parent frame might pass data to the child by
resetting its location URL to U_C#data
; the
child picks up this data by polling for changes in its
location URL. This technique is common in Comet
Programming. As an example, the Dojo AJAX
toolkit uses an
IFrame proxy
to enable
cross-domain XML HTTP Requests. this is a useful
technique when writing cross-site mashups. As an example,
see XKCD
and AxsJAX — a cross-site mashup that mashes
together XKCD comics with their associated transcripts to
create a speech-friendly XKCD experience.
As the final item in the usage scenarios as seen on the
Web, this section documents the use of a single
#
sign as the value of the href
attribute on HTML anchors. This can be thought of as a
relative URL with a null fragment
identifier. Web sites wishing to override the
default-target behavior of anchors use this when
attaching a JavaScript event-handler to anchor elements for
mouse-clicks. The only justification to place a naked
#
as the value of the href
attribute
appears to be to avoid anything showing up on the browser status
bar as the user activates the link. Note that this idiom also
creates significant hurdles for non-mouse users of the Web.