Why does the address bar show the tempolink instead of the permalink?

An important feature of HTTP is the temporary redirect, where a resource can have a “permanent” URI while its content moves from place to place over time. For example,
http://purl.org/syndication/history/1.0 remains a constant name for that resource even though its location (as specified by a second URI) changes from time to time.

If this is such a useful feature, then why does the browser address
bar show the temporary URI instead of the permanent one? After all,
the permanent one is the one you want to copy and paste to email, to
bookmark, to place in HTML documents, and so on. The HTTP
specification says to hang on to the permanent link (“since the
redirection MAY be altered on occasion, the client SHOULD continue to
use the Request-URI for future requests.”). Tim Berners-Lee says the
same thing in User Agent watch points
(1998): “It is
important that when a user agent follows a “Found” [302] link that the
user does not refer to the second (less persistent) URI. Whether
copying down the URI from a window at the top of a document, or making
a link to the document, or bookmarking it, the reference should
(except in very special cases) be to the original URI.”
Karl Dubost amplifies this in his 2001-2003 W3C Note Common User
Agent Problems
: “Do not
treat HTTP temporary redirects as permanent redirects…. Wrong: User
agents usually show the user (in the user interface) the URI that is
the result of a temporary (302 or 307) redirect, as they would do for
a permanent (301) redirect.”

So why do browsers ignore the RFC and these repeated admonitions?
Possibly due to lack of awareness of the issue, but more likely
because the status quo is seen as protecting the user. If
the original URI (the permalink) were shown we might have the following scenario:

  1. an attacker discovers a way
    to establish a 3xx redirect from
    http://w3.org/resources/looksgood to
    http://phishingsite.org/pretendtobew3 – either because w3.org
    is being careless, or because of a conscious decision to deed part
    of its URI space to other parties

  2. user sees address bar = http://w3.org/resources/looksgood with
    content X, and concludes that the X is attributable
    to the resource http://w3.org/resources/looksgood

  3. user treats the http://w3.org/ prefix as an informal credential
    and treats the http://w3.org/resources/looksgood content as
    coming from W3C (without any normative justification; they just
    do) when in fact it’s a phishing site pretending to be W3C

  4. user enters their W3C password into phishing form, etc.

Were the user to observe address bar = http://phishingsite.org/pretendtobew3 with the same content, she
might suspect an attack and decline to enter a password.

An attacker might make use of an explicit redirection service on a site similar to that provided by purl.org, or it might exploit a redirect script that takes a URL as part of the
query string, e.g.
http://w3.org/redirect?uri=http://phishingsite.org/pretendtobew3 .

This line of reasoning is documented in the Wikipedia article URL redirection and its references and
in Mozilla bug 68423.

There are two possible objections. One is that the server in these
cases is in error – it shouldn’t have allowed the redirects if it
didn’t really mean for the content source to speak on behalf of the
original resource (similar to an iframe or img element). The other is
that the user is in error – s/he shouldn’t be making authorization
decisions based on the displayed URI; other evidence such as a
certificate should be demanded. Unfortunately, while correct in
theory, neither of these considerations is very compelling.

If browser projects are unwilling to change address bar behavior – and
it seems unlikely that they will – is there any other remedy?

Perhaps some creative UI design might help. Displaying the permalink
in addition to the tempolink might be nice, so that it could be
selected (somehow) for bookmarking, but that might be confusing and
take too much screen real estate. One possible partial solution would
be an enhancement to the bookmark creation dialog. In Firefox on
selecting “Bookmark This Page” one sees a little panel with text
fields “name” and “tags” and pull-down “folder”. What if, in the case
of a redirection, there were an additional control that gave the
option of bookmarking the permalink URI in place of the substitute
URI? With further thought I bet someone could devise a solution that would work for URI copy/paste as well.

(Thanks to Dan Connolly, other TAG members, and David Wood for their
help with this note.)

Default Prefix Declaration

Default Prefix Declaration

Table of Contents

1. Disclaimer

The ideas behind the proposal presented here are neither
particularly new nor particularly mine. I’ve made the effort to
write this down so anyone wishing to refer to ideas in this space
can say “Something along the lines of [this posting]” rather than
“Something, you know, like, uhm, what we talked about, prefix
binding, media-type-based defaulting, that stuff”.

2. Introduction

Criticism of XML
as an appropriate mechanism for enabling distributed
extensibility for the Web typically targets two issues:

  1. Syntactic complexity
  2. API complexity

Of these, the first is arguably the more significant, because
the number of authors exceeds the number of developers by a large
margin. Accordingly, this proposal attempts to address the first
problem, by providing a defaulting mechanism for namespace prefix
bindings which covers the 99% case.

3. The proposal

Define a trivial XML language which provides a means to
associate prefixes with namespace names (URIs);
Invoking from HTML
Define a link relation dpd for use in the (X)HTML
from XML
Define a processing instruction xml-dpd and/or an
attribute xml:dpd for use at the top of XML
Defaulting by Media Type
Implement a registry which maps from media types to a published
dpd file;
Define a precedence, which operates on a per-prefix basis,
namely xmlns: >> explicit invocation >> application
built-in default >> media-type-based default, and a semantics
in terms of namespace
information items
or appropriate data-model equivalent on the
document element.

4. Why

XML namespaces provide two essentially distinct mechanisms for
‘owning’ names, that is, preventing what would otherwise be a name
collision by associating names in some way with some additional
distinguishing characteristic:

  1. By prefixing the name, and binding the prefix to a particular
  2. By declaring that within a particular subtree,
    unprefixed names are associated with a particular URI.

In XML namespaces as they stand today, the association with a
URI is done via a namespace declaration
which takes the form of an attribute, and whose impact is scoped to
the subtree rooted at the owner element of that attribute.

Liam Quin
has proposed
an additional, out-of-band and defaultable,
approach to the association for unprefixed names, using
patterns to identify the subtrees where particular URIs apply. I’ve
borrowed some of his ideas about how to connect documents to prefix
binding definitions.

The approach presented here is similar-but-different, in that its primary
goal is to enable out-of-band and defaultable associations of namespaces
to names with prefixes, with whole-document scope. The
advantages of focussing on prefixed names in this way are:

  • Ad-hoc extensibility mechanisms typically use prefixes.
    The HTML5 specification already has at least two of these:
    aria- and data-;
  • Prefixed names are more robust in the face of arbitrary
    cut-and-paste operations;
  • Authors are used to them: For example XSLT stylesheets and W3C
    XML Schema documents almost always use explicit prefixes
  • Prefix binding information can be very simple: just a set of
    pairs of prefix and URI.

Provision is also made for optionally specifying a binding for the default namespace at the document element, primarily for the media type registry case, where it makes sense to associate a primary namespace with a media type.

5. Example

If this proposal were adopted, and a dpd document for use in HTML 4.01 or XHTML1:

<dpd ns="http://www.w3.org/1999/xhtml">
<pd p="xf" ns="http://www.w3.org/2002/xforms"/>
<pd p="svg" ns="http://www.w3.org/2000/svg"/>
<pd p="ml" ns="http://www.w3.org/1998/Math/MathML"/>

was registered against the text/html media type, the following would result in a DOM with html and body elements in the XHTML namespace and an input element in the XForms namespace:

<xf:input ref="xyzzy">...</xf:input>

Orthogonality of Specifications


The general principle of platform design is that platforms consist of a set of standard interfaces. Standard interfaces allow substitution of components across the interface boundary, while independence of interfaces allow evolution of the interfaces themselves. In a PC, for example, the disk bus interface allows many different disk vendors to offer disk products independent of the model of display or keyboard, but the orthogonality of interfaces allow evolution of the interfaces themselves. If the display interface were linked to the disk interface too tightly, it wouldn’t be possible to evolve ISA to SATA without updating VGA.

In the web platform, the three important interfaces are transport, format and reference, and the current definitions of those interfaces are HTTP, HTML and URI. The interfaces are standard, allowing many different implementations: HTTP standard lets you use HTTP servers from many vendors, the HTML standard lets you use many different HTML authoring tools or template systems, and the URI specification allows identification of many different components.

While HTTP is the current “common denominator”  protocol that all web agents are expected to speak, the web should continue to work if web content is delivered by other protocols — FTP, shared file systems, email, instant messaging, and so forth.  HTTP as it has evolved has severe difficulties, and designing a Web that only works with HTTP as it is currently implemented and deployed would unfortunate. We should work harder to reduce the dependencies and isolate them.

HTML is the ‘lingua franca’, the common language that all agents are currently expected to be able to produce, process, read and interpret (or at least a well-defined subset of it). Having a common language is important for interoperability, but  the web should also work for other formats — extensions to HTML  including scripting, DOM APIs, but also other formats and application environments such as XHTML, Java, PDF, Flash, Silverlight, XForms, 3D objects, SVG, other XML languages and so forth. Certainly HTML has it has evolved is overly complex for the purposes to which it is designed.

The URI is the fundamental element of reference, but the URI itself is evolving to deal with internationalization, reference to session state, IRIs, LEIRIs, HREFs and so forth. Many applications use URIs and IRIs, not just the formats described above but other protocols and locations, including databases, directories, messaging, archiving, peer-to-peer sharing and so forth.

The is just one of many communication applications on the global Internet; for web browsing to integrate will with the rest of the distributed networking, web components should be independent of the application, and work well with messaging, instant messaging,  news feeds, etc etc.

A sign of a breakdown of this architectural principle would be for a specification of a format (say HTML) to attempt to redefine, for its purposes, the protocol (say HTTP) or the method of reference (URI).  The specifications should be independent, or at least, dependencies isolated, minimized, reduced. If those other elements of the web architecture are incorrect, need to evolve to meet current practice or have flaws in their definitions, they need to evolve independently, so that orthogonality of the specifications and reusability of the components are the promoted.

There may well be reasons to link some features of HTML to the fact that it is delivered over an interactive protocol, but linking HTML directly to HTTP in a way that features would work only for HTTP and not for any other protocol with similar features – that would be unfortunate. It might not matter in the short-term (that’s all we have right now) but it is harmful to the long-term evolution of the web.

(Should go without saying, but just in case: this is a personal post, not reviewed by the TAG)

Language semantics and operational meaning

W3C and other standards organizations are in the business of defining languages — conventions that organizations can choose to follow — and not in mandating operational behavior — telling organizations and participants in the network how they are supposed to behave. Organizations (implementors, operators, administrators, software developers) are free to choose which standards they adopt, and what their operational behavior will be.

In some posts on the www-tag mailing list, I was trying to point out the risks in defining languages such that the "meaning" of the language depends on operational behavior. In some ways, of course, this is a fallacy: in general, what an utterance "means" in some operational way depends on what the speaker intends and how the listener will interpret the utterance.

However, as an organization, W3C can, and should, define languages in which the meaning is defined in the document, in terms of abstractions rather than in terms of operational behavior. The result is more robust standards, those that have wider applicability, that can be used for more purposes, and that create a more vibrant and extensible web.

Search Engines take on Structured Data

Structured data on the web got a boost this week, with Google’s announcement of Rich Snippets and Rich Snippets in Custom Search. Structured data at such a large scale raises at least three issues:

  1. Syntax
  2. Vocabulary
  3. Policy

Google’s documentation shows support for both microformats and RDFa. It follows the hReview microformat syntax with small vocabulary changes (name vs fn). Support for RDFa syntax, in theory, means support for vocabularies that anyone makes; but in practice, Google is starting with a clean slate: data-vocabulary.org. That’s a place to start, though it doesn’t provide synergy with anyone who has uses FOAF or Dublin Core or the like to share their data.

The policy questions are perhaps the most difficult. Structured data is a pointy instrument; if anyone can say anything about anything, surely the system will be gamed and defrauded. Google’s rollout is one step at a time, starting with some trusted sites and an application process to get your site added. The O’Reilly interview with Guha and Hansson is an interesting look at where they hope to go after this first step; if you’re curious about how this fits in to HTML standards, see Sam Ruby’s microdata.

While issues remain–there are syntactic i’s to dot and t’s to cross and even larger policy issues to work out–between Google’s rollout and Yahoo’s searchmonkey and the UK Central Office of Information rollout, it seems that the industry is ready to take on the challenges of using structured data in search engines.

Data interchange problems come in all sizes

I had a pretty small data interchange problem the other day: I just
wanted to archive some play lists that I had compiled using various
music player daemon (mpd)
The mpd server stores playlists as simple m3u files,
i.e. line-oriented files with a path to the media file on each line. But
that’s too fragile for archive and interchange purposes.
I had a similar problem a while back with iTunes playlists. In that episode,
I chose hAudio, an
HTML dialect in progress in the microformats
, as my target.

Unfortunately, hAudio changed out from under me between when I
started and when I finished. So this time, a simple search found the
music ontology and I tried it
with RDFa, which
lets you use any RDF vocabulary in HTML*.
I’m mostly pleased with the results:

  1. from A Song’s Best Friend_ The Very Best Of John Denver [Disc 1]

    by John Denver

    Poems, Prayers And Promises
  2. from WOW Worship (orange)

    by Compilations

    Did you Feel the Mountains Tremble
  3. from Family Music Party

    by Trout Fishing In America

    Back When I Could Fly

The album names come before the track names because I didn’t read
enough of the the RDFa primer when I
was coding; RDFa includes @rev as well as @rel
for reversing subject/object order.
advogato episode on m3uin.py
for details about the code.

The Music Ontology was developed by a handful of people who
staked out a claim in URI space
(http://musicontology.org/...) and happily took comments from
as big a review community as they could manage, but they had no
obligation to get a really global consensus. The microformats process
is intended to reach a global consensus so that staking out a claim in
URI space is superfluous; it works well given certain initial
conditions about how common the problem is and availability of pre-web
designs to draw from. Perhaps playlists (and media syndication, as
hAudio seems to be expanding in scope to hMedia) will eventually reach
these conditions, but the music ontology already meets my needs, since
I’m the sort who doesn’t mind declaring my data vocabulary with URIs.

My view of Web architecture is shaped by episodes such as this
one. While giga-scale deployment is always impressive and definitely
something we should design for, small scale deployment is just as
important. The Web spread, initially, not because of global phenomena
such as Wikipedia and Facebook but because you didn’t need
your manager’s permission to try it out; you didn’t even
need a domain name; you could just run it on your LAN
or even on just one machine with no server at all.

In an
Oct 2008
tech plenary session on web architecture
Henri Sivonen said:

I see the Web
as the public Web that people can access. The resources you can
navigate publicly. I define Web as the information space accessible to
the public via a browser.
If a mobile operator operates behind
walls, this is not part of the Web.

I can’t say that I agree with that perspective. I’m no great fan of
walled gardens either, but freedom means freedom to do things we don’t
like as well as freedom to do things we do like. And architecture and
policy should have a sort of church-and-state separation between

Plus, data interchange happens not just at planetary scale, but
also within mobile devices, across devices, and across communities
and enterprises of all shapes and sizes.

I’ve gone a little outside the scope of current
standards; RDFa has only been specified for use in modular XHTML, with
the application/xhtml+xml media type, so far.

See also:

Once more into Versioning — this time with HTML

The W3C TAG has worked on the general issue of “versioning” for many years, and many TAG members may be worn out on the issue.
However, undeterred by past history, I’m taking another run at it, this time trying to look specifically at the issues around versioning of HTML, CSS, JavaScript and other parts of the standard web browser landscape.
Part of what’s new (I think) is looking at the cost/benefits around deployment. See the www-tag mailing list archive for the HTML and versioning threads.

Palm webOS approach to HTML extensibility: x-mojo-*

I got pretty excited about the iPhone,
and even more about the openness of Android and the G1, and then I
learn that the Palm Pre developer platform is basically just the open
web platform: HTML, CSS, and JavaScript.

Just after the mobile buzz at Web Directions North and the TAG declared victory on how to build The Self-Describing Web with URI-based Extensibility , I get some details on how Palm is building on the open web platform:

A widget is declared within your HTML as an empty div with an x-mojo-element attribute.

<div x-mojo-element="ToggleButton" id="my-toggle"></div>

Oh great; x- tokens… aren’t those passe by now?

The suggestion in the HTML 5 draft is data-* attributes. The ARIA draft suggests @role. The Palm design looks like new information for issue-41, Decentralized-extensibility, in the HTML WG.

Anybody know how frozen the Palm design is? Or if they looked at ARIA, data-* or URI-based namespaces?

JavaScript required for basic textual info? TRY AGAIN

Sam says he’s Online and Airborne. “Needless to say, this is seriously cool.” I’ll say! But when I follow the link to details from the service provider, I get:

Sorry. You must have JavaScript enabled to view this page. Click the
BACK button below or enable JavaScript in your browser preferences and
click TRY AGAIN.

Let’s turn that around, shall we? Sorry, if you’re a network provider and you want my business, read up on unobtrusive javascript (aka the rule of least power), go BACK to work on your web site design and TRY AGAIN.