Redirecting URLs with fragment IDs

Note 2 August 2011

This version: http://www.w3.org/People/Bos/redirect
Latest version: […]
Author: Bert Bos (W3C) <bert@w3.org>

Abstract

This is a proposal for (1) a way to express in XHTML that the target of a link can be found elswehere (“rel=redirect”), (2) a way to use such an enhanced XHTML document as a generic redirection document for any protocol (“application/redirect+xml and (3) an extension to HTTP/1.1 that uses this format to improve the information returned by 3XX response codes. The principle is that an HTTP server returns not just a traditional redirect response for a resource that changed locations (3XX code and Location header), but also a document with more detailed information, from which a conforming client can learn more than from the Location header alone. The technique is not limited to HTTP.

Introduction and scope

HTTP allows a server to respond to a request with a “redirect,” a response that doesn't contain the requested resource, but a URL where the resource can be found instead. This avoids broken links when a document is moved to a new location, and it also allows the creation of short URLs or “human-readable” URLs as alternatives for long and complex ones. But this facility has limitations.

The redirection is a feature of the protocol. HTTP has it, but FTP, e.g., does not. Which means a request to an HTTP server can be redirected to an FTP server, but an FTP server cannot redirect a request to an HTTP server.

Another limitation is that the fragment IDs of URLs cannot be redirected. E.g., if an HTML document had an element with ID=chapter1 and the document is rewritten so that the element now has ID=c1, there is no way for the server to tell the client that http://example.org/foo#chapter1 can now be found at http://example.org/foo#c1.

Another example is when a resource with several parts, indicated by http://example.org/foo#part1, …#part2, etc. is split in two or more separate resources. Think, e.g., of a document with news items. At some point the document becomes too long and the items have to be split over multiple files. But the HTTP redirect cannot inform the client that #part1 is still in http://example.org/foo1, while #part7 is now found in http://example.org/foo2.

This specification defines a way to use XHTML to define redirects for URLs with fragment identifiers. It can be used in three ways:

If some or all of the targets (ID attributes) in an existing XHTML document are renamed or moved to other documents, mark-up can be added so that the XHTML user agent that reads the document knows the new names and locations of the targets that no longer exist in this document.
If a document (in any format, not just XHTML) is moved to a new location, possibly split over multiple locations or merged into documents at other locations, a document can be put in its old location that defines for each target in the old document where to find it now.
In HTTP, the 3XX status codes can be accompanied by additional information that defines the new names and locations of any targets that do not exist in the new location of the document.

The technology defined in this specification uses <link> and <a> elements in XHTML documents to define redirections, which has some advantages but also limitations:

It is easy to enhance an existing XHTML document if one of its target anchors changes. It only requires an extra <a> or <link> element with the normal, familiar syntax.
It allows self-documenting redirection documents, by adding other HTML elements as documentation. The redirection document is a functional XHTML document on its own.
It does not allow to specify a pattern to redirect a large or infinite number of fragment identifiers. E.g., there is no way to express that all fragments of the form “abc-something” are redirected to “xyz-something” or that “p1”, “p2”, “p3”… are redirected to “p0”, “p2”, “p3”…
It does not allow to redirect fragment IDs that are expressions, such as an XPointer with an XPath expression. All fragment IDs are matched as strings. No attempt is made to establish the language the fragment ID is written in, evaluate expressions or find equivalent expressions.

Syntax of redirection documents

A redirection document must be an XHTML file [[reference]]. Each <link> or <a> element in the document that has

an “id” attribute,
an “href” attribute and
a “rel” attribute that contains the keyword “redirect” as one of its values,

defines a redirection from #fragment to url, where fragment is the value of the “id” attribute and url is the absolute URL corresponding to the URL reference that is the value of the “href” attribute.

Each <link> element in the document that has

a non-empty "title" attribute,
an “href” attribute and
a “rel” attribute that contains the keyword “redirect” as one of its values,

defines a redirection from #fragment to url, where fragment is the value of the “title” attribute and url is the absolute URL corresponding to the URL reference that is the value of the “href” attribute.

Note that a <link> element may have both an “id” and a “title” and can thus define two redirections.

If there are multiple redirections for the same “#fragment”, all but the first one (in document order) are ignored.

In addition, if there is a <link> or <a> element in the document that has an “href” attribute and a “rel” attribute that contains the keyword “redirect” as one of its values, but has no “id” and no “title” attributes, then the first such element defines a redirection from the document's URL to the absolute URL corresponding to the value of the “href” attribute.

If the target URL of a redirection is a URL that is itself redirected, the target of the first redirection is defined to be the target of the second one. This is recursive.

Note that the context of the elements (ancestor elements, descendant elements, preceding or following elements, other attributes) has no influence, with the exception of the <base> element, if the document has one.

Here is an example of a document that has both “normal” links and redirected links:

<html lang="en">
<head>
<title>The oak</title>
<link rel="redirect" id="branch" href="#p1" />
<link rel="redirect" id="root" href="#p2" />
<link rel="stylesheet" href="green.css" />
</head>
<body>
<h1>The oak</h1>
<p>The oak has <a href="#p1">branches</a>…
<p id="p1">Branches of the oak…
<p id="p2">The oak's roots…
</body>
</html>

Assuming the URL of this document is http://example.org/tree/oak, it defines the following two redirects:

Old	New
http://example.org/tree/oak#branch	http://example.org/tree/oak#p1
http://example.org/tree/oak#root	http://example.org/tree/oak#p2

The other <link> and <a> elements do not have “rel=redirect” and thus do not define redirects.

Here is an example of a document with a <base> element:

<html lang="en">
<head>
<title>Redirections</title>
<base href="http://example.org/new/doc"/>
</head>
<body>
<p>This document moved to <a href="">here.</a> The new internal
targets are:</p>
<ul>
<li><a rel="redirect" id="ch1" href="#intro">Introduction</a></li>
<li><a rel="redirect" id="ch2" href="#a-long-day">A long day</a></li>
<li><a rel="redirect" id="ch3" href="#sleep">Sleep</a></li>
</ul>
</body>
</html>

It contains four <a> elements, three of which define redirects. If the document itself is “http://example.org/book”:

Old	New
http://example.org/book#ch1	http://example.org/new/doc#intro
http://example.org/book#ch2	http://example.org/new/doc#a-long-day
http://example.org/book#ch3	http://example.org/new/doc#sleep

Here is an example with some renamed anchors and some anchors redirected to different documents. This might be a document that once contained everything about its subject in one file, but has now been split into three files:

<html lang="en">
<head>
<title>The oak</title>
<link rel="stylesheet" href="green"/>
<link rel="redirect" id="branch" href="branch"/>
<link rel="redirect" id="root" href="other#root"/>
</head>
<body>
<h1>The leaves of an oak</h1>
<p>The leaves are attached to <a href="#branch">branches</a>…
<p>The <a href="#root">root</a> and
the <a href="other#stem">stem</a>…
</body>
</html>

Assuming the URL of this document is “http://example.org/tree/oak”, the redirections are:

Old	New
http://example.org/tree/oak#branch	http://example.org/tree/branch
http://example.org/tree/oak#root	http://example.org/tree/other#root

Note that the document itself contains links to anchors that are redirected.

This document contains three <link> elements that define redirections. The second element is the target of the first, and the first is the target of the third:

<html>
<head>
<link id="gh9" rel="redirect" href="#py"/>
<link id="py" rel="redirect" href="http://example.org/newdoc#pyi"/>
<link id="ye71" rel="redirect" href="#gh9"/>
</head>
<body>
<p>…</p>
</body>
</html>

The three redirections defined by this document thus all three have the same target. If the document itself is “http://example.org/doc”:

Old	New
http://example.org/doc#gh9	http://example.org/newdoc#pyi
http://example.org/doc#py	http://example.org/newdoc#pyi
http://example.org/doc#ye71	http://example.org/newdoc#pyi

This document defines not only redirections for fragment identifiers, but also for the document's URL itself:

<html lang="en">
<head>
<title>Redirects</title>
<base href="../new/doc"/>
<link rel="redirect" href="" />
<link rel="redirect" id="branch" href="#p1" />
<link rel="redirect" id="root" href="#p2" />
</head>
<body>
<p>Please, see <a href="">over here</a></p>
</body>
</html>

If this document's URL is http://example.org/old/doc, this defines three redirections:

Old	New
http://example.org/old/doc	http://example.org/new/doc
http://example.org/old/doc#branch	http://example.org/new/doc#p1
http://example.org/old/doc#root	http://example.org/new/doc#p2

This example uses <link> elements with a “title” attribute to redirect fragment identifiers that can not be specified in an “id” attribute (because such attributes in XHTML must match the syntax of XML identifiers):

<html>
<head>
<base href="../new/fig" />
<link rel="redirect" href=""/>
<link rel="redirect" title="xywh=100,100,200,70" href="#xywh=90,90,200,70"/>
<link rel="redirect" title="xywh=100,200,200,70" href="#xywh=90,190,200,70"/>
</head>
<body>
<p>This image moved to <a href="">here.</a></p>
</body>
</html>

If the document has URL “http://example.org/fig”, the resulting redirections are:

Old	New
http://example.org/fig	http://example.org/new/fig
http://example.org/fig#xywh=100,100,200,70	http://example.org/new/fig#xywh=90,90,200,70
http://example.org/fig#xywh=100,200,200,70	http://example.org/new/fig#xywh=90,190,200,70

Use of rel=redirect in application/xhtml+xml and text/xhtml+xml

This specification recommends new behavior for user agents that render XHTML documents. If a document is determined to be of type application/xhtml+xml or text/xhtml+xml (e.g., because it was delivered over HTTP with a corresponding Content-Type header), such a user agent SHOULD determine the list of redirections defined by the document and then, whenever it is asked to display a fragment, display the result of the redirection instead.

Only redirections for fragments are used: If the document defines a redirection for the URL without a fragment, that redirection is ignored for this purpose.

E.g., a browser typically scrolls a document so that the element with the target anchor is as near to the top of the viewport as possible. If the anchor is redirected to another fragment in the same document, it should scroll to display that latter fragment.

If the target anchor is redirected to another document, it should load and display that other document.

If there is a CSS style rule for ':target' (i.e., to highlight the target of the current URL), that rule matches the element that is the result of the redirection, not the element that defines the redirection. (But if there is a CSS rule for '#p1' and the element with ID p1 is an <a> element with rel=redirect, it is still that <a> element that is styled, and not the element it links to.)

Use of rel=redirect in application/redirect+xml

This specification defines a new Media Type “application/redirect+xml” (see the formal definition below). This is syntactically an XHTML document, but is treated differently.

If a user agent is asked to process a resource identified by a URL (with or without a fragment identifier) and resolving the URL results in a document of type application/redirect+xml, it SHOULD look in that document for the redirection that corresponds to the original URL and process that (unless it is configured not to follow redirections.)

If there is no redirection that exacly matches the original URL, but there is a redirection for the URL without a fragment identifier, then the user agent SHOULD use that URL, without its fragment ID, if any, and with the fragment ID of the original URL appended.

The redirection in the above cases is considered to be a temporary one (i.e., similar to the 307 code in HTTP). The user agent should assume that next time it resolves the original URL, it may get different redirects.

If there is no redirection that matches exactly and no redirection for the URL without a fragment ID either, the user agent SHOULD signal an error (resource not found) and MAY display the redirection document as an XHTML document.

In this example, a user agent resolves the URL http://example.org/fig.png and gets this document with Media Type application/redirect+xml;

<html>
<head>
<link rel="redirect" href="http://example.org/new.png" />
<link rel="redirect" title="xywh=10,10,50,50
  href="http://example.org/new.png#xywh=10,60,50,50" />
</head>
<body>
</body>
</html>

If the original resource it wanted was http://example.org/fig.png, it should now resolve http://example.org/new.png instead. If the original resource was http://example.org/fig.png#xywh=10,10,50,50, it should now use http://example.org/new.png#xywh=10,60,50,50 (because of the exact match in the redirection document). If the original resource was http://example.org/fig.png#xywh=1,2,150,50, it should now use http://example.org/new.png#xywh=1,2,150,50 (no exact match, so the original fragment identifier is used again).

One might think that, if the fragment xywh=10,10,50,50 is now found at xywh=10,60,50,50, the slightly smaller fragment xywh=10,10,49,49 will be at xywh=10,60,49,49. However, there is no way to express such a rule and a user agent should not assume any such rule is implied.

The Media Type is defined as follows:

To: ietf-types@iana.org
Subject: Registration of media type text/redirect

Type name: application

Subtype name: redirect+xml

Required parameters: (none)

Optional parameters: charset

Encoding considerations: Any IANA-registered charset, default is
utf-8.

Security considerations: See below.

Interoperability considerations: Redirection documents can be deployed
on HTTP servers in such a way that they do not affect clients that do
not implement the format. (Those clients obviously do not benefit from
them either.) See explanation in text above. The specification
includes rules for interpreting invalid documents.

Published specification: This document.

Applications that use this media type: Redirection documents are
expected to be useful for Web clients that use HTTP or other protocols
to retrieve resources whose URIs can have fragment IDs.

Additional information: (none)

Magic number(s): none (A redirection document is syntactically an XHTML document. See http://www.iana.org/assignments/media-types/application/xhtml+xml)

File extension(s): .redirect
Macintosh file type code(s): REDI
URI fragment/anchor identifier(s): (none)

Person & email address to contact for further information: Bert Bos
<bert@w3.org>

Intended usage: COMMON

Restrictions on usage: In the case of HTTP, the format should only be
returned with 3XX responses, not with a 200 response.

Author: Bert Bos <bert@w3.org>

Change controller: Bert Bos <bert@w3.org>

Use of rel=redirect with HTTP 3xx responses

This specification defines new behavior for HTTP clients. If a client receives a response with a status code of 301, 302, 303, 307 or 308, and the response includes a body of type application/xhtml+xml or application/redirect+xml, the client SHOULD use that body as described in the previous section (i.e., by treating it as application/redirect+xml), with two exceptions:

The Location header in the HTTP response defines the redirection for the URL without a fragment identifier and any redirection for the URL without a fragment identifier in the body is ignored.
The redirection is treated as defined by the status code, i.e., it is not always treated as a 307 response.

Note that, unlike in the previous section, there is always a redirection for the URL without a fragment identifier and thus never an error.

Old/alternative ideas

(A simple, dedicated text format is easier to parse and extend, if necessary.)

The character encoding depends on the protocol with which it is transported from server to client. E.g., in HTTP, the character encoding is ISO-8859-1, unless there is an explicit parameter in the HTTP headers.

If the protocol does not specify a character encoding, (e.g., the FTP protocol), the document must be in UTF-8.

The document consists of one or more lines. There are three kinds of lines: a line that contains two URI references, a line that contains a comment (starting with “##”), and a line that contains only white space.

The first line must be a comment line that starts with the exact string “##redirect” optionally followed by a space or tab and zero or more arbitrary characters. (This is to help identify the file in the absence of external metadata.)

In formal syntax (see [RFC 2234]):

redirection-document = magic *( comment / empty-line / redirection )
magic = "##redirect" [ 1*WSP *any_char ] NL
comment = *WSP "##" *any_char NL
empty-line = *WSP NL
redirection = *WSP [ from 1*WSP ] to *WSP NL
from = 1*non-space
to = 1*non-space
any_char = %x1-9 / %xB-C / %xE-FFFFF
NL = LF / CR / CRLF
non-space = %x41-FFFFF

Example 1:

##redirect    -*- redirect -*-
## Made by Pete on 2011-08-02

  ## Sections
#sec1.1     chap-1.html#sect-1
#sec1.2     chap-1.html#sect-2
#sec2.1     chap-2.html#sect-1
#sec2.2     chap-2.html#sect-2

  ## Other
book#def1   chap-1.html#def1
book#def2   chap-2.html#def1

Example 2:

##redirect

## Default:
../foo

## Others:
#x1 ../foo#one
#x2 ../foo#two

Comment lines and empty lines must be ignored. Both the from and the to must be URI references. They are relative to a base URI which is the URI of the redirection document itself.

E.g., assume a client wants to retrieve http://example.org/path1#part1. It sends a request for /path1 to the HTTP server at example.org. The server returns a redirection document with this content:

##redirect
#part7   http://info.example.org/path2#p7
#part1   /other/path

From this, the client learns that the fragment #part1 is now a resource on its own, viz., http://example.org/other/path.

A redirection without a from is a redirection for the base URI itself.

Multiple redirections for the same URI are allowed, but all but the first are ignored.

Each error in the syntax causes the line with the error to be ignored. Errors can be caused by input that doesn't match the above grammar, or by from and to tokens that do not match the grammar of URI references. [See ref...]

Usage with HTTP

An HTTP server may respond with a 3XX redirect with a redirection document in the body of the response. In that case, the server must also include a Content-Type header indicating the media type (“text/redirect,” see below) of the returned body.

If the redirection resource contains redirection lines without a from token, the client must ignore them, in favor of the Location header of the HTTP response.

HTTP clients that understand 3XX responses, but not redirection documents, will simply ignore the redirection document. However (older) clients that do not understand 3XX responses either, may display the redirection document to the user.

Clients must not indicate in their Accept headers that they understand the “text/redirect” type (except as part of a wildcard pattern “text/*” or “*/*”).

ISSUE: Is there any useful meaning we can assign to a request that does include text/redirect in the Accept header?

A redirection document may include redirects for other URIs than the one requested by the client, but it is recommended that clients ignore those when they are returned as part of an HTTP 3XX response. Trusting them may be a security risk, unless the client knows it can trust the server. (The client might trust the server, e.g., if the server is actually a local cache server under the control of the same user as the client.)

HTTP servers must not return a redirection document with a 200 response code. A client that receives a redirection document with a 200 response code should not interpret it as a redirection. (It may instead display it as text or do what it normally does with text documents.

Missing redirects

The following is not part of the definition of redirection documents, but a general recommendation for clients to handle fragment IDs and redirections.

If a client is looking for foo#bar (where foo is a URL without a fragment ID and #bar is a fragment ID) and the server does not return a redirect for foo#bar, but does return a redirect foo2 for foo, where foo2 is a URL without a fragment ID, then we recommend that the client assume that the redirect for foo#bar is foo2#bar.

E.g., if the client is looking for http://example.com/foo#p4 and the HTTP server at example.com returns

HTTP/1.1 307 Tempororary redirect
Location: http://example.com/bar
Content-Type: text/redirect

##redirect
#p1 bar#part1
#p2 bar#part2
#p3 bar#part3

then the client should assume that the complete redirect is http://example.com/bar#p4, i.e., the redirected URI with the original fragment ID added.

Media Type registration

To: ietf-types@iana.org
Subject: Registration of media type text/redirect

Type name: text

Subtype name: redirect

Required parameters: (none)

Optional parameters: charset

Encoding considerations: Any IANA-registered charset, default is
utf-8, except when transported over HTTP/1.0 or HTTP/1.1, in which
case the default is iso-8859-1.

Security considerations: See below.

Interoperability considerations: Redirection documents can be deployed
on HTTP servers in such a way that they do not affect clients that do
not implement the format. (Those clients obviously do not benefit from
them either.) See explanation in text above. The specification
includes rules for interpreting invalid documents.

Published specification: This document.

Applications that use this media type: Redirection documents are
expected to be useful for Web clients that use HTTP or other protocols
to retrieve resources whose URIs can have fragment IDs.

Additional information: (none)

Magic number(s): “##redirect ” as the first 11 character of the resource
(in the resource's stated character encoding), where the final
character can be a space, a tab, a line feed or a carriage return.

File extension(s): .redirect
Macintosh file type code(s): REDI
URI fragment/anchor identifier(s): (none)

Person & email address to contact for further information: Bert Bos
<bert@w3.org>

Intended usage: COMMON

Restrictions on usage: In the case of HTTP, the format should only be
returned with 3XX responses, not with a 200 response.

Author: Bert Bos <bert@w3.org>

Change controller: Bert Bos <bert@w3.org>

Security considerations

If a server returns redirects for URIs other than the one the client requested, the client should verify that the server has authority over those other URIs. In case of doubt, it is better to ignore those redirects.

Redirects can form loops (URI A redirects to URI B, which redirects back to URI A). A client should take care it does not end up in an infinite loop.

Redirection documents are typically small (one or two KiB), but a server may return an extremely large document as well, on purpose or because of a bug, and the client should take care it does not exhaust local memory.

A server that returns a redirection document may thereby disclose to a client the existence of resources that the client didn't know existed. (But the redirection has no effect on whether the client can access those resources: if, e.g., it needed a password to access a resource, it still needs a password if it tries to access the same resource after a redirect.)

A client that acts on a redirect discloses very little additional information to a server: it discloses the fact that it understands the redirection format and it discloses what fragment ID it was looking for in the first place.

In HTTP, and possibly other protocols, a redirection document may be send to a client in compressed form. As with all compressed resources, the client should take care that uncompressing the document does not exhaust local memory.

The redirection document itself may be protected (e.g., with HTTP authentication) but redirect to resources that are not protected. The client should not send passwords or other sensitive information to the server it is redirected to, unless it knows it is safe to do so. (This is no different from existing redirections in HTTP.)