Copyright © 2008 W3C® (MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark and document use rules apply.
This document provides guidance to content transformation proxies and content providers as to how inter-work when delivering Web content.
This document is an editors' copy that has no official standing.
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.
Publication as a Group Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
This document has been produced by the Content Transformation Task Force of the Mobile Web Best Practices Working Group as part of the Mobile Web Initiative . Please send comments on this document to the Working Group's public email list public-bpwg-ct@w3.org, a publicly archived mailing list .
This document was produced under the 5 February 2004 W3C Patent Policy . W3C maintains a public list of patent disclosures made in connection with this document; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) with respect to this specification must disclose the information in accordance with section 6 of the W3C Patent Policy.
1 Introduction
1.1 Purpose
1.2 Audience
1.3 Scope
2 Terminology
2.1 Types of Proxy
2.2 Types of Transformation
2.3 Interpretation of RFC 2119 Key Words
3 Requirements
3.1 Summary of Requirements
3.2 Control of the Behavior of the Proxy
3.2.1 Control by the User
3.2.2 Control by Server
3.2.3 Control by Administrative or Other Arrangements.
3.2.4 Resolving Conflicting Instructions
4 Behavior of Components
4.1 Proxy Treatment of Request
4.1.1 no-transform directive in Request
4.1.2 Proxy Decision to Transform
4.1.3 Proxy Indication of its Presence to Server
4.1.4 Altering Header Values
4.2 Server Response to Proxy
4.3 Proxy Receipt and Forwarding of Response from Server
4.4 Proxy Response to User Agent
5 Testing
A References
B Scope for Future Work
B.1 POWDER
B.2 link HTTP Header
B.3 Amendments to HTTP
B.4 Inter Proxy Communication
C Acknowledgments
From the point of view of this document, Content Transformation is the manipulation in various ways, by proxies, of requests made to and content delivered by an origin server with a view to making it more suitable for mobile presentation.
The W3C MWI BPWG neither approves nor disapproves of Content Transformation, but recognizes that is being deployed widely across mobile data access networks. The deployments are widely divergent to each other, with many non-standard HTTP implications, and no well-understood means either of identifying the presence of such transforming proxies, nor of controlling their actions. This document establishes a framework to allow that to happen.
The overall objective of this document is to provide a means, as far as is practical, for users to be provided with at least a "functional user experience" [Device Independence Glossary] of the Web, when mobile, taking into account the fact that an increasing number of content providers create experiences specially tailored to the mobile context which they do not wish to be altered by third parties. Equally it takes into account the fact that there remain a very large number of Web sites that do not provide a functional user experience when perceived on many mobile devices.
The document describes how the origin server may request conforming transforming proxies not to alter HTTP requests and responses, as well as describing control options that a transforming proxy offers users.
A more extensive discussion of the requirements for these guidelines can be found in "Content Transformation Landscape" [CT Landscape].
The audience for this document is creators of Content Transformation proxies, purchasers and operators of such proxies and content providers whose services may be accessed by means of such proxies.
The recommendations in this document refer only to "Web browsing" - i.e. access by user agents that are intended primarily for interaction by users with HTML Web pages (Web browsers).
Note:
The document is not intended as guidelines for delivery of WAP/WML. Some of its recommendations may, in some circumstances, disrupt the delivery of WML.
The BPWG is not chartered to create new technology, its role is to advise on best practice for use of existing technology. In satisfying Content Transformation requirements, existing HTTP headers, directives and behaviors must be respected, and as far as is practical, no extensions to [RFC 2616 HTTP] are to be used.
Alteration of HTTP requests and responses is not prohibited by HTTP other than in the circumstances referred to in [RFC 2616 HTTP] Section 13.5.2.
HTTP defines two types of proxy: transparent proxies and non-transparent proxies. As discussed in [RFC 2616 HTTP] Section 1.3, Terminology:
[Definition: "A transparent proxy is a proxy that does not modify the request or response beyond what is required for proxy authentication and identification."]
[Definition: "A non transparent proxy is a proxy that modifies the request or response in order to provide some added service to the user agent, such as group annotation services, media type transformation, protocol reduction, or anonymity filtering.] Except where either transparent or non-transparent behavior is explicitly stated, the HTTP proxy requirements apply to both types of proxies."
This document elaborates the behavior of non transparent proxies, when used for Content Transformation in the context discussed in [CT Landscape].
Editorial Note: The BPWG requests feedback on the degree to which it is necessary to distinguish between Content Transformation proxies that interact with user agents using HTTP, and other types of arrangements where a (proprietary) client application interacts with an in-network component using other techniques.
Transforming proxies can carry out a wide variety of operations. In this document we categorize these operations as follows:
Alteration of Requests
Transforming proxies process requests in a number of ways, especially replacement of various request headers to avoid HTTP 406 Status responses (the server can not provide content in the format requested) and at user request.
Alteration of Responses
There are three classes of operation on responses:
Restructuring content
[Definition: Restructuring content is a process whereby the original layout is altered so that content is added or removed or where the spatial or navigational relationship of parts of content is altered, e.g. by linearization or pagination. It includes also rewriting of URIs so that subsequent requests route via the proxy.]
Recoding content
[Definition: Recoding content is a process whereby the layout of the content remains the same, but details of its encoding may be altered. Examples include re-encoding HTML as XHTML, correcting invalid markup in HTML, conversion of images between formats (but not, for example, reducing animations to static images). ]
Optimizing content
[Definition: Optimizing content includes removing redundant white space, re-compressing images (without loss of fidelity) and compressing for transfer.]
This document is not normative Need link to definition and it is inappropriate to claim conformance to it. Implementors of this Recommendation who wish to promote effective inter operability of Web content will, however, interpret the key words must, must not, required, shall, shall not, should, should not, recommended, may, and optional in this Recommendation as described in [RFC 2119] .
The purpose of this section is to summarize the communication requirements of actors (transforming proxies, origin servers, and to some extent users) to communicate with each other. The relevant scenario involving a content transformation proxy is as follows:
Note:
The interactions of several transformation proxies are encompassed by this document, but only in a rudimentary form.
The needs of these actors are as follows:
The user agent needs to be able to tell the Content Transformation proxy and the origin server:
what type of mobile device and what user agent is being used;
that all Content Transformation should be avoided.
The Content Transformation proxy needs to be able to tell the origin server:
that some degree of Content Transformation (restructuring and recoding) can be performed;
that Content Transformation will be carried out unless requested not to;
that content is being requested on behalf of something else and what that something else is;
that the request headers have been altered (e.g. additional content types inserted).
The origin server needs to be able to tell the Content Transformation proxy:
that it varies its presentation according to device type and other factors;
that it is permissible (or not) to perform Content Transformation of various kinds;
that it has media-specific representations;
that is unable or unwilling to deal with the request in its present form.
The Content Transformation proxy needs to be able to tell the user agent:
that it has applied transformations of various kinds to the content.
The Content Transformation proxy needs to be able to interact with the user:
to allow the user to disable its features;
to alert the user to the fact that it has transformed content and to allow access to an untransformed representation of the content.
A transforming proxy as described in this document must offer a level of control to users and to origin servers with which it communicates.
Transforming proxies should provide to their users:
an indication that the content being viewed has been transformed for mobile presentation;
an option to view the original, unmodified content.
They may also provide the ability for their users to make a persistent or semi-persistent expression of preferences. Examples of such settings are "Never transform", "Always request desktop presentation", "Transform only if necessary to avoid mis-operation" and "Compress where possible".
Editorial Note: The BPWG is studying how to clarify the scope of "persistent" and "semi-persistent".
Transforming proxies must allow origin servers to control the Content Transformation process. The control mechanisms include use of HTTP conventions as discussed in the following section (4 Behavior of Components).
The preferences of users and of servers may be ascertained by means outside the scope of this document, for example:
the use by transforming proxies of a disallow list of Web sites for which Content Transformation is known to diminish the user experience of content or be ineffective;
the use by transforming proxies of an allow list of Web sites for which Content Transformation is known to be necessary;
terms and conditions of service, as agreed between the user and the Content Transformation service provider.
Note:
Allow and disallow lists generally cause intractable problems for content providers since there is no mechanism for them to establish which lists they should be on, nor any generic mechanism though which they can check or change their status.
There is the possibility for conflict between the desires of the content provider and the desires of users of that content. This document sets out to provide a framework within which, for matters of presentation, the desires of the content provider are usually accommodated but, where necessary, the user may expressly override those desires with their preferences.
no-transform
directive in RequestIf the request contains a Cache-Control: no-transform
directive
the proxy must forward the request unaltered to the
server, other than to comply with transparent HTTP behavior and in particular to add a Via
HTTP header.
Note:
An example of the use of Cache-Control: no-transform
is the issuing of asynchronous HTTP requests, perhaps by means of XMLHttpRequest [XHR], which may include such a directive in order to prevent transformation of both the request and the response.
Irrespective of the presence of a no-transform directive:
the proxy should add the IP address of the initiator of the request to the end of a comma separated list in an X-Forwarded-For
HTTP header;
the proxy must behave transparently unless it is able
positively to determine that the user agent is a Web browser. The mechanism
by which the proxy recognizes the user agent as a Web browser should use
evidence from the HTTP request, in particular the User-Agent
and Accept
headers.
If there is no no-transform
directive present in the request the
proxy should analyze whether it intends to offer
transformation services by referring to:
any knowledge it has of user preferences;
any knowledge it has of user agent capabilities (including linearization and zoom);
any prior knowledge it has of server behavior, derived from previous interaction with the server - and in particular to preserve the consistency of user experience across a sequence of related requests;
the HTTP method of the request.
Proxies should not alter HTTP requests unless:
unaltered headers would result in the user's request being rejected by the origin server;
an unaltered request body is not consistent with the origin server's requirements in respect of Internet content type or character encoding (as may happen, for example, if the proxy has transformed an HTML form that results in this request);
the user has specifically requested a restructured version of a desktop presentation.
Note:
Rejection of a request by a server is taken to mean both a HTTP 406 Status as well as HTTP 200 Status, with content indicating that the request cannot be handled - e.g. "Your browser is not supported"
Editorial Note: The BPWG is studying heuristics for determining when a response with a 200 Status should be treated as a response with a 406 Status.
Proxies should not intervene in methods other than GET, POST, HEAD and PUT.
User agents sometimes issue HTTP HEAD requests in order to determine if a resource is of a type and/or size that they are capable of handling. A transforming proxy may convert a HEAD request into a GET request if it requires the response body to determine the characteristics of the transformed response that it would return were the user agent subsequently to issue a GET request for that content.
In this case, the proxy should (providing such action is in accordance with normal HTTP caching rules) cache the response so that it is not required to send a second GET request to the server.
If, as a result of carrying out this analysis the proxy remains unaware of the server's preferences and capabilities it should:
Issue a request with unaltered headers and examine the response (see 4.4 Proxy Response to User Agent);
If it is still in doubt, issue a request with altered headers
The proxy must not issue a POST/PUT request with altered headers when the response to the unaltered POST/PUT request has HTTP status code 200 (in other words, it may only send the altered request for a POST/PUT request when the unaltered one was refused with a 406 status).
The theoretical idempotency of GET requests is not always respected by servers. In order, as far as possible, to avoid mis-operation of such content, proxies should avoid issuing duplicate requests and specifically should not issue duplicate requests for comparison purposes.
The proxy must (in accordance with compliance to RFC 2616)
include a Via
HTTP header indicating its presence.
The proxy should indicate its presence and capabilities by
including a comment in the Via
HTTP header consisting of the URI "http://www.w3.org/2008/04/ct".
Editorial Note1k: Need to put something at the end of the rainbow in case the URI is ever resolved.
When forwarding Via
headers proxies should not alter them
in any way.
Note:
According to [RFC 2616 HTTP] Section 14.45 Via
header comments "may be removed by any recipient prior to forwarding the message". However, the justification for removing such comments is based on memory limitations of early proxies, and that most modern proxies do not suffer such limitations.
If it has altered HTTP headers the proxy must include
copies of the unaltered headers in the request in the form
"X-Device-"<original header name>
. For example, if
it has altered the User-Agent
header, an
X-Device-User-Agent
header must be added
with the value of the received User-Agent
header.
Servers should respond with a 406 HTTP Status if a request can not be satisfied with content that meets the criteria specified by values of request HTTP headers (and not a 200 Status).
If it is capable of varying its presentation it should take account of user agent capabilities and formulate an appropriate experience according to those criteria.
If the server varies its presentation according to examination of received HTTP
headers then it must include a Vary
HTTP header
indicating this to be the case. If, in addition to, or instead of HTTP headers,
the server varies its presentation on other factors (source IP Address ...) then
it must, in accordance with [RFC 2616 HTTP],
include a Vary
header containing the value '*'.
If the server has distinct presentations that vary according to presentation media, then the medium for which the presentation is intended should be indicated.
Editorial Note: The BPWG is studying the use of the link
element of HTML
which is used for this purpose. It is noted that the link
element is not available in formats other than HTML, and it is noted that
there is currently active discussion about the use of the
Link
HTTP header, which would serve this purpose well.
If the server creates a specific user experience according to device
characteristics or presentation media types it should inhibit
transformation of the response by including a Cache-Control: no-transform
directive.
The server must include a Cache-Control: no-transform
directive
if one is received from the user agent.
Note:
Including a Cache-Control: no-transform
directive
can disrupt the behavior of WAP/WML proxies, because it can inhibit such
proxies from converting WML to WMLC.
Note:
If the server is unable to add a Cache-Control
HTTP headers, it may, in HTML documents, add a meta HTTP-Equiv
element containing Cache-Control: no-transform
.
Servers may base their actions on knowledge of behavior of
specific transforming proxies, as identified in a Via
header, but
should not choose a Content-Type
for their
response based on their assumptions about the heuristic behavior of any
intermediaries. (e.g. a server should not choose content-type:
application/vnd.wap.xhtml+xml
solely on the basis that it
suspects that proxies will not transform content of this type).
If HTTP header fields were altered in the request then the proxy
must be prepared to re-issue the request in an unaltered form
on receipt of a Vary
header in the response indicating that the server offers
variants of its presentation according to any of the HTTP header fields that
have been modified.
Editorial Note: The BPWG is aware that more precision may be needed in the above statement. If a transforming proxy has followed the guidelines in this document, then it should not receive a response with a Vary header if it has not already received such a response to a request with unaltered headers.
If the response includes a Warning: 214 Transformation Applied the proxy must not apply further transformation.
If the response is an HTML response and it contains a <link rel="alternate" media="handheld" />
element, the CT-proxy SHOULD request and return the referenced resource, unless the resource referenced is the current resource (1k) as determined by [unresolved discussion] ....
In the absence of a Vary
or no-transform
directive (or a meta HTTP-Equiv
element containing Cache-Control: no-transform
) the
proxy should apply heuristics to the content to determine
whether it is appropriate to restructure or recode it (in the presence of such
directives, heuristics should not be used.)
Examples of heuristics:
The server has previously shown that it is contextually aware, even if the present response does not indicate this;
the Content-Type
or other aspects of the response are known to be specific to the device or class of device (e.g. for HTML documents the DOCTYPE is "-//OMA//DTD XHTML Mobile 1.2//EN", "-//WAPFORUM//DTD XHTML Mobile 1.1//EN" "-//WAPFORUM//DTD XHTML Mobile 1.0//EN" "-//W3C//DTD XHTML Basic 1.1//EN" and "-//W3C//DTD XHTML Basic 1.0//EN");
the user agent has linearization or zoom capabilities or other features which allow it to present the content unaltered;
the URI of the response (following redirection or as indicated by the Content-Location
HTTP header) indicates that the resource is intended for mobile use (e.g. the domain is *.mobi, wap.*, m.*, mobile.* or the leading portion of the path is /m/ or /mobile/);
the response contains client-side scripts that may mis-operate if the resource is restructured;
the response is an HTML response and it includes <link> elements specifying alternatives according to presentation media type.
A proxy should strive for the best possible user experience that the user agent supports. It should only alter the format, layout, dimensions etc. to match the specific capabilities of the user agent. For example, when resizing images, they should only be reduced so that they are suitable for the specific user agent, and this should not be done on a generic basis.
If the proxy alters the content then it must add a Warning 214 Transformation Applied HTTP Header.
If the proxy alters the content then the altered content should validate according to an appropriate published formal grammar.
If the response contains links whose URIs have the scheme https
the
proxy may only rewrite them so that it can transform the content,
if it meets the following provision. If the proxy does rewrite such links, it
must advise the user of the security implications of
doing so and must provide the option to avoid decryption and
transformation of the resources the links refer to.
If the response includes a Cache-Control: no-transform
directive
then the response must remain unaltered other than to comply with transparent
HTTP behavior and other than as noted below.
If the proxy determines that the resource as currently represented is likely to cause serious mis-operation of the user agent then it may, with the users explicit prior consent, warn the user and provide links to both transformed and unaltered versions of the resource.
The BPWG believes that POWDER [POWDER] will represent a powerful mechanism by which a server may express transformation preferences. Future work in this area may recommend the use of POWDER.
link
HTTP HeaderThe BPWG believes that the link
HTTP header which was removed from recent drafts of HTTP, and which is under discussion for re-introduction, would represent a more general and flexible mechanism than use of the HTML link
element, as discussed in this recommendation.
The BPWG believes that amendments to HTTP are needed to improve the inter operability of transforming proxies. For example, HTTP does not provide a way to distinguish between prohibition of any kind of transformation, and the prohibition only of restructuring (and not recoding or compression). At present HTTP does not provide a mechanism for recoding altered header values.
The editors acknowledge contributions of various kinds from members of the MWI BPWG Content Transformation Task Force.
The editor acknowledges significant written contributions from: