Copyright © 2008 W3C® (MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark and document use rules apply.
This document provides guidance to content transformation proxies and content providers as to how inter-work when delivering Web content.
This document is an editors' copy that has no official standing.
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.
Publication as a Group Working Draft of a proposed normative Recommendation does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
This document has been produced by the Content Transformation Task Force of the Mobile Web Best Practices Working Group as part of the Mobile Web Initiative . Please send comments on this document to the Working Group's public email list public-bpwg-ct@w3.org, a publicly archived mailing list .
This document was produced under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of patent disclosures made in connection with this document; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) with respect to this specification must disclose the information in accordance with section 6 of the W3C Patent Policy.
1 Introduction
1.1 Purpose
1.2 Audience
1.3 Scope
1.4 Summary of Requirements
2 Terminology
2.1 Types of Proxy
2.2 Types of Transformation
2.3 Interpretation of RFC 2119 Key Words
3 Behavior of Components
3.1 Proxy Forwarding of Request
3.1.1 Applicable HTTP Methods
3.1.2 no-transform directive in Request
3.1.3 Treatment of Requesters that are not Web browsers
3.1.4 Serving Cached Responses
3.1.5 Alteration of HTTP Header Values
3.1.5.1 Content Tasting
3.1.5.2 Avoiding "Unacceptable" Responses
3.1.5.3 User Selection of Restructured Experience
3.1.5.4 Sequence of Requests
3.1.5.5 Original Headers
3.1.6 Additional HTTP Headers
3.1.6.1 Proxy Treatment of Via Header
3.2 Server Response to Proxy
3.2.1 Use of HTTP 406 Status
3.2.2 Server Origination of Cache-Control: no-transform
3.2.3 Varying Representations
3.2.3.1 Use of Vary HTTP Header
3.2.3.2 Indication of Intended Presentation Media Type of Representation
3.3 Proxy Forwarding of Response to User Agent
3.3.1 Receipt of Cache-Control: no-transform
3.3.2 Receipt of Warning: 214 Transformation Applied
3.3.3 Server Rejection of HTTP Request
3.3.4 Receipt of Vary HTTP Header
3.3.5 Link to "handheld" Representation
3.3.6 Proxy Decision to Transform
3.3.6.1 Alteration of Response
3.3.6.2 HTTPS Link Re-writing
4 Testing
A References
B Example Transformation Interactions (Non-Normative)
B.1 Basic Operation
B.2 Optimization based on Previous Server Interaction
B.3 Optimization based on Previous Server Interaction, Server has Changed its
Operation
B.4 Server Response Indicating that this Representation is Intended for the Target
Device
B.5 Server Response Indicating that another Representation is Intended for the
Target Device
C Applicability to Transforming Solutions which are Out of Scope (Non-Normative)
D Scope for Future Work (Non-Normative)
D.1 POWDER
D.2 link HTTP Header
D.3 Sources of Device Information
D.4 Inter Proxy Communication
D.5 Amendment to and Refinement of HTTP
E Acknowledgments (Non-Normative)
From the point of view of this document, Content Transformation is the manipulation in various ways, by proxies, of requests made to and content delivered by an origin server with a view to making it more suitable for mobile presentation.
The W3C MWI BPWG neither approves nor disapproves of Content Transformation, but recognizes that is being deployed widely across mobile data access networks. The deployments are widely divergent to each other, with many non-standard HTTP implications, and no well-understood means either of identifying the presence of such transforming proxies, nor of controlling their actions. This document establishes a framework to allow that to happen.
The overall objective of this document is to provide a means, as far as is practical, for users to be provided with at least a "functional user experience" [Device Independence Glossary] of the Web, when mobile, taking into account the fact that an increasing number of content providers create experiences specially tailored to the mobile context which they do not wish to be altered by third parties. Equally it takes into account the fact that there remain a very large number of Web sites that do not provide a functional user experience when perceived on many mobile devices.
The audience for this document is creators of Content Transformation proxies, purchasers and operators of such proxies and content providers whose services may be accessed by means of such proxies.
The recommendations in this document refer only to "Web browsing" - i.e. access by user agents that are intended primarily for interaction by users with HTML Web pages (Web browsers) using HTTP. Clients that interact with proxies using mechanisms other than HTTP (and that typically involve the download of a special client) are out of scope, and are considered to be a distributed user agent. Proxies which are operated in the control of or under the direction of the operator of an origin server are similarly considered to be a distributed origin server and hence out of scope.
Note:
The document is not intended as guidelines for delivery of WAP/WML. Some of its recommendations may, in some circumstances, disrupt the delivery of WML.
The BPWG is not chartered to create new technology - its role is to advise on best practice for use of existing technology. In satisfying Content Transformation requirements, existing HTTP headers, directives and behaviors must be respected, and as far as is practical, no extensions to [RFC 2616 HTTP] are to be used.
This section summarizes the communication requirements of actors (users, user agents, transforming proxies and origin servers) to communicate with each other. It is recognised that several transformation proxies may be present but their interactions are not discussed in detail. The relevant scenario is as follows:
The needs of these actors are as follows:
The user agent needs to be able to tell the Content Transformation proxy and the origin server:
what type of mobile device and which user agent is being used;
that all Content Transformation should be avoided.
The Content Transformation proxy needs to be able to tell the origin server:
that some degree of Content Transformation (restructuring and recoding) can be performed;
that content is being requested on behalf of something else and what that something else is;
that the request headers have been altered and what the original ones were.
The origin server needs to be able to tell the Content Transformation proxy:
that it varies the representation of its responses according to device type and other factors;
that it is not permissible to perform Content Transformation;
that it has media-specific representations;
that is unable or unwilling to deal with the request in its present form.
The Content Transformation proxy needs to be able to tell the user agent:
that it has applied transformations of various kinds to the content.
The Content Transformation proxy needs to be able to interact with the user:
to allow the user to disable its features;
to alert the user to the fact that it has transformed content and to allow access to an untransformed representation of the content.
Note:
A more extensive discussion of the requirements for these guidelines can be found in "Content Transformation Landscape" [CT Landscape].
Alteration of HTTP requests and responses is not prohibited by HTTP other than in the circumstances referred to in [RFC 2616 HTTP] Section 13.5.2.
HTTP defines two types of proxy: transparent proxies and non-transparent proxies. As discussed in [RFC 2616 HTTP] Section 1.3, Terminology:
[Definition: "A transparent proxy is a proxy that does not modify the request or response beyond what is required for proxy authentication and identification."]. [Definition: "A non-transparent proxy is a proxy that modifies the request or response in order to provide some added service to the user agent, such as group annotation services, media type transformation, protocol reduction, or anonymity filtering.] Except where either transparent or non-transparent behavior is explicitly stated, the HTTP proxy requirements apply to both types of proxies."
This document elaborates the behavior of non-transparent proxies, when used for Content Transformation in the context discussed in [CT Landscape].
Transforming proxies can carry out a wide variety of operations. In this document we categorize these operations as follows:
Alteration of Requests
Transforming proxies process requests in a number of ways, especially replacement of various request headers to avoid HTTP 406 Status responses (if a server can not provide content that is compatible with the original HTTP request headers) and at user request.
Alteration of Responses
There are three classes of operation on responses:
Restructuring content
[Definition: Restructuring content is a process whereby the original layout is altered so that content is added or removed or where the spatial or navigational relationship of parts of content is altered, e.g. by linearization or pagination. It includes also rewriting of URIs so that subsequent requests route via the proxy handling this response.]
Recoding content
[Definition: Recoding content is a process whereby the layout of the content remains the same, but details of its encoding may be altered. Examples include re-encoding HTML as XHTML, correcting invalid markup in HTML, conversion of images between formats (but not, for example, reducing animations to static images). ]
Optimizing content
[Definition: Optimizing content includes removing redundant white space, re-compressing images (without loss of fidelity) and compressing for transfer.]
The key words must, must not, required, shall, shall not, should, should not, recommended, may, and optional in this Recommendation have the meaning defined in [RFC 2119].
Editorial Note1l: @@TODO need to have a conformance statement and re-introduce text where sections are non-normative?
Proxies should not intervene in methods other than GET, POST, HEAD and PUT.
User agents sometimes issue HTTP HEAD requests in order to determine if a resource is of a type and/or size that they are capable of handling. A transforming proxy may convert a HEAD request into a GET request if it requires the response body to determine the characteristics of the transformed response that it would return were the user agent subsequently to issue a GET request for that content.
If the HTTP method is altered from HEAD to GET, proxies should (providing such action is in accordance with normal HTTP caching rules) cache the response so that a second GET request for the same content is not required.
no-transform
directive in RequestIf the request contains a Cache-Control: no-transform
directive
proxies must forward the request unaltered to the server,
other than to comply with transparent HTTP behavior and as noted below (see 3.1.6 Additional HTTP Headers).
Note:
An example of the use of Cache-Control: no-transform
is the
issuing of asynchronous HTTP requests, perhaps by means of
XMLHTTPRequest [XHR], which may include such a
directive in order to prevent transformation of both the request and the
response.
Proxies must act as though a no-transform
directive is present (see 3.1.2 no-transform directive in Request unless
they are able positively to determine that the user agent is a Web browser.
The mechanism by which a proxy recognizes the user agent as a Web browser
should use evidence from the HTTP request, in
particular the User-Agent
and Accept
headers.
Proxies should follow standard HTTP procedures in respect of caching and should use cached copies of resources where this is in accordance with those procedures
In some circumstances, proxies may paginate responses and where this is the case a request may be for a subsequent page of a previously requested resource.
Editorial Note1l: This hasn't been discussed before, what should we say about serving subsequent parts of a response that is set to "no-cache" or whose expiry time has elapsed?
Other than to comply with transparent HTTP operation, proxies should not modify request headers unless:
the user would be prohibited from accessing content as a result of the server responding that the request is "unacceptable" (see 3.3.3 Server Rejection of HTTP Request;
the user has specifically requested a restructured desktop experience;
the request is part of a sequence of requests to the same Web site and either it is technically infeasible not to adjust the request because of earlier interaction, or because doing so preserves consistency of user experience.
Note:
In this section, the concept of "Web site" is used (rather than "origin server") as some origin servers host many different Web sites. Since the concept of "Web site" is not strictly defined, proxies should use heuristics including comparisons of domain name to assess whether resources form part of the same "Web site".
These circumstances are detailed in the following sections.
The theoretical idempotency of GET requests is not always respected by servers. In order, as far as possible, to avoid mis-operation of such content, proxies should avoid issuing duplicate requests and specifically should not issue duplicate requests for comparison purposes.
A proxy may reissue a request with altered HTTP header values if a previous request with unaltered values resulted in the origin server rejecting the request as "unacceptable" (see 3.3.3 Server Rejection of HTTP Request). A proxy may apply heuristics of various kinds to assess, in advance of sending unaltered header values, whether the request is likely to cause a "request unacceptable" response. If it determines that this is likely then it may alter header values without sending unaltered values in advance, providing that it subsequently assesses the response as described under (@@@ below) and is prepared to reissue the request with unaltered headers, and alter its subsequent behavior in respect of the Web site so that unaltered headers are sent.
A proxy must not issue a POST/PUT request with altered headers when the response to the unaltered POST/PUT request has HTTP status code 200 (in other words, it may only send the altered request for a POST/PUT request when the unaltered one resulted in an HTTP 406 response, and not a "request unacceptable" response).
Proxies may offer users an option to choose to view a restructured experience even when a Web site offers a choice of user experience. If a user has made such a choice then proxies may alter header values when requesting resources in order to reflect that choice, but must, on receipt of an indication from a Web site that it offers alternative representations (see @@ use of link header), inform the user of that and allow them to select an alternative representation.
Proxies should assume that by default users will wish to receive a representation prepared by the Web site. Proxies must assess whether a user's expressed preference is still valid if a Web site changes its choice of representations (see @@).
Editorial Note1l: Do we need something in here about how a proxy should not stand in the way of the Web site offering the user direct choice of representation?
When requesting resources that form part of the representation of a resource (e.g. style sheets, images), proxies should make the request for such resources with the same headers as the request for the resource from which they are referenced.
Linked resources that form part of the same Web site as a previously requested resource proxies may, for the sake of consistency, be requested with the same headers as the resource from which they are referenced.
When requesting linked resources that do not form part of the same Web site as the resource from which they are linked, proxies should not base their choice of headers on a consistency of presentation premise.
When forwarding an HTTP request with altered HTTP headers proxies
must include in the request copies of the
unaltered header values in the form "X-Device-"<original
header name>
. For example, the User-Agent
header has been altered, an X-Device-User-Agent
header
must be added with the value of the received
User-Agent
header.
Note:
The X-Device-
prefix was chosen primarily on the basis
that this is a already existing convention. It is noted that the
values encoded in such header may not ultimately derive from a
device, they are merely received headers. The treatment of received
X-Device
headers, which may happen where there are
multiple transforming proxies is undefined (see D Scope for Future Work).
Irrespective of the presence of a no-transform
directive:
proxies should add the IP address of the initiator
of the request to the end of a comma separated list in an
X-Forwarded-For
HTTP header;
proxies must include a Via
HTTP
header (see 3.1.6.1 Proxy Treatment of Via Header).
Via
HeaderProxies must (in accordance with compliance to RFC
2616) include a Via
HTTP header indicating their presence
and should indicate their conformance to this Recommendation by
including a comment in the Via
HTTP header consisting of
the URI "http://www.w3.org/ns/ct".
When forwarding Via
headers proxies should
not alter them in any way.
Note:
According to [RFC 2616 HTTP]
Section 14.45
Via
header comments "may be removed
by any recipient prior to forwarding the message". However, the
justification for removing such comments is based on memory
limitations of early proxies, most modern proxies do not suffer such
limitations.
Servers should respond with an HTTP 406 Status (and not an HTTP 200 Status) if a request cannot be satisfied with content that meets the criteria specified by values of the HTTP request headers.
Cache-Control: no-transform
Servers must include a Cache-Control:
no-transform
directive if one is received in the HTTP request.
Servers should include a Cache-Control:
no-transform
directive if, for any reason, they wish to inhibit
transformation of the response.
Note:
Including a Cache-Control:
no-transform
directive can disrupt the behavior of WAP/WML
proxies, because it can inhibit such proxies from converting WML to
WMLC.
Note:
If a server is unable to add a Cache-Control
HTTP header, in
HTML documents it should add a meta
HTTP-Equiv
element containing Cache-Control:
no-transform
.
Editorial Note1l: I think this is here to be symmetrical with the section below on proxies taking account of it, however, this really applies to legacy servers that can do nothing else, so does it belong here? How much could a legacy server actually comply with this spec? Perhaps, if anywhere, this needs to be moved to an appendix on "legacy servers" to accompany the other "things that out of scope components should do anyway?"
Servers should take account of user agent capabilities and formulate an appropriate experience according to those capabilities. Servers should provide a means for users to select among available representations, should default to the last selected representation and should provide a means of changing the selection.
Editorial Note1l: Above wording changed and elaborated from 1k, to be consistent with discussion at F2F and to proposed Best Practice in BP2. We did not resolve a mechanism by which servers should indicate that they are offering user choice in this way.
Vary
HTTP HeaderIf a server varies its representation according to examination of
received HTTP headers then it must include a
Vary
HTTP header indicating this to be the case. If, in
addition to, or instead of HTTP headers, a server varies its
representation based on other factors (e.g. source IP Address) then it
must, in accordance with [RFC 2616 HTTP], include a Vary
header containing the value '*'.
Servers may base their actions on knowledge of
behavior of specific transforming proxies, as identified in a
Via
header, but should not choose an
Internet content type for a response based on an assumption or
heuristics about behavior of any intermediaries. (e.g. a server should
not choose Content-Type: application/vnd.wap.xhtml+xml
solely on the basis that it suspects that proxies will not transform
content of this type).
If a server has distinct representations that vary according to the
target presentation media type, it should inhibit
transformation of the response by including a Cache-Control:
no-transform
directive (see 3.2.2 Server Origination of Cache-Control: no-transform).
In HTML content it should indicate the medium for
which the representation is intended by including link
elements as follows:
Include a link
element identifying the target
presentation media types of this representation by setting the
media
attribute and set the href
attribute to a valid local reference (i.e. use the fragment
identifier added to the URI of the document being served to
point to a valid target within the document);
Include link
elements identifying the target
presentation media types of other available representations by
setting the media
attribute to indicate those
representations and set the href
attribute to a URI
without a fragment identifier.
Note:
The presence of the second usage of the link
element
discussed above in the absence of a link
element
conforming to the first usage does not indicate that this
representation is or is not formatted for the presentation media
types listed.
Note:
Some examples of the use of the link
element are
included below in B Example Transformation Interactions.
If the response includes a Cache-Control: no-transform
directive
then the response must remain unaltered other than to
comply with transparent HTTP
behavior and other than as follows.
If a proxy determines that the resource as currently represented is likely to cause serious mis-operation of the user agent then it may advise the user that this is the case and must provide the option for the user to continue with unaltered content.
If the response includes a Warning: 214 Transformation Applied
HTTP header, proxies must not apply further
transformation.
For compatibility with servers that do not implement this Recommendation (see 3.2.1 Use of HTTP 406 Status), a proxy may treat responses with an HTTP 200 Status as though they were responses with an HTTP 406 Status if it has determined that the content (e.g. "Your browser is not supported") is equivalent to one with an HTTP 406 Status.
Vary
HTTP HeaderIf, in response to an HTTP request with altered headers, that was not
preceded by an HTTP request with unaltered headers, a proxy receives a
response containing a Vary
header referring to one of the
altered headers then it should request the resource again
and update whatever heuristics it uses so that unaltered headers are always
presented first for this resource.
If the response is an HTML response and it contains a <link
rel="alternate" media="handheld" />
element, the CT-proxy
SHOULD request and process the referenced resource, unless the resource
referenced is the current resource as determined by the presence of
link
elements as discussed under 3.2.3.2 Indication of Intended Presentation Media Type of Representation.
In the absence of a Vary
or no-transform
directive
(or a meta HTTP-Equiv
element containing Cache-Control:
no-transform
) proxies should apply heuristics
to the response to determine whether it is appropriate to restructure or recode it (in the presence of such
directives, heuristics should not be used.)
Examples of heuristics:
The Web site (@@sic) has previously shown that it is contextually aware, even if the present response does not indicate this;
a claim of mobileOK Basic™ ??? conformance is indicated;
the Content-Type
or other aspects of the response are
known to be specific to the device or class of device (e.g. for HTML
documents the DOCTYPE is "-//OMA//DTD XHTML Mobile 1.2//EN",
"-//WAPFORUM//DTD XHTML Mobile 1.1//EN" "-//WAPFORUM//DTD XHTML
Mobile 1.0//EN" "-//W3C//DTD XHTML Basic 1.1//EN" and "-//W3C//DTD
XHTML Basic 1.0//EN");
the user agent has linearization or zoom capabilities or other features which allow it to present the content unaltered;
the URI of the response (following redirection or as indicated by the
Content-Location
HTTP header) indicates that the
resource is intended for mobile use (e.g. the domain is *.mobi,
wap.*, m.*, mobile.* or the leading portion of the path is /m/ or
/mobile/);
the response contains client-side scripts that may mis-operate if the resource is restructured;
the response is an HTML response and it includes <link> elements specifying alternatives according to presentation media type.
A proxy should strive for the best possible user experience that the user agent supports. It should only alter the format, layout, dimensions etc. to match the specific capabilities of the user agent. For example, when resizing images, they should only be reduced so that they are suitable for the specific user agent, and this should not be done on a generic basis.
If a proxy alters the response then it must add a
Warning 214 Transformation Applied
HTTP header.
If a proxy alters a response body then the altered content should validate according to an appropriate published formal grammar.
If the response contains links whose URIs have the scheme
https
the proxy may only rewrite them so
that it can transform the content, if it meets the following provision.
If a proxy does rewrite such links, it must advise
the user of the security implications of doing so and
must provide the option to avoid decryption and
transformation of the resources the links refer to.
ACTION-732If the proxy re-writes HTTPS
links, replacement links must have the scheme
https
.
Editorial Note1l: @@todo: some examples of common scenarios
Request resource with original headers
- if the response is a 406 response, re-request with altered headers (unless the 406 response contains no-transform)
- if the response is a 200 response
-- if the response contains vary: User-Agent, an appropriate link element or header, or no-transform, forward it
-- otherwise assess (by unspecified means) whether the 200 response is a bogus one
--- if it is not, forward it
--- if it is, re-request with altered headers
There are a number of well-known examples of solutions that seem to their users as though they are using a browser, but because the client software communicates with using proprietary protocols and techniques, it is the combination of the client and the in-network component that is regarded as the HTTP User Agent. The communication between the client and the in-network component is therefore out of scope of this document.
Additionally, where some kind of administrative arrangement exists between a transforming proxy and an origin server for the purposes of transforming content on the origin server's behalf, this is also out of scope of this document.
In both of the above cases, it is recommended that when forwarding requests to origin servers that proxies adhere to the provisions of this document in respect of providing information about the device and the original IP address.
The BPWG believes that POWDER will represent a powerful mechanism by which a server may express transformation preferences. Future work in this area may recommend the use of POWDER to provide a mechanism for origin servers to indicate more precisely which alternatives they have and what transformation they are willing to allow on them, and in addition to provide for Content Transformation proxies to indicate which services they are able to perform.
Editorial Note: Wondering if this needs to go per discussion on mobileOK
link
HTTP HeaderThe BPWG believes that the link
HTTP header which was removed from
recent drafts of HTTP, and which is under discussion for re-introduction, would
represent a more general and flexible mechanism than use of the HTML
link
element, as discussed in this recommendation.
The process of adapting content at the origin server, or transforming it in a proxy is likely to have a dependency on a repository of device descriptions. An origin server's willingness to allow a transforming proxy to transform content may depend on its evaluation of the trustworthiness of device description data that is being used. There is scope for enhancement of the trust relationship by some means of indicating this.
There is scope for further work to define how multiple proxies may inter-operate. A common case of multiple proxies is where a network provider transforming proxy and a search engine transforming proxy are both present.
The BPWG believes that amendments to HTTP are needed to improve the inter operability of transforming proxies. For example, HTTP does not provide a way to distinguish between prohibition of any kind of transformation, and the prohibition only of restructuring (and not recoding or compression). At present HTTP does not provide a mechanism for recoding altered header values 1l: I can't remember what the last sentence means..
A number of mechanisms exist in HTTP which might be exploited given more precise
definition of their operation - for example the OPTIONS
method and
the HTTP 300 (Multiple Choices) Status.
The editor acknowledges contributions of various kinds from members of the Mobile Web Best Practices Working Group Content Transformation Task Force.
The editor acknowledges significant written contributions from: