Copyright © 2007 W3C® (MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark and document use rules apply.
This document is the Guidelines referred to in the Charter of the W3C Mobile Web Initiative Best Practices Working Group Content Transformation Task Force.
Its purpose is to provide guidance to implementors of components of the delivery context as to how to communicate their intentions and capabilities in respect of content transformation.
This document is an editors' copy that has no official standing.
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.
Publication as a Group Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
This document has been produced by the Content Transformation Task Force of the Mobile Web Best Practices Working Group as part of the Mobile Web Initiative . Please send comments on this document to the Working Group's public email list public-bpwg-ct@w3.org, a publicly archived mailing list .
This document was produced under the 5 February 2004 W3C Patent Policy . W3C maintains a public list of patent disclosures made in connection with this document; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) with respect to this specification must disclose the information in accordance with section 6 of the W3C Patent Policy.
Work needed on this draft:
1. Work Brian's suggestions into the text
I think the following require mention at least as heuristics, if not recommended practice as they do play a role:
a) a priori knowledge of device characteristics, as gleaned from a DDR; b) administrative arrangements, white lists etc.; c) heuristics, such as knowing which content types and DTDs are specifically mobile, looking for the presence of "handheld" in style sheets and @media attributes, looking for mobileOK labels; d) User interaction
In reference to one of Bryan's contributions, user interaction needs more thought and discussion - on the one hand we don't want to interrupt the user experience with excise tasks, yet on the other, in the end, the user must act to signal their intentions and this needs noting. E.g. there could be a note that the host should provide interactions that allow the user to have a choice of presentations and so should the proxy and the client, for that matter.
Another as yet unopened Pandora's box is that the discussion and proposed text below looks at the issues primarily from the point of view of "varying presentation from Thematically consistent URIs". What hasn't, as yet, been explored is how it all works if there is a common entry point to a site (Thematically consistent URI for a home page) which then dispatches via redirect to media specific versions. This is possibly rather more common than the previous case (e.g. redirect to example.com/mobile - or rather better, imo, example.mobi). Naturally, there will also be varying presentation even within a redirected solution. This whole area needs further thought.
Whatever we come up with does of course have to deal with conforming and non conforming and transforming and non-transforming proxies. There isn't, as yet, a use case analysis, it is a bit too soon for that, I think.
The philosophy here should be in line with existing HTTP practice, which is to fall back to safe behavior. Thus, when trying to distinguish reformatting behavior from recoding behavior, the objective is to fall back to "safe" known HTTP/1.1 practice for non conforming (unaware) and say things like:
Cache-Control: no-transform, allow-reencode
as this will result in a stricter interpretation by unaware participants. This behavior is discussed in detail in HTTP section 14.9.6 (reproduced below in this note for your convenience and see Sean's detailed list of references to points in the HTTP spec that need to be included also).
This, of course, immediately introduces the question as to whether we are over stepping the mark in introducing such extensions, and I think we need to be clear about that before going further. On the one hand HTTP makes it clear, in explaining how to introduce extensions that it expects such extensions to be introduced. On the other hand, we do typically take a conservative approach and say if it is not in the IANA registry then it's not an existing protocol and therefore beyond our scope. Introducing extensions to existing header values, to my mind falls short of introducing new headers. Though it's not clear that we can do what we need to if we don't do that, go through IANA registration and so on.
I think that we are going to need to do that and suggest we speak to this point tomorrow on our call, if necessary by joining forces with a group that is actually chartered to "invent new protocols". The alternative being a much more insipid document that only gets to a small subset of the problem.
I'd also like to bring the group's attention to the following RFCs:
RFC 2506 Media Feature Tag Registration Procedure RFC 2295 Transparent Content Negotiation RFC 2296 Remote Variant Selection Algorithm
RFC 2295 is experimental, but actually gets to some of the points we want to make, though doesn't exactly address what we are doing. It's rather a lengthy and detailed read, and has a lot of features that we don't need. It does, however, introduce a couple of headers and field values which have been IANA registered. Also, the main points of the negotiation are implemented in Apache in mod_negotiation (see [APACHE]).
[APACHE] http://httpd.apache.org/docs/2.2/content-negotiation.html
IANA registration is probably a bit of a nuisance, and may be something we don't need to do - e.g. it would seem that the q parameter for content type and much else is not registered. For those of you who fancy a bit of train spotting, I think you'll find registered things at [IANA], though I confess I find this all a bit impenetrable and difficult to navigate.
[IANA] http://www.iana.org/numbers.html
I have tried to take into account the contributions and discussions on the list, especially those threads starting at the following points. Some are quite lengthy threads and can be followed with the "Next in Thread" link:
Magnus's original proposal for 2.1 [1] elaborated in the text below
[1] http://lists.w3.org/Archives/Public/public-bpwg-ct/2007Sep/att-0014/00-p art Sean Patterson's original proposal for 2.3 [2] points included in the text and included verbatim [2] http://lists.w3.org/Archives/Public/public-bpwg-ct/2007Sep/0029.html Aaron's contribution for section 2.3 [3] points included in the text and included verbatim [3] http://lists.w3.org/Archives/Public/public-bpwg-ct/2007Sep/0025.html Pointer to ISSUE-222 TAG Finding on Alternative Representations [4] http://lists.w3.org/Archives/Public/public-bpwg-ct/2007Oct/0011.html Pointer to ISSUE-223 (Jo's CT Shopping List): Various Items to Consider for the CT Guidelines [5] http://lists.w3.org/Archives/Public/public-bpwg-ct/2007Oct/0012.html Pointer to ACTION-575 Techniques for Guidelines Document [6] http://lists.w3.org/Archives/Public/public-bpwg-ct/2007Oct/0023.html Scope of CT Guidelines [7] http://lists.w3.org/Archives/Public/public-bpwg-ct/2007Oct/0041.html
3. Find a solution to how to communicate the orignal user agent headers: This can be done in at least the following ways: a) by including X-Original-xxx headers b) by including a request body that contains either the original headers or the headers the proxy would have sent if it had replaced the original headers c) by requesting twice, first with the original headers, then if permitted/desirable with the modified headers [understood the limitations wrt GET only, but that is true of any of these solutions, I think], then caching the inference as to whether to modify headers in future requests d) by decorating the modified headers to allow inference as to what their previous value was (e.g. as suggested in the draft sent as response to ACTION-581 adding a parameter where a content type has been inserted in the Accept header)
1 Introduction
1.1 Purpose
1.2 Scope
2 Requirements
2.1 general requirements
2.1.1 preferences
2.1.2 provision of highest-quality content
2.1.3 detection of CT-awareness
2.1.4 user agent identification and capabilities disclosure
2.1.5 original representation availability
2.2 CT proxy serving CT-unaware CP and browser
2.2.1 CT-unaware browser user selection of content representation
2.2.2 CT-unaware browser user selection of original content representation
2.2.3 CT-unaware browser user selection of alternate content representation
2.3 CT proxy serving CT-aware CP and CT-unaware browser
2.3.1 CP directives
2.4 CT proxy capabilities disclosure to CP
2.5 CT proxy serving CT-aware CP and CT-aware browser
2.5.1 browser directives
2.5.2 CT proxy capabilities disclosure to CT-aware browser
2.5.3 CT actions disclosure to CT-aware browser
2.5.4 CT-aware browser selection of original content representation
2.5.5 CT-aware browser selection of alternate content representation
2.6 security considerations
2.7 non-browser user agents
3 Guidelines
3.1 Objectives
3.2 Types of Proxy
3.3 Types of Transformation
3.4 Alteration of HTTP Requests and Responses
3.5 Control by Client/User
3.6 Control by Server
4 Behavior of Components
4.1 Client Request to Proxy
4.2 Proxy Request to Server
4.2.1 Alternative 1
4.2.2 Alternative 2
4.3 Server Response to Proxy
4.4 Proxy Receipt of Response from Server
4.5 Proxy Response to Client
4.6 Client Action on Receipt of Response
4.7 Encoding of [@@new] Features
5 Use Case Analysis
6 Testing
7 Conformance
A Scope for Future Work (Non-Normative)
B References (Non-Normative)
C Acknowledgments (Non-Normative)
The background to the requirements for these guidelines is discussed in "Content Transformation Landscape" [CT-Landscape]. This document provides ...
These are pasted verbatim from Bryan's original note
CT = Content Transformation, CP = Content Provider(s)
An entity that is "CT-aware" is assumed to be specifically designed to use or provide CT service per these guidelines. A "CT proxy" is assumed to be CT-aware. A "non-CT proxy" is assumed to be CT-unaware. Browsers and CP may be CT-aware or CT-unaware.
A CT proxy may enable a user or user agent to select preferences for CT service features.
A CT proxy that offers preference selection shall be capable of retaining the selections.
When selecting a content representation by default, CT proxies shall provide the highest-quality representation compatible with the browser.
"Compatible" in this requirement means a representation that the browser supports, and results in a usable user experience.
A CT proxy shall be capable of detecting CT-awareness in CP and browsers.
A CT proxy may enable a user to select preferences for user agent identification and capabilities disclosure to CP.
A CT proxy shall forward requests to CP without affecting user agent identification or capabilities disclosure, except as necessary to provide a user-selected content representation, or as otherwise specified by user preferences.
A CT proxy shall be capable of providing CT service to CT-unaware CP and browsers.
A CT proxy may enable a CT-unaware browser user to select a preference for a content representation from among those available through the proxy.
A CT proxy that offers user-selection of content representations should be capable of user selection of such preferences for specific domains and globally for all domains.
A CT proxy that offers user-selection of content representations should be capable of offering the user the ability to switch representations when viewing a page.
A CT proxy should support the ability of a CT-unaware browser user to select the original representation for a CP response.
A CT proxy shall support the disclosure of available alternate representations for a CP response to a CT-unaware browser user.
A CT proxy shall support the ability of a CT-unaware browser user to select an alternate representation for a CP response.
A CT proxy shall recognize and honor CP directives for supported CT services.
As an exception to the previous requirement, a CT proxy should deny CP directives that would result in dangerous markup being sent to the browser.
A CT proxy may enable a user to select preferences for error handing related to CP directives.
A CT proxy shall disclose its CT capabilities to CT-aware CP without affecting user agent identification or capabilities disclosure.
A CT proxy shall recognize and honor browser directives for supported CT services.
As an exception to the previous requirement, a CT proxy should deny browser directives that would result in dangerous markup being sent to the browser.
A CT proxy shall disclose its CT capabilities to CT-aware browsers without affecting CP-provided headers.
A CT proxy shall disclose CT actions taken on CP responses to CT-aware browsers.
A CT proxy shall support the disclosure of the original representation for a CP response to a CT-aware browser.
A CT proxy shall support the ability of a CT-aware browser to select the original representation for a CP response.
A CT proxy shall support the disclosure of available alternate representations for a CP response to a CT-aware browser.
A CT proxy shall support the ability of a CT-aware browser to select an alternate representation for a CP response.
A CT proxy shall not rewrite secure links as a way to enable CT service for those links, without the consent of the CP and user.
A CT proxy that does not support or is not allowed to provide CT service for secure links should disclose to the user that the CT service will be unavailable for those links.
From here on, taken from text in email response to ACTION-581
The purpose of this section is to explore the need for actors (clients, proxy servers, gateways, origin servers, etc) to communicate with each other, and also suggest guidelines for doing so. The relevant scenario involving a content transformation proxy is as follows:
client browser <---HTTP---> content transformation proxy <---HTTP---> origin server
There may be other scenarios as well but they will initially be ignored for the sake of simplicity. The needs of these three actors are as follows:
The client browser needs to be able to tell the content transformation proxy:
what media-type (presentation format e.g. desktop, handheld) is desired.
that all content transformation should be avoided, or that reformatting is allowed/desired
what type of mobile device and what user agent is being used
that the device has (zoom, linearize, keyhole) presentation [@@??]
The content transformation proxy needs to be able to tell the origin server:
that some degree of content transformation (re-coding and reformatting) can be performed
Content transformation will be carried out unless instructed not to.
that content is being requested on behalf of something else.
about the delivery context (for example mobile device type and user agent).
That the request headers have been altered (e.g. additional content types inserted) [??]
The origin server needs to be able to tell the content transformation proxy:
that content is already optimized and no additional transformation is required (or that it should not be restructured by may be recoded]
that it's OK to perform additional content transformation.[??]
That it varies its presentation
That it has media-specific presentations
I can't/don't wish to handle this request in its present form
That request headers should/should not be modified
The content transformation proxy needs to be able to tell the client browser:
the status of the content: it is reformatted/recoded/untouched;
where to find the original content if it has been transformed. [@@ should this read "how", or do we suppose that there are "magic" mechanisms/URIs for by-passing proxies?]
In satisfying these requirements existing HTTP headers and directives and behaviors must be respected. However, not all of the features required can be achieved without extensions to the behaviors defined in [RFC 2616]. Knowing that many actors will be unaware of any HTTP extensions, special consideration needs to go into making sure that the fall-back behavior - i.e. strict adherence to HTTP/1.1 - is "safe". For example, if there is no standard way for a client browser to specify that all content transformation should be avoided in a request, then we must define a default behavior for a well-behaved content transformation proxy that receives a request from such a client.
[@@ other principles behind what we are trying to do - e.g. noting Sean's point that there is a wide diversity of different devices that all fall under the simple appellation of "handheld".]
HTTP defines two types of proxy: transparent proxies and non-transparent proxies. As discussed in Section 1.3 [HTTP], Terminology:
A "transparent proxy" is a proxy that does not modify the request or response beyond what is required for proxy authentication and identification. A "non-transparent proxy" is a proxy that modifies the request or response in order to provide some added service to the user agent, such as group annotation services, media type transformation, protocol reduction, or anonymity filtering. Except where either transparent or non-transparent behavior is explicitly stated, the HTTP proxy requirements apply to both types of proxies.
This document elaborates the behaviour of non-transparent proxies, when used for content transformation in the context discussed in [Content Transformation Landscape] and henceforward referred to as transforming proxies.
Transforming proxies can carry out a wide variety of operations. To carry out an exhaustive survey of those operations and to discuss means of server or client side control of them is beyond the scope of this document. In this document we categorize this rich vocabulary of possible operation into two types:
1) Alteration of Request Headers
Alteration of Responses
Alteration of responses is further sub-categorized into
a) restructuring content;
b) recoding content;
c) optimizing content.
Restructuring content is a process whereby the original layout is altered so that content is added or removed or where the spatial or navigational relationship of parts of content is altered, e.g. by linearization or pagination.
Recoding content is a process whereby the layout of the content remains the same, but details of its encoding may be altered. Examples include re-encoding HTML as XHTML, correcting invalid markup in HTML, conversion of images between formats (but not, for example, reducing animations to static images).
Optimizing content means removing redundant white space, recompressing images (without loss of fidelity), zipping for transfer ...
Alteration of HTTP requests and responses is not prohibited by HTTP other than in the circumstances referred to in [HTTP] section 13.5.2. This document describes how the Client and the Destination Server may require conforming transforming proxies not to alter HTTP requests and responses.
A transforming proxy gains knowledge of whether a user requests alteration of requests and responses by:
Administrative arrangements between the provider of the proxy and the end user;
As a result of the request containing an indication that changing the request headers must not be carried out;
Direct interaction with the User;
Other means.
A transforming proxy gains knowledge of whether a server permits alteration of requests and responses by:
e) Administrative arrangements between the provider of the server and the provider of the proxy;
f) For requests, by having previously received an indication from the origin server as a response to a request [for a resource on the path that this request is in scope of] that transformation of headers is not permissible;
g) For responses as a result of the response containing indications as to the servers intentions - including mobileOK labels;
h) Other means.
Aside from b) f) and g) above, these techniques are generally out of scope of this document, however use of knowledge gleaned for sources other than HTTP is referred to below.
Transforming proxies should allow the overriding of standing administrative arrangements on a request by request and response by response basis.
The client may request that the Content-Type and Content-Encoding must not be altered in the response by setting the Cache-Control: no-transform directive.
The client may add a [@@preserve-headers directive] to indicate that transforming proxies must not alter other aspects of the request headers, except as permitted by HTTP/1.1 to allow correct operation of caching functions [want to say that do not affect transparency, but that is probably not technically exact]. The [@@preserve-headers directive] may only be present in addition to the no-transform Cache-Control directive.
The client may add an [@@allow-recode directive] to the Cache-Control: no-transform directive, indicating that the proxy may change the format of the response but not restructure the content.
The client may add an [@@allow-compress] to the Cache-Control: no-transform directive, meaning that a proxy may remove redundant white space, recompress images or change the Content-Encoding (to use gzip, from identity, for example).
The client may also add [@@preferred-medium directive] indicating that a preference for a presentation style. The [@@preferred-medium directive] has the form media=presentation-format (as described in RFC ..., current values of the presentation format-directive are taken from IANA ... and include "screen" and "handheld").
[It would be nice if the client were able to indicate what type of presentational capabilites it has, for example, zoom, linearize, keyhole ... @@@ client-feature indication]
If the request contains a Cache-Control: no-transform directive [@@or any of the other directives specified in previous section] the proxy must forward the request unaltered to the server.
If there are no [@@ such directives] present in the request from the client, and there is no indication from a downstream proxy that it intends to transform [@@ see I will transform below] the proxy should analyze whether it intends to offer transformation services by referring to any administrative arrangements that are in place with the user of the client, or the server, and any a priori knowledge it has of client capabilities [@@ from a DDR and so on]. Knowing that the client has available a linearization or zoom capability the proxy should not broad range of formats the proxy should not offer to recode content.
If as a result of this deliberation it intends to restructure the proxy must indicate this by including a [@@@ I will transform (restructure / reformat / compress)] - [@@ and even if it doesn't it may indicate its potential for restructuring or recoding or compressing content [@@by means of ...].
The proxy must include a Via HTTP header indicating its presence.
Proxies must not intervene in https and should not intervene in methods other than GET and HEAD.
When altering the Accept HTTP header, the proxy should indicate any formats that it intends to recode for delivery by assigning a lower q factor (indicated by the q parameter) than those natively supported and should, in addition,[@@extension] add a further transform parameter indicating that the format is not natively supported by the client.
e.g. Accept: image/jpeg, image/gif, image/png;q=0.7;[@@transform]
When altering the User-Agent HTTP Header the proxy must indicate this change by adding a [@@ User Agent Modified indication with the Original User-Agent indicated]
If other HTTP header fields are altered then the proxy must be prepared to re-issue the request as received from the client on receipt of a Vary header in the response indicating that the server offers variants of its presentation according to any of the HTTP header fields that have been modified.
When altering the Accept HTTP header, the proxy should indicate any formats that it intends to recode for delivery by assigning a lower q factor (indicated by the q parameter) than those natively supported.
e.g. Accept: image/jpeg, image/gif, image/png;q=0.7
If other HTTP header fields are altered then the proxy must be prepared to re-issue the request as received from the client on receipt of a Vary header in the response indicating that the server offers variants of its presentation according to any of the HTTP header fields that have been modified.
If the server varies its presentation according to examination of received HTTP Headers then it must include a Vary HTTP header indicating this to be the case. If, in addition to, or instead of HTTP headers, the server varies its presentation on other factors (source IP Address ...) then it must include a * as one of the fields in the Vary response.
The server must include a no-transform directive if one is received from the client. If it is capable of varying its presentation it should take account of client capabilities [@@as derived from a DDR etc.] and formulate an appropriate experience according to those criteria.
If the server has distinct presentations according to its perception of the presentation media, then the medium for which the presentation is intended should be indicated [@@using the ...]
If the client has requested a specific presentation using the [@@ directive] the server should provide a presentation of that kind. e.g. if the server would ordinarily provide a handheld experience but the client requests a screen experience the screen experience should be provided. And vice versa, of course.
If the server creates a specific user experience for certain presentation media types it should inhibit transformation of the response by including a no-transform directive. The server should not prohibit recoding or compression of its content unless it has specific reasons not to allow it [including that this has been requested by the client] and hence should in general add a [@@allow-recoding or allow-compression] directive when adding a no-transform directive.
Note that including a no-transform directive may [@@should actually] disrupt the behaviour of WAP/WML proxies, because this inhibits such proxies from converting WML to WMLC (because this is a content-encoding behavior). Adding [@@allow-recoding] or [@@allow-compression] is unlikely to be recognized in the short-term by such proxies which predate these guidelines.
Servers may base their actions on a priori knowledge of behaviour of transforming proxies, when they are identified in a Via header.
The server should not choose a Content-Type for its response based on its assumptions about the heuristic behavior of any intermediaries. (e.g. it should not choose content-type: application/vnd.wap.xhtml+xml solely on the basis that it suspects that transforming proxies will apply heuristics that make them not restructure it).
If servers provide only limited variants of presentation they should consider providing a rich presentation and allowing a transforming proxy to reduce this - which may result in a richer experience for the user than providing a basic handheld experience only, say.
406 Response - Note that some clients (MSIE for instance) don't display the body of a 406 response, this is in contravention of HTTP/1.1 as far as I can see.
Vary headers in 406 response - restrict to the one(s) that have caused the 406.
In general, successful responses should are done with 200 OK Vary: User-Agent, Accept, Accept-Language etc. e.g. MS doesn't want you to do updates except with IE. so they should say 406 Vary: User-Agent (but note that IE doesn't display the body of 406 responses)
Servers should respond with a 406 not a 200 if they can't handle the request and should indicate that they permit header alteration in that 406. Servers should provide information about alternative representations by using the Vary header (if the alternatives are available from the same URI) or using link information if alternative representations are handled by different URIs. [This restricts to HTML for now. If link headers a reinstated in HTTP then this becomes a more universal mechanism. Open question as to whether it SVG or WICD etc. support any such notion]
[@@300 Response - could this be used as a signal from the server to say that it understands the protocol? A la RFC 2295]
If the proxy has altered any of the HTTP request headers, and it receives a Vary response from the server it should re-make the request with the original headers and forward the subsequent response without restructuring it, irrespective of the contents of the subsequent response. The proxy should take note of this and should not vary headers for subsequent requests, unless requests are subsequently received with the Vary header [@@ + note on backoff below]
[@@note that loop detection and elimination is needed here]
If the response includes a Warning: 214 Transformation Applied the proxy must not apply further transformation.
If the response includes a Cache-Control: no-transform directive that is not modified by [@@ other directives on recoding] then the response must be forwarded to the client unaltered.
In the absence of a Vary or no-transform directive the proxy should apply heuristics to the content to determine whether it is appropriate to restructure or recode it (in the presence of such directives, heuristics should not be used.)
The server has previously shown that it is contextually aware, even if the present response does not indicate this - modified by a need for the proxy to be aware that the server has changed its behavior and is no longer aware in that way
the content-type is known to be specific to the device or class of device e.g. application/vnd.wap.xhtml+xml
examination of the content reveals that it is of a specific type appropriate to the device or class of device e.g. DOCTYPE XHTML-MP or WBMP or [@@mobile video] [@@ note Sean's extensive list of heuristics that should be included as an informative example?]
The response is an HTML response and it includes <link> elements specifying alternat(iv)es according to media type [or that such links are included as HTTP headers] or that the content has a mobileOK label.
If the proxy alters the content then it must add a Warning: 214 Transformation Applied HTTP Header
All ... must be tested for deleterious effects ... [@@TBD]
Providers of transforming proxies should make available interfaces that facilitate testing of Web sites accessed through them. [@@ though how they should make known how to do this and what administrative arrangements would be needed are both probably out of scope]
The editors acknowledge contributions of various kinds from members of the MWI BPWG Content Transformation Task Force.
The editors acknowledge significant written contributions from: