/*]]>*/
Copyright © 2008 W3C ® ( MIT , ERCIM , Keio ), All Rights Reserved. W3C liability , trademark and document use rules apply.
This document is the Guidelines referred to in the Charter of the W3C Mobile Web Initiative Best Practices Working Group Content Transformation Task Force .
Its purpose is to provide guidance to implementors of components of the delivery context as to how to communicate their intentions and capabilities in respect of content transformation.
This document is an editors' copy that has no official standing.
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.
Publication as a Group Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
This document has been produced by the Content Transformation Task Force of the Mobile Web Best Practices Working Group as part of the Mobile Web Initiative . Please send comments on this document to the Working Group's public email list public-bpwg-ct@w3.org , a publicly archived mailing list .
This document was produced under the 5 February 2004 W3C Patent Policy . W3C maintains a public list of patent disclosures made in connection with this document; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) with respect to this specification must disclose the information in accordance with section 6 of the W3C Patent Policy .
Third Fourth Editor's Draft
See [todo] for a list of to do s.
1 Introduction
1.1 Purpose
1.2 Scope
1.3 Audience
2 Guidelines
2.1 Summary of
Requirements
2.2 Objectives
2.3 Types of
Proxy
2.4 Proxy
States
2.5 Types of
Transformation
2.6 Control by User and Client of the
Behavior of the Proxy
2.7 Control by
Server
2.8 Control by Administrative or Other
Arrangements.
3 Behavior of Components
3.1 Client Origination of
Request
3.2 Proxy Receipt,
Forwarding or response to a Request
3.2.1
Alternative
1
3.2.2
Alternative
2
3.2.3
Proxy Interaction with the
User when Active
3.3 Server Response to
Proxy
3.4 Proxy Receipt and
Forwarding of Response from Server
3.5 Proxy Response to
Client
3.6 Client Action on
Receipt of Response 3.7 Summary of Proposed
Features 4 Use Case Analysis
5 Testing
6 Conformance
A Scope for Future Work
(Non-Normative)
B References (Non-Normative)
C Acknowledgments (Non-Normative)
D To Do (Non-Normative)
From the point of view of this document Content Transformation is the manipulation in various ways, by proxies, of content as delivered by an origin server with a view to making it more suitable for mobile presentation.
The W3C MWI BPWG neither approves nor disapproves of Content Transformation, but recognizes that is being deployed widely across mobile data access networks. The deployments are widely divergent to each other, with many non-standard HTTP implications, and no well-understood means either of identifying the presence of such transforming proxies, nor of controlling their actions.
This document establishes a framework to allow that to happen.
A more extensive discussion of the requirements for these guidelines is discussed in "Content Transformation Landscape" [CT-Landscape] .
Needs more work.
Rev 1e: Note that this is now beyond the
scope we agreed, but there is still stuff here we want to
say.
The purpose of this section is to summarize the communication
requirements of actors (users, user agents,
transforming (transforming
proxies, and origin servers) servers, and to
some extent users) to communicate with each other. The
relevant scenario involving a content transformation proxy is as
follows:
user <---HTTP---> user agent <---HTTP---> one or more content transformation proxies <---HTTP---> origin server
Note:
The interactions of several transformation proxies are encompassed by this document, but only in a rudimentary form.
The needs of these actors are as follows:
The user agent needs to be able to tell the content transformation proxy [@@and the origin server]:
what type of mobile device and what user agent is being used;
what media-type (presentation format e.g. desktop, handheld) the user desires;
that all content transformation should be avoided, or that reformatting is allowed/desired.
The content transformation proxy needs to be able to tell the origin server:
that some degree of content transformation (re-coding and reformatting) can be performed;
that content transformation will be carried out unless instructed not to;
that content is being requested on behalf of something else [@@?? and what that something else is];
that the request headers have been altered (e.g. additional content types inserted).
The origin server needs to be able to tell the content transformation proxy:
that it varies its presentation according to device type and other factors;
that it's permissible or otherwise to perform content transformation of various kinds;
that it has media-specific representations;
that is unable or unwilling to deal with the request in its present form.
The content transformation proxy needs to be able to tell the client browser:
that it has applied transformations of various kinds to the content;
how to access the untransformed representation of the content.
The content transformation proxy needs to be able to interact with the user:
to allow the user to disable its features;
to alert the user to the fact that it has transformed content and to allow access to an untransformed representation of the content.
Rev 1e:
In satisfying these requirements existing HTTP headers and
directives and behaviors must be respected.
However, not all of the features required can be achieved
without respected, and as far as is
practical, suggests no extensions to the behaviors defined in [RFC 2616]. Knowing that many
actors will be unaware encoding
of any HTTP extensions, special consideration needs to go into
making sure that the fall-back behavior - i.e. strict adherence to
HTTP/1.1 - is "safe". For example, if there is no standard way for
a client browser to specify that all content transformation should
be avoided in a request, then we must define a default behavior for
a well-behaved content transformation proxy that receives a request
from such a client. [RFC
2616].
[@@ other principles behind what we are trying to do - e.g. noting Sean's point that there is a wide diversity of different devices that all fall under the simple appellation of "handheld".]
Alteration of HTTP requests and responses is not prohibited by HTTP other than in the circumstances referred to in [HTTP] section 13.5.2. This document describes how the Client and the Destination Server may require conforming transforming proxies not to alter HTTP requests and responses.
HTTP defines two types of proxy: transparent proxies and non-transparent proxies. As discussed in Section 1.3 [HTTP], Terminology:
"A [ Definition : transparent proxy proxy] is a
proxy that does not modify the request or response beyond what is
required for proxy authentication and identification. A [ Definition : non transparent proxy proxy] is a
proxy that modifies the request or response in order to provide
some added service to the user agent, such as group annotation
services, media type transformation, protocol reduction, or
anonymity filtering. Except where either transparent or
non-transparent behavior is explicitly stated, the HTTP proxy
requirements apply to both types of proxies."
This document elaborates the behavior of non-transparent proxies, when used for content transformation in the context discussed in [Content Transformation Landscape] and henceforward referred to as transforming proxies.
Rev 1e: I don't know if this is needed any more
A transforming proxy is viewed as being in one of two states in
respect to a client and a server. In the active [ Definition :active] state it may transform content and
manipulate HTTP headers. In the passive
[ Definition :passive] state it behaves like a transparent proxy and
behaves as though a Cache-Control: no-transform
directive were present on every request and every response, with
the possible exception that - only with the consent of both the
user and the content provider - content which it has been
determined would cause serious mis-operation of the client, such as
causing it to crash, may be minimally transformed to prevent that
mis-operation.
Note:
In practice, the passive state may be achieved by the proxy being by-passed.
Transforming proxies can carry out a wide variety of operations. To carry out an exhaustive survey of those operations and to discuss means of server or client side control of them is beyond the scope of this document. In this document we categorize this rich vocabulary of possible operation into two types:
Alteration of Request Headers;
Alteration of Responses.
Alteration of responses is further sub-categorized into
Restructuring content including rewriting URIs;
Recoding content;
Optimizing content.
Restructuring content is a process whereby the original layout
is altered so that content is added or removed or where the spatial
or navigational relationship of parts of content is altered, e.g.
by linearization or pagination. It includes also rewriting of URIs
to so that
subsequent requests route via the proxy.
Recoding content is a process whereby the layout of the content remains the same, but details of its encoding may be altered. Examples include re-encoding HTML as XHTML, correcting invalid markup in HTML, conversion of images between formats (but not, for example, reducing animations to static images).
Optimizing content means removing redundant white space, re-compressing images (without loss of fidelity), zipping for transfer ...
Rev 1e: This section needs to go other than to comment that the proxy SHOULD offer a level of control to the user.
A transforming proxy as described in this document must offers a level of control to users irrespective of whether their browser is aware of the conventions of this document. It interacts at least in a limited way to turn itself off, and may offer other features via interaction. Interaction of this kind is essential where the users client does not implement the HTTP conventions of this document. The interaction may supplement those HTTP conventions.
In addition to allowing control by means of HTTP conventions and interaction with the user, transforming proxies may also allow control by other means such as administrative arrangements.
A transforming proxy gains knowledge of whether a server permits
alteration of requests and responses in a number of ways. For
requests, by having previously received from the origin server [for
a resource on the path that this request is in scope of] an
indication of what degree of content transformation is permissible.
For responses as a result of the response containing indications as
to the servers intentions - such indications include use of HTTP
conventions, and mobileOK labels.
site labelling.
Rev 1e: All from this section removed, other than the following.
The client may request that the Content-Type and Content-Encoding must not be altered in the response by setting the Cache-Control: no-transform directive.
Rev 1e: Most removed
from this interaction]
section
Irrespective of the presence of the no-transform
directive, the proxy must behave transparently (q.v.) if it detects
that the user agent is not a browser [@@open question as to how it
does that].
If the request contains a no-transform directive
for a resource that has already been served to the client, it
may respond with a cached untransformed copy of
the resource, providing that serving that response is in accordance
with the cache-control directives that the server attached to the
untransformed response.
If the request contains a [@@reload-untransformed] directive the server must not
forward the request to the server and must respond with an
untransformed cached copy of the resource, irrespective of
cache-control directives attached to the resource. [@@this needs
careful consideration but I think is probably OK. A number of
justifications come to mind. A related issue is the reuse by
browsers of images that appear multiple times in a page, but the
images prohibit caching - in practice the same image appears to be
re-used.] If the proxy is unable to respond with the untransformed
resource it must respond with an HTTP 406 response to indicate
this. If the request contains a Cache-Control: no-transform
directive [@@or any of the other directives
specified in previous section] the proxy
must forward the request unaltered to the server,
other than to comply with transparent HTTP behavior and in particular to
add a Via HTTP header.
Rev 1e: The USD1M question is, do we specify what the following looks like?
The proxy must (in accordance with compliance
to RFC 2616) include a Via HTTP header indicating its
presence, and must indicate its capabilities
using the [@@transforming-proxy-capability] mechanism.
presence.
If there are is no [@@ transformation
related directives] no-transform directive present in the request from the client, and there is no indication from a
downstream proxy that it intends to transform [@@ see I will
transform below] the proxy should analyze
whether it intends to offer transformation services by referring to
any administrative arrangements that are in place with the user of
the client, or the server, and any a priori knowledge it has of
client capabilities [@@ from a DDR and so on]. Knowing that the
client has available a linearization or zoom capability and/or
supports a broad range of content formats the proxy should
not offer to recode
content.
If as a result of this deliberation it intends to (restructure / reformat / compress) the proxy must indicate this by including a [@@@ I will transform (restructure / reformat / compress)] - [@@ and even if it doesn't it may indicate its potential for restructuring or recoding or compressing content [@@by means of ...].
Proxies must not intervene in HTTPS requests and should not intervene in methods other than [@@we have an open question here as to which methods are applicable].
A proxy should not alter HTTP requests unless not doing so would result in the users request being rejected by the origin server (this includes HTTP 406 status as well as HTTP 200 status, saying that the request cannot be handled - e.g. "Your browser is not supported").
Rev 1e: The following section needs more discussion before I can edit it
When altering the Accept HTTP header, the proxy should indicate any formats that it intends to recode for delivery by assigning a lower q factor (indicated by the q parameter) than those natively supported and should, in addition,[@@extension] add a further transform parameter indicating that the format is not natively supported by the client.
e.g. Accept: image/jpeg, image/gif, image/png;q=0.7;[@@transform]
When altering the User-Agent HTTP Header the proxy must indicate this change by adding a [@@ User Agent Modified indication with the Original User-Agent indicated]
If other HTTP header fields are altered then the proxy must be prepared to re-issue the request as received from the client on receipt of a Vary header in the response indicating that the server offers variants of its presentation according to any of the HTTP header fields that have been modified.
When altering the Accept HTTP header, the proxy should indicate any formats that it intends to recode for delivery by assigning a lower q factor (indicated by the q parameter) than those natively supported.
e.g. Accept: image/jpeg, image/gif, image/png;q=0.7
If other HTTP header fields are altered then the proxy must be prepared to re-issue the request as received from the client on receipt of a Vary header in the response indicating that the server offers variants of its presentation according to any of the HTTP header fields that have been modified.
Rev 1e: All the following to go?
If neither a no-transform nor a [@@ct-aware]
directive is present in a request from the client, the proxy
must offer a means of interacting with it to
provide at least basic control (the ability to transition the proxy
to passive role) of its services. Proxies may in
addition offer user interaction when those directives are present -
i.e. interaction with the user is not prohibited if the client is
aware of the conventions of this document, unless the client has
requested disabling them.
Proxies must offer, via this interaction, a means of retrieving the untransformed copy of the resource [@@need to be careful what we mean here, this assumes that the proxy has stuck an interaction bar above/below the content and that this is meaningful - i.e. that there is some notion of "current" resource.]. It must also allow itself to be rendered passive for the current domain and for all future requests [2.2].
The proxy may offer features via this interaction which go beyond transitioning to passive operation, these should be offered on the same basis (current domain, all future requests, etc.).
If content has been transformed proxies must indicate this to the user via this interaction and should provide a means of retrieving the original content. Proxies should provide a cached untransformed copy of the response in fulfilling this requirement. [@@see section 4.1]
The following para is probably unnecessary
Servers should distinguish URIs that are intended for access only by HTTPS from those that are intended for insecure access in order to be able to detect and reject requests that should have been submitted by HTTPS but have been re-written in contravention of its directives.
If the server varies its presentation according to examination of received HTTP Headers then it must include a Vary HTTP header indicating this to be the case. If, in addition to, or instead of HTTP headers, the server varies its presentation on other factors (source IP Address ...) then it must include a * as one of the fields in the Vary response.
The server must include a no-transform directive if one is received from the client. If it is capable of varying its presentation it should take account of client capabilities [@@as derived from a DDR etc.] and formulate an appropriate experience according to those criteria.
If the server has distinct presentations according to its
perception of the presentation media, then the medium for which the
presentation is intended should be indicated
[@@using the ...] If the client has requested
a specific presentation using the [@@ directive] the server should
provide a presentation of that kind. e.g. if the server would
ordinarily provide a handheld experience but the client requests a
screen experience the screen experience should This is going to be
provided. And vice versa, of course.
something like the link headers, I
think
If the server creates a specific user experience for certain
presentation media types it should inhibit
transformation of the response by including a no-transform
directive. The server should not prohibit
recoding or compression of its content unless it has specific
reasons not to allow it [including that this has been requested by
the client] and hence should in general add a [@@allow-recoding or
allow-compression] directive when adding a no-transform directive.
If the response contains URIs with the scheme https and the server
is content to allow the scheme to be re-written as the http scheme
then it must indicate this using the [@@allow-https-rewrite]
directive, otherwise rewriting is inhibited.
Note that including a no-transform directive may [@@
should actually] disrupt the behavior of WAP/WML
proxies, because this inhibits such proxies from converting WML to
WMLC (because this is a content-encoding behavior). Adding [@@allow-recoding] or [@@allow-compression] is
unlikely to be recognized in the short-term by such proxies which
predate these guidelines.
Servers may base their actions on a priori knowledge of behavior of transforming proxies, when they are identified in a Via header.
The server should not choose a Content-Type for its response based on its assumptions about the heuristic behavior of any intermediaries. (e.g. it should not choose content-type: application/vnd.wap.xhtml+xml solely on the basis that it suspects that transforming proxies will apply heuristics that make them not restructure it).
If servers provide only limited variants
of presentation they should consider providing a rich presentation
and allowing a transforming proxy to reduce this - which may result
in a richer experience for the user than providing a basic handheld
experience only, say. 406 Response - Note that some clients (MSIE
for instance) don't display the body of a 406 response, this is in
contravention of HTTP/1.1 as far as I can see. [@@ Vary headers in 406 response - restrict to the
one(s) that have caused the 406. In general,
successful responses should are done with 200 OK Vary: User-Agent,
Accept, Accept-Language etc. e.g. MS doesn't want you to do updates
except with IE. so they should say 406 Vary: User-Agent (but note
that IE doesn't display the body of 406 responses)
406.??]
Servers should respond with a 406 not a 200 if they can't handle
the request and should indicate that they
permit header alteration in that 406. request. Servers should provide information about
alternative representations by using the Vary header (if the
alternatives are available from the same URI) or using link
information if alternative representations are handled by different
URIs. [This restricts to HTML for now. If link headers a reinstated
in HTTP then this becomes a more universal mechanism. Open question
as to whether it SVG or WICD etc. support any such notion]
[@@300 Response - could this be used as a signal from the server to say that it understands the protocol? A la RFC 2295]
If the proxy has altered any of the HTTP request headers, and it receives a Vary response from the server it should re-make the request with the original headers and forward the subsequent response without restructuring it, irrespective of the contents of the subsequent response. The proxy should take note of this and should not vary headers for subsequent requests, unless requests are subsequently received with no Vary header [@@ + note on back off below]
[@@note that loop detection and elimination is needed here]
If the response includes a Warning: 214 Transformation Applied the proxy must not apply further transformation.
If the response includes a Cache-Control: no-transform directive
that is not modified by [@@ other directives
on recoding] then the response must be
forwarded to the client unaltered other than in the respects noted
for transparent operation of HTTP proxies as
specified in RFC2616, and in particular the addition of a
Via HTTP header [@@which
includes, or is in addition to a [@@transforming-proxy-capability]
...]. header.
In the absence of a Vary or no-transform directive the proxy should apply heuristics to the content to determine whether it is appropriate to restructure or recode it (in the presence of such directives, heuristics should not be used.)
The server has previously shown that it is contextually aware, even if the present response does not indicate this - modified by a need for the proxy to be aware that the server has changed its behavior and is no longer aware in that way
the content-type is known to be specific to the device or class of device e.g. application/vnd.wap.xhtml+xml
examination of the content reveals that it is of a specific type appropriate to the device or class of device e.g. DOCTYPE XHTML-MP or WBMP or [@@mobile video] [@@ note Sean's extensive list of heuristics that should be included as an informative example?]
The response is an HTML response and it includes <link> elements specifying alternatives according to media type [or that such links are included as HTTP headers] or that the content has a mobileOK label.
If the proxy alters the content then it must add a Warning: 214 Transformation Applied HTTP Header. [@@ should this be elaborated to say what kind of transformation?]
If the response contains URIs with the
scheme https
the proxy must not rewrite
them unless [@@er actually this should be a discussion of
intercepting links that were https and either a) informing the user
of them being now insecure, or alternatively that content
transformation is not going to be applied so they may get
garbage].
If the proxy has transformed (reformatted) the content but not rewritten https links it should annotate those links to indicate that transformation service is not available on them.
A proxy should strive for the best possible user experience that the client supports. It should only alter the format, layout, dimensions etc. to match the specific capabilities of the client. For example, when resizing images, they should only be reduced so that they are suitable for the specific client, and this should not be done on a generic basis.
Rev 1e: Did we say we would remain silent on this?
In the passive mode (as well as in the active mode), if the proxy determines that the resource as currently represented is likely to cause serious mis-operation of the client then the proxy may transform the resource but only sufficiently to alter the specific aspect of the content that is likely to cause mis-operation. Proxies must not exhibit this behavior unless this has been specifically allowed by both the server and the user. [@@ either by persistent registration of preferences, or by use of the [@@correct dangerous content] directive.]
All ... must be tested for deleterious effects ... [@@TBD]
Providers of transforming proxies should make available interfaces that facilitate testing of Web sites accessed through them. [@@ though how they should make known how to do this and what administrative arrangements would be needed are both probably out of scope]
The editors acknowledge contributions of various kinds from members of the MWI BPWG Content Transformation Task Force .
The editor acknowledges significant written contributions from:
Work needed on this draft:
There could be a note that the host shou should provide
interactions that allow the user to have a choice of presentations
and so should the proxy and the client, for that matter.
Another as yet unopened Pandora's box is that the discussion and proposed text looks at the issues primarily from the point of view of "varying presentation from Thematically consistent URIs". What hasn't, as yet, been explored is how it all works if there is a common entry point to a site (Thematically consistent URI for a home page) which then dispatches via redirect to media specific versions. This is possibly rather more common than the previous case (e.g. redirect to example.com/mobile - or rather better, imo, example.mobi). Naturally, there will also be varying presentation even within a redirected solution. This whole area needs further thought.
We need a discussion as to what extent we should be drawing up an RFC to do what we want to do. On the one hand HTTP makes it clear, in explaining how to introduce extensions that it expects such extensions to be introduced. On the other hand, we do typically take a conservative approach and say if it is not in the IANA registry then it's not an existing protocol and therefore beyond our scope. Introducing extensions to existing header values, to my mind falls short of introducing new headers. Though it's not clear that we can do what we need to if we don't do that, go through IANA registration and