Content Transformation Guidelines 1c

Group Working Draft 18 January 2008

This version:
http://www.w3.org/2005/MWI/BPWG/Group/TaskForces/CT/editors-drafts/Guidelines/080118
Latest version:
http://www.w3.org/2005/MWI/BPWG/Group/TaskForces/CT/editors-drafts/Guidelines/latest
Previous version:
http://www.w3.org/2005/MWI/BPWG/Group/TaskForces/CT/editors-drafts/Guidelines/071124
Editor:
Jo Rabin, dotMobi

Abstract

This document is the Guidelines referred to in the Charter of the W3C Mobile Web Initiative Best Practices Working Group Content Transformation Task Force.

Its purpose is to provide guidance to implementors of components of the delivery context as to how to communicate their intentions and capabilities in respect of content transformation.

Status of this Document

This document is an editors' copy that has no official standing.

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

Publication as a Group Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

This document has been produced by the Content Transformation Task Force of the Mobile Web Best Practices Working Group as part of the Mobile Web Initiative . Please send comments on this document to the Working Group's public email list public-bpwg-ct@w3.org, a publicly archived mailing list .

This document was produced under the 5 February 2004 W3C Patent Policy . W3C maintains a public list of patent disclosures made in connection with this document; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) with respect to this specification must disclose the information in accordance with section 6 of the W3C Patent Policy.

Revision Description

Second Editor's Draft

See [todo] for a list of to do s.

Table of Contents

1 Introduction
    1.1 Purpose
    1.2 Scope
    1.3 Audience
2 Guidelines
    2.1 Requirements
    2.2 Objectives
    2.3 Types of Proxy
    2.4 Proxy States
    2.5 Types of Transformation
    2.6 Control by User and Client
    2.7 Control by Server
3 Behavior of Components
    3.1 Client Origination of Request
    3.2 Proxy Receipt, Forwarding or response to a Request
        3.2.1 Alternative 1
        3.2.2 Alternative 2
        3.2.3 Proxy Interaction with the User when Active
    3.3 Server Response to Proxy
    3.4 Proxy Receipt and Forwarding of Response from Server
    3.5 Proxy Response to Client
    3.6 Client Action on Receipt of Response
    3.7 Encoding of [@@new] Features
4 Use Case Analysis
5 Testing
6 Conformance

Appendices

A Scope for Future Work (Non-Normative)
B References (Non-Normative)
C Acknowledgments (Non-Normative)
D (Non-Normative)


1 Introduction

2 Guidelines

2.1 Requirements

I'm not actually sure how useful this is any more, I think it may have served its purpose in stimulating debate and thought.

The purpose of this section is to explore the need for actors (clients, proxy servers, gateways, origin servers, etc) to communicate with each other, and also suggest guidelines for doing so. The relevant scenario involving a content transformation proxy is as follows:

There may be other scenarios as well but they will initially be ignored for the sake of simplicity. The needs of these three actors are as follows:

  1. The client browser needs to be able to tell the content transformation proxy:

    1. what media-type (presentation format e.g. desktop, handheld) is desired.

    2. that all content transformation should be avoided, or that reformatting is allowed/desired

    3. what type of mobile device and what user agent is being used

    4. that the device has (zoom, linearize, keyhole) presentation [@@??]

  2. The content transformation proxy needs to be able to tell the origin server:

    1. that some degree of content transformation (re-coding and reformatting) can be performed

    2. Content transformation will be carried out unless instructed not to.

    3. that content is being requested on behalf of something else.

    4. about the delivery context (for example mobile device type and user agent).

    5. That the request headers have been altered (e.g. additional content types inserted) [??]

  3. The origin server needs to be able to tell the content transformation proxy:

    1. that content is already optimized and no additional transformation is required (or that it should not be restructured by may be recoded]

    2. that it's OK to perform additional content transformation.[??]

    3. That it varies its presentation

    4. That it has media-specific presentations

    5. I can't/don't wish to handle this request in its present form

    6. That request headers should/should not be modified

  4. The content transformation proxy needs to be able to tell the client browser:

    1. the status of the content: it is reformatted/recoded/untouched;

    2. where to find the original content if it has been transformed. [@@ should this read "how", or do we suppose that there are "magic" mechanisms/URIs for by-passing proxies?]

2.4 Proxy States

A transforming proxy is viewed as being in one of two states in respect to a client and a server. In the active state it may transform content and manipulate HTTP headers. In the passive state it behaves like a transparent proxy and behaves as though a Cache-Control: no-transform directive were present on every request and every response, with the possible exception that - only with the consent of both the user and the content provider - content which it has been determined would cause serious mis-operation of the client, such as causing it to crash, may be minimally transformed to prevent that mis-operation.

This must steer clear of recommending a deviation from the HTTP spec which I don't think is acceptable. However there are parallels with the operation of e.g. child protection mechanisms?

Note:

In practice, the passive state may be achieved by the proxy being by-passed.

3 Behavior of Components

3.1 Client Origination of Request

The client may request that the Content-Type and Content-Encoding must not be altered in the response by setting the Cache-Control: no-transform directive.

The client may add an [@@allow-recode directive] to the Cache-Control: no-transform directive, indicating that the proxy may change the format of the response but not restructure the content.

The client may add a [@@allow-https-rewrite=yes] parameter to the [@@allow-recode] directive to allow the rewriting of the https scheme portion of any URIs - the default value is "no" in the absence of this parameter or in the absence of the [@@allow-recode directive].

The client may add an [@@allow-compress] to the Cache-Control: no-transform directive, meaning that a proxy may remove redundant white space, re-compress images or change the Content-Encoding (to use gzip, from identity, for example).

The client may also add [@@preferred-medium directive] indicating that a preference for a presentation style. The [@@preferred-medium directive] has the form media=presentation-format (as described in RFC ..., current values of the presentation format-directive are taken from IANA ... and include "screen" and "handheld").

In the absence any of the previous directives elaborating the no-transform directive, the client should indicate that it understands the conventions of this document by including a [@@ct-proxy-aware directive].

[It would be nice if the client were able to indicate what type of presentational capabilities it has, for example, zoom, linearize, keyhole ... @@@ client-feature indication]

[It would also be useful if the client were able to turn off the User Interaction features of the proxy via HTTP]

3.2 Proxy Receipt, Forwarding or response to a Request

If the incoming request contains neither a no-transform nor an [@@allow-transform] directive and the proxy intends to enter the active mode, the proxy must provide the ability for the user to interact with it in some way in order to instruct it to enter the passive mode. Whether the proxy does this by responding to the incoming request, decorating the response or by some other means is beyond the scope of this document.[@@see 4.2.3 below for requirements of this interaction]

Irrespective of the presence of the no-transform directive, the proxy must behave transparently (q.v.) if it detects that the user agent is not a browser [@@open question as to how it does that].

If the request contains a no-transform directive for a resource that has already been served to the client, it may respond with a cached untransformed copy of the resource, providing that serving that response is in accordance with the cache-control directives that the server attached to the untransformed response.

If the request contains a [@@reload-untransformed] directive the server must not forward the request to the server and must respond with an untransformed cached copy of the resource, irrespective of cache-control directives attached to the resource. [@@this needs careful consideration but I think is probably OK. A number of justifications come to mind. A related issue is the reuse by browsers of images that appear multiple times in a page, but the images prohibit caching - in practice the same image appears to be re-used.] If the proxy is unable to respond with the untransformed resource it must respond with an HTTP 406 response to indicate this.

If the request contains a Cache-Control: no-transform directive [@@or any of the other directives specified in previous section] the proxy must forward the request unaltered to the server, other than to comply with transparent HTTP behavior and in particular to add a Via HTTP header.

The proxy must (in accordance with compliance to RFC 2616) include a Via HTTP header indicating its presence, and must indicate its capabilities using the [@@transforming-proxy-capability] mechanism.

If there are no [@@ transformation related directives] present in the request from the client, and there is no indication from a downstream proxy that it intends to transform [@@ see I will transform below] the proxy should analyze whether it intends to offer transformation services by referring to any administrative arrangements that are in place with the user of the client, or the server, and any a priori knowledge it has of client capabilities [@@ from a DDR and so on]. Knowing that the client has available a linearization or zoom capability and/or supports a broad range of content formats the proxy should not offer to recode content.

If as a result of this deliberation it intends to (restructure / reformat / compress) the proxy must indicate this by including a [@@@ I will transform (restructure / reformat / compress)] - [@@ and even if it doesn't it may indicate its potential for restructuring or recoding or compressing content [@@by means of ...].

Proxies must not intervene in HTTPS requests and should not intervene in methods other than GET and HEAD.

A proxy should not alter HTTP requests unless not doing so would result in the users request being rejected by the origin server (this includes HTTP 406 status as well as HTTP 200 status, saying that the request cannot be handled - e.g. "Your browser is not supported").

3.3 Server Response to Proxy

Servers should distinguish URIs that are intended for access only by HTTPS from those that are intended for insecure access in order to be able to detect and reject requests that should have been submitted by HTTPS but have been re-written in contravention of its directives.

If the server varies its presentation according to examination of received HTTP Headers then it must include a Vary HTTP header indicating this to be the case. If, in addition to, or instead of HTTP headers, the server varies its presentation on other factors (source IP Address ...) then it must include a * as one of the fields in the Vary response.

The server must include a no-transform directive if one is received from the client. If it is capable of varying its presentation it should take account of client capabilities [@@as derived from a DDR etc.] and formulate an appropriate experience according to those criteria.

If the server has distinct presentations according to its perception of the presentation media, then the medium for which the presentation is intended should be indicated [@@using the ...]

If the client has requested a specific presentation using the [@@ directive] the server should provide a presentation of that kind. e.g. if the server would ordinarily provide a handheld experience but the client requests a screen experience the screen experience should be provided. And vice versa, of course.

If the server creates a specific user experience for certain presentation media types it should inhibit transformation of the response by including a no-transform directive. The server should not prohibit recoding or compression of its content unless it has specific reasons not to allow it [including that this has been requested by the client] and hence should in general add a [@@allow-recoding or allow-compression] directive when adding a no-transform directive.

If the response contains URIs with the scheme https and the server is content to allow the scheme to be re-written as the http scheme then it must indicate this using the [@@allow-https-rewrite] directive, otherwise rewriting is inhibited.

Note that including a no-transform directive may [@@should actually] disrupt the behavior of WAP/WML proxies, because this inhibits such proxies from converting WML to WMLC (because this is a content-encoding behavior). Adding [@@allow-recoding] or [@@allow-compression] is unlikely to be recognized in the short-term by such proxies which predate these guidelines.

Servers may base their actions on a priori knowledge of behavior of transforming proxies, when they are identified in a Via header.

The server should not choose a Content-Type for its response based on its assumptions about the heuristic behavior of any intermediaries. (e.g. it should not choose content-type: application/vnd.wap.xhtml+xml solely on the basis that it suspects that transforming proxies will apply heuristics that make them not restructure it).

If servers provide only limited variants of presentation they should consider providing a rich presentation and allowing a transforming proxy to reduce this - which may result in a richer experience for the user than providing a basic handheld experience only, say.

406 Response - Note that some clients (MSIE for instance) don't display the body of a 406 response, this is in contravention of HTTP/1.1 as far as I can see.

Vary headers in 406 response - restrict to the one(s) that have caused the 406.

In general, successful responses should are done with 200 OK Vary: User-Agent, Accept, Accept-Language etc. e.g. MS doesn't want you to do updates except with IE. so they should say 406 Vary: User-Agent (but note that IE doesn't display the body of 406 responses)

Servers should respond with a 406 not a 200 if they can't handle the request and should indicate that they permit header alteration in that 406. Servers should provide information about alternative representations by using the Vary header (if the alternatives are available from the same URI) or using link information if alternative representations are handled by different URIs. [This restricts to HTML for now. If link headers a reinstated in HTTP then this becomes a more universal mechanism. Open question as to whether it SVG or WICD etc. support any such notion]

[@@300 Response - could this be used as a signal from the server to say that it understands the protocol? A la RFC 2295]

3.5 Proxy Response to Client

If the response includes a Warning: 214 Transformation Applied the proxy must not apply further transformation.

If the response includes a Cache-Control: no-transform directive that is not modified by [@@ other directives on recoding] then the response must be forwarded to the client unaltered other than in the respects noted for transparent operation of HTTP proxies as specified in RFC2616, and in particular the addition of a Via HTTP header [@@which includes, or is in addition to a [@@transforming-proxy-capability] ...].

In the absence of a Vary or no-transform directive the proxy should apply heuristics to the content to determine whether it is appropriate to restructure or recode it (in the presence of such directives, heuristics should not be used.)

  • The server has previously shown that it is contextually aware, even if the present response does not indicate this - modified by a need for the proxy to be aware that the server has changed its behavior and is no longer aware in that way

  • the content-type is known to be specific to the device or class of device e.g. application/vnd.wap.xhtml+xml

  • examination of the content reveals that it is of a specific type appropriate to the device or class of device e.g. DOCTYPE XHTML-MP or WBMP or [@@mobile video] [@@ note Sean's extensive list of heuristics that should be included as an informative example?]

  • The response is an HTML response and it includes <link> elements specifying alternatives according to media type [or that such links are included as HTTP headers] or that the content has a mobileOK label.

If the proxy alters the content then it must add a Warning: 214 Transformation Applied HTTP Header. [@@ should this be elaborated to say what kind of transformation?]

If the proxy has transformed (reformatted) the content but not rewritten https links it should annotate those links to indicate that transformation service is not available on them.

A proxy should strive for the best possible user experience that the client supports. It should only alter the format, layout, dimensions etc. to match the specific capabilities of the client. For example, when resizing images, they should only be reduced so that they are suitable for the specific client, and this should not be done on a generic basis.

In the passive mode (as well as in the active mode), if the proxy determines that the resource as currently represented is likely to cause serious mis-operation of the client then the proxy may transform the resource but only sufficiently to alter the specific aspect of the content that is likely to cause mis-operation. Proxies must not exhibit this behavior unless this has been specifically allowed by both the server and the user. [@@ either by persistent registration of preferences, or by use of the [@@correct dangerous content] directive.]

4 Use Case Analysis

Client Proxy Server

Unaware, Unaware, Unaware etc.

[@@TBD]

5 Testing

All ... must be tested for deleterious effects ... [@@TBD]

Providers of transforming proxies should make available interfaces that facilitate testing of Web sites accessed through them. [@@ though how they should make known how to do this and what administrative arrangements would be needed are both probably out of scope]

6 Conformance

A Scope for Future Work (Non-Normative)

A placeholder for the bits we couldn't do

B References (Non-Normative)

BestPractices
Mobile Web Best Practices 1.0 Basic Guidelines, Jo Rabin, Charles McCathieNevile (eds), W3C Proposed Recommendation, 2 November 2006 (See http://www.w3.org/TR/mobile-bp/.)
CT-Landscape
Content Transformation Landscape 1.0, Jo Rabin, Andrew Swainston (eds), W3C Working Draft 25 October 2007 (See http://www.w3.org/TR/ct-landscape/.)
HTTP
Hypertext Transfer Protocol -- HTTP/1.1 Request for Comments: 2616, R. Fielding, J. Gettys, J. Mogul, H. Frystyk, L. Masinter, P. Leach, T. Berners-Lee, June 1999 (See http://tools.ietf.org/html/rfc2616.)

C Acknowledgments (Non-Normative)

The editors acknowledge contributions of various kinds from members of the MWI BPWG Content Transformation Task Force.

The editor acknowledges significant written contributions from:

D (Non-Normative)

Work needed on this draft:

There could be a note that the host should provide interactions that allow the user to have a choice of presentations and so should the proxy and the client, for that matter.

Another as yet unopened Pandora's box is that the discussion and proposed text looks at the issues primarily from the point of view of "varying presentation from Thematically consistent URIs". What hasn't, as yet, been explored is how it all works if there is a common entry point to a site (Thematically consistent URI for a home page) which then dispatches via redirect to media specific versions. This is possibly rather more common than the previous case (e.g. redirect to example.com/mobile - or rather better, imo, example.mobi). Naturally, there will also be varying presentation even within a redirected solution. This whole area needs further thought.

We need a discussion as to what extent we should be drawing up an RFC to do what we want to do. On the one hand HTTP makes it clear, in explaining how to introduce extensions that it expects such extensions to be introduced. On the other hand, we do typically take a conservative approach and say if it is not in the IANA registry then it's not an existing protocol and therefore beyond our scope. Introducing extensions to existing header values, to my mind falls short of introducing new headers. Though it's not clear that we can do what we need to if we don't do that, go through IANA registration and so on.

We need to discuss what relationship, if any, this has to the following RFCs:

RFC 2295 is experimental, but actually gets to some of the points we want to make, though doesn't exactly address what we are doing. It's rather a lengthy and detailed read, and has a lot of features that we don't need. It does, however, introduce a couple of headers and field values which have been IANA registered. Also, the main points of the negotiation are implemented in Apache in mod_negotiation (see [APACHE]).

[APACHE] http://httpd.apache.org/docs/2.2/content-negotiation.html

IANA registration is probably a bit of a nuisance, and may be something we don't need to do - e.g. it would seem that the q parameter for content type and much else is not registered. For those of you who fancy a bit of train spotting, I think you'll find registered things at [IANA], though I confess I find this all a bit impenetrable and difficult to navigate.

[IANA] http://www.iana.org/numbers.html