Content Transformation Guidelines 1g

Group Working Draft 13 March 2008

This version:
http://www.w3.org/2005/MWI/BPWG/Group/TaskForces/CT/editors-drafts/Guidelines/080313
Latest version:
http://www.w3.org/2005/MWI/BPWG/Group/TaskForces/CT/editors-drafts/Guidelines/latest
Previous version:
http://www.w3.org/2005/MWI/BPWG/Group/TaskForces/CT/editors-drafts/Guidelines/080227
Editor:
Jo Rabin, dotMobi

Abstract

This document is the Guidelines referred to in the Charter of the W3C Mobile Web Initiative Best Practices Working Group Content Transformation Task Force.

Its purpose is to provide guidance to implementors of components of the delivery context as to how to communicate their intentions and capabilities in respect of content transformation.

Status of this Document

This document is an editors' copy that has no official standing.

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

Publication as a Group Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

This document has been produced by the Content Transformation Task Force of the Mobile Web Best Practices Working Group as part of the Mobile Web Initiative . Please send comments on this document to the Working Group's public email list public-bpwg-ct@w3.org, a publicly archived mailing list .

This document was produced under the 5 February 2004 W3C Patent Policy . W3C maintains a public list of patent disclosures made in connection with this document; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) with respect to this specification must disclose the information in accordance with section 6 of the W3C Patent Policy.

Revision Description

Fifth Editor's Draft

See [todo] for a list of to do s.

Table of Contents

1 Introduction
    1.1 Purpose
    1.2 Scope
    1.3 Audience
2 Guidelines
    2.1 Summary of Requirements
    2.2 Objectives
    2.3 Types of Proxy
    2.4 Types of Transformation
    2.5 Control of the Behavior of the Proxy
        2.5.1 Control by the User
        2.5.2 Control by Server
        2.5.3 Control by Administrative or Other Arrangements.
3 Behavior of Components
    3.1 Proxy Treatment of Request
        3.1.1 no-transform directive in Request
        3.1.2 Proxy Decision to Transform
        3.1.3 Proxy Indication of its Presence to Server
        3.1.4 Altering Header Values
    3.2 Server Response to Proxy
    3.3 Proxy Receipt and Forwarding of Response from Server
    3.4 Proxy Response to User Agent
4 Use Case Analysis
5 Testing
6 Conformance

Appendices

A References
B Scope for Future Work (Non-Normative)
C Acknowledgments (Non-Normative)
D To Do (Non-Normative)


1 Introduction

1.2 Scope

Rev 1g: New in this rev

The recommendations in this document refer only to "Web browsing" - i.e. access by user agents that are intended primarily for access to HTML based Web pages (Web browsers).

The BPWG is not chartered to create new technology, its role is to advise on best practice for use of existing technology. It is therefore not in scope to propose the creation of new HTTP headers. Thus although interactions between user agents and content transformation proxies might benefit from an enhanced repertoire of HTTP headers this document does not define any such headers, and discusses in only a limited way such interactions using existing headers.

2 Guidelines

2.1 Summary of Requirements

Rev 1g: This requires revision before publication.

The purpose of this section is to summarize the communication requirements of actors (transforming proxies, origin servers, and to some extent users) to communicate with each other. The relevant scenario involving a content transformation proxy is as follows:

The needs of these actors are as follows:

  1. The user agent needs to be able to tell the content transformation proxy [@@and the origin server]:

    1. what type of mobile device and what user agent is being used;

    2. what media-type (presentation format e.g. desktop, handheld) the user desires;

    3. that all content transformation should be avoided, or that reformatting is allowed/desired.

  2. The content transformation proxy needs to be able to tell the origin server:

    1. that some degree of content transformation (re-coding and reformatting) can be performed;

    2. that content transformation will be carried out unless instructed not to;

    3. that content is being requested on behalf of something else [@@?? and what that something else is];

    4. that the request headers have been altered (e.g. additional content types inserted).

  3. The origin server needs to be able to tell the content transformation proxy:

    1. that it varies its presentation according to device type and other factors;

    2. that it's permissible or otherwise to perform content transformation of various kinds;

    3. that it has media-specific representations;

    4. that is unable or unwilling to deal with the request in its present form.

  4. The content transformation proxy needs to be able to tell the client browser:

    1. that it has applied transformations of various kinds to the content;

    2. how to access the untransformed representation of the content.

  5. The content transformation proxy needs to be able to interact with the user:

    1. to allow the user to disable its features;

    2. to alert the user to the fact that it has transformed content and to allow access to an untransformed representation of the content.

Rev 1g: Moved from the following section

In satisfying content transformation requirements existing HTTP headers, directives and behaviors must be respected, and as far as is practical, no extensions to HTTP [HTTP] are to be used.

2.2 Objectives

Rev 1g The overall objective of this document is to provide a means, as far as is practical, for users to be provided with at least a [Device Independence Glossary] "functional user experience" of the Web, taking into account the fact that an increasing number of content providers create experiences specially tailored to the mobile context which they do not wish to be altered by third parties. Equally it takes into account the fact that there remain a very large number of Web sites that do not provide a functional user experience when perceived on many mobile devices.

Rev 1g The document describes how the origin server may request conforming transforming proxies not to alter HTTP requests and responses, as well as describing control options that a transforming proxy should offer users.

Rev 1g There is the possibility for conflict between the desires regarding content transformation of the content provider and the desires of users of that content. This document sets out to provide a framework within which, for matters of presentation, the desires of the content provider are usually accommodated but, where necessary, the user may override those desires with their preferences.

Rev 1g: Is this where the output of Sean Patterson's ACTION-678 might reasonably go? [What is the difference between a CT proxy and a Opera Mini?]

Rev 1g: Can the following go now?

[@@ other principles behind what we are trying to do - e.g. noting Sean's point that there is a wide diversity of different devices that all fall under the simple appellation of "handheld".]

2.3 Types of Proxy

Alteration of HTTP requests and responses is not prohibited by HTTP other than in the circumstances referred to in [HTTP] Section 13.5.2.

HTTP defines two types of proxy: transparent proxies and non-transparent proxies. As discussed in [HTTP] Section 1.3, Terminology:

"A [Definition: transparent proxy] is a proxy that does not modify the request or response beyond what is required for proxy authentication and identification. A [Definition: non transparent proxy] is a proxy that modifies the request or response in order to provide some added service to the user agent, such as group annotation services, media type transformation, protocol reduction, or anonymity filtering. Except where either transparent or non-transparent behavior is explicitly stated, the HTTP proxy requirements apply to both types of proxies."

This document elaborates the behavior of non-transparent proxies, when used for content transformation in the context discussed in [CT-Landscape].

3 Behavior of Components

3.1 Proxy Treatment of Request

3.1.1 no-transform directive in Request

If the request contains a Cache-Control: no-transform directive the proxy must forward the request unaltered to the server, other than to comply with transparent HTTP behavior and in particular to add a Via HTTP header.

Rev 1g: Bryan has ACTION-632 to "propose some recommendation on user agent detection"..."

Rev 1g: Martin has ACTION-679 to rewrite this to make it firmer along the lines of "Unless the proxy can be sure that the user agent is a browser ..."

Irrespective of the presence of the no-transform directive, the proxy must behave transparently (q.v.) if it detects that the user agent is not a browser.

3.1.2 Proxy Decision to Transform

If there is no no-transform directive present in the request the proxy should analyze whether it intends to offer transformation services by referring to:

Proxies should not alter HTTP requests unless the unaltered headers would result in the user's request being rejected by the origin server and unless the user has specifically requested a restructured version of a desktop oriented experience.

Rev 1g: We noted at the F2F that devices issue HEAD requests in some circumstances and Martin has ACTION-682 to write a proposal on how to treat them. Also Rob noted at the F2F that Openwave tokenizes URIs - I'm uncertain whether this means the list of methods here needs discussion, or whether this has an effect further on in the document. Finally, I am wondering if we need separate sections for each of the methods

Proxies should not intervene in methods other than GET, POST, HEAD and PUT.

Proxies should not alter request bodies Rev 1g: Rob's ACTION-680 (discussion of forms that have been broken apart) and Francois's ACTION-681 (clarify with Aaron his point about character encoding) both need to be taken into account here..

Knowing that the browser has available a linearization or zoom capability and/or supports a broad range of content formats the proxy should not Rev 1g: [??] restructure or recode content.

Rev 1f: Reflects uncertain outcome of discussion 2008-02-26

If, as a result of carrying out this analysis the proxy remains unaware of the servers preferences and capabilities it should:

Rev 1g: We dropped HEAD request but Francois has ACTION-710 (trackbot didn't actually pick this one up, so I raised it just now) to see what effect HEAD has

Rev 1g: Also I said something about pushing this up to the TAG as an issue, which we might choose to do.

Rev 1g: Further on this, Rotan (observing by IRC) raised the question of whether OPTIONS could be used - I infer he meant "can it be used in place of the HEAD?". Seems like a good idea, but to what extent is this "new technology" and to what extent is it feasible to implement it for content providers

Rev 1g: Also we need something in here about the notion of "session" that Rob raised, i.e. if you have started transforming in a particular way it's important to continue to do so. I think we need to be clearer as to the scope of this and whether it overrides anything else we are saying.

Note:

I think a note here on the supposed idempotency of GET is needed, reflecting Rob's comment that there are circumstances in which a POST can't be generated even though you'd like to, e.g. confirmation from email

Rev 1g: Also a note discouraging the issuing of two GETs for the purpose of comparison of responses

3.2 Server Response to Proxy

Rev 1f: what are we saying the server should do with the POWDER received from the proxy?

If the server varies its presentation according to examination of received HTTP Headers then it must include a Vary HTTP header indicating this to be the case. If, in addition to, or instead of HTTP headers, the server varies its presentation on other factors (source IP Address ...) then it must include a * as one of the fields in the Vary response.

Rev 1f

The server should indicate, using [@@ref POWDER] and the vocabulary defined in Appendix [@@] its capabilities and its defaults in respect of content it serves. [@@ note that you should be able to get all the info for a server by following links from a single resource]

The server must include a no-transform directive if one is received from the user agent. If it is capable of varying its presentation it should take account of user agent capabilities [@@as derived from a DDR etc.] and formulate an appropriate experience according to those criteria.

If the server has distinct presentations according to its perception of the presentation media, then the medium for which the presentation is intended should be indicated [@@using the ...] This is going to be something like the link headers, I think

If the server creates a specific user experience for certain presentation media types it should inhibit transformation of the response by including a no-transform directive.

Servers may base their actions on a priori knowledge of behavior of transforming proxies, when they are identified in a Via header.

The server should not choose a Content-Type for its response based on its assumptions about the heuristic behavior of any intermediaries. (e.g. it should not choose content-type: application/vnd.wap.xhtml+xml solely on the basis that it suspects that transforming proxies will apply heuristics that make them not restructure it).

[@@ Vary headers in 406 response - restrict to the one(s) that have caused the 406.??]

Servers should respond with a 406 not a 200 if they can't handle the request. Servers should provide information about alternative representations by using the Vary header (if the alternatives are available from the same URI) or using link information if alternative representations are handled by different URIs. [This restricts to HTML for now. If link headers a reinstated in HTTP then this becomes a more universal mechanism. Open question as to whether it SVG or WICD etc. support any such notion]

[@@300 Response - could this be used as a signal from the server to say that it understands the protocol? A la RFC 2295]

3.4 Proxy Response to User Agent

If the response includes a Warning: 214 Transformation Applied the proxy must not apply further transformation.

If the response includes a Cache-Control: no-transform directive then the response must be forwarded to the user agent unaltered other than in the respects noted for transparent operation of HTTP proxies as specified in RFC2616, and in particular the addition of a Via HTTP header.

Rev 1f: Follows from saying the server should label itself

If the response contains references to one or more Web Description Resource indicating server preferences for the treatment of its content the proxy should retrieve those WDRs, analyze them as to how they pertain to this resource, and act in accordance with the server's expressed preferences. Proxies should retain such WDRs for future reference in accordance with the policies, if any, described in those WDRs.

In the absence of a Vary or no-transform directive or of a WDR the proxy should apply heuristics to the content to determine whether it is appropriate to restructure or recode it (in the presence of such directives, heuristics should not be used.)

  • The server has previously shown that it is contextually aware, even if the present response does not indicate this - modified by a need for the proxy to be aware that the server has changed its behavior and is no longer aware in that way

  • the content-type is known to be specific to the device or class of device e.g. application/vnd.wap.xhtml+xml

  • examination of the content reveals that it is of a specific type appropriate to the device or class of device e.g. DOCTYPE XHTML-MP or WBMP or [@@mobile video] [@@ note Sean's extensive list of heuristics that should be included as an informative example?]

  • The response is an HTML response and it includes <link> elements specifying alternatives according to media type [or that such links are included as HTTP headers] or that the content has a mobileOK label.

If the proxy alters the content then it must add a Warning: 214 Transformation Applied HTTP Header. [@@ should this be elaborated to say what kind of transformation?]

Rev 1g: The following is an adaptation (sic) of Andrew's ACTION-633

If the response contains links whose URIs have the scheme https the proxy may rewrite them so that it can transform the content. If the proxy rewrites such links, it must advise the end user of the security implications of doing so and must provide the option to avoid transformation of the resources the links refer to.

If the proxy has transformed (reformatted) the content but not rewritten https links, it should annotate those links to indicate that transformation service is not available on them.

A proxy should strive for the best possible user experience that the user agent supports. It should only alter the format, layout, dimensions etc. to match the specific capabilities of the user agent. For example, when resizing images, they should only be reduced so that they are suitable for the specific user agent, and this should not be done on a generic basis.

Rev 1e: Did we say we would remain silent on this?

Rev 1g: Francois made some proposals under ACTION-625 for dangerous content but no one has responded

If the proxy determines that the resource as currently represented is likely to cause serious mis-operation of the user agent then the proxy may transform the resource but only sufficiently to alter the specific aspect of the content that is likely to cause mis-operation. Proxies must not exhibit this behavior unless this has been specifically allowed by both the server and the user. [@@ either by persistent registration of preferences, or by use of the [@@correct dangerous content] directive.]

4 Use Case Analysis

Client Proxy Server

Unaware, Unaware, Unaware etc.

[@@TBD]

5 Testing

Rev 1g: As resolved at F2F in Korea

Providers of transforming proxies should make available interfaces that facilitate testing of Web sites accessed through them.

6 Conformance

A References

BestPractices
Mobile Web Best Practices 1.0 Basic Guidelines, Jo Rabin, Charles McCathieNevile (eds), W3C Proposed Recommendation, 2 November 2006 (See http://www.w3.org/TR/mobile-bp/)
CT-Landscape
Content Transformation Landscape 1.0, Jo Rabin, Andrew Swainston (eds), W3C Working Draft 25 October 2007 (See http://www.w3.org/TR/ct-landscape/)
HTTP
Hypertext Transfer Protocol -- HTTP/1.1 Request for Comments: 2616, R. Fielding, J. Gettys, J. Mogul, H. Frystyk, L. Masinter, P. Leach, T. Berners-Lee, June 1999 (See http://tools.ietf.org/html/rfc2616)
Device Independence Glossary
W3C Glossary of Terms for Device Independence, Rhys Lewis (ed), W3C Working Draft 18 January 2005

B Scope for Future Work (Non-Normative)

A placeholder for the bits we couldn't do

C Acknowledgments (Non-Normative)

The editors acknowledge contributions of various kinds from members of the MWI BPWG Content Transformation Task Force.

The editor acknowledges significant written contributions from:

D To Do (Non-Normative)

Work needed on this draft:

There could be a note that the host should provide interactions that allow the user to have a choice of presentations and so should the proxy and the User Agent, for that matter.

Another as yet unopened Pandora's box is that the discussion and proposed text looks at the issues primarily from the point of view of "varying presentation from Thematically consistent URIs". What hasn't, as yet, been explored is how it all works if there is a common entry point to a site (Thematically consistent URI for a home page) which then dispatches via redirect to media specific versions. This is possibly rather more common than the previous case (e.g. redirect to example.com/mobile - or rather better, imo, example.mobi). Naturally, there will also be varying presentation even within a redirected solution. This whole area needs further thought.

We need to discuss what relationship, if any, this has to the following RFCs:

RFC 2295 is experimental, but actually gets to some of the points we want to make, though doesn't exactly address what we are doing. It's rather a lengthy and detailed read, and has a lot of features that we don't need. It does, however, introduce a couple of headers and field values which have been IANA registered. Also, the main points of the negotiation are implemented in Apache in mod_negotiation (see [APACHE]).

[APACHE] http://httpd.apache.org/docs/2.2/content-negotiation.html

This draft (1f) has introduced the notion of POWDER to describe the proxy and the server. It would seem that two vocabularies are needed, 1 to describe the CT proxy and one to describe server preferences.

All ... must be tested for deleterious effects ... [@@TBD]