URI Schemes and Web Protocols

Draft Tag Finding 12 June 2005

This version:
http://www.w3.org/2001/tag/doc/schemeProtocols-2005-06-12
Latest version:
http://www.w3.org/2001/tag/doc/schemeProtocols
Previous version:
Editor:
Noah Mendelsohn, IBM Corp. <Noah_Mendelsohn@us.ibm.com>

This document is also available in these non-normative formats: XML.


Abstract

This document discusses the deployment of protocols and URI schemes for use on the World Wide Web. Implications for correct configuration of user agents, servers and proxy gateways are discussed. Guidelines are also provided for deciding whether a new protocol or scheme is merited, and for maximizing interoperability of new protocols with those that are already widely deployed.

Status of this Document

This document is an editors' copy that has no official standing.

This document has been produced by the W3C Technical Architecture Group (TAG). This finding addresses TAG schemeProtocols-49.

This version of the document is a very preliminary sketch of a possible finding. Essentially, it is a snapshot of the editor's work-in-progress, made available so that the TAG will have something to discuss informally at the 14 June 2005 Face to Face meeting.

This document builds on and complements information from [RFC 2717] and [RFC 2718]. At the time of this writing, an Internet Draft has been submitted that would revise and subsume both of those RFCs (see [RFC 2717bis]). Although such drafts are not suitable for normative reference, this finding is intended to be consistent with the directions signalled in those revisions. When and if a revision to the RFCs becomes accepted, the TAG intends to republish this finding with the appropriate references and with any necessary changes to content.

Editorial note 
Is this the right way to handle the reference to 2718bis2718bis?

Additional TAG findings, both accepted and in draft state, may also be available. The TAG may incorporate this and other findings into future versions of the [AWWW].

The terms MUST, SHOULD, and SHOULD NOT are used in this document in accordance with [RFC 2119].

Editorial note 
Should a finding like this actually make use of the formal rfc2119 terminology?

Please send comments on this finding to the publicly archived TAG mailing list www-tag@w3.org (archive).

Table of Contents

1 Preface
2 Terminology
3 URI Assignment and Protocols
4 Gateway Proxies
5 Selection of Protocols by User Agents
6 Protocol design: consistency of operations and formats (To Be Supplied)
7 References


1 Preface

Of the many ways in which the Web can be extended, the provision for new URI schemes may be the most fundamental. URI schemes are often created to signal the use of new protocols for transfer of resource-related information. Embodied in such protocols are operations, such as GET, POST and DELETE in the case of HTTP, which determine the sorts of interactions that are possible with a given resource. The operations in turn determine the format and typing of information exchanged on the Web.

Precisely because so many aspects of resource naming and interaction are subject to change through introduction of URI schemes, new schemes can undermine the interoperability of the Web. [AWWW] explains why unnecessary proliferation of URI schemes must be avoided: "While Web architecture allows the definition of new schemes, introducing a new scheme is costly. Many aspects of URI processing are scheme-dependent, and a large amount of deployed software already processes URIs of well-known schemes. Introducing a new URI scheme requires the development and deployment not only of client software to handle the scheme, but also of ancillary agents such as gateways, proxies, and caches. See [RFC 2718] for other considerations and costs related to URI scheme design...Because of these costs, if a URI scheme exists that meets the needs of an application, designers should use it rather than invent one."

Conversely, the introduction of new protocols, operations, and data formats may sometimes be essential if the Web is to continue as a universal information space integrating the broadest possible range of resources. Already a wide variety of information is being shared through peer-to-peer protocols, many of which are not well integrated on the Web, and there is reason to believe that the highest quality multimedia feeds may not be best distributed and controlled through HTTP. Immersive user interfaces may require interaction protocols which are more flexible than those in in widespread use on the Web today, and so on. For such reasons, it is important to explore the tradeoffs involved in deploying new schemes, protocols and operations. This finding attempts to provide useful guidelines for the introduction and use of URI schemes and their associated protocols.

[RFC 2718] sets out guidelines and caveats for the creation of new URI schemes. This finding provides complementary information relating to the deployment of resources using such schemes, the design of protocols, the choice of operations to be supported, and the suggested behaviour of user agents. Indeed, this finding for the most part avoids restating principles already covered in [RFC 2717] and [RFC 2718]; readers are encouraged to become familiar with them before proceeding.

2 Terminology

This finding is intended to cover a broad range of information formats and protocols including client/server, peer-to-peer, streaming multimedia, multicast, etc. For convenience, the following client/server-oriented terminology is used except in cases where the extension to other protocols is unlikely to be clear:

User agent
A programmed entity, typically software, that uses a protocol to retrieve or update the state of a remote resource.
Server
A programmed entity, typically software, that maintains the state of a resource. Servers implement network protocols that support requests to retrieve (representations of) and/or to update the resource's state.
In the case of HTTP, state is exchanged in the form of media-typed resource representations, but other protocols may implement other abstractions or use different formats or typing systems.

3 URI Assignment and Protocols

For schemes such as http and ftp, the association of a URI to a resource is defined in terms of the corresponding protocol. Thus, the resource identified by http://example.org/resource1 is by definition the one for which representations are returned (GET) or updated (PUT) when that URI is supplied as the HTTP Request-URI (see [RFC 2616]). Unless otherwise stated, this finding deals only with such protocol-associated URI schemes.

Subtleties arise when such URIs are employed without deploying a server for the resource. For example, it is common to use XML namespace names based on the http scheme even when no server is providing representations for that namespace. Deploying such a server is desirable, but is not required by Web architecture. When there is no such server, the URI chosen SHOULD be consistent with eventual server deployment. So, in the case of HTTP, it is inappropriate to base a URI on a DNS name that is not registered, because the DNS name might later be assigned to an organization that would use it for a purpose inconsistent with serving representations of the resource. Similar considerations apply for other schemes and their associated protocols.

4 Gateway Proxies

In the simplest case, the protocol associated with the URI directly connects the user agent to the resource provider.

Picture of direct connection to resource provider.

The Web also allows for gateway proxies, which convert from one protocol to another. In such cases, the server offers the resource using one protocol, but the user agent access it through another. For example, HTTP can be used as a proxy for the FTP protocol.

Picture of proxy/gateway connection to resource provider.

The following considerations apply to the implementation of gateway proxies:

On the Web, URI names are typically used "on the wire" as the means by which protocols identify resources to be accessed. HTTP, for example, uses a URI as its Request-URI. When one such protocol is used as a proxy to another, the two "hops" may or may not use the same URI. When they are not the same, then there are two URI's identifying the same resource. (Strictly speaking, the two URI's name the resource and the proxy of the resource respectively, but for many practical purposes the effect is similar to having two names for the resource itself.) [AWWW] explains the disadvantages of assigning more than one URI to a single resource. For those reasons, protocols intended for use with gateways SHOULD be designed to avoid the requirement to generate such duplicate URI names. HTTP, for example, provides for the use of non-http scheme URIs as Request-URIs; accordingly, the same URI can often be used on both "hops". Conversely, URI duplication may be unavoidable when the gateway protocol demands naming with a particular scheme.

Editorial note 
Does this section us an appropriate mix of RFC 2119 "MUST"s and "SHOULD"s vs. more informal guidance?

5 Selection of Protocols by User Agents

This section discusses the means by which a user agent can select an appropriate protocol for accessing a resource.

The specification for a URI scheme determines the normative association of URIs from that scheme to resources. For protocol-based URIs, that association is typically defined in terms of the protocol (see 3 URI Assignment and Protocols). In such cases, a user agent can determine a protocol based on inspection of the URI, and in the common case where there is one protocol associated with a scheme, the scheme name directly determines the protocol. It is, for example, always acceptable for a user agent to attempt an HTTP connection to a resource named with the http scheme.

The means by which user agents determine that a gateway protocol is to be used are specific to each user agent. Using the example above, a user agent would require some configuration to indicate that ftp-scheme resources were in fact to be accessed using the HTTP protocol. This is similar to the other sorts of proxy configuration that are commonly required of Web browsers.

6 Protocol design: consistency of operations and formats (To Be Supplied)

This section has not been written. The paragraphs below are placeholders with reminders of possible topics to be covered.

To be supplied: explain that it's much easier to support new protocols in a user agent if the operations of that protocol are even generally similar to those of HTTP or other widely deployed protocols. So, a high def streaming video protocol may not support exactly an HTTP "Get", but if it supports something in the same spirit then a browser can probably provide a fairly consistent navigation experience as one goes from a web page to a movie and back.

To be supplied: similarly, if a peer-to-peer protocol supports retrieval of media typed octet streams, then browsers can use existing renderers, caches, etc. This will link to the AWWW GPN on reusing formats.

To be supplied: operations on the wire vs. operations at the endpoint. In HTTP, GET is visible both as a browser operation and on the wire. In peer to peer, you might have a very compatible operation at the browser that turned into all sorts of strange traffic on the wire. That's still a good thing to go for: if you can simulate as much of the HTTP "endpoint API" as possible, then you get a lot of browser compatiblity, even if the on the wire protocols are radically different.

7 References

AWWW
I.Jacobs, N. Walsh, Architecture of the World Wide Web. W3C. December, 2004. (See http://www.w3.org/TR/webarch/.)
RFC 2119
S. Bradner. Key words for use in RFCs to Indicate Requirement Levels. IETF. March, 1997. (See http://www.ietf.org/rfc/rfc2119.txt.)
RFC 2717
R. Petke, I. King Registration Procedures for URL Scheme Names. IETF. November, 1999. (See http://www.ietf.org/rfc/rfc2717.txt.)
RFC 2616
R. Fielding, J. Gettys, J. Mogul, H. Frystyk, P. Masinter, P. Leach, T. Berners-Lee Hypertext Transfer Protocol — HTTP/1.1. IETF. June, 1999. (See http://www.ietf.org/rfc/rfc2616.txt.)
RFC 2717bis
T. Hansen, T. Hardie, L. Masinter Guidelines and Registration Procedures for new URI Schemes. February, 2005. (See http://ietfreport.isoc.org/all-ids/draft-hansen-2717bis-2718bis-uri-guidelines-03.txt.)
RFC 2718
L. Masinter, H. Alvestrand, D. Zigmond, R. Petke Guidelines for new URL Schemes. IETF. November, 1999. (See http://www.ietf.org/rfc/rfc2718.txt.)
RFC 3986
T. Berners-Lee, R. Fielding, L. Masinter Uniform Resource Identifier (URI): Generic Syntax. IETF. January, 2005. (See http://www.ietf.org/rfc/rfc3986.txt.)