INTERNET-DRAFT URTP Lewis Girod, MIT draft-ietf-http-URTP Benjie Chen, MIT H. Frystyk Nielsen, W3C John Mallery, MIT Expires: XXXX Wednesday, October 14, 1998 URTP - URI Resolver Transport Protocol Status of this Document This document is an Internet-Draft. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress". To learn the current status of any Internet-Draft, please check the "1id-abstracts.txt" listing contained in the Internet-Drafts Shadow Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe), munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or ftp.isi.edu (US West Coast). Distribution of this document is unlimited. Please send comments to the mailing list. This list is archived at "http://lists.w3.org/Archives/Public/ietf-http-ext/". The contribution of World Wide Web Consortium (W3C) staff is part of the W3C HTTP Activity (see "http://www.w3.org/Protocols/Activity"). Abstract In order for the Web to continue to grow and prosper, information publishers must have readily stable URIs[1] available to identify resources. "Stable" can in this context be interpreted along different axis like persistence over time, accessibility, locality, etc. Although ensuring the stability of URIs to a high degree is a social engineering task, it is as important that the Web infrastructure supports evolution of protocols, transports and access models without requiring changes to already deployed URIs. Currently extensibility is primarily done through invention of new URI schemes which has serious implications on the long term interoperability of the Web. This document proposes a lightweight extension to HTTP[4] based on the HTTP Extension Framework[7] - hereafter known as "URTP". URTP encourages use of stable URIs and at the same time supports protocol evolution without having to change already deployed URIs or URI schemes. The extension does not define nor does it require a particular resolution mechanism. Rather, it defines a simple mechanism Girod, et al [Page 1] INTERNET-DRAFT URTP Wednesday, October 14, 1998 for carrying URI resolver information using HTTP as a transport with support for both implicit and explicit resolution of URIs. Table of Contents 1. Introduction.....................................................2 1.1 Terminology...................................................2 1.2 Purpose of URTP...............................................3 1.3 Requirements..................................................4 1.4 Operational Overview..........................................5 2. Notational Conventions...........................................5 3. URTP Extension Identifier........................................6 4. 350 Resolution Delegated.........................................6 5. Resolver Location................................................6 6. Resolver Control Directives......................................7 6.1 Fragment......................................................7 7. Convergence Errors...............................................7 8. Security Considerations..........................................8 9. References.......................................................8 10. Resolution Summary..............................................9 11. Examples........................................................9 11.1 Example 1....................................................9 11.2 Example 2...................................................10 1. Introduction Numerous generic URI resolver mechanisms have been designed and implemented over the years. Except from DNS, which can be seen as a partial URI resolver, none of them have become widely used in the Web, due mainly to the boot strapping problem of deploying a global resolver mechanism into the Web. This proposal breaks down the problem by separating the resolver transport protocol from the resolver mechanism and only specify the former. The URI resolver protocol defined in this document, known as URTP, is a simple extension to HTTP that allows HTTP to be used as a URI resolver protocol while imposing as few restrictions on the URI resolution mechanism as possible. 1.1 Terminology In this document, we use the following terminology which is not new to the Web community but as we define explicitly here for clarity: fragment or view A string at the end of a URI which identifies, within a Web document, a part or view to which one refers. The view, which is a function of the media type, is separated from the URI by a crosshatch ("#") character Girod, et al [Page 2] INTERNET-DRAFT URTP Wednesday, October 14, 1998 renderer An application that takes an input stream and produces a representation as output. The representation is typically a function of the media type of the input data, the requested view, and maybe stylistic information provided by style sheets etc. resolver An application that translates a URI into another URI or in case it is the authoritative resolver, directly to the requested resource. resolution The sequenced set of operations performed by a set of one or more resolvers is a nested set of operations that may result in an entity being generated and returned to the requestor. 1.2 Purpose of URTP Many existing URI schemes contain some implicit information about how to access a particular resource, typically as a function of the URI access scheme like "http:", "news:", etc. Some of these schemes are location dependent, some are not; some schemes may be intended to be persistent, others may not, and so on. The problem is that these characteristics often change and evolve over time. The following examples are well-known characteristics of a URI that often change and evolve in an orthogonal manner: Access Mechanism (how can I get it?) A single resource may be available through different access protocols supported by the party serving the resource. These access protocols may or may not be compatible: HTTP/0.9, HTTP/1.0, and HTTP/1.1 are backwards compatible protocols but HTTP running on top of SSL is not although it is in fact using HTTP as one of the access protocols. Lacking the capability of expressing and negotiating protocol stacks forces new access schemes like "shttp:", causing serious deployment and evolvability problems. Persistence (for how long can I get it?) The persistence of a URI, that is, the time period by which the URI exists, often depends on the contents or may be obtained through contractual agreements based on delegation of the URI space. Note that this is different from the max-age cache control directive in HTTP/1.1 which talks about how long an HTTP response is cachable and not about the persistence of the URI itself. For example, a URI pointing to today’s news paper is not expected to change, but the contents is. Being able to express additional information about the persistence of a URI would be a great potential benefit to search services, indexes etc. Locality (where can I get it?) Global availability of resources may be obtained by local mirroring of parts of the URI space. Mirroring can either be directly supported in the URI scheme, which for example is the case of Girod, et al [Page 3] INTERNET-DRAFT URTP Wednesday, October 14, 1998 "news:" or implemented in an ad hoc manner, which often is the case in HTTP based mirroring. Two typical mechanisms of mirroring "http:" URIs is either to make it explicit in the document: This document can also be obtained from…, or by doing multihome host DNS tricks based on the AS number of the requestor; both of which have considerable drawbacks. In a more abstract sense, the above examples can along with features like content negotiation etc. be summarized as follows: Under which circumstances can a URI be compared to itself as well as other URIs, what are the semantics of the comparison, and how can this relationship be expressed without changing the URI itself? The purpose of URTP is to allow relationship information such as the examples above to be returned by a URI resolver mechanism instead of having to code the information into the URI itself, often using ad hoc mechanisms. The advantage being that much more stable names can be introduced into the Web. The list of parameters of course doesn’t end here. Other information that may be passed around using URTP is information about what interfaces the resource supports, privacy policies, pricing policies, content ratings, etc. 1.3 Requirements In order for one or more resolvers to be able to use URTP for URI resolution, it is required: o That the protocol is independent of specific URI schemes; o That the protocol can transfer authoritative and non- authoritative information provided by resolvers; o That the protocol is independent of the format of the information passed around by the resolvers; o That resolver information is identified using a URIs as any other resource on the Web; o That it is possible for resolvers to detect infinite resolution loops and be able to decide whether a resolution is likely to converge or not; o That the protocol can work without breaking existing Web applications. Note, that it is not a requirement for the resolver protocol to provide a mechanism of finding a resolver providing URI resolution services. Within HTTP, this is equivalent to the problem of a client finding a proxy for accessing the Web. Although this is a very important problem, it should be solved for all uses of HTTP proxies and not only for special cases. Girod, et al [Page 4] INTERNET-DRAFT URTP Wednesday, October 14, 1998 1.4 Operational Overview A resource which is discovered through the resolution process will based on the user preferences, application capabilities, and view indicated in the request respond with a representation of itself. This representation is returned to the client as a response entity and form the input stream to the renderer. The resolution process can be described recursively as renderer ( resolution ( Request-URI, view ) ) where the Request-URI, the view and the resolver may change while iterating through the resolution process. By applying this model to HTTP itself, it becomes clear that the origin server of an HTTP URL also is the authoritative resolver for that URL. It also means that as HTTP can carry arbitrary URIs in a request, an HTTP server can be a resolver for arbitrary URIs. Currently, HTTP requests do not include the view identifier of the URI. However, as the view is a function of the media type of the response entity, it is important that the view is passed along in the resolution process as it may not be defined across the set of representations that the resource can return to the client. URTP is intended to be used as follows: o A URTP compliant client issues an HTTP request to its favored or local resolver server with the URI to be resolved as the Request- URI. The client indicates that it understands URTP using the HTTP Extension Framework[7]; o If the resolver server is non-authoritative, it can either a) reply with information about where to find an alternative resolver which may or may not be authoritative; or b) proxy or gateway the request itself; o If the resolver is authoritative, it replies as it would normally have done. No promise is made by this extension that the resolution converges towards a resource nor that a URI is guaranteed to be understood by the resolver (see section 10 for more details). 2. Notational Conventions This specification uses the same notational conventions and basic parsing constructs as RFC 2068[4]. In particular the BNF constructs "token", "quoted-string", and "field-name" in this document are to be interpreted as described in RFC 2068[4]. For definitive information on URI syntax and semantics, see "Uniform Resource Identifiers (URI): Generic Syntax and Semantics" [10]. This Girod, et al [Page 5] INTERNET-DRAFT URTP Wednesday, October 14, 1998 specification adopts the definitions of "URI-reference" and "uric" from that specification.. The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119[5]. 3. URTP Extension Identifier The URI used to identify this extension within the HTTP Extension Framework[7] is defined as URTP-specification-URN = "urn:specs:URTP" This identifier uniquely identifies the URTP extension and MUST NOT be used for any other purpose. 4. 350 Resolution Delegated The resolution process has been delegated to one or more alternative resolvers indicated in the response using the Resolver header field (see section 4). A 350 response can be issued by any resolver receiving a request for resolving a URI. Note that this differs from most other HTTP status codes as it is issued by the resolver that serves it, and may not be within the same trust domain as the origin server serving the Request- URI. We expect that document signing will help establish the required trust between two parties to allow them to communicate the appropriate resolver information independent of any a priori trust relationship (section 8). The response MAY include an entity containing additional metadata about the resource. If that entity is accessible from a separate location then this SHOULD be indicated using the Content-Location header field (see [4]). 350 responses are cachable unless indicated otherwise. If the 350 status code is received in response to a request other than GET or HEAD, the user agent SHOULD NOT automatically redirect the request unless it can be confirmed by the user, since this might change the conditions under which the request was issued. 5. Resolver Location The Resolver response-header field MUST be included in 350 (Resolution Delegated) response messages (see section 4). The field value consists of at least one resolver that indicates the location of the next resolver in the resolution chain. Girod, et al [Page 6] INTERNET-DRAFT URTP Wednesday, October 14, 1998 resolver = "Resolver" ":" #resolver resolver-address = <"> URI-reference <"> If the resolver-address is a relative URI, the relative URI is interpreted relative to the Request-URI. If more than one resolver-address is provided, the client SHOULD try to determine which of the resolvers is the optimal one, for example based on connectivity, trust, etc. If the Resolver field-value is empty then no resolver could be found that could resolve the URI and the resolution process stops. 6. Resolver Control Directives The Resolver-Ctrl general-header field is used to specify directives that MUST be obeyed by the recipient. The directives typically override default resolution behavior or contain additional intended to guide the resolution in a certain direction. Resolver control directives are unidirectional in that the presence of a directive in a request does not imply that the same directive should be given in the response. resolver-control = "Resolver-Ctrl" ":" 1#resolver-directive resolver-directive = resolver-request-directive | resolver-response-directive resolver-request-directive = "fragment" "=" uric | resolver-extension resolver-response-directive = resolver-extension resolver-extension = token [ "=" ( token | quoted-string ) ] Relative URIs in resolver-control directives are interpreted relative to the base URI of that message. As the parameters used in the resolution process may change (see section 1.4), the resolver control directives may also change. 6.1 Fragment The fragment resolver directive can be used to pass the fragment identifier of the Request-URI to the resolver. As mentioned in section 1.4, the fragment identifier may change as a result of the resolution an hence SHOULD be presented to the resolver. 7. Convergence Errors If a server detects an unrecoverable error in the resolution process, it SHOULD return a 350 (Resolution Delegated) response to the client with an empty Resolver field-value. This can for example be the case Girod, et al [Page 7] INTERNET-DRAFT URTP Wednesday, October 14, 1998 if it detects a resolution loop or does not know how to resolve the Request-URI. If a client detects a resolution loop by inspecting the Resolver header field, then it SHOULD immediately stop the resolution process and report an error. 8. Security Considerations Possible conflicts between the HTTP trust model and the 350 response raise security concerns. In short, 350 responses without security extensions are responses from untrusted resolvers. Measures such as loop-avoidance should be applied to detect and prevent denial-of- service attacks. Implementations of URTP should follow the security restrictions of the environment the resolver operates in. For example, Resolvers on firewalls operating under both single-step and delegation proxy behaviors may be required to filter out resolution requests from outside the firewall that intend to use an internal resource. Such requests, in most cases, are not allowed. However it is quite essential that such proxying resolvers forward resolution requests from internal clients to the outside world, unless an organization intend to mirror resolution services over all URI namespaces internally. Many security concerns of URI resolution, such as authenticity of resolution information, are problems that require further study. These considerations are beyond the scope of this document. 9. References [1] T. Berners-Lee, "Universal Resource Identifiers in WWW. A Unifying Syntax for the Expression of Names and Addresses of Objects on the Network as used in the World-Wide Web", RFC 1630, CERN, June 1994. [2] T. Berners-Lee, L. Masinter, M. McCahill. "Uniform Resource Locators (URL)" RFC 1738, CERN, Xerox PARC, University of Minnesota, December 1994. [3] T. Berners-Lee, R. Fielding, H. Frystyk, "Hypertext Transfer Protocol-HTTP/1.0", RFC 1945, W3C/MIT, UC Irvine, W3C/MIT, May 1996. [4] R. Fielding, J. Gettys, J. C. Mogul, H. Frystyk, T. Berners-Lee, "Hypertext Transfer Protocol HTTP/1.1", RFC 2068, U.C. Irvine, DEC W3C/MIT, DEC, W3C/MIT, W3C/MIT, January 1997 [5] S. Bradner, "Key words for use in RFCs to Indicate Requirement Levels", RFC 2119, Harvard University, March 1997 [6] R. Daniel, M. Mealling, "Resolution of Uniform Resource Identifiers using the Domain Name System", RFC 2168, June 1997 Girod, et al [Page 8] INTERNET-DRAFT URTP Wednesday, October 14, 1998 [7] H. F. Nielsen, P. Leach, S. Lawrence, "HTTP Extension Framework", draft-http-ext-mandatory, March 23, 1998. This is work in progress. [8] R. Daniel, "A Trivial Convention for using HTTP in URN Resolution", RFC 2169 , June 1997 [9] K. Sollins, "Architectural Principles of Uniform Resource Name Resolution", RFC 2276, MIT, September, 1997 [10] Berners-Lee, T., Fielding, R., Masinter, L., "Uniform Resource Identifiers (URI): Generic Syntax and Semantics", draft-fielding- uri-syntax, Work in Progress, March, 1998. 10. Resolution Summary The following section summarizes the operations performed by a URI resolver.The summary is intended as a guide and index to the text, but is necessarily cryptic and incomplete. When a resolver receives a resolution request for a URI, it SHOULD attempt to resolve the URI, making use of any resolution control information provided by the client (section 0). If the resolver is authoritative, it replies as it would normally have done. If the resolver can not itself complete the resolution, it can do one of three things: 1. Respond with a 350 (Resolution Delegated) status code indicating that the resolution has been delegated to alternate resolvers (section 4). Redirection and delegation information MUST be conveyed via the Resolver header field (section 4); An empty Resolver field-value indicates that no resolver could be found. 2. Proxy or gateway the request to an upstream resolver and return the proxied response to the client. This mechanism can also be used to interoperate with non-URTP compliant applications that may still be using a resolver mechisnism. 11. Examples 11.1 Example 1 A URTP client issues a request to "http://metadata.org" in order to look for metainformation about the resource "urn:cid:9802032044@www.w3.org": GET urn:cid:9802032044@www.w3.org HTTP/1.1 Host: metadata.org Opt: "urn:specs:URTP" Resolver-Ctrl: fragment="top" The resolver returns information about where to find the authoritative resolver for this URI, the request-URI to use, and the metadata that the resolver has about the resource: Girod, et al [Page 9] INTERNET-DRAFT URTP Wednesday, October 14, 1998 HTTP/1.1 350 Resolution Delegated Resolver: "http://www.w3.org/Protocols/#top" Content-Type: application/rdf … 11.2 Example 2 A client wants to resolve "urn:cid:9802032044@thebe.lcs.mit.edu". It sends a resolution request to "http://urn.org" GET urn:cid:9802032044@thebe.lcs.mit.edu HTTP/1.1 Host: urn.org Opt: "urn:specs:URTP" … The resolver at "http://urn.org" determines that the URI can be resolved using another resolver, and sends back a 350 response: HTTP/1.1 350 Resolution Delegated Resolver: "http://thebe.lcs.mit.edu/;scope=urn%3Acid%3A" Content-Type: application/rdf … To continue the resolution process, the client makes another resolution request, this time to "http://thebe.lcs.mit.edu": GET urn:cid:9802032044@thebe.lcs.mit.edu HTTP/1.1 Host: thebe.lcs.mit.edu Opt: "urn:specs:URTP" Resolver-Ctrl: hint="http://thebe.lcs.mit.edu/;scope=urn%3Acid%3A" The resolver at thebe.lcs.mit.edu is the authoritative resolver for the URI. It returns the requested entity: HTTP/1.1 200 OK … Girod, et al [Page 10]