INTERNET-DRAFT Wire Lewis Girod, MIT draft-ietf-http-wire Benjie Chen, MIT H. Frystyk Nielsen, W3C John Mallery, MIT Expires: XXXX Monday, May 11, 1998 WIRE - Web Identifier Resolver Extension Status of this Document This document is an Internet-Draft. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress". To learn the current status of any Internet-Draft, please check the "1id-abstracts.txt" listing contained in the Internet-Drafts Shadow Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe), munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or ftp.isi.edu (US West Coast). Distribution of this document is unlimited. Please send comments to the mailing list. This list is archived at "http://lists.w3.org/Archives/Public/ietf-http-ext/". The contribution of World Wide Web Consortium (W3C) staff is part of the W3C HTTP Activity (see "http://www.w3.org/Protocols/Activity"). Abstract In order for a distributed information system like the Web to continue to grow and prosper, it is a necessity that publishers are provided with stable URIs[1] to identify resources. "Stable" can in this context be interpreted along different axis like time, accessibility, locality, etc. Although ensuring the stability of URIs to a high degree is a social engineering task, it is as important that the Web infrastructure supports evolution of protocols, transports and access models without requiring changes to already deployed URIs. Currently this evolution is only allowed through the invention of new URI schemes which has serious implications on the long term interoperability of the Web. This document proposes a lightweight extension to HTTP[4] based on the HTTP Extension Framework[7] - hereafter known as "WIRE". WIRE encourages use of stable URIs and at the same time supports protocol evolution without having to change already deployed URIs or URI schemes. The extension does not define nor does it require a Girod, et al [Page 1] INTERNET-DRAFT Wire Monday, May 11, 1998 particular resolution mechanism. Rather, it defines a simple mechanism for carrying URI resolver information using HTTP as a transport with support for both implicit and explicit resolution of URIs. Table of Contents 1. Introduction.....................................................2 1.1 Terminology...................................................2 1.2 Purpose of WIRE...............................................3 1.3 Requirements..................................................4 1.4 Operational Overview..........................................5 2. Notational Conventions...........................................6 3. WIRE Extension Identifier........................................6 4. Resolver Control Directives......................................6 4.1 Fragment......................................................7 4.2 No-Forward....................................................7 4.3 Hint..........................................................7 5. Resolver Location................................................7 6. 350 Resolution Delegated.........................................8 7. Convergence Errors...............................................8 8. Security Considerations..........................................9 9. References.......................................................9 10. Acknowledgments................................................10 11. Resolution Summary.............................................10 12. Examples.......................................................11 12.1 HTTP Resolution.............................................11 12.2 URN Resolution..............................................11 1. Introduction Numerous generic URI resolver mechanisms have been designed and implemented over the years. Except from DNS, which can be seen as a partial URI resolver, none of them have become widely used in the Web, due mainly to the boot strapping problem of deploying a global resolver mechanism into the Web. This proposal breaks down the problem by separating the resolver protocol from the resolver mechanism and only specify the former. The URI resolver protocol defined in this document, known as WIRE, is a simple extension to HTTP that allows HTTP to be used as a URI resolver protocol while imposing as few restrictions on the URI resolution mechanism as possible. 1.1 Terminology In this document, we use the following terminology which is not new to the Web community but as we define explicitly here for clarity: fragment or view Girod, et al [Page 2] INTERNET-DRAFT Wire Monday, May 11, 1998 A string at the end of a URI which identifies, within a Web document, a part or view to which one refers. The view, which is a function of the media type, is separated from the URI by a crosshatch ("#") character renderer An application that an input stream and produces a representation as output. The representation is a function of the media type of the input data, the requested view, and maybe stylistic information provided by style sheets etc. resolver An application that translates a URI into another URI or in case it is the authoritative resolver, directly to the requested resource. resolution The sequenced set of operations performed by a set of one or more resolvers is a nested set of operations that may result in an entity being generated and returned to the requestor. 1.2 Purpose of WIRE Many existing URI schemes contain some information about how to access the resource, typically as a function of the URI access scheme like "http:", "news:", etc. Some of these schemes are location dependent, some are not; some schemes may be intended to be persistent, others may not, and so on. The problem is that these characteristics often change and evolve over time. The following examples are well-known characteristics of a URI that often change and evolve in an orthogonal manner: Access Mechanism (how can I get it?) A single resource may be available through different access protocols supported by the party serving the resource. These access protocols may or may not be compatible: HTTP/0.9, HTTP/1.0, and HTTP/1.1 are backwards compatible protocols but HTTP running on top of SSL is not although it is in fact using HTTP as one of the access protocols. Lacking the capability of expressing and negotiating protocol stacks forces new access schemes like "shttp:", causing serious deployment and evolvability problems. Persistence (for how long can I get it?) The persistence of a URI, that is, the time period by which the URI exists, often depends on the contents or may be obtained through contractual agreements based on delegation of the URI space. Note that this is different from the max-age cache control directive in HTTP/1.1 which talks about how long an HTTP response is cachable and not about the persistence of the URI itself. For example, a URI pointing to today’s news paper is not expected to change, but the contents is. Being able to express additional information about the persistence of a URI would be a great potential benefit to search services, indexes etc. Girod, et al [Page 3] INTERNET-DRAFT Wire Monday, May 11, 1998 Locality (where can I get it?) Global availability of resources may be obtained by local mirroring of parts of the URI space. Mirroring can either be directly supported in the URI scheme, which for example is the case of "news:" or implemented in an ad hoc manner, which often is the case in HTTP based mirroring. Two typical mechanisms of mirroring "http:" URIs is either to make it explicit in the document: This document can also be obtained from…, or by doing multihome host DNS tricks based on the AS number of the requestor; both of which have considerable drawbacks. In many ways the above examples of often encountered problems in characterizing URIs along with other problems introduced by content negotiation etc. boil down to a single consideration: Under which circumstances can a URI be compared to itself as well as other URIs, what are the semantics of the comparison, and how can this relationship be expressed without changing the name of the URI itself? The list of parameters of course doesn’t end here. Other information that may be passed around using WIRE is information about what interfaces the resource supports, privacy policies, pricing policies, content ratings, etc. The purpose of WIRE is to allow relationship information such as the examples above to be returned by a URI resolver mechanism instead of having to code the information into the URI itself, often using ad hoc mechanisms. The advantage being that much more stable names can be introduced into the Web. 1.3 Requirements In order for one or more resolvers to be able to use WIRE for URI resolution, it is required: o That the protocol is independent of specific URI schemes; o That the protocol can transfer authoritative and non- authoritative information provided by resolvers; o That the protocol is independent of the format of the information passed around by the resolvers; o That it is possible to address that information within the URI space itself so that this information may be accessed as any other resource; o That it is possible for resolvers to detect infinite resolution loops and be able to decide whether a resolution is likely to converge or not; o That the protocol can work without breaking existing applications on the Web. Note, that it is not a requirement for the resolver protocol to provide a mechanism of finding the first resolver providing URI resolution services. Within HTTP, this is equivalent to the problem of Girod, et al [Page 4] INTERNET-DRAFT Wire Monday, May 11, 1998 a client finding a proxy for accessing the Web. Although this is a very important problem, it should be solved for all uses of HTTP proxies and not only for special cases. 1.4 Operational Overview A resource which is discovered through the resolution process will based on the user preferences, application capabilities, and view indicated in the request respond with a representation of itself. This representation is returned to the client as a response entity and form the input stream to the renderer. The resolution process can be described recursively as renderer ( resolution ( Request-URI, view ) ) where the Request-URI, the view and the resolver may change while iterating through the resolution process. By applying this model to HTTP itself, it becomes clear that the origin server of an HTTP URL also is the authoritative resolver for that URL, and that a server can be a resolver for arbitrary URIs. Currently, HTTP requests do not include the view identifier of the URI. However, as the view is a function of the media type of the response entity, it is important that the view is passed along in the resolution process as it may not be defined across the set of representations that the resource can return to the client. WIRE is intended to be used as follows: o A WIRE compliant client issues an HTTP request to its favored or local resolver server with the URI to be resolved as the Request- URI. The client indicates that it understands WIRE using the HTTP Extension Framework[7]; o If the resolver server is non-authoritative, it can either a) reply with information about where to find an alternative resolver which may or may not be authoritative; b) proxy or gateway the request itself; or c) report an error in case it can not resolve the URI or no convergence is registered; o If the resolver is authoritative, it replies as it would normally have done. This resolution process can be used with any HTTP method although the typical methods used are expected to be GET, HEAD, and OPTIONS. No promise is made by this extension that the resolution converges towards a resource nor that a URI is guaranteed to be understood by the resolver (see section 11 for more details). Girod, et al [Page 5] INTERNET-DRAFT Wire Monday, May 11, 1998 2. Notational Conventions This specification uses the same notational conventions and basic parsing constructs as RFC 2068[4]. In particular the BNF constructs "token", "quoted-string", and "field-name" in this document are to be interpreted as described in RFC 2068[4]. For definitive information on URI syntax and semantics, see "Uniform Resource Identifiers (URI): Generic Syntax and Semantics" [10]. This specification adopts the definitions of "URI-reference" and "uric" from that specification.. The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119[5]. 3. WIRE Extension Identifier The URI used to identify this extension within the HTTP Extension Framework[7] is defined as WIRE-specification-URN = "urn:specs:wire" This identifier uniquely identifies the WIRE extension and MUST NOT be used for any other purpose. 4. Resolver Control Directives The Res-Ctrl general-header field is used to specify directives that MUST be obeyed by the recipient. The directives typically override default resolution behavior or contain additional intended to guide the resolution in a certain direction. Resolver control directives are unidirectional in that the presence of a directive in a request does not imply that the same directive should be given in the response. resolver-control = "Res-Ctrl" ":" 1#resolver-directive resolver-directive = resolver-request-directive | resolver-response-directive resolver-request-directive = "fragment" "=" uric | "no-forward" | "hint" "=" <"> URI-reference <"> | resolver-extension resolver-response-directive = resolver-extension resolver-extension = token [ "=" ( token | quoted-string ) ] Relative URIs in resolver-control directives are interpreted relative to the base URI of that message. As the parameters used in the resolution process may change (see section 1.4), the resolver control directives may also change. Girod, et al [Page 6] INTERNET-DRAFT Wire Monday, May 11, 1998 4.1 Fragment The fragment resolver directive can be used to pass the fragment identifier of the Request-URI to the resolver. As mentioned in section 1.4, the fragment identifier may change as a result of the resolution an hence SHOULD be presented to the resolver. 4.2 No-Forward The no-forward directive implies that the resolver MUST NOT forward the request as either a proxy or a gateway. Note that the Max-Forwards HTTP request-header field MAY be used to limit the number of proxies or gateways that can forward the request to the next upstream server. 4.3 Hint Certain types of URN resolvers require the use of state in the form of hints during resolution. This allows a URN resolver to skip delegation steps that would otherwise be necessary to obtain the hint. Resolution hints can be passed using the hint resolver directive and are encoded as URIs. Apart from the URI encoding rules, the syntax and semantics of resolution hints is specific to the resolution system indicated by the hint. For the purpose of caching and loop avoidance, two hints are lexically equivalent if they are octet-by-octet equal after applying URI normalization rules. 5. Resolver Location The Res-Loc response-header field MUST be included in 350 (Resolution Delegated) response messages (see section 6). The field value consists of at least one resolver that indicates where to find the next resolver and potentially the Request-URI to use when issuing the delegated request to that resolver. resolver-location = "Res-Loc " ":" 1#resolver resolver = <"> URI-reference <"> [";" request ] request = "request" "=" <"> URI-reference <"> The request parameter indicates the Request-URI to use when issuing the delegated request. If no request parameter is present, then the delegated request SHOULD use the same Request-URI as in the original request. If the resolver is a relative URI, the relative URI is interpreted relative to the Request-URI. Girod, et al [Page 7] INTERNET-DRAFT Wire Monday, May 11, 1998 If more than one resolver is provided, the client SHOULD try to determine which of the resolvers is the optimal one, for example based on connectivity, trust, etc. 6. 350 Resolution Delegated The resolution process has been delegated to one or more alternative resolvers indicated in the response using the res-loc header field. A 350 response can be issued by any resolver receiving a request for resolving a URI. Note that this differs from most other HTTP status codes as it is issued by the resolver that serves it, and may not be within the same trust domain as the origin server serving the Request- URI. We expect that document signing will help establish the required trust between two parties to allow them to communicate the appropriate resolver information independent of any a priori trust relationship (section 8). A resolver issuing a 350 response MUST indicate that it was contacted using the Via HTTP header field (see [4]). Clients receiving a 350 response MUST inspect the Via header field before issuing a delegated request in order to detect that this request will not create an infinite resolution loop. The response MAY include an entity containing additional metadata about the resource. If that entity is accessible from a separate location then this SHOULD be indicated using the Content-Location header field (see [4]). 350 responses are cachable unless indicated otherwise. If the 350 status code is received in response to a request other than GET or HEAD, the user agent SHOULD NOT automatically redirect the request unless it can be confirmed by the user, since this might change the conditions under which the request was issued. 7. Convergence Errors If a server detects an unrecoverable error in the resolution process, it SHOULD return a 418 (Not Proxying) response to the client. This can for example be the case if it detects a resolution loop or does not know how to resolve the Request-URI. If a client detects a resolution loop by inspecting the Via header field, then it SHOULD immediately stop the resolution process and report an error. Girod, et al [Page 8] INTERNET-DRAFT Wire Monday, May 11, 1998 8. Security Considerations Possible conflicts between the HTTP trust model and the 350 response raise security concerns. In short, 350 responses without security extensions are responses from untrusted resolvers. Measures such as loop-avoidance should be applied to detect and prevent denial-of- service attacks. Implementations of WIRE should follow the security restrictions of the environment the resolver operates in. For example, Resolvers on firewalls operating under both single-step and delegation proxy behaviors may be required to filter out resolution requests from outside the firewall that intend to use an internal resource. Such requests, in most cases, are not allowed. However it is quite essential that such proxying resolvers forward resolution requests from internal clients to the outside world, unless an organization intend to mirror resolution services over all URI namespaces internally. Many security concerns of URI resolution, such as authenticity of resolution information, are problems that require further study. These considerations are beyond the scope of this document. 9. References [1] T. Berners-Lee, "Universal Resource Identifiers in WWW. A Unifying Syntax for the Expression of Names and Addresses of Objects on the Network as used in the World-Wide Web", RFC 1630, CERN, June 1994. [2] T. Berners-Lee, L. Masinter, M. McCahill. "Uniform Resource Locators (URL)" RFC 1738, CERN, Xerox PARC, University of Minnesota, December 1994. [3] T. Berners-Lee, R. Fielding, H. Frystyk, "Hypertext Transfer Protocol-HTTP/1.0", RFC 1945, W3C/MIT, UC Irvine, W3C/MIT, May 1996. [4] R. Fielding, J. Gettys, J. C. Mogul, H. Frystyk, T. Berners-Lee, "Hypertext Transfer Protocol HTTP/1.1", RFC 2068, U.C. Irvine, DEC W3C/MIT, DEC, W3C/MIT, W3C/MIT, January 1997 [5] S. Bradner, "Key words for use in RFCs to Indicate Requirement Levels", RFC 2119, Harvard University, March 1997 [6] R. Daniel, M. Mealling, "Resolution of Uniform Resource Identifiers using the Domain Name System", RFC 2168, June 1997 [7] H. F. Nielsen, P. Leach, S. Lawrence, "HTTP Extension Framework", draft-http-ext-mandatory, March 23, 1998. This is work in progress. [8] R. Daniel, "A Trivial Convention for using HTTP in URN Resolution", RFC 2169 , June 1997 [9] K. Sollins, "Architectural Principles of Uniform Resource Name Resolution", RFC 2276, MIT, September, 1997 Girod, et al [Page 9] INTERNET-DRAFT Wire Monday, May 11, 1998 [10] Berners-Lee, T., Fielding, R., Masinter, L., "Uniform Resource Identifiers (URI): Generic Syntax and Semantics", draft-fielding- uri-syntax, Work in Progress, March, 1998. 10. Acknowledgments The motivation leading to this work stemmed from a few directions. John Mallery’s experience with implementing the PDI namespace for URNs indicated that the THTTP ([8]) spec did not adequately cover error messages and redirection. Discussions with John Mallery and Henrik Frystyk Nielsen led to the initial formulation of this protocol specification, in an effort to rework THTTP into an official HTTP extension. Henrik Frystyk Nielsen also provided invaluable ideas and feedback on modifying the original design to fit into the generic ideas of URI resolution. Karen Sollins and Dorothy Curtis have also provided many insightful ideas and feedback on the general resolution architecture and on this document. 11. Resolution Summary The following section summarizes the operations performed by a URI resolver.The summary is intended as a guide and index to the text, but is necessarily cryptic and incomplete. When a resolver receives a resolution request for a URI, it SHOULD attempt to resolve the URI, making use of any resolution control information provided by the client (section 4). If the resolver is authoritative, it replies as it would normally have done. If the resolver can not itself complete the resolution, it can do one of three things: 1. Respond with a 350 (Resolution Delegated) status code indicating that the resolution has been delegated to alternate resolvers (section 6). Redirection and delegation information can be conveyed via the Res-Loc header field (section 5). A 350 response MUST include a Via header field so that the client can detect any resolution loops; 2. Respond with a 418 (Not Proxying) status code indicating that the resolution could not be completed and the request could not be fulfilled. (This code should be in HTTP/1.1!); 3. Proxy or gateway the request to an upstream resolver and return the proxied response to the client. This mechanism can also be used to interoperate with non-WIRE compliant applications that may still be using a resolver mechisnism. Girod, et al [Page 10] INTERNET-DRAFT Wire Monday, May 11, 1998 12. Examples 12.1 HTTP Resolution A WIRE client issues a request to "http://metadata.org" in order to look for metainformation about the resource "http://www.w3.org/Protocols/". It indicates that it does not want the resolver to proxy the request using the no-forward directive: OPTIONS http://www.w3.org/Protocols/ HTTP/1.1 Host: metadata.org Opt: "urn:specs:wire" Res-Ctrl: no-forward The resolver returns information about where to find the authoritative resolver for this URI, the request-URI to use, and the metadata that the resolver has about the resource: HTTP/1.1 350 Resolution Delegated Res-Loc: "http://www.w3.org"; request="/Protocols/" Content-Type: application/rdf … 12.2 URN Resolution A client wants to resolve "urn:cid:9802032044@thebe.lcs.mit.edu". It sends a resolution request to http://urn.org: GET urn:cid:9802032044@thebe.lcs.mit.edu HTTP/1.1 Host: urn.org Opt: "urn:specs:wire" … The resolver at http://urn.org determines that the URI can be resolved using another resolver, and sends back a 350 response: HTTP/1.1 350 Resolution Delegated Res-Loc: "http://thebe.lcs.mit.edu/;scope=urn%3Acid%3A" Content-Type: application/rdf … To continue the resolution process, the client makes another resolution request, this time to "http://thebe.lcs.mit.edu": GET urn:cid:9802032044@thebe.lcs.mit.edu HTTP/1.1 Host: thebe.lcs.mit.edu Opt: "urn:specs:wire" Res-Ctrl: hint="http://thebe.lcs.mit.edu/;scope=urn%3Acid%3A" The resolver at thebe.lcs.mit.edu is the authoritative resolver for the URI. It returns the requested entity: HTTP/1.1 200 OK … Girod, et al [Page 11]