URIs, Addressability, and the use of HTTP GET

TAG Finding 10 June 2002 (Revised 22 September 2002)

This version:: http://www.w3.org/2001/tag/doc/get7
Latest version:: http://www.w3.org/2001/tag/doc/whenToUseGet
Editor:: Dan Connolly

Abstract

An important principle of Web architecture is that all important resources be identifiable by URI. This finding discusses the importance of using GET for safe operations on the Web, so that those resources may be identified by a URI. The finding also discusses some practical limitations to this general principle.

Status of this document

Note: This document has been superseded by the 22 September 2003 version of this finding.

This document has been produced by the W3C Technical Architecture Group (TAG). This finding addresses TAG issue whenToUseGet-7.

This finding was accepted by the TAG at its 10 June 2002 teleconference. The TAG originally reached consensus on this finding at its 20 May 2002 teleconference. At their 16 Dec 2002 teleconference, the TAG agreed to add a publication date to this document, consistent with the TAG's expectation that findings no longer be modified in place.

Additional TAG findings, both approved and in draft state, may also be available. The TAG expects to incorporate this and other findings into a Web Architecture Document that will be published according to the process of the W3C Recommendation Track.

The terms MUST, SHOULD, and SHOULD NOT are used in this document in accordance with RFC 2119 [RFC2119].

Please send comments on this finding to the publicly archived TAG mailing list www-tag@w3.org (archive).

Relevant Principles of Web Architecture
Use GET for addressability
Dereferencing URIs is safe
Obligations of confidentiality, payment, and licensing terms
Limitations
Myths, Bugs, and Ephemeral Limitations
Ongoing work on SOAP
References
Acknowledgments

1. Relevant Principles of Web Architecture

All important resources SHOULD be identifiable by URI.
Dereferencing URIs is safe; i.e. agents do not incur obligations by following links.

1.1 Relevant good practice

Safe operations (read, query, view, ask, lookup, etc.) on HTTP resources SHOULD be implemented using GET because that allows the result documents to be identified by URI, while using POST does not. Also, it is useful for server to use a safe operation since clients can take advantage of safety guarantees made by the protocol.
If you use GET for operations with side-effects, your make your system insecure.

2. Use GET for addressability

It is possible to share information using Web technologies without giving that information a URI, but it's not optimal. For example, a product catalog can be built using an HTML form where the client provides a product number to the server in an HTTP POST request, and information about the product comes back in the response. But that design does not allow the client to make a link to the information about the product, bookmark it, or use it with any of the many Web technologies (e.g., XSLT's document() function, RDF assertions, XLink, etc.) that depend on information being URI-addressable.

HTML forms that use the GET method provide a URI for each combination of inputs. Section 17.13.1 of the HTML 4.01 Recommendation [HTML401] states (and the text goes back to HTML 2.0):

The "get" method should be used when the form is idempotent (i.e., causes no side-effects). Many database searches have no visible side-effects and make ideal applications for the "get" method.

Unfortunately, the term idempotent is misused there, and the term side-effects is stretched from its use in the design of programming languages. Section 9.1.1 of the HTTP 1.1 specification [RFC2616]is more precise on the matter:

Implementors should be aware that the software represents the user in their interactions over the Internet, and should be careful to allow the user to be aware of any actions they might take which may have an unexpected significance to themselves or others.

In particular, the convention has been established that the GET and HEAD methods SHOULD NOT have the significance of taking an action other than retrieval. These methods ought to be considered "safe". This allows user agents to represent other methods, such as POST, PUT and DELETE, in a special way, so that the user is made aware of the fact that a possibly unsafe action is being requested.

Naturally, it is not possible to ensure that the server does not generate side-effects as a result of performing a GET request; in fact, some dynamic resources consider that a feature. The important distinction here is that the user did not request the side-effects, so therefore cannot be held accountable for them.

2.1. Security considerations

If you use GET for operations with side-effects, your make your system insecure. For example, a malicious Web page publisher outside a firewall might put a URI in a page that, when dereferenced unwittingly by someone inside the firewall, could activate a function on another system within the firewall.

3. Dereferencing URIs is safe

To elaborate on the principal of following links being safe, consider the following two designs for mailing list subscription confirmation.

Design 1:

The user sends a subscribe message to an administrative mailbox (mylist-request@example.org).
The list processing software sends an email response to the user, requesting that the user confirm the subscription request, and including a link to a confirmation page.
The user follows this link to the confirmation page, and finds a "[Confirm] your subscription" form, with method="POST".
The user activates the [Confirm] form control.
The list processing software confirms the subscription.

Design 2 (incorrect):

same as 1 above
same as 2 above
The user follows the link to the confirmation page and and is informed "your subscription is confirmed".

The latter design performed an unsafe operation (list subscription) in response to a request with a safe method (following the link from the mail message with GET). If the users's mail agent pre-fetched pages to speed up browsing, the subscription would be confirmed without the knowledge and consent of the user; the HTTP specification makes it clear that the fault is with the server in this case; the user's mail agent is free to follow links without incurring obligations.

4. Obligations of confidentiality, payment, and licensing terms

This is not to say that there are never any obligations related to following links; only that the obligations must be accepted some other way than requesting to follow a link.

Obligations of confidentiality can be established in a straightfoward manner as follows:

The client requests access to the materials.
The server declines, with an "authorization required" notice, and a link to an account application form.
The client follows the link to the form, and applies for an account, agreeing to the terms and conditions in a POST request (or by fax or postal mail, for that matter).
The server provides credentials in response.
The client re-requests the matierials, providing credentials.

Web sites that say "by following the link to ABC, you agree to the following terms and conditions" do not account for the fact that anyone (in particular, a search service) can make another link to ABC, and anyone who follows this other link to ABC may never have seen the terms and conditions.

5. Limitations

Web application design should be informed by the above principles, but also by the relevant limitations.

The W3C HTML validation service provides an example: the norm is that validation requests are done by reference; the form uses GET, which gives the results a URI for bookmarks, links, etc; but the service also allows clients to upload a document for validation. In that case, the form uses POST, since

the document to be validated might be confidential; any link to the results of validating it would divulge its contents.
a URI that encoded the entire document would be at least as large as the document, and there's little or no use in linking to it, since the results will always be the same.

Whether or not GET with HTTP is used for the initial access, supplying a URI for subsequent access to the same information, e.g., using Content-Location, is useful.

The case of large parameters to a safe operation is not directly addressed by HTTP as it is presently deployed. A QUERY or "safe POST" or "GET with BODY" method has been discussed (e.g., at the December 1996 IETF meeting) but no consensus has emerged.

WebDAV [RFC 2518] uses a different HTTP method, PROPFIND (section 8.1 PROPFIND), for querying properties of resources; unfortunately, this provides no URI for the results of these queries.

5.1 Internationalization

Designers of HTML forms that accept non-ASCII characters have been challenged by some implementation limitations and gaps in specifications. Implementation limitations are length-related. Section section 17.13.4 of HTML 4.01 [HTML401] on mutipart/form-data says:

The content type "application/x-www-form-urlencoded" is inefficient for sending large quantities of binary data or text containing non-ASCII characters.

This inefficiency is due to the octet-to-%hh escape conversion, combined with the fact that many characters need more than one octet to be encoded. But while somewhat inefficient, this is not a real obstacle to using GET for non-ASCII characters.

A more serious problem is that the mapping between characters and octets is not clearly specified beyond US-ASCII; refer to section 2.1 of the URI specification [RFC2396]. For query parts (parts after the '?') resulting from filling in an HTML form, the default is to use the character encoding of the form. The definition of the accept-charset attribute on the form element in HTML 4.01 [HTML401] says:

The default value for this attribute is the reserved string "UNKNOWN". User agents may interpret this value as the character encoding that was used to transmit the document containing this FORM element.

The general direction to address this limitation is to converge to using UTF-8 for the mapping between characters and octets. The use of UTF-8 is already defined in various specifications, and we expect it to be adopted in future specifications and further deployed in due course. For instance, we expect XForms to specify that the encoding to be used in query parts is always UTF-8.

6. Myths, Bugs, and Ephemeral Limitations

While Web application design must take into account the limitations of technology that is widely deployed at present, it should not treat these as architectural invariants. Some limitations are likely to fade away as bugs are fixed and the scope of interoperable specifications expands.

Myth: Search services will not index anything with a "?" in the URI.: This was a heuristic to avoid infinite loops in some search service crawlers, but it was not an architectural constraint, and modern search services use more sophisticated heuristics to avoid loops.

Myth: URIs cannot be longer than 256 characters: This was a limitation in some server implementations, and while servers continue to have limitations to prevent denial-of-service attacks, they are generally at least 4000 characters, and they evolve as the legitimate uses of application developers evolve.

7. Ongoing work on SOAP

The use of HTTP for typical safe remote operations is not addressed by SOAP specifications as of this writing. For instance, from section 8.4.1.1.1 Requesting State of SOAP Adjuncts [SOAPADJUNCTS]:

HTTP Method: POST (the use of other HTTP methods is currently undefined in this binding).

Intitial investigations into requirements and a proposed solution (SOAP HTTP GET Binding Version 0.1, Orchard, May 2002) suggest this limitation is straightfoward to address; meanwhile, "the oft-quoted stock quote example" (Overview section) is misleading, since it suggest that HTTP POST is appropriate for this safe operation.

WSDL 1.1 [WSDL] provides a binding to HTTP GET, which makes it possible to respect the principle of using GET for safe operations. However, to represent safety in a more straightforward manner, it should be a property of operations themselves, not just a feature of bindings.

8. References

[HTML401]: "HTML 4.01 Specification", D. Raggett, A. Le Hors, I. Jacobs, 24 Dec 1999. This W3C Recommendation is available at: http://www.w3.org/TR/1999/REC-html401-19991224
[RFC2119]: "RFC2119: Key words for use in RFCs to Indicate Requirement Levels", S. Bradner, March 1997. Available at http://www.ietf.org/rfc/rfc2119.txt
[RFC2396]: "RFC2396: Uniform Resource Identifiers (URI): Generic Syntax", T. Berners-Lee, R. Fielding, L. Masinter, August 1998. Available at http://www.ietf.org/rfc/rfc2396.txt
[RFC2518]: "RFC2518: HTTP Extensions for Distributed Authoring -- WEBDAV", Y. Goland, E. Whitehead, A. Faizi, S. Carter, D. Jensen, February 1999. Available at: http://www.ietf.org/rfc/rfc2518.txt.
[RFC2616]: "RFC2616: Hypertext Transfer Protocol -- HTTP/1.1", R. Fielding, J. Gettys, J. Mogul, H. Frystyk, L. Masinter, P. Leach, T. Berners-Lee, June 1999. Available at: http://www.ietf.org/rfc/rfc2616.txt
[SOAPADJUNCTS]: "SOAP Version 1.2 Part 2: Adjuncts", M. Gudgin, M. Hadley, J-J. Moreau, H. Frystyk-Nielsen, 17 December 2001. This W3C Working Draft is available at: http://www.w3.org/TR/2001/WD-soap12-part2-20011217/.
[WSDL]: "Web Services Description Language (WSDL) 1.1", E. Christensen, F. Curbera, G. Meredith, S. Weerawarana, 15 March 2001. This W3C Note is available at: http://www.w3.org/TR/2001/NOTE-wsdl-20010315. Please note that this document is the result of a W3C Member Submission and does not represent consensus within W3C.

9. Acknowledgments

Thanks to David Orchard, Larry Masinter, Paul Prescod, Roy Fielding, Martin Dürst, and others for their feedback in response to the 15 April 2002 call for review.

Last modified: $Date: 2003/09/22 20:28:18 $ by $Author: ijacobs $. $Revision: 1.38 $