DRAFT Findings on when to use GET to make resources addressable (whenToUseGet-7)

OK, I've taken a stab at integrating feedback
received since 15 Feb:

  DRAFT Findings on when to use GET to make resources addressable

  DRAFT by Dan Connolly, for the TAG
  $Revision: 1.11 $ of $Date: 2002/05/01 21:30:59 $
  by $Author: connolly $
  http://www.w3.org/2001/tag/doc/get7


plaintext copy follows, for convenience...


   [1]W3C [2]TAG [3]findings

      [1] http://www.w3.org/
      [2] http://www.w3.org/2001/tag/
      [3] http://www.w3.org/2001/tag/findings

        DRAFT Findings on when to use GET to make resources addressable

   ref. issue [4]whenToUseGet-7

      [4] http://www.w3.org/2001/tag/ilist#whenToUseGet-7


    DRAFT by Dan Connolly, for the TAG
    $Revision: 1.11 $ of $Date: 2002/05/01 21:30:59 $ by $Author:
    connolly $

   Two principles are central to the design of Web sites and
   applications:
     * All important resources should be identifiable by URI.
     * Following references in the web is safe; i.e. agents do not incur
       obligations by following links

   It's possible to share information using Web technologies without
   giving the information a URI, but it's not optimal. For example, a
   product catalog can be built using an HTML form where the client
   provides a product number to the server in an HTTP POST request, and
   information about the product comes back in the response. But that
   design does not allow the client to make a link to the information
   about the product, bookmark it, or use it with any of the many Web
   technologies (e.g., XSLT's document() function, RDF assertions,
XLink,
   ...) that depend on info being URI addressable.

   HTML forms that use the GET method provide a URI for each combination
   of inputs. The relevant section of the HTML specification is:

     The "get" method should be used when the form is idempotent (i.e.,
     causes no side-effects). Many database searches have no visible
     side-effects and make ideal applications for the "get" method.

     [5]17.13.1 Form submission method of HTML 4.01 (text has been in
     HTML spec back to [6]HTML 2.0)

      [5]
http://www.w3.org/TR/1999/REC-html401-19991224/interact/forms.html#h-17.13.1
      [6] http://www.w3.org/MarkUp/html-spec/html-spec_8.html#SEC8.2.2

   Unfortunately, the term [7]idempotent is misused there, and the term
   [8]side-effects is stretched from its use in the design of
programming
   languages. The HTTP 1.1 specification is more precise on the matter:

      [7] http://wombat.doc.ic.ac.uk/foldoc/foldoc.cgi?idempotent
      [8] http://wombat.doc.ic.ac.uk/foldoc/foldoc.cgi?query=side+effect

     Implementors should be aware that the software represents the user
     in their interactions over the Internet, and should be careful to
     allow the user to be aware of any actions they might take which may
     have an unexpected significance to themselves or others.

     In particular, the convention has been established that the GET and
     HEAD methods SHOULD NOT have the significance of taking an action
     other than retrieval. These methods ought to be considered "safe".
     This allows user agents to represent other methods, such as POST,
     PUT and DELETE, in a special way, so that the user is made aware of
     the fact that a possibly unsafe action is being requested.

     Naturally, it is not possible to ensure that the server does not
     generate side-effects as a result of performing a GET request; in
     fact, some dynamic resources consider that a feature. The important
     distinction here is that the user did not request the side-effects,
     so therefore cannot be held accountable for them.


    [9]9.1.1 Safe Methods, HTTP 1.1

      [9] http://www.w3.org/Protocols/rfc2616/rfc2616-sec9.html#sec9.1.1

   To elaborate on the principal of following links being safe, consider
   the following two designs for mailing list subscription confirmation:

   In the first case:
    1. The user sends a subscribe message to an administrative mailbox
       (mylist-request@example.org).
    2. The list processing software requests confirmation by email,
       including a link to a confirmation page
    3. The user visits the confirmation page, and finds a "[Confirm]
your
       subscription" form, with method="POST".
    4. The user activates the [Confirm] form control.
    5. The list processing software confirms the subscription.

   In the second case:
    1. as above
    2. as above
    3. The user visits the confirmation page and sees "your subscription
       is confirmed". The list processing software confirms the
       subscription.

   The latter design performed an unsafe operation (list subscription)
in
   response to a request with a safe method (following the link from the
   mail message with GET). If the users's mail agent pre-fetched pages
to
   speed up browsing, the subscription would be confirmed without the
   knowledge and consent of the user; the HTTP specification makes it
   clear that the fault is with the server in this case; the user's mail
   agent is free to follow links without incurring obligations.

Obligations of confidentiality, payment, and licensing terms

   This is not to say that there are never any obligations related to
   following links; only that the obligations must be accepted some
other
   way than requesting to follow a link.

   For confidential materials, a straightforward design is:
    1. The client requests access to the materials
    2. The server declines, with an "authorization required" notice, and
       a link to an account application form
    3. The client follows the link to the form, and applies for an
       account, agreeing to the terms and conditions in a POST request
       (or by fax or postal mail, for that matter)
    4. The server provides credentials in response
    5. The client re-requests the matierials, providing credentials

   Web sites that say "by following the link to ABC, you agree to XYZ
   terms and conditions" do not account for the fact that anyone (in
   particular, a search service) can make another link to ABC, and
anyone
   who follows this other link to ABC may never have seen the terms and
   conditions.

Limitations

   Web application design should be informed by not only the principles
   above, but also the relevant limitations.

   The [10]W3C HTML validation service provides an example: the norm is
   that validation requests are done by reference; the form uses GET,
   which gives the results a URI for bookmarks, links, etc; but the
   service also allows clients to upload a document for validation. In
   that case, the form uses POST, since
     * the document to be validated might be confidential; any link to
       the results of validating it would divulge its contents
     * a URI that encoded the entire document would be at least as large
       as the document, and there's little or no use in linking to it,
       since the results will always be the same

     [10] http://validator.w3.org/

   Whether or not GET with HTTP is used for the initial access,
supplying
   a URI for subsequent access to the same information, e.g., using
   Content-Location, is useful.

Myths and transitional limitations

   Myth: search services won't index anything with a ? in the URI anyway
          This was a heuristic to avoid infinite loops in some search
          service crawlers, but it was not an architectural constraint,
          and modern search services use more sophisticated heuristics
to
          avoid loops.

   Myth: URIs cannot be longer than 256 characters
          This was a limitation in some server implementations, and
while
          servers continue to have limitations to prevent
          denial-of-service attacks, they are generally at least 4000
          characters, and they evolve as the legitimate uses of
          application developers evolve.

   Designers of HTML forms that accept non-western characters have been
   challenged by various implementation limitations and gaps in
   specifications. For example:

     The content type "application/x-www-form-urlencoded" is inefficient
     for sending large quantities of binary data or text containing
     non-ASCII characters.


    [11]multipart/form-data in [12]HTML 4.01

     [11] http://www.w3.org/TR/html401/interact/forms.html#h-17.13.4.2
     [12] http://www.w3.org/TR/html401/

   We expect these limitations to be address in future specifications
   (@@e.g. XForms?) and deployed in due course.

Acknowledgements

   Thanks to David Orchard, Larry Masinter, Paul Prescod, Roy Fielding,
   and others for their feedback in response to the [13]15Apr call for
   review.

     [13] http://lists.w3.org/Archives/Public/www-tag/2002Apr/0150.html

Related work

     * Neilsen's [14]1997 rant:

     [14] http://www.useit.com/alertbox/9708a.html

     There is not much you can do to get users to bookmark your site,
     except making it possible to do so: no URL-eating frames, and no
     weird one-time-only links that do not work for subsequent visits.
     * [15]The Power of the URL-Line By Jon Udell August 20, 2001
     * (@@cite stats about the popularity of the back button)
     *

     [15]
http://www.byte.com/documents/s=1113/byt20010816s0002/0820_udell.html

     Safety here is regarded as a relative term. Although safety has
     been defined as "freedom from those conditions that can cause
     death, injury, occupational illness, or damage to or loss of
     equipment or property" [MIL-STD-882B 1984], it is generally
     recognized that this is unrealistic; by this definition any system
     that presents an element of risk is unsafe. ... Unfortunately, the
     question of "How safe is safe enough?" has no simple answer.


    Leveson, Nancy G. [16]Software safety: why, what and how, ACM
    Computing Surveys, June 1986, pages 125-163.

     [16] http://doi.acm.org/10.1145/7474.7528


-- 
Dan Connolly, W3C http://www.w3.org/People/Connolly/

Received on Wednesday, 1 May 2002 17:42:39 UTC