Hypertext Markup Language (HTML) Tim Berners-Lee, CERN Internet Draft Expires 14 January 1994 14 July 1993 Hypertext Transfer Protocol (HTTP) A Stateless Search, Retrieve and Manipulation Protocol Status of this memo This document is an Internet Draft. Internet Drafts are working documents of the Internet Engineering Task Force (IETF), its Areas, and its Working Groups. Note that other groups may also distribute working documents as Internet Drafts. Internet Drafts are working documents valid for a maximum of six months. Internet Drafts may be updated, replaced, or obsoleted by other documents at any time. It is not appropriate to use Internet Drafts as reference material or to cite them other than as a "working draft" or "work in progress". This document is a DRAFT specification of a protocol in use on the internet and to be proposed as an Internet standard. Discussion of this protocol takes place on the www-talk@info.cern.ch mailing list -- to subscribe mail to www-talk-request@info.cern.ch. Distribution of this memo is unlimited. Abstract HTTP is a protocol with the lightness and speed necessary for a distributed collaborative hypermedia information system. It is a generic stateless object-oriented protocol, which may be used for many similar tasks such as name servers, and distributed object-oriented systems, by extending the commands, or "methods", used. A feature if HTTP is the negotiation of data representation, allowing systems to be built independently of the development of new advanced representations. Note: This specification This HTTP protocol is an upgrade on the original protocol as implemented in the earliest WWW releases. It is back-compatible with that more limited protocol. This specification includes the following parts: The Request Methods A list of headers in the request message Status codes A list of headers on any object transmitted Format negotiation algorithm The HTTP Registration Authority References The following notes form recommended practice not part of the specification: Servers tolerating clients Clients tolerating servers Purpose When many sources of networked information are available to a reader, and when a discipline of reference between different sources exists, it is possible to rapidly follow references between units of information which are provided at different remote locations. As response times should ideally be of the order of 100ms in, for example, a hypertext jump, this requires a fast, stateless, information retrieval protocol. Practical information systems require more functionality than simple retrieval, including search, front-end update and annotation. This protocol allows an open-ended set of methods to be used. It builds on the discipline of reference provided by the Universal Resource Identifier (URI) as a name (URN, RFCxxxx) or address (URL, RFCxxxx) allows the object of the method to be specified. Reference is made to the Multipurpose Internet Mail Extensions (MIME, RFC1341) which are used to allow objects to be transmitted in an open variety of representations. Overall operation On the internet, the communication takes place over a TCP/IP connection. This does not preclude this protocol being implemented over any other protocol on the internet or other networks. In these cases, the mapping of the HTTP request and response structures onto the transport data units of the protocol in question is outside the scope of this specification. It should not however be at all complicated. The protocol is basically stateless, a transaction consisting of Connection The establishment of a connection by the client to the server - when using TCP/IP port 80 is the well-known port; Request The sending, by the client, of a request message to the server; Response The sending, by the server, of a response to the client; Close The closing of the connection by either both parties. The format of the request and response parts is defined in this specification. Whilst header information defined in this specification is sent in ISO Latin-1 character set in CRLF terminated lines, object transmission in binary is possible. Character sets In all cases in HTTP where RFC822 characters are allowed, these may be extended to use the full ISO Latin 1 character set. 8-bit transmission is always used. tableofcontents REQUEST The request is sent with a first line containing the method to be applied to the object requested, the identifier of the object, and the protocol version in use, followed by further information encoded in the RFC822 header style. The format of the request is: Request = SimpleRequest | FullRequest SimpleRequest = GET URI CrLf FullRequest = Method UR ProtocolVersion CrLf [*] [ ] = ProtocolVersion = HTTP/V1.0 URI = = : = MIME-conforming-message The UR is the Uniform Resource Locator (URL) as defined in the specification, or may be (when it is defined) a Uniform Resource Name (URN) when a specification for this is settled, for servers which support URN resolution. Unless the server is being used as a gateway, a partial URL should be given with the assuptions of the protocol (HTTP:) and server (the server) being obvious. Note. The rest of an HTTP url after the host name and optional port number is completely opaque to the client: The client may make no deductions about the object from its URL. Protocol Version The Protocol/Version field defines the format of the rest of the request.. At the moment only HTRQ is defined . If the protocol version is not specified, the server assumes that the browser uses HTTP version 0.9. Uniform Resource Identifier This is a string identifying the object. It contains no blanks. It may be a Uniform Resource Locator [ URL ] defining the address of an object as described in RFCxxxx, or it may be a representation of the name of an object (URN, Universal Resource Name) where that object has been registered in some name space. At the time of writing, no suitable naming system exists, but this protocol will accept such names so long as they are distinguishable from the existing URL name spaces. Methods Method field indicates the method to be performed on the object identified by the URL. More details are with the list of method names below . Request Headers These are RFC822 format headers with special field names given in the list below , as well as any other HTTP object headers or MIME headers. Data The data (if any) sent with an HTTP request is in a format and encoding defined by the object header fields, the default being "plain/text" type with "8bit" encoding. Note that while all the other information in the request (just as in the reply) is in ISO Latin1 with lines delimited by Carriage Return/Line Feed pairs, the data may contain 8-bit binary data. TERMINATION The delimiting of the message is determined by the Content-Length: field. If this is present, then the message contains the specified number of bytes. If it is not specified, then the message must be terminated by a CrLF . CrLf sequence. This sequence may not be followed by any other data. (Note: This allows the receiver to check only the end part of each received buffer for the start of the termination sequence). Any occurence of the sequence CrLf . within the data itself is converted to CrLF . . on transmission and converted back on reception. This section on termination only applies to data sent with the request. It is not required for data in the reply, when connection closure by the server is used to indicate the end of the data. See also: note on server tolerance for back-compatibility, etc. Methods Method field indicates the method to be performed on the object identified by the URL. The methods GET and HEAD below are always supported, The list of other methods acceptable by the object are returned in response to either of these two requests. This list may be extended from time to time by a process of registration with the design authority. Method names are case sensitive. Currently specified methods are as follows: GET means retrieve whatever data is identified by the URI, so where the URI refers to a data-producing process, or a script which can be run by such a process, it is this data which will be returned, and not the source text of the script or process. Also used for searches . HEAD is the same as GET but returns only HTTP headers and no document body. CHECKOUT Similar to GET but locks the object against update by other people. The lock may be broken by a higher authority or on timeout: in this case a future CHECKIN will fail. SHOWMETHOD Returns a description (perhaps a form) for a given method when applied to the given object. The method name is specified in a For-Method: field. (TBS) PUT specifies that the data in the body section is to be stored under the supplied URL. The URL must already exist. The new contenst of the document are the data part of the request. POST and REPLY should be used for creating new documents. POST Creates a new object linked to the specified object. The message-id field of the new object may be set by the client or else will be given by the server. A URL will be allocated by the server and returned to the client. The new document is the data part of the request. It is considered to be subordinate to the specified object, in the way that a file is subordinate to a directory containing it, or a news article is subordinate to a newsgroup to which it is posted. REPLY The same as post, except that the new object is considered to be on an equal footing to the specified object. CHECKIN Similar to PUT, but releases the lock set on the object. Fails if no lock has been set by CHECKOUT. TEXTSEARCH The object may be queried with a text string. The search form of the GET method is used to query the object. SPACEJUMP The object will accept a query whose terms are the cooridnates of a point within the object. The method is implemented using GET with a derived URL . (Some of these methods require more detailed specification) GET A representation of the object is transferred to the client. Some URIs refer to specific variants of an object, and some refer to objects with many variants. In the latter case, the representations, encodings, and languages acceptable may be specified in the header request fields, and may affect the particular value which is returned. Other possible replies allow a set of URIs to be returned to the client, who may use them to retrieve the object. This allows name servers to be implemented using HTTP, and also forwarding address to be given when objects have been moved. SHOWMETHOD When an object can support more operations than are defined in this specification, SHOWMETHOD allows a client to understand the interface to that operation sufficiently to allow the user to perform it interactively. Required parameter field For-Method: This filed contains only the method name about which the client is inquiring. Preconditions The methodname spacified in the For-Method field must have been previously issued in a "Allowed:" field returned with the given object. The client should specify an Accept: field which includes at least one form langauge it it wants to be able to interpret the result. Postcondidtion SHOWMETHOD returns, if possible, a form in a representation acceptable to the client. This form will contain instructions for ordering the operation, and fields for the parameters. SPACEJUMP This method is similar to the TEXTSEARCH method, but instead of the search criterion being a text string, it is a set of coordinates defining a point within the image. The semantics of the operation are not defined here. Typically, the user clicks on a point within the image with a mouse or other pointing device. Two or more coordinates are supplied, in the order x, y z, t. All coordinates are scaled so that 0 represents the bottom left hand point and 1.0 represents the top right hand point. The z access direction follows the normal right-hand rule, that is extends toward the viewer when the x and y axes are flat as in the normal two-dimensional representation. In the case of a time-occupying object, 0 represents the starting instance, and 1.0 represents the finishing instant. The method is implemented using GET with a derived URL. TEXTSEARCH This is a simple form of search. The text is assumed to derive from the requesting user, and is in no special format. The exact algorithm to be applied is not defined in this specification, but techniques such as vocabulary proximity matching between the request data portion and the contents or titles of documents, keyword matching, stemming, and the use of a thesaurus are quite appropriate. Whilst this method name is given as a flag to specify that the function is available, the search form of the GET method is in fact used to query the object. HTTP Request fields These header lines are sent by the client in a HTTP protocol transaction. All lines are RFC822 format headers. The list of headers is terminated by an empty line. FROM: In Internet mail format, this gives the name of the requesting user. This field may be used for logging purposes and an insecure form of access protection. The interpretation of this field is that the request is being performed on behalf of the person given, who accepts responsability for the method performed. The Internet mail address in this field does not have to correspond to the internet host which issued the request. (For example, when a request is passed through a gateway, then the original issuer's address should be used). The mail address should, if possible, be a valid mail address, whether or not it is in fact an internet mail address or the internet mail representation of an address on some other mail system. ACCEPT: This field contains a comma-separated list of representation schemes (MIME compatible Content-Type values) which will be accepted in the response to this request. The set given may of course vary from request to request from the same user. This field may be wrapped onto several lines according to RCFC822, and also more than one occurence of the field is allowed with the signifiance being the same as if all the entries has been in one field. The format of each entry in the list is (/ meaning "or") = Accept: *[ ; ] = *[ , ] = = = q / mxs / mxb = See the appendix on the negotiation algorithm as a function and penalty model. If no Accept: field is present, then it is assumed that text/plain and text/html are accepted. Example Accept: text/plain; text/html Accept: text/x-dvi, q=.8, mxb=100000, mxt=5.0; text/x- c ACCEPT-ENCODING: Similar to Accept, but lists the Content-Encoding types which are acceptable in the response. = Accept-Encoding: *[ , ] = *[ , ] Example Accept-Encoding: x-compress; x-zip ACCEPT-LANGUAGE: Similar to Accept, but lists the Language values which are preferable in the response. A response in an unspecifies language is not illegal. See also: Language. Language coding TBS. (ISO standard xxxx) USER-AGENT: This line if present gives the software program used by the original client. This is for statistical purposes and the tracing of protocol violations. It should be included. The first white space delimited word must be the software product name, with an optional slash and version designator. Other products which form part of the user agent may be put as separate words. = User-Agent: + = [/] = Example: UserAgent: LII-Cello/1.0 libwww/2.5 REFERER: This optional header field allows the client to specify, for the server's benefit, the address ( URI ) of the document (or element within the document) from which the URI in the request was obtained. This allows a server to generate lists of back-links to documents, for interest, logging, etc. It allows bad links to be traced for maintenance. If a partial URI is given, then it should be parsed relative to the URI of the object of the request. Example: Referer: http://info.cern.ch/hypertext/DataSources/Over view.html AUTHORIZATION: This line is present contains authorization information. The format is To Be Specified (TBS). The format of this field is in extensible form. The first word is a specification of the authorisation system in use. Proposals have been as follows: (and see current one for implementation by Ari) User/Password scheme Authorization: user fred:mypassword The scheme name is "user". The second word is a user name (typically derived from a USER environment variable or prompted for), with an optional password separated by a colon (as in the URL syntax for FTP). Without a password, this povides very low level security. With the password, it provides a low-level security as used by unmodified FTP, Telnet, etc. Kerberos Authorization: kerberos kerberosauthenticationsparam eters The format of the kerberosauthenticationsparameters is to be specified. CHARGETO: This line if present contains account information for the costs of the application of the method requested. The format is TBS. The format of this field must be in extensible form. The first word starts with a specification of the namespace in which the account is . (This is similar to extensible URL definition.) No namespaces are currently defined. Namespaces will be registered with the registration authority . The format of the rest of the line is a function of the charging system, but it is recommended that this include a maximum cost whose payment is authorized by the client for this transaction, and a cost unit. Note: Server tolerance of bad clients Whilst it is seen appropriate for testing parsers to check full conformance to this specification, it is recommended that operational parsers be tolerant of deviations. In particular, lines should be regarded as terminated by the Line Feed, and the preceeding Carriage Return character ignored. Any HTTP Header Field Name which is not recognised should be ignored in operational parsers. It is recommended that servers use URIs free of "variant" characters whose representation differs in some of the national variant character sets, punctuation characters, and spaces. This will make URIs easier to handle by humans when the need (such as debugging, or transmission through non hypertext systems) arises. RESPONSE The response from the server shall start with the following syntax (See also: note on client tolerance ): ::= ::= 3* ::= 3* ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 ::= * identifies the HyperText Transfer Protocol version being used by the server. For the version described by this document version it is "HTTP/1.0" (without the quotes). < status code > gives the coded results of the attempt to understand and satisfy the request. A three digit ASCII decimal number. gives an explanation for a human reader, except where noted for particular status codes. Fields on the status line are delimited by a single blank (parsers should accept any amount of white space). The possible values of the status code are listed below . Response headers The headers on returned objects are RDC822 format headers with special field names given below , as well as any MIME conforming headers, notably the Content-Type field. Response data Additional information may follow, in the format of a MIME message body. The significance of the data depends on the status code. The Content-Type used for the data may be any Content-Type which the client has expressed his ability to accept, or text/plain, or text/html. That is, one can always assume that the client can handle text/plain and text/html. Status codes The values of the numeric status code to HTTP requests are as follows. The data sections of messages Error, Forward and redirection responses may be used to contain human-readable diagnostic information. SUCCESS 2XX These codes indicate success. The body section if present is the object returned by the request. It is a MIME format object. It is in MIME format, and may only be in text/plain, text/html or one fo the formats specified as acceptable in the request. OK 200 The request was fulfilled. CREATED 201 Following a POST command, this indicates success, but the textual part of the response line indicates the URI by which the newly created document should be known. ERROR 4XX, 5XX The 4xx codes are intended for cases in which the client seems to have erred, and the 5xx codes for the cases in which the server is aware that the server has erred. It is impossible to distinguish these cases in general, so the difference is only informational. The body section may contain a document describing the error in human readable form. The document is in MIME format, and may only be in text/plain, text/html or one for the formats specified as acceptable in the request. Bad request 400 The request had bad syntax or was inherently impossible to be satisfied. Unauthorized 401 The parameter to this message gives a specification of authorization schemes which are acceptable. The client should retry the request with a suitable Authorization header. PaymentRequired 402 The parameter to this message gives a specification of charging schemes acceptable. The client may retry the request with a suitable ChargeTo header. Forbidden 403 The request is for something forbidden. Authorization will not help. Not found 404 The server has not found anything matching the URL given Internal Error 500 The server encountered an unexpected condition which prevented it from fulfillingthe request. Not implemented 501 The server does not support the facility required. REDIRECTION 3XX The codes in this section indicate action to be taken (normally automatically) by the client in order to fulfill the request. Moved 301 The data requested has been assigned a new URI, the change is permanent. (N.B. this is an optimisation, which must, pragmatically, be included in this definition. Browsers with link editing capabiliy should automatically relink to the new reference, where possible) The response contains one or more header lines of the form Location: String CrLf Which specify alternative addresses for the object in question. The String is an optional comment field. Found 302 The data requested actually resides under a different URL, however, the redirection may be altered on occasion (when making links to these kinds of document, the browser should default to using the Udi of the redirection document, but have the option of linking to the final document) as for "Forward". The response format is the same as for Moved . Method 303 Method: body-section Like the found response, this suggests that the client go try another network address. In this case, a different method may be used too, rather than GET. The body-section contains the parameters to be used for the method. This allows a document to be a pointer to a complex query operation. The body may be preceded by the following additional fields as listed . Object Headers The header fields given with or in relation to objects in HTTP are as follows. All are optional. The order of header lines withing the HTTP header has no significance. However, those fields which are not MIME fields should occur before the MIME fields, so that the MIME fields and following form a valid MIME document. This is not mandatory. Any header fields which are not understood should be ignored. (TBS in more detail) ALLOWED: *METHOD Lists the set of requests which the requesting user is allowed to issue for this URL. If this header line is omitted, the default allowed methods are "GET HEAD" Example of use: Allow: GET HEAD PUT PUBLIC: *METHOD As "Allow" but lists those requests which anyone may use. If omitted, the default is "GET" only. Example of use: Public: GET HEAD TEXTSEARCH CONTENT-LENGTH: INT Implies that the body is binary and should be read directly from the communications link, without parsing lines, etc. When the data is part of the request, prevents the escaping and de-escaping of the termination sequence. @@@ This should be part of the MIME header, as it applies to any binary encoded part. Note HTML is the first internet protocol to allow MIME "binary" encoding. In MIME, the use of Content-Length is currently allowed only for external messages. CONTENT-TYPE: As defined in MIME, except: Extra non-MIME types It is reasonable to put strict limits on transfer formats for mail, where there is no guarantee that the receiver will understand a weird format. However, in HTTP one knows that the receiver will be able to receive it because it will have been sent in the Accept: field. There is therefore a lot to be gained from a very complete registry of well-defined types for HTTP which may nevertheless not be recommended for mail. In this case, the content-type list for HTTP may be a superset of the MIME list. The x- convention for experimental types is of course still available as well. Type parameters Parameters on the content type are extremely useful for describing resolutions, colour depths, etc. They will allow a client to specify in the Accept: field the resolution of its device. This may allow the server to economise greatly on transmission time by reducing the resultion of an image, for example. These parameters are to be specified when types are registered.. @@ TBS. DATE: DATE Creation date of object. (or last modified, and separately have a Created: field?) Format as in RFC850 but GMT MUST BE USED. EXPIRES: DATE Gives the date after which the information given ceases to be valid and should be retrieved again. This allows control of caching mechanisms, and also allows for the periodic refreshing of displays of volatile data. Format as for Date:. This does NOT imply that the original object will cease to exist. LAST-MODIFIED: DATE Last time object was modified, i.e. the date of this version if the document is a "living document". Format as for Date:. MESSAGE-ID: URI A unique identifier for the message. As in RFC850 , except that the unlimited lifetime of HTTP objects requires that the Message-ID be unique in all time, not just in two years. A document may only have one Message-ID. No two documents, even if different versions of the same live document, may have the same Message-id. VERSION-URI: 1*URI This gives a URI with which the object may be found. There is no guarantee that the object can be retrieved using the URI specified. However, it is guaranteed that if an object is successfully retrieved using that uri it will be the same unmodified object as this one. Multiple occurencies of this field give alternative access names or addresses for the live document. LIVE-URI: 1*URI This gives a URI with which the most recent version of an object, may be found. There is no guarantee that the object can be retrieved using the URI specified. However, it is guaranteed that if an object is successfully retrieved using that uri that it will be the same object or a more recent version of the same object. Multiple occurencies of this field give alternatives which should refer to the same live object. LANGUAGE: CODE The language code is the ISO code for the language in which the document is written. If the language is not known, this field should be omitted of course . The language code is an ISO 3316 language code with an optional ISO639 country code to specify a national variant. Example Language: en_UK means that the content of the message is in British English, while Language: en means that the language is English in one of its forms. (@@ If a document is in moe than one language, for example requires both Greek Latin and French to be understood, should this be representable?) See also: Accept-Language. COST: TBS The cost of retrieving the object is given. This is the cost of access of a copyright work. Format of units to be specified. Currently refers to an unspecified charging scheme to be agreed out of band between parties. Note: Client tolerance of bad servers Servers not implementing the specification as written are not HTTP compiant. Servers should always be made completely copmpliant. However, clients should also tolerate deviant servers where possible. BACK COMPATIBILITY In order that clients using the HTTP protocol should be able to communicate with servers using the protocol originally implemented in the W3 data model, clients should tolerate responses which do not start with a numeric version number and response codes. In this case, they should assume that the rest of the response is a document body in type text/html. WHITE SPACE Clients should be tolerant in parsing response status lines, in particular they should accept any sequence of white space (SP and TAB) characters between fields. Lines should be regarded as terminated by the Line Feed, and the preceeding Carriage Return character ignored. HTTP NEGOTIATION ALGORITM This note defines the significance of the q, mxb and mxs values optionally sent in the Accept: field of the HTTP protocol request message. It is assumed that there is a certain value of the presentation of the document, optimally rendered using all the information available in its original source. It is further assumed that one can allocate a number between 0 and 1 to represent the loss of value which occurs when a document is rendered into a representation with loss of information. Whilst this is a very subjective measurement, and in fact largely a function of the document in question, the approximation is made that one can define this "degradation" figure as a function of merely the representation involved. The next assumption is that the other cost to the user of viewing the document is a function of the time taken for presentation. We first assume that the cost is linear in time, and then assume that the time is linear in the size of the message. The final net value to the user can therefore be written presented_value = initial_value * total-degradation - a - b * size for a document in a given incoming representation. Suppose we normalize the initial value of the document to be 1. The server may judge that the value in a particular format is less than 1 is a conversion on the server side has lost information. The total degradation is then the product of any degradation due to conversions internal to the server, and the degradation "q" sent in the Accept field. If q is not sent, it defaults to 1. The values of a and b have components from processing time on the server, network delays, and processing time on the client. These delays are not additive as a good system will pipeline the processing, and whilst the result may be linear in message size, calculation of it in advance is not simple. The amount of pipelining and the loads on machines and network are all difficult to predict, so a very rough assumption must be made. We make the client responsible for taking into account network delays. The client will in fact be in a better position to do this, as the client will after one transaction be aware of the round-trip time. We assume that the delays imposed by the server and by the client (including network) are additive. We assume that the client's delay is proportional to message size. The three parameters given by the client to the server are q The degradation (quality) factor between 0 and 1. If omitted, 1 is assumed. mxb The size of message (in bytes) which even if immediately available from the server will cause the value to the reader to become zero mxs The delay (in seconds) which, even for a very small message with no length-related penalty, will cause the value to the reader to become zero. These parameters are chosen in part because they are easy to visualize as the largest tolerable delay and size. If not sent, they default to infinity. The server may optimize the presented value for the user when deciding what to return. The hope is that fine decisions will not have to be made, as in most cases the results for different formats will be very different, and there will be a clear winner. A suitable algorithm is that the assumed value v of a document of initial value u delivered to the network after a delay t whose transfer length on the net is b bytes is v = u * q - b/mxb - t/mxs Note that t is the time from the arrival of the request to the first byte being available on the net. [[See also: Design issues discussions around this point.]] Note: The cost of retrieval time The assumption that the cost to the user associated with a certain retrieval time is linear in that time is wildly innaccurate. The real function could be very dependent on circumstances (like go to infinity at a deadline). A better general approximation might be logarithmic for large time delays, and linear for small ones, like a*log(b*t-1) which has two parameters. REGISTRATION AUTHORITY The HTTP Registration Authority is responsible for maintaining lists of: Charge account name spaces (see ChargeTo: field above) Authorization schemes (see Authorization: field above) Data format names (as MIME Content-Types) Data encoding names (as MIME Content-Encoding)) It is proposed that the Internet Assigned Numbers Authority or their successors take this role. Unregistered values may be used for experimental purposes if they are start with "X-". REFERENCES RFC 822 "Standard for ARPA Internet Text Messages". David H. Crocker, describes Internet mail message fromat. RFC850 "Standard for Interchange of USENET Messages" This RFC uses some field names in common with this specification, and is relevant reading. RFC977 "Network News Transfer Protocol", Kantor and Lampsley. RFC 1341 Multipurpose Internet Mail Extensions (MIME), Nathaniel Borenstien and Ned Freed, Internet RFC 1341, 1992. URL Universal Resource Locators. RFCxxx. Currently available by anonymous FTP from info.cern.ch as /pub/ietf/url3.{ps,txt}. MIME and PEM Internet Draft only