Protocol Modules as State machines
A part of the libwww thread model is to keep track of the current state in
the communication interface to the network. As an example, this section describes
the current implementation of the HTTP module
and how it has been implemented as a state machine. The HTTP module is based
on the HTTP 1.0 specification but is backwards compatible with the 0.9 version.
The major difference between the implementation before version 3.0 of the
Library is that this version is a state machine based on the state diagram
illustrated below. This implementation has several advantages even though
the HTTP protocol is stateless by nature.
The individual states and the transitions between them are explained in the
following sections.
-
BEGIN State
-
This state is the idle state or initial state where the HTTP module awaits
a new request passed from the application.
-
NEED_CONNECTION State
-
The HTTP module is now ready for setting up a connection to the remote host.
The connection is always initiated by a connect system call. In
order to minimize the access to the Domain Name Server, all host names to
previous visited hosts are stored in a local host cache as explained in section
"DNS Cache and Host Name Canonicalization". The
cache handles multi homed hosts in a special way in that it measures the
time it takes to actually make a connection to one of the IP-addresses. This
time is stored together with the specific IP-address and the host name in
the cache and on the next connection to the same host the IP-address with
the fastest connect time is chosen.
-
NEED_REQUEST State
-
The HTTP Request is what the application
sends to the remote HTTP server just after the establishment of the connection.
The request consists of a HTTP header line, a set of HTTP Headers, and possibly
a data object to be posted to the server. The header line has the following
format:
<METHOD> <URI> <HTTP-VERSION> CRLF
-
SENT_REQUEST State
-
When the request is sent the module waits until a response is given from
the server or the connection is timed out in case or an error situation.
As the module does not know whether the remote server is a HTTP 0.9 server
or a HTTP 1.0 it must look at the first part of the response to figure out
what version of HTTP is returned. The reason is that the HTTP protocol 0.9
does not contain a HTTP header line in the response. It simply starts to
send the requested data object as soon as the GET request is handled.
-
NEED_ACCESS_AUTHORIZATION State
-
If a 401 Unauthorized status code is returned
the module asks the user for a user id and a password, see also the
" HTTP Basic Access Authorization Scheme".
The connection is closed before the user is asked for the user-id and password
so any new request initiated upon a 401 status
code causes a new connection to be established. This is done in order
to avoid having the connection hanging around waiting while the applications
is waiting for user input.
-
REDIRECTION State
-
The remote server returns a redirection status
code if the URI has either been moved temporarily or permanent to another
location, possibly on another HTTP server or any other service, for example
FTP or gopher. The HTTP module supports both a temporarily and a permanent
redirection code returned from the server:
-
301 Moved
-
The load procedure is recursively called on a 301 redirection code. The new
URI is parsed back to the user as information via the
Error and Information module,
and a new request generated. The new request can be of any
access scheme accepted
in a URI. An upper limit of redirections has been defined (default to 10)
in order to avoid infinite loops.
-
302 Found
-
The functionality is the same as for a 301 Moved return status. A clever
application can use the returned URI to change the document in which the
URI originates so that the URI points to the new location.
-
-
NO_DATA State
-
When a return code indicates that no data object or resource follows the
HTTP headers the HTTP module can terminate the request and pass control back
to the application.
-
NEED_BODY State
-
If a body is included in the response from the server, the module must prepare
to read the data from the network and direct it to the destination set up
by the application. This is done by setting up a stream stack with the required
conversions.
-
GOT_DATA State
-
When the data object has been parsed through the stream stack, the HTTP module
terminates the request and handles control back to the application.
-
ERROR or FAILURE State
-
If at any point in the request handling a fatal error occurs the request
is aborted and the connection closed. All information about the error is
parsed back to the application via the
Error and Information Module.
As the HTTP protocol is stateless, all errors are fatal between the server
and the server. If the erroneous request is to be repeated, the request starts
in the initial state.
Henrik Frystyk Nielsen,
libwww@w3.org,
@(#) $Id: HTTPFeatures.html,v 1.16 1996/12/09 03:20:54 jigsaw Exp $