Programming the Web: An Application-Oriented Language for Hypermedia Service Programming

David A. Ladd
J. Christopher Ramming

Abstract
MAWL is an application language for programming interactive World Wide Web services. The language is small, because no construct was introduced without compelling justification; as with yacc [9], general-purpose computation is done in a host language. MAWL offers conveniences such as control abstraction, persistent state management, synchronization, and shared memory. In addition, the MAWL compiler performs static checking designed to prevent common Web programming errors. In this paper we discuss the design and engineering of MAWL in the context of our general language design philosophy. We also include an appendix of commentary on several short MAWL programs.
Keywords
Programming Languages, Application-Oriented Languages, World Wide Web

Introduction

The scope and diversity of the World Wide Web (the Web) are expanding daily. Much of the popularity of the Web is undoubtedly due to the simplicity and robustness of its underlying protocol, HTTP [3]. The source of this simplicity is the fact that HTTP is a stateless protocol: no HTTP transaction is defined in terms of the transactions that precede it. Because HTTP was originally designed for straightforward hypertext document publishing, the stateless nature of HTTP has been acceptable.

Now that Web browsers and servers are ubiquitous on the internet, it is worthwhile to exploit the medium by using it to provide interactive services in addition to document serving. Although a stateless protocol is sufficient for serving stand-alone documents, it is an inconvenient basis for interactive services with an inherently sequential structure, such as banking transactions and ticket reservations.

In addition to obstacles posed by a stateless protocol, Web services must typically be constructed the basic building-blocks of modern programming practice. Lacking these building blocks, Web programmers must continually reimplement basic programming constructs. For example, memory, including some notion of a program counter, must be managed explicitly. Moreover, programmers must also address problems that are peculiar to Web service programming. For example, all Web programmers must guard against a user's failure to fill in required form fields. Also, because servers handle numerous requests concurrently, programmers are compelled to invent methods to protect resources such as shared files from conflicting simultaneous requests.

In order to provide quality services, Web programmers need appropriate reusable abstractions. Unfortunately, the languages that are commonly used to build Web services (Tcl [19], Perl [22], awk [1], and various shells [12]) offer little in the way of static analysis or guarantees. And general-purpose languages such as ML [16], [21], and Eiffel [17], that as a matter of policy guard against dynamically discovered errors, do not specifically address common Web-programming errors, such as dangling URL's and incorrect HTML. Therefore, it is worthwhile to explore systems which both make available reusable abstractions and detect Web-programming errors at compile time.

Web programmers encounter three major problems: First, a deficiency in the underlying protocol. Second, a lack of reusable solutions to common Web programming problems. And third, the need for a way to uncover Web-specific programming errors at compile time---independently of execution---rather than at run time. MAWL is a language designed to address these issues.

MAWL Services and Varieties of State

 

A Web site typically comprises a set of hypertext documents connected to each other and to the rest of the Web by hyperlinks. The most elementary Web programming is simply the design of Web pages using appropriate layout marks. Here the programmer's burden is essentially to produce correct HTML. But for interactive services, the real task is to specify complex interactions with multiple simultaneous users interacting over a network. Such services happen to use Web browsers and HTML as a convenient and universally available user interface.

A MAWL service consists of a MAWL program and documents written in an extension of HTML (MHTML). The MAWL compiler processes the MAWL program together with its MHTML documents. We draw a sharp distinction between MHTML markups and MAWL programs, and present separate syntaxes. Figure 1 describes how MHTML differs from HTML; the balance of the grammar rules given in the figures apply to MAWL program code. A MAWL program consists of one or more related sessions, which are similar to the procedures of other languages. Each session offers an entry point to the service.

We distinguish between a session and a service because many services are organized around resources that can be used in several ways. For instance, our department's Web-based book-ordering service is loosely organized around a database of book vendors and order records. The bookbot has several administrative functions as well as several user-related entry points. Each of these functions or entry points corresponds to a MAWL session, and collectively these sessions are referred to as a service.

MAWL services are distinguished by the fact that they are stateful, by which we mean that information is stored between HTTP transactions and used to affect subsequent transactions. In MAWL, state consists of a collection of variables with certain scoping and persistence attributes. This allows us to categorize MAWL services according to the kind of variables that they use. Local variables are variables that are visible only within a certain lexical region of a service description; global variables are variables that are available to any session in a service. Local variables are distinguished as being either static or automatic.

If a new copy of a local variable is associated with each interactive session then the variable is called automatic. For example, the program counter is a (hidden) variable that the system uses to implement sequencing. User-defined automatic variables can be used to hold intermediate results during the course of a session, and are an essential ingredient of dialogues.

A session is defined to be a sequence of MAWL statements, some of which involve presenting HTML forms and awaiting the replies. As in imperative computer programs, a program counter indicates the point within the session at which execution is taking place. Users cannot arbitrarily skip ahead or go back in a session. The most primitive application of a session is sequencing; that is, controlling the order in which HTML documents are presented. An advertiser-supported service may, for instance, insist that every fourth page be an advertisement. Without a notion of session there is no way to force users to step through the commercial messages.

Sequences in which the content depends on the history of a session are called dialogues. User authentication---for example, password-authentication on directories---is a particular kind of dialogue that has been provided for in HTTP.

Services with more complex dialogues---for example, the popular touchtone telephone service through which one can find out where and when specific films are showing, and then purchase tickets---are becoming increasingly common in the telephone network. Such services should be equally available on the Web. MAWL's sessions and user-defined local automatic variables form a basis for such services.

Although many services can be constructed using only variables that persist for the duration of a session (automatic variables), static variables, that is, variables that persist forever, are also useful. For instance, it is common practice to keep track of how many times a particular page has been served; in this case one would specify this quantity as a static local variable. Another application of static local variables is annotation, in which users of a particular service might add comments that will be visible to the next user.

In order to implement Web-based multiple-user services such as ``chat'' programs, through which people can communicate in real time, it is necessary for a server to mediate between two or more concurrent sessions. This is done using global variables---variables that are available to any session in a service. Global variables lead to a new class of collaborative Web services, such as shared editing and shared code inspection.

The design of an application language can be divided in two parts: the part meant to address general goals---goals that can be associated with any application--- and the part that is special to the particular application domain. Our experience has resulted in several major design decisions of both kinds.

General Language Design Philosophy

A number of issues arise when designing application-specific languages. One question involves domain-specific errors and how to avoid them; this leads to choices about what should be implicit in all programs and what should be explicitly addressed by each programmer. Another question is how to handle general-purpose computing.

Error avoidance

Programming is an inherently error-prone activity; therefore we prefer languages that guard against common errors over those which do not. When feasible we make design choices that either eliminate the possibility of error or reduce it through effective compile-time analysis.

Sometimes entire classes of bugs can be eliminated by a method we call implicit specification. If a language does not offer constructs of a specific, troublesome kind, then errors associated with that construct can never be blamed on the programmer. The language ML, for example, does not offer constructs for memory allocation and freeing: memory is managed without any explicit help from the programmer, and memory errors simply cannot occur.

On the other hand, the details of a programming problem often must be specified explicitly. Whenever an aspect of programming must be specified explicitly, errors can occur. The crucial issue is whether these errors will be detected statically or dynamically. Sometimes a language can be restricted in order to obtain compile-time safety, and other times it cannot.

For instance, in a language that had a single type, say ``string,'' one would be forced to encode all values as strings. Such encodings cannot in general be analyzed at compile time, leading to dynamic errors and the need for careful testing.

For this reason, we prefer languages with expressive typing constructs. Such languages enable users to make distinctions that compilers can analyze, so that a wide class of errors can be uncovered at compile time.

In designing MAWL, we sought to handle certain programming problems implicitly. When that was impossible or undesirable, we strove to offer checks that would uncover bugs as early as possible in the software development cycle.

Approach to general-purpose computing

An application programming language must address problems specific to a domain, but in most application domains, some measure of general-purpose computing is unavoidable. At issue is the way in which general-purpose computing is to be managed in application-oriented contexts.

We consider three general strategies:

Three strategies

We have adopted the extension approach in earlier work [13], but the results have not been entirely satisfactory. One drawback is that there is no universally accepted language for general-purpose computing, so any choice will be the wrong one. Another pitfall is that the extended language is tied to (some particular implementation of) the base language, and any changes in the base language need to be reflected in the extended language.

Numerous application languages have been designed with their own general-purpose computing constructs (PostScript is a good example [8]). To learn a variety of syntaxes for similar language constructs is irritating, and subtle semantic variations can pose serious problems. Numerous application languages, each with its own general-purpose computing constructs, would be difficult to sustain.

Therefore we favor the third approach, which has been used to good effect by successful application languages such as yacc [9] and make [5]. On the principle of parsimony, we included no constructs in MAWL that were easily obtained from general-purpose computing languages. We strove for an application language in which each construct is clearly related to a design goal that could not be easily fulfilled by a general-purpose computing language. We have instead offered, in the spirit of yacc, an interface that should be easily adapted to the general-purpose-computing language of a user's choice.

Languages vs. libraries

  Another approach to supporting application-specific programming is to eliminate the special-purpose language constructs entirely, and write special-purpose application libraries in a general-purpose language.

It is commonly held that library design and language design are equivalent, or, stated in more general terms, that the abstraction facilities built into existing general-purpose languages are sufficient to achieve the effect of any application language [11,10]. To the contrary, we hold that application languages motivated only by functional abstraction, control abstraction, and/or data abstraction (possibly buried in syntactic sugar) are on shaky ground for precisely the reason that the same effect could be achieved with a good general-purpose language and an application library.

Instead, an application language should be justified on the grounds that it exploits the distinction between compile time and run time to effect useful analyses. Most general-purpose programming languages use this opportunity to detect flawed programs and perform optimizations. The most useful application languages perform domain-specific analysis at compile time. For instance, yacc users are offered some compile-time guarantee that their language can be implemented efficiently, as well as information about ambiguities. Users of make are informed of circular dependencies. A language called PRL5, for specifying database constraints, was designed at AT&T Bell Laboratories so that constraints expressed in the language could always be transformed into efficient transaction guards. [14,7]

Thus application-language design is primarily a problem of determining which compile-time analyses are useful in a particular domain, and then finding a way to express the necessary computation in ways that the analyses are nonetheless decidable. Since it is impossible to perform useful analyses on arbitrary programs in Turing-complete languages, the trick is often to balance restrictions against convenience. If a useful analysis can be performed on programs in the language, then the language has an advantage that cannot in general be matched by any approach involving libraries and general-purpose languages.

Application-Driven Design Goals

Static analysis

From a software engineering perspective, a compelling justification for using a special application language is the kind of static analysis and computation that can be performed on programs in the language. Two of the most common benefits of compile-time analyses are optimizations and safety checks of various kinds. MAWL was designed primarily with safety checks in mind: the intent was to prevent common Web programming errors.

Static HTML analysis

One common problem with Web services is that their HTML documents can be syntactically incorrect. Syntax errors sometimes survive because no HTML parser has been invoked, but many errors are outside the realm of such parsers in any case because they occur in dynamically generated documents. For these documents the task is not HTML parsing: it is ensuring that scripts written in arbitrary languages will generate legal HTML---an undecidable problem. Ambitious Web services invariably generate HTML on the fly to reflect run-time conditions, and as a consequence have become notorious suppliers of fractured syntax.

The first requirement on MAWL was therefore that errors in the HTML used by MAWL services should be discoverable at compile time, independent of execution, whenever possible. MAWL achieves this (in conjunction with certain declarations) by extending HTML with the new marks described in Figure 1. The purpose of these marks is to offer a way to place program variables in the document during execution. The complement, a way to collect values from a <FORM...> is provided by marks such as INPUT..., TEXTAREA..., and SELECT.... This scheme offers two advantages

      <MVAR NAME=variable-name>
      <MITER NAME=variable-name MCURSOR=variable-name>...</MITER>

  
Figure 1: New Marks Defined by MAWL-extended HTML (MHTML)

The MVAR mark is for variable replacement; the variable-name is declared separately in the service logic, and is a variable of any scalar printable type. The MITER mark is used to iterate over the list-typed variable specified by the NAME attribute; there is an iteration variable MCURSOR that is set to the value of each element, and the MHTML enclosed by the MITER marks is expanded once for each element. The MVAR mark is legal anywhere ordinary text is legal, and the MITER mark is legal only in places where zero or more of its enclosed MHTML are legal (this restriction guarantees that the resulting HTML document will conform to the standard HTML grammar).

In circumstances where, without MAWL, an entire document would have to be generated dynamically, users of MAWL-extended HTML are able to compose their documents statically, specifying portions that are run-time variabilities using variables with the MVAR syntax. In this way, the MHTML can be parsed at compile time and analyzed for correctness independent of execution; the scope of dynamically generated components is limited, as is the effect of any errors these fragments may contain.

Thus MHTML documents and forms used in Web services can be considered typed, particularly with the introduction of variable substitutions. Variables that are expanded in an MHTML document may be considered the input parameters, and the values set by certain FORM marks--- INPUT fields, TEXTAREA fields, and SELECT menus---can be considered the output parameters.

In order to eliminate certain HTML programming errors, all MHTML documents used in a MAWL program must be declared according to the syntax in Figure 2.

declaration:
        mhtml record-declaration : doc-name+ ;
      | mhtml record-declaration -> record-declaration : doc-name+ ;

record-declaration:
        { field-decl-comma-list }

field-decl-comma-list:
        field-decl
      | field-decl-comma-list , field-decl

field-decl:
        field-name : type
      | field-name

  
Figure 2: MHTML Declarations

The first part of an mhtml declaration declares the type of the form, giving the type (always a MAWL record) of the data passed from the program to the form, and optionally a type for the data coming back from the form. Following the type specification is a list of the form identifiers being declared to have the type. To display a particular form, it must be invoked with a record argument of the appropriate type according to the syntax of Figure 3; the return value must also correspond to the declaration. Note that the MAWL expression for serving a document has the flavor of a remote procedure call.

expr:
        mhtml.put [ doc-name, expr ]

  
Figure 3: MHTML Usage

With MHTML, MAWL users are able to statically specify much of what Web programmers are accustomed to generating with Tcl and Perl programs. In addition, when documents have input and output values, these values are declared and checked by the compiler. The net result is a dramatic reduction in certain common errors.

Compile-time optimizations

One advantage of programming languages is that they offer an opportunity to perform compile-time optimizations; application-specific languages therefore have an opportunity to perform application-specific optimizations. For MAWL, two interesting optimizations involve selecting its execution model and tuning the degree to which MHTML expansion is interpreted.

The choice of execution model has perhaps the greatest impact on service performance; we consider two models. In the first, a MAWL service executes under the control of a traditional Web server and communicates via the common gateway interface. In the second, the MAWL service is itself an HTTP server listening to a TCP port. The ML instantiation of MAWL (e.g., the version of MAWL which uses ML as its host language) can generate code for either execution model. In the case where the MAWL service assumes the role of HTTP server, each instance of a session corresponds to a Concurrent ML thread [20]. Thus, a form submission simply leads to the awakening of a lightweight thread with a new output file descriptor, not the fork(), exec(), and interpreter start-up overhead of, for instance, a Tcl process.

Another factor in the execution efficiency of MAWL programs is the degree to which MHTML elaboration is interpreted. Many points along the spectrum from fully interpretive to fully compiled, inline code are possible. In the current instantiation of MAWL, MHTML documents are stored as text in the service's ML heap. An earlier prototype stored the documents as a vector of strings and variable substitution instructions. Another plausible option is to encode each form as a function in the host language, consisting mainly of output of literal strings. Further options include storing the MHTML in the server file system and storing the MHTML in a compressed form. Once again, the fact that MAWL is a language rather than a library allows more flexibility in delivering Web services.

Abstractions

In addition to some constructs that were introduced to support static analysis, there are other constructs which are justified largely on the grounds of convenience.

Control flow abstraction

Traditionally, the Web has been a medium for publishing hypermedia documents; for such publishing, a stateless protocol is sufficient. However, more advanced Web services frequently need to present the user with certain documents in a certain order. Moreover, it is often necessary for a service to remember information over the life of a session with a user. These needs pose a significant obstacle for Web service programmers, who need to manage state and flow control explicitly. Web service programs often look as though they were produced by a compiler that was translating from an imperative language into a pure functional language. By this we mean that programmers must specify explicitly in each HTML form the instruction that should be executed next (the continuation), and in what environment. Such a programming style is tremendously inconvenient to humans, so MAWL offers facilities not only for declaring and checking HTML, but also for managing flow control and state for those who prefer to code imperatively.

In HTML documents the ``next'' activity is hard-coded into each form by offering its URL as the ACTION parameter of a FORM mark or providing an HREF which continues the interaction. Figure 4 shows a user-registration form that asks for a user's name and email address. This information is then passed to a program called registerUser, which presumably stores the information and then generates a new HTML document, which in turn must specify its next ACTION. But because the flow control is explicitly defined in the ACTION parameter of FORM, the forms cannot be rearranged or used in other contexts without modification. In addition, since the next state is specified explicitly, there is a possibility that programmers will introduce errors by incorrectly specifying a continuation.

        <HEAD><TITLE>query form</TITLE></HEAD>
        <BODY><FORM METHOD=post ACTION=/cgi-bin/registerUser>
        What is your name? <INPUT VALUE=name>
        What is your email address? <INPUT VALUE=email>
        <INPUT TYPE=submit VALUE=execute>
        </FORM></BODY>

  
Figure 4: CGI Form Example

By way of solution, \MAWL{} supplies the form's ACTION and METHOD fields automatically. (In practice this is done by the same preprocessor that performs the variable substitutions described earlier.) A complete MAWL service therefore consists of a set of MHTML documents and a MAWL program that sequences the documents: it is this flow-of-control specification that forms the backbone of the MAWL language.

session: session session-name [arg-name = default-string-text] compound-stmt

stmt:
        compound-stmt
      | if expr  compound-stmt else compound-stmt
      | for expr, expr, expr compound-stmt
      | while expr compound-stmt
      | break
      | continue

compound-stmt: { stmt* }

  
Figure 5: Control-flow Specification Syntax

The purpose of a session is to describe the order in which MHTML forms should be served; the compound-stmt is the syntactic device for listing individual stmts; it is this order that will specify the session's control flow. Because sequencing alone is often insufficient for describing real services, constructs are provided for conditional execution and looping.

By introducing flow-of-control syntax, it is possible to supply the ACTION field---heretofore specified explicitly by hapless programmers---automatically, thus reducing the possibility of error. At the same time, forms become more abstract and less tied to particular services or points in the execution of a program.

Memory management abstraction

Although some Web services can be constructed from sequenced HTML alone, it is often necessary to preserve values across the presentation of the HTML. As in most imperative programming languages, it is convenient to have some notion of state---a set of variables. Like control flow, implementing state is something of a trick for Web programmers, since HTTP is a stateless protocol.

Figure 6 illustrates a typical Web programming technique for maintaining state across form calls. The pathname of the ACTION part of the FORM mark has been extended with the state information of interest. (Here some variable ``email'' has the value ``benedikt@research''.) This information will be stripped out by the HTTP server and passed to the program in file /cgi-bin/search, where it will be used. (There are a variety of mechanisms, for example hidden fields, that can be used to pass the context to the ACTION program, but all available methods suffer from the fact that they are nonetheless explicit.)

        <HEAD><TITLE>query form</TITLE></HEAD>
        <BODY><FORM METHOD=post ACTION=/cgi-bin/search/email=benedikt@research>
        What would you like to search for? <INPUT NAME=searchString>
        <INPUT TYPE=submit VALUE=execute>
        </FORM></BODY>

  
Figure 6: CGI Form Example

Explicit approaches to state management suffer from the fact that they are error-prone: non-portable constructs are often used, variable names are mistyped, and it is often necessary to effect encoding and decoding. When concurrency is introduced, additional complications arise because the atomicity of certain operations must be preserved. State management therefore becomes difficult to describe in a Web program, and this difficulty---a problem in its own right---is compounded because the resulting code can be hard to change as the service evolves.

declaration:
        auto datatype : varasgn-opt-list ;
        constant datatype : varasgn-list ;
        static datatype : varasgn-list ;

varasgn:
        var-name = expr

varasgn-opt:
        var-name
      | var-name = expr

varasgn-opt-list:
        varasgn-opt
      | varasgn-list, varasgn

varasgn-list:
        varasgn
      | varasgn-list, varasgn

datatype:
        integer
      | boolean
      | string
      | void
      | record-declaration
      | datatype list

expr:
        var-name
      | varasgn

stmt:
        declaration ;

  
Figure 7: Variable Declaration Syntax

MAWL addresses these problems by extending its statement sequencing with variable declarations, references, and assignments. Variables are either automatic or static. Each running session has its own private copy of the automatic variables. Static variables (which may be either local or global) are initialized when the service is started and maintained for the life of the service Static variables offer a way for different interactive sessions to communicate, and possibly to interfere, with each other. MAWL serializes access to each static variable, insuring that individual reads or writes are consistent. Longer periods of exclusive access, for example a read-modify-write sequence, can be obtained using the region construct described below.

MAWL users need not be concerned with the details of variable implementation. Because the typing language is more expressive than that of common scripting languages, increased safety can be enjoyed by MAWL users, who no longer need to encode all values as strings. Through declarations and type-checking, many common errors are discovered independently of execution.

Concurrency

Concurrency is an issue that most Web service programmers must consider: since HTTP servers handle requests that may execute in parallel, certain resources (often files) must be protected from conflicting simultaneous access. MAWL accordingly offers a construct that enables programmers to declare certain code segments to be critical regions; the system prevents multiple processes from executing code in the region. A process attempting to execute within an occupied region is blocked.

stmt:
        region region-name compound-stmt 

  
Figure 8: Critical Region Syntax

The region statement has two parts---the region name and a compound statement to be protected. There may be any number of regions throughout a service with the same name, because a given resource is often used in several places in a service. For instance, a file may be read by one session but written by another; in this case it is wise to surround both the reading and writing with regions of the same name.

General-purpose computing in MAWL

MAWL does not have any general-purpose computing constructs---not even primitive constant expressions such as strings and numbers. Instead, like yacc, MAWL defers to a host language that is capable of general computing. All MAWL programs, like yacc programs, are preceded up to the delimiter %% with declarations in the host language that may be referenced later in host-language fragments. Syntactically, these fragments are introduced with parentheses: nothing inside of parentheses is interpreted by the MAWL compiler; instead, everything inside is passed unchanged to the host language compiler.

mawl-service-program:
        host-lang-frag %% session *

expr:
        ( host-lang-frag )

  
Figure 9: General-Purpose Programming Constructs

The rules of MAWL type inference place type obligations on all such fragments; mistyped fragments are detected at compile-time, although typically by the host language's compiler rather than the MAWL compiler. MAWL variables can be referenced within these fragments, and the resulting value of the fragment is converted into MAWL terms so that the fragment can play its appropriate role in the service.

For the moment, MAWL uses Standard ML of New Jersey as its host language. Although ML has numerous advantages over other languages, it is not familiar to most Web programmers. MAWL is designed to allow any language that supports structured data types to serve as the host language.

An instantiation of MAWL with C as the host language is currently under development in our department.

Platform specialization

Web programmers must be aware that Web browsers have different and constantly evolving capabilities. Browsers typically support (a subset of) the HTTP and HTML standards, and many popular browsers include nonstandard but useful extensions to the protocols. The urge to take advantage of the latest features must be balanced against the increased complexity of one's code, and against the possibility that some browsers will be incapable of handling the special feature. Since there are no instructions that specify how MAWL accomplishes its intrinsic functions---such as storing state, managing flow control, blocking conflicting access to critical regions---these implementation details are invisible to the programmer. Because all these details are concentrated in the hands of the presumably up-to-date compiler owner, MAWL services are more likely to take advantage of the latest browser-features without incurring any penalty on application code or programmers.

Error handling

Like platform specialization, dealing with incorrect form submissions is implicit in MAWL.

Without MAWL, service developers must be careful to check that required fields of a form have been filled in; MAWL automatically returns users to incomplete forms after explaining what has been forgotten. Both careless users and misbehaving browsers are detected in this fashion.

Other Work

While several authors [2,18] have identified and addressed problems with current Web programming practice, the techniques needed for advanced Web programming have not been brought together in a single place. The state of the art still consists of monolithic, relatively inflexible daemons, ad-hoc CGI scripts, and interpretive languages for clients. The one notable exception to this rule is Mallery's Common Lisp HTTP server [15], which we will refer to as CL-HTTP.

CL-HTTP is a library for Common Lisp that allows Lisp applications to serve dynamic hypertext.

CL-HTTP and MAWL have differing orientations toward the programming problem. MAWL is geared toward specifying services, whereas CL-HTTP is geared toward permitting existing applications to use Web clients as user interfaces. Thus while MAWL can assume complete control over decisions such as how to store state between HTTP transactions and how users should be treated when a transaction is blocked, it seems as though CL-HTTP must force the user to make these decisions explicitly. Similarly, MAWL is able to offer mechanisms for controlling concurrency, whereas it seems that users of CL-HTTP perform such control at a low level.

CL-HTTP also differs from MAWL in many of the same ways that Common Lisp differs from Standard ML. While Lisp and CL-HTTP are highly dynamic, deferring many decisions and correctness checks until run time, ML and MAWL aim to verify correctness at compile time.

While part of this difference in error detection is due to the difference between ML and Lisp, some of this is due to the fundamental distinction between libraries and languages we raised in Section 3.2.2. Even if the combination of Lisp itself and the CL-HTTP library were to offer exactly the same functionality as MAWL, it would still not be possible in general to analyze the Lisp source itself to see if the library were used ``correctly'' (and in any case such analysis is not attempted). Therefore, whereas MAWL programmers do not need to test for certain errors (such as whether their HTML usage is in accordance with the needs of the service), users of the CL-HTTP server must work harder to achieve confidence in their programs.

Java vs. MAWL

Like MAWL, the Java programming language [6] can be used to build sequential Web services. However, whereas Java applets can be used to create ``active pages'' that can offer network efficiency, these applets do not currently offer any solution for state management at the server side. A service that needs both active pages and server-side state could be constructed with a combination of MAWL and Java; the two are in some sense complementary. In addition, Java suggests some interesting MAWL optimization strategies. Ideally, service programmers should enjoy distribution transparency just as the users of Web services do; service logic should not be cluttered with the details of where in the network it is executed. A clever MAWL compiler, using Java as its target language, might automatically find sequences that could be bundled into a single Java applet and executed at the client. Such an optimization scheme would offer both the simplicity of MAWL service programming and the efficiency of Java's client-side execution.

Conclusion

MAWL is an application-oriented language for World Wide Web services that encompasses server and client functions. MAWL simplifies service programming by allowing service providers to act as though clients interact with stateful services within sessions. Many details of Web programming---for instance how to retain state and how to serialize access to server resources---are invisible to the application programmer. MAWL greatly simplifies the creation and maintenance of dynamic, interactive services on the World Wide Web.

Acknowledgements

We are deeply indebted to Curt Tuckey, Michael Benedikt, David Atkins, and Ken Rehor for their suggestions and ideas.

Appendix: An Informal Introduction to MAWL

Mere words cannot replace the experience of programming--- the only way to learn a new programming language is to program in it--- but it is often helpful to look at examples. In this appendix we give several examples of simple MAWL services, but this is by no means intended as exhaustive documentation of the features and idioms of the language. (Further documentation, as well as details of installation and compilation, will be released with the software.)

``Hello, World'' using basic HTML

``Web programming'' is usually taken to mean specifying static document layout using HTML. We therefore begin our introduction to MAWL with an HTML program that displays the words ``Hello, World''; the code is in Figure 1.

        <HTML>
        <HEAD><TITLE>A Basic HTML Program</TITLE></HEAD>
        <BODY>Hello, World</BODY>
        </HTML>

  
Figure 1: Some Static HTML

Constructing this kind of document does not require MAWL. Doing it in MAWL is not more convenient, nor does it offer extra safety advantages, since any basic HTML parser would reveal syntactic errors.

Web programming in the MAWL sense is very different from mere document layout and presentation. MAWL services do not simply display static documents; they treat Web browsers as input devices that guide the execution of interactive concurrent programs.

``Time-of-Day'' via the Common Gateway Interface

We first describe the typical way in which Web daemons are programmed to serve dynamic documents (i.e., documents which cannot be specified completely at composition time). This task is representative of the simplest Web services for which MAWL was designed; it is fundamentally different from ordinary Web programming (i.e., static document layout and presentation) because it must use the ``common gateway interface'' (CGI). The basic task of a Web server is to retrieve a file when presented with a URL. However, the common gateway interface involves a convention which interprets a URL as a program to run rather than as a document to retrieve. The programs, known as CGI programs, are supposed to produce legal MIME documents as their standard output.

Figure 2 contains a shell script that could be used as the CGI program for a time-of-day service.

        #!/bin/sh
        echo 'Content-type:  text/html'
        echo ''
        echo '<HTML>'
        echo '<HEAD><TITLE>A Time-of-Day Page</TITLE></HEAD>'
        echo "<BODY>The current time is `/bin/date`</BODY>"
        echo '</HTML>'

  
Figure 2: A Program that Generates HTML

This simple example immediately brings to light an important class of problem: namely, that while one can imagine statically checking the shell program itself for correctness, there is no way (in general) to analyze the shell program to see whether it will always produce legal output. (In fact, since it can be a completely arbitrary program, there is no way to tell whether it will terminate, or interfere with the server itself, or any one of many other plausible disaster scenarios.) To obtain confidence that the CGI program is sensible, one must resort to testing---a notoriously expensive and inadequate method of finding bugs.

``Time-of-Day'' via MAWL

Extended HTML.

We now introduce the concept of variable substitution, so that most of an HTML document can be specified statically (and checked for correctness) while the variable part can be evaluated when the document is requested. Figure 3 shows the MHTML (MAWL-extended HTML) version of Time-of-Day, which we will imagine sits in a file called TOD.mhtml.

        <HTML>
        <HEAD><TITLE>A Time-of-Day Page</TITLE></HEAD>
        <BODY>The current time is <MVAR NAME=date> </BODY>
        </HTML>

  
Figure 3: MHTML Describing a Dynamic Document

Note that this is not a shell script, but rather something akin to ordinary HTML extended with a preprocessor-like variable substitution (where <MVAR NAME=date>; means that the variable date should be substituted in the text). The advantage of MHTML is that it can be statically analyzed for correctness. It is important to note that if we were to use a general preprocessor for this task, our analysis goals would be foiled.

MAWL service logic.

The MHTML code for the Time-of-Day example is only part of the solution: the variable date must be defined somewhere in order for that substitution to make sense. This is accomplished by what really constitutes the core of MAWL: its service logic component. Figure 4 contains service logic for the Time-of-Day example.

        %%
        session timeOfDay {
                mhtml { date }:  TOD;
                mhtml.put [ TOD, ({ date=jcrlib.system "/bin/date" }) ];
        }

  
Figure 4: Service Logic for the Time-of-Day Example

The Time-of-Day service is approximately the simplest possible MAWL program: there is only one session, named timeOfDay, and the only thing that service does is provide the single document that lives (by default) in file TOD.mhtml.

MAWL programs consist of a prelude and body separated by the delimiter %%. The prelude is where one places declarations written in the ``host language'' if they are necessary to the remainder of the program. In the current version of MAWL, the host language is Standard ML of New Jersey but could equally well be C, Java, or any other language that meets certain requirements. (Some other application languages have a notion of host language---for instance, most implementations of yacc use C as their host language; most versions of ``make'' use a UNIX shell as their host language.) In the Time-of-Day example, no host language declarations are necessary, so the prelude is empty.

The body of the MAWL program in Figure 4 consists of a single session specification. Some MAWL services are organized around a common resource (usually persistent data) and in such cases the body might contain several session specifications. Sessions serve as entry points into a service. There is a convention by which these entry points are related to URL's so that the service can be accessed by a specific user input device---typically a Web browser).

A MAWL session.

The session of Figure 4 doesn't do very much: an MHTML document named TOD is declared, and the declaration asserts that TOD takes as its input parameter a record containing the field date (in this case there are no output parameters). Because date is not further qualified, its type defaults to string. The next line serves this document; mhtml.put is a primitive operation in MAWL, and its arguments are enclosed in square brackets. The first argument indicates which document to serve (it must have been declared previously), and the second argument must be an expression with the same type as the declared input parameter of the field---in this case, it must evaluate to a record with the single string field date.

Note that this second argument, which must be a record with the single field date, is a fragment of the host language. Host language fragments are easy to recognize because they are delimited with parentheses; the remainder of the MAWL language uses square brackets for grouping rather than parentheses. This particular host-language fragment computes a string containing the current date.

Some basic guarantees.

In the Time-of-Day example, there are several important things to note:

By using MAWL in combination with MHTML, certain common Web errors can be avoided. However, the Time-of-Day service is relatively uninteresting, because it still involves generating only a single HTML document and presenting it. Although the MAWL features presented so far can be of tremendous value in complex examples of this same flavor, MAWL also provides features that apply primarily to more ambitious services.

A MAWL Program that Collects User Input

User input.

More ambitious Web services often request input from a user. When that happens, the HTML ``forms extension'' is used in combination with the CGI interface. Suppose one wished to collect a user's name before proceeding with some other activity. In HTML, one would first create a form like the one in Figure 5.

        <HTML>
        <HEAD><TITLE>Login form</TITLE></HEAD>
        <BODY>
        <FORM METHOD=POST ACTION=http://somewhere.com/cgi-bin/time.sh>
        Please fill in the fields below with the requested info:<P>
        First name: <INPUT NAME=firstname><P>
        Last name: <INPUT NAME=lastname><P>
        <INPUT TYPE=SUBMIT NAME=Continue>
        </FORM>
        </BODY>
        </HTML>

  
Figure 5: An Interrogative HTML Form

That form contains some information about the logical structure of the document and specifies two input fields for users to type their first and last names; these input fields are contained within the FORM marks; also, it describes a button labeled ``continue'' that the user is supposed to press when the two fields are filled in. The form's ACTION parameter identifies the recipient of the (encoded) input field information; it is a CGI program that must decode its input and produce a new HTML document as output. The CGI program that this HTML points to is necessarily much more complicated than the Time-of-Day example shown in Figure 2.

There are numerous opportunities for error in the new script:

Even if the resulting system doesn't exhibit any of these problems, the solution is unsatisfactory in that each form (all but the first of which will be dynamically generated) points explicitly to its continuation; therefore, it is difficult to look at the system and understand its flow-of-control (not to mention that it would be difficult to change). A shell script that handles even this simple form correctly is already too complicated to present here, so in Figure 6 we present instead the equivalent MAWL service logic.
        %%
        session fancyGreeting {
                mhtml {} -> { firstname, lastname }:  login;
                mhtml { firstname, lastname }:  greeting;
                auto { firstname, lastname }: names;

                names = mhtml.put [ login, ({}) ];
                mhtml.put [ greeting, names ];
        }

  
Figure 6: Service Logic for a Personalized Greeting Service

The lines beginning with the keyword mhtml declare certain properties about MHTML documents that are found in other files. Note that the login document is declared with an arrow between two record descriptions: it requires an empty record as its input parameter and produces a record with two fields firstname and lastname as its output parameter. MHTML code for the login form is found in Figure 7.

        <HTML>
        <HEAD><TITLE>Login form</TITLE></HEAD>
        <BODY>
        Please fill in the fields below with the requested info:
        <P>
        First name: <INPUT NAME=firstname><P>
        Last name: <INPUT NAME=lastname><P>
        </BODY>
        </HTML>

  
Figure 7: A file needed by the Greeting Service

        <HTML>
        <HEAD><TITLE>Greeting form</TITLE></HEAD>
        <BODY>
        Rather than ``Hello, World'', we now say:<P>
        Hello, <MVAR NAME=firstname> <MVAR NAME=lastname>!
        </BODY>
        </HTML>

  
Figure 8: Another File Needed by the Greeting Service

Note that neither the FORM mark nor the SUBMIT button appear; MAWL inserts these automatically when they are needed (MAWL will not allow the user to specify the ACTION parameter of a FORM mark). MHTML never contains explicit flow-of-control information, because that information is derived from the service logic; therefore, MHTML documents can be easily re-ordered.

The fancyGreeting session in Figure 6 has another twist: a record variable (declared to be automatic as opposed to static) named names; this variable stores the results of the first login form and is subsequently passed to both the greeting form.

This example shows several advantages worth noting:

MAWL includes not only sequencing, but also branching and looping constructs. Such control abstractions support services which the Web is poorly equipped to handle because of its reliance on a stateless protocol. MAWL is able to do this because it maintains a program counter during the execution of a session; that program counter is used to direct the execution of a session.

But a program counter is not the only interesting state that ought to persist over the life of a session: persistent user variables might also be required. If a value needs to persist beyond the immediate document presentation then the programmer must explicitly save and restore that value in whatever ad-hoc manner seems most suitable. Such ad-hoc activity introduces even more error possibilities:

MAWL solves all of these problems (without loss of generality, since the ad-hoc solution could always be used if for some reason it were deemed important). The fancyGreeting could therefore be easily extended so that the persistent information is used even later in the session, as in Figure 9. This trivial change to the MAWL service logic would wreak havoc on an ordinary CGI program, because it would trigger the need for persistent management that might have been avoided if firstname and lastname were only needed to present the aloha document.
        %%
        session fancierGreeting {
                mhtml {} -> { firstname, lastname }:  login;
                mhtml { firstname, lastname }:  howdy, sayonara;
                auto { firstname, lastname }: names;

                names = mhtml.put [ login, ({}) ];
                mhtml.put [ howdy, names ];
                mhtml.put [ sayonara, names ];
        }

  
Figure 9: Service Logic for a Fancier Personalized Greeting Service

A Longer MAWL program

It is now possible to look at a more complex MAWL program such as the one in Figure 10. Such a service would be (relatively) unthinkable if written from scratch, but is easy to build with MAWL. The service is the old children's guessing game, where the system chooses a number between 1 and 100, and the player must figure out which number was chosen.

It is probably easy to figure out most of what this program is doing, but take special note of the static variables that keep track of how many people have played the game, and look at the interplay between the host language fragments (anything in parentheses) and the rest of the language. Note how there are two entry points, one for playing the game and one to look up some statistics and who has achieved the quickest victory. Also note that MAWL variables can be used within the host language fragments, and the return result of the host language fragments is automatically translated into the appropriate MAWL representation (any type errors are caught at compile time).

fun number()=
    let val s=Time.toSeconds(Time.now()) in 1 + (s mod 100) end
%%
static integer: numPlayed=(0), numWon=(0), minGuesses=(0);
static string:  bestPlayer=("");
session play {
    mhtml { suggestion } -> { guess } : askUser;
    mhtml {} -> { name, guess}: initQuestion;
    auto integer : mynum=(number()), guesses=(0), guess=(0);
    auto string : suggestion = ("");
    auto { name, guess } : initresult;
    auto { guess } : result;
    auto string:  name;

    numPlayed = (numPlayed + 1);
    initresult = mhtml.put [
        initQuestion,
        ({guessno=makestring guesses, suggestion=suggestion}) ];
    guess = (jcrlib.atoi (#guess initresult));
    while (mynum <> guess) {
        suggestion = (if guess<mynum then "higher" else "lower");
        result= mhtml.put [ askUser,
            ({ guessno=makestring guesses,
               suggestion=suggestion}) ];
        guess = (jcrlib.atoi (#guess result));
        guesses = (guesses + 1);
    }
    numWon = (numWon + 1);
    if (minGuesses < guesses andalso minGuesses <> 0) {
        mhtml { best, gamelength } : youWin;
        mhtml.put [ youWin,
            ({ best=bestPlayer,
               gamelength=makestring guesses }) ];
    } else {
        mhtml {} : youBest;
        bestPlayer = (#name initresult);
        minGuesses = guesses;
        mhtml.put [ youBest, ({}) ];
    }
}

session admin {
    mhtml { played, won, best }:  highScoresAndInfo;
    mhtml.put [ highScoresAndInfo, ({
        played=makestring numPlayed,
        won=makestring numWon,
        best=bestPlayer }) ];
}

  
Figure 10: Service Logic for a Guessing Game

References

1
A. V. Aho, B. W. Kernighan, and P. J. Weinberger. The AWK Programming Language. Addison-Wesley, 1986.

2
Scot Anderson and Rick Garvin. Sessioneer: Flexible session level authentication with off the shelf servers and clients. In Third International WWW Conference, 1995.

3
T. Berners-Lee. Hypertext transfer protocol (HTTP). Working Draft of the Internet Engineering Task Force, 1993.

4
T. Berners-Lee and D. Connolly. Hypertext markup language (HTML). Working Draft of the Internet Engineering Task Force, 1993.

5
S. I. Feldman. Make: a program for maintaining computer programs. Technical report, Bell Telephone Laboratories, 1979.

6
James Gosling and Henry McGilton. The java language environment: A white paper. Technical report, Sun Microsystems Laboratories, 1995. available at URL:http://java.sun.com/whitePaper/javawhitepaper_1.html.

7
T. G. Griffin and H. Trickey. Integrity maintenance in a telecommunications switch. IEEE Data Engineering Bulletin, June 1994.

8
Adobe Systems Inc. PostScript Language Reference Manual. Addison-Wesley, 1985.

9
S. C. Johnson. Yacc: Yet another compiler compiler. Technical report, Bell Telephone Laboratories, 1975.

10
Andrew R. Koenig. Language design is library design. Journal of Object-Oriented Programming, July 1991.

11
Andrew R. Koenig. Library design is language design. Journal of Object-Oriented Programming, June 1991.

12
D.G. Korn. Ksh---a shell programming language. Technical report, AT&T Bell Laboratories, 1986.

13
D. A. Ladd and J. C. Ramming. A*: A language for implementing language processors. In IEEE International Conference on Computer Languages, 1994.

14
D. A. Ladd and J. C. Ramming. Two application languages in software production. In USENIX Symposium on Very High Level Languages, 1994.

15
John C. Mallery. A common lisp hypermedia server. In First International WWW Conference, 1994.

16
D. B. McQueen and A. Appel. Standard ML of New Jersey. In Proceedings of the 3rd International Symposium on Programming Language Implementation and Logic Programming, pages 1--2. Springer-Verlag, 1991.

17
Bertrand Meyer. Eiffel: the Language. Prentice Hall, 1992.

18
David Nicol, Calum Smeaton, and Alan Falconer Slater. Footsteps: Trail-blazing the Web. In Third International WWW Conference, 1995.

19
John K. Ousterhout. Tcl and the Tk Toolkit. Addison-Wesley, 1994.

20
John H. Reppy. Concurrent ML: Design, application and semantics. In Peter E. Lauer, editor, Functional Programming, Concurrency, Simulation and Automated Reasoning (LNCS 693), pages 165--198. Springer-Verlag, 1993.

21
B. Stroustrup. The C++ Programming Language. Addison-Wesley, 1986.

22
Larry Wall and Randal L. Schwartz. Programming PERL. O'Reilly & Associates, 1990.

Author Information

David Ladd received the BS and MS degrees in Computer Science from the University of Illinois at Urbana-Champaign in 1987 and 1989. He joined AT&T in 1989, where he is currently a Member of Technical Staff in the Software Production Research Department. His current research interests are network services and application-oriented languages and environments.

Chris Ramming received degrees in Computer Science from Yale College (BA '85) and the University of North Carolina at Chapel Hill (MS '89). He joined AT&T Bell Laboratories in 1987 and is a Member of Technical Staff in the Innovative Services Research Department. His current interests include application languages and their use in software production.