Setting the scope for light-weight Web-based applications

Unfinished version of an essay on “Web applications.”

Definition

The light-weight, Web-based applications (“webapps”) of this essay are small, platform-independent programs that are downloaded on demand and execute inside a client program, such as a browser. They are thus like Java applets, but more "script-like" than "program-like" and therefore easier to write in many cases (though harder in others). They have a clearly separated user interface, that allows webapps to be easily adapted to different devices.

screendump of a webapp running inside an HTML page

A webapp can be a component in a compound document. This (simulated) screendump shows a Celsius-Fahrenheit converter running inside an OBJECT element inside an HTML page.

A trivial example is a Celsius-Fahrenheit converter: it runs inside your browser window and contains some code that calculates a number for every number you give it. Its UI might be two sliders, or two text input fields, or a text field and two buttons. You stop it by pressing the browser's back-button or maybe the program itself has a stop button.

Webapps are to be distinguished from Web documents, which are static, except for hyperlinks. (Note that a hypertext may still include certain visual effects, depending on the style sheet that is applied, such as tooltips, pop-ups and collapsing/expanding text and even animations.) Webapps don't have a document-like interface, but a program-like interface (a GUI/WIMP in the visual case).

Webapps are also different from Web-based forms, whose UI is still primarily that of a documents and whose logic is executed on a server (apart from superficial, UI-related logic, such as unchecking one radio button when another one is clicked). A webapp could be made to visually mimick a form and could communicate with a server, but its purpose is to do computations on the client-side and it lacks the typographical primitives of style sheets such as CSS and XSL

The difference between Java applets and webapps is the type of applications and type of programmers they target. Java is a general-purpose 3rd generation programming language. It has objects, modules and strong typing, which make it suitable for complex applications, but require a relatively experienced programmer. Webapps are for simple applications, for which you would use script-like languages (shell scripts, batch files, Perl, or maybe Basic) if the code had been for a single platform. The programming language, though not yet known at this point, will probably be interpreted and have implicit typing. There will probably be other differences in the underlying technology.

In summary, the characteristics of a webapp are the following:

The following may or may not be characteristics of webapps, depending on the technology chosen to implement them.

Accessibility has to be designed into webapps from the start. Exactly how is not clear yet. The ability to provide several UIs for the same webapp allows specialized UIs for different devices, but few webapps will have UIs for all devices, just as few Web documents currently have style sheets for multiple devices. Some automatic adaptation is needed. Depending on the platform, the UI objects from which the UI is built may already have built-in accessibility APIs.

Example applications

Some examples of applications that could be implemented as webapps.

Celsius-Fahrenheit converter

A very simple application, that needs no network and whose program logic probably takes no more than two lines of code. The user can input a number and have the webapp convert it from Celsius to Fahrenheit or vice-versa.

The program logic simply gets a command and a number on its input and returns a number and a unit on its output.

The UI code is probably a dozen lines or so. The UI could be a slider and two buttons (“C to F” and “F to C”), or a text field and two buttons, or just two sliders (if one is moved, a command is sent to the application, which returns a number, which cause the other slider to move to that number).

In fact, the webapp could be delivered as one program with three UIs, allowing the user to select his preferred UI, through a menu in the webapp UA. (This is similar to how a document with alternative style sheets allows a user to select his preferred one.)

Digital clock

This webapp also doesn't require network access, but it shows the need for access to the clock of the computer. The UI is simply a text label and the application behind it sends new text to that label at regular intervals.

How the API to the clock is defined is not yet known. It could be that the application enters a loop that queries the current time, sleeps for while and then repeats the process. Or it could be that the clock is modeled as something that generates events, coming in on a stream/socket, just like input may arrive from the user in other programs.

Analog clock

This webapp is similar, but also shows the need for a graphics canvas. Whether this canvas can draw shapes (circles, lines, text, etc.) or only shows prerecorded images is yet to be defined.

Tax form simulator

This is a large application, that contains the rules for tax paying, either all rules or the rules as they apply in some domain, e.g., house owners. It is not a “real” tax form, in the sense that it can't be used to submit the filled-in form to the tax collectors, but it allows somebody to do “what-if” simulations and prepare all the calculations ahead of filling in a real form.

The UI will probably show several screens, one at a time, that each are reminescent of some fragment of a real tax form. Filling in a field or checking a box results in the application performing a recalculation of the amount to pay, which the UI then displays.

The webapp might have a function to print a form or a summary.

Tax form that also submits the form

This is a webapp that the ministry of finance might develop and offer to people who want to fill in their tax forms on-line. Like the previous example, it can be used for “what-if” simulations, but the goal of this webapp is that the user clicks “submit” on the last screen.

A requirement for such a webapp is that the user can somehow sign the form that he filled in. This might be achieved with an HTTP password that has been previously agreed upon or a GPG signature or some other means.

Design your kitchen

This is an application with a GUI that is a bit different from most other programs. The main part is a 3D rendering (possibly associated with a 2D map) of a kitchen, in which the user can place furniture with the mouse, change colors and texture of surfaces, add light points, etc.

This might be associated with a function to order the designed kitchen, or at least get a price quote on it.

Taking a snapshot (a virtual photograph) of the finalized kitchen and printing it (or storing it on disk) is also a possible function.

The 3D rendering is obviously not a predefined object of the webapp UA. It will have to made out of a graphics canvas on which the UI (in response to calculations by the application) places images or geometric shapes.

IRC client

This webapp looks like a typical IRC client: a text screen showing the conversation up til now, an input field for entering a line of text and some buttons (set the title, private conversation, etc.).

The interesting aspect of this webapp is, that it has an open socket to some server. I.e., it not just issues GET and PUT on URLs, but manages a protocol by itself (at least at the application level).

Similar applications: Usenet news reader and telnet client.

An internet radio tuner

This webapp could be implemented in two ways: it could decode the audio stream itself and send data to the UA's sound API, or it could launch a player program (VLC, realplayer, or similar) and control it via interprocess communication.

The former raises two questions: is the programming language of the webapp fast enough (since it is probably an interpreted language, like Javascript, or one that runs on a virtual machine) and does the webapp have the right to reach outside of its “sand box” and contact servers?

The second also raises two questions: how does the webapp know what media players are available and does it have the right to start programs on the client's machine?

The internet radio tuner, like the Celsius-Fahrenheit converter, could come with several UIs. One could be a big interface, high on graphics, with drawn buttons. One could be small and minimal. One might even be multi-modal, accepting user input simultaneously from the mouse, the keyboard and the microphone.

Weblog authoring tool

This is a dedicated editor that sends HTML fragments to a weblog server. The editor part may be a standard text widget, but the webapp has special functions for adding links and predefined HTML fragments.

Calculator

A very basic webapp: it shows the face of a traditional pocket calculator. The user presses buttons for digits and operations and after each press the application computes the number to display in the single-line output field.

screendump of a calculator

A typical calculator. This one is kcalc from KDE, but a webapp running under KDE could look the same.

Solitaire games

These are also fairly simple webapps, without need for network communication. Think of patience, four-in-a-row, 16-puzzle, etc.

These might have a need for persistent storage, so that the user can see his previous scores or maybe even continue a saved game.

Apple's Sherlock and Watson

The applications running on the Sherlock or Watson application platforms on the Mac are typically applications with a little bit of program logic, that derive most of their functionality from communicating with a server. Apple calls Sherlock a “Web services client.” They could almost be done as an HTML page with a form, except that the little bit of additional logic and the presentation as a GUI instead of a document make the applications much easier to use.

The applications typically have one or more screens, among which the user can select (tabbed interface). When they are started, they get data from a server. Depending on the user's action, they may then get more or more detailed data from the server. E.g., one interface shows movie listings. Initially, it shows a list of locations to choose from and the other parts of the interface are empty. When the user selects a location, detailed information about the films playing at that location are downloaded. The locations remain displayed and can be selected from at again at any moment, however, since this is a “direct manipulation” interface, not a series of forms.

The name

In this essay, I've used the name “webapp” as a convenient short name to refer to the technology. But other names are possible. Here are some suggestions:

What it's not

Here are some of the technologies (partly still unsolved problems), that were mentioned as related to webapps and possibly solved by it, but which on closer inspection are actually not related.

XML namespaces

Webapps technology won't solve the problem of defining the meaning of documents with multiple XML namespaces.

<foo xmlns:a="http://example.org/a"
     xmlns:b="http://example.org/b">
  <a:bar>This "bar" is different from...</a:bar>
  <b:bar>... this "bar."</b:bar>
</foo>

Example of a hypothetical XML-based format with two namespaces, abbreviated to “a” and “b.”

Such namespaces can be used in XML formats that are created by combining other XML formats. They make it clear which parts come from which format. Namespaces are a syntactic construct, they have no meaning in themselves. Thus, any format that uses namespaces has to define what they mean in the context of that format.

Webapps are programs, and therefore have a specific meaning, but webapps technology is not by itself meant to attach meaning to namespaces. It is conceivable that the meaning of some XML format is defined by a mapping of elements to webapps (e.g., with a technology similar to Mozilla's XBL), but this would be a separate technology, which uses webapps, but is not itself part of webapps.

Plug-in API

Plug-ins are a way to extend a browser with support for new formats, new protocols or other new functionality (bookmarks, search, etc.) They implement a certain API, by which the browser can invoke them and the browser implements a complementary API, which a plug-in uses to communicate with the browser.

There are similarities between a plug-in and a webapp, but also differences: a webapp is transitory, a plug-in is installed permanently; a webapp is platform-independent, a plug-in is binary code for a specific platform; a webapp is untrusted code running in a “sand box”, a plug-in is an integral part of the browser. This means that the APIs often have similar functionality, but are not the same.

For example: both a plug-in and a webapp can ask the UA to retrieve a document, given a certain URL. But the webapp may restricted to URLs on the server it itself was downloaded from. Bot a plug-in and a webapp can open a dialog box, but the webapp calls a platform-independent function that is translated by the UA to the platform-native call, while the plug-in can call the native graphics library directly.

Compound documents

A compound document is a document made up of semi-independent parts, that are displayed together, but can also be used independently or as part of another compound document. The parts are referenced, rather than included. In Ted Nelson's words: a compound document is a document that has other documents transcluded.

As said above in the definitions section, a webapp can be a component in a compound document. E.g., it is expected that browsers will be able to run a webapp inside an OBJECT element in an HTML document.

A webapp itself is not a compound document, primarily because it is not a document. As part of its execution, it may open other documents or cause documents and other webapps to be displayed in a browser, and thus it may at some point almost look like a compound document to a user. But since an application cannot be easily inspected and analysed by a program (such as search engines or link checkers), it cannot take the place of real compound document formats, such as HTML.

Web-based forms

A webapp can obviously be made to look and act superficially like a form (with selection boxes, text fields, etc.), that undergoes some minimal processing before being sent to a server, but if the processing doesn't need client-side processing, it is better to use a declarative (typically HTML-based) form.

Webapps are less accessible, harder to maintain and harder to adapt to user preferences than HTML documents. It is harder for a user to do a “view source” or to find out how to submit the form without using a browser. Forms done as HTML documents probably also look better, since forms are text-based and CSS can provide the typography for them, while text in webapps is most likely in the form of label objects, with very primitive control over formatting.

There are various mark-up-based technologies for Web-based forms, that are easy to use, powerful, or even both. A general guideline should be to use one of those (without any Javascript), when possible, and only write a webapp when the mark-up-based solutions don't work:

Web services

A Web service is an advertised function provided by an application, that is accessible via a certain protocol. There is no UI associated with a Web service. In most cases, a Web service is used by some client program running on another machine, which program may or may not be a user agent (i.e., may or may not have a UI). The client doesn't execute any code that was sent by the server or vice-versa.

A webapp and a Web service can complement each other, however. If a user wants to interact with a Web service but doesn't have a suitable client program, the provider of the Web service may have a webapp available that the user can download and execute and which will then handle the protocol with the Web service.

For example, a bank may have a Web service that allows querying account balances and transfering money between accounts. The bank may then provide a webapp that “speaks” the protocol and offers a nice GUI to the user.

Dynamic HTML

DHTML is the term given by some browser makers to the combination of four technologies: HTML, CSS, Javascript and the DOM. By using Javacript to modify HTML and CSS on the fly, causing the browser to continually reformat the display, DHTML tries to make applications out of documents.

Webapps are meant to make DHTML obsolete, by offering easier ways to design a GUI, offering more advanced GUIs and other UIs, making webapps independent of browsers (although a browser may, of course, contain a webapp client), and giving a webapp its own MIME type (and download format), so that users can allow webapps to run, even if they have Javascript turned off for HTML pages.

“Dynamic documents” (CGI, PHP, JSP, ASP…)

The “dynamic” in dynamic documents refers to the server-side only. These documents are not stored on the server in their final form, but generated for each new request. However, to the client there is no difference and the client need not know any of the technologies used for creating such documents, including CGI, PHP, JSP, SSI and ASP.

Often, CGI, PHP, etc. are used to program the server-side of a Web-based form: a page is generated that contains the results of processing the form. But any request for a document could be implemented dynamically. E.g., a server might want to return different pages at different times of the day.

Any kind of Web resource could be dynamically generated in this way, including webapps. CGI, PHP, etc. are purely server-side and orthogonal to formats sent “over the wire” to a client.

Dangers

The reason to send an application (procedural knowledge) rather than a document (declarative knowledge) is normally because that solves the receiver's problem in the easiest or quickest way.

As a trivial example, assume somebody wants to know how much tax was included in the 100 € that he paid. One solution to that is just to tell him: "13.04 €." That solves his immediate problem, but next time he wants to know the tax for 150 € and he has to ask again. You can also give him a calculator that accepts a number and outputs the tax. That way, he will never have to ask again. But one day, he may want to know how much the price would be if the tax was 15 €. The calculator doesn't have that function. What he could do is try a number of inputs until he finds the one that returns 15. Or, you can give the person the formula, then he has full knowledge and can from now on calcalute forwards, backward or inside out, without any more help.

The first of these solutions, giving the answer in this concrete case, is the solution via a server-side application. This tells the user almost nothing beyond his particular question. The second, giving the application itself, is the webapps solution. It provides the user with procedural knowledge. He is now able to solve a whole class of similar problems by himself. The third solution is by means of declarative knowlege. The user now has the means to answer this and many other questions by himself. On the other hand, if this is the only time he needs the answer or this is the only kind of problem he ever solves, one of the two earlier solutions would have solved his problem quicker. It is only in the re-use that the declarative knowledge wins.

Another reason to send procedural knowledge (i.e., a webapp) might be that the provider of the information himself doesn't understand the full solution. He knows a method that gives a solution and passes it on, but he doesn't know why it works. This is like somebody who programs a videorecorder: he has figured out that it works if he presses button A, counts to 5, presses button B and then turns knob C. What each of those buttons do, he doesn't know, but at least he solved his problem...

There is, unfortunately, another reason information providers might send a webapp instead of a document: because they don't know they are withholding information that way. Which is a pity, because making a webapp represents considerable work, while it is unlikely to be useful as widely or for as long a time as the author hoped. A webapp will never be as device-independent as a document.

And, there is the case that information providers may want to hide knowledge, to bind customers to their service or because they consider the information their property. There isn't much that can be done about that.

When sending a program, there is a difference between sending a “binary” and sending the source. The former hides the embedded knowledge much more and makes reuse of the procedures that are contained in the code hard. Sending source code (which is interpreted or compiled on the client side) is preferrable: it allows the receiver to fix bugs, use (parts of) the code in his own application and thus also helps to teach and spread the webapps technology.

History

[ Active-X, Javascript, XBL/XUL]

Java (1995) is a 3rd generation programming language. It is object-oriented, strictly typed and borrows much of its syntax from C.

Java derives its portability from the existence of a virtual machine and a large set of libraries (file I/O, networking, GUIs, database access, etc.), some of which are bundled with browsers, so that the byte code of a Java program can run inside many browsers on many platforms. Java has an extensive security model, to allow control over what mobile code (Java applets) can do when they run on a user's machine.

Java is currently the most widely used example of mobile code. Java is suitable for large and complex programs. On the other hand, it does not separate program logic from UI and it is not very suitable for “quick and dirty” scripts.

Flash is best known as a format for animated graphics. The typical Flash “movie” on the Web shows some cartoon characters or geometric shapes moving and making sound, possibly reacting to mouse events. But Flash has a programming language (Javascript/ECMAscript), which is interpreted by the plug-in running in a browser, and it has several libraries, so it is possible to program simple applications in it, without any cartoon graphics.

Water (2001) is a language with similar scope as PHP or JSP. The logic is all on the server, the client just sees an HTML page with some form elements. Like in PHP and JSP, program logic and UI (i.e., HTML elements) are mixed, although programmers can use functions to better separate them.

CURL (1998) is a LISP-like language (but with curly braces, hence the name). Curl programs are compiled into native code on the client side. Curl builds interfaces with primitives like vertical and horizontal boxes (as in TEX), buttons, text input fields, multi-line text paragraphs, lines and curves. Curl programs can be distributed: partly on the client and partly on the server. Simple Curl applications look a bit like PHP or JSP, in the sense that the program logic and the UI layout are mixed. Larger programs use functions and objects. Strong typing is supported, but not required.

Konfabulator (2003) is a program that executes applets (called "widgets"), that are written in Javascript and which have a GUI defined in an XML-based format. Konfabulator provides a set of APIs, that allow the widgets to access system resources, such as the clock, the file system, the battery status, an HTTP handler, etc. Many people have contributed widgets and a few hundred are available for download. Note that widgets are downloaded and installed, like plug-ins, not downloaded on demand, like webapps.

Gist (1990) is a language for rapid prototyping of GUIs, that I developed and implemented for my PhD, between 1988 and 1993. It is designed to allow writing GUI front-ends for command-line programs, such as the Unix shell, or any programs that accepts text on standard input and outputs other text on standard output.

Gist is interpreted, object-oriented (the objects are GUI widgets) and is a pattern-action style language. It has no loops, but supports recursion. The patterns can either be events (mouse clicks, key presses, etc.) or regular expressions that match outputs from the back-end program. The actions either set properties of the object or send messages (commands) to other objects or to the back-end application.

Gist doesn't have a predefined set of GUI widgets, but is linked to a platform-dependent set of widgets. (The implementation under X worked with XToolkit based widgets, such as the Motif set.) The idea to separate the program logic and the GUI logic and use a different language for each comes from Gist.

[Opera's initiative]

Syntax and implementation

I think the ideal technology would consist of two languages:

  1. one language that describes the UI logic, and
  2. one that describes the program logic and handles network communication.

A packaging format is also needed, since webapps, like most programs, will typically consist of several files.

Diagram: UI controled by UI logic, which communicates via messages with the program logic (which may in turn communicate with the Web)

The UI is controled by code written in a dedicated language. The UI exchanges commands and command outputs with the back-end program, that contains the UI-independent program logic. Occasionally, such a program may communicate with the environment and in particular the Web.

UI language

The first language I imagine as a fairly simple pattern-action language. It has objects that represent UI objects with attributes/properties for their color, size, position, etc., which react to user events and to messages sent to them. For the most part, user events will actually be handled implicitly, i.e., by the default event handlers provided by the graphics library. The UI logic beyond what is implicit in the graphics library is modeled as messages sent from one UI object to another. Think of it as an Awk-like language for UIs (or Snobol or Gist):

Window main        # Window object called "main"
  title "Main"     # Initial value of attribute
  width 600px      # Initial value of attribute
  /message (.*)/:  # Pattern of a received message
      status $1.   # Action: send message to status
  /quit/:
      self halt.
Label status
  background #CCCCCC
  /.*/:
      self.value $1.

Example of a syntax for the UI code of the webapp. This syntax is organized around UI widgets (window, label), each of which has attributes (title, background) and actions associated with messages it can receive.

One designated object receives the messages from the “back-end” application in the second language, that contains the UI-independent program logic. This object distributes those messages to the UI objects that need to react to them.

The UI languages should be an interpreted language. This allows users to look at the code and copy it for their own use.

In summary, the UI described in this first language is responsible for:

There should probably be only one UI language. That forces webapp programmers to abandon their favourite programming language, but it allows re-use of the UI code. The language should be simple, so learning it should not be a problem.

The hardest part is defining the standard set of UI objects. Window, button, checkbox, label, slider and text field are obviously needed, but the devil is in the details: what attributes should they have so that all platform can support them in some way? Color, font, size, position, border... Beyond those obvious objects, there are various more complex objects (combobox, sound object, graphics canvas, tabbed window, etc.) that may not be available on all platforms and may have to be simulated by the UA.

Maybe in a first version, only the most obvious UI widgets should be provided, so that some time is available to think about, and prepare implementations for, the more advanced UI objects.

Back-end language(s)

Between the UI part and the program logic part, there is a 2-way channel for messages. This is what makes the application UI-independent: the UI can be replaced, as long as it reacts to the same messages from the application and sends the same commands back.

These messages can be binary packets, XML-encoded (SOAP) packets, or simply text strings. The latter is probably the best solution, since it is the easiest; you can use a well-known regular expression language to interpret them and they are easy to debug. Most applications don't need to send large amounts of information back and forthe from the UI to the application, so the speed of string processing should be adequate.

The language for the application itself is probably a traditional third-generation (imperative) programming language, of the type of Algol, Pascal, C, Java, etc. Such languages are familiar to most programmers.

But to make the threshold as low as possible for inexperienced programmers (or indeed for experienced programmers wishing to implement a “quick hack”), it should probably be an interpreted language with implicit typing, such as Basic, Python or Javascript. Javascript has the advantage that there are already free interpreters available for many platforms, including inside browsers.

Maybe in a second phase, the system should incorporate a virtual machine. That would make it possible to implement webapps in languages that are more suitable to large projects, such as Java or C. Indeed, the webapp programmer could use any language for which a byte-code compiler is available and could even mix languages. The webapp UA only sees the byte code.

Libraries provided by the (standardized) environment provide many common functions, including network functions. An application should be able to communicate with servers at a high level by doing GET and PUT on URLs, or directly at a lower level, using sockets.

Packaging

A webapp typically consists of several files: one or more UIs and one or more application modules. The URL that points to the webapp could point to one of these, e.g., the default UI, which in turn points to the other files, but it may be more efficient (to avoid multiple client-server roundtrips) to zip the files up and let the URL point to the zip file. This is similar to how Java applets are published, except that the file extension should probably not be “.jar”. (A preferred file extension and a MIME type will have to be defined for it.)

An optional “manifest” file in the package could contains some meta-data, such as which UI is the default, which module contains the main entry point, a digital signature for the package, a version number, author, copyright and feedback address.

Bert Bos
Created: 26 Feb 2004