The Bees and the Ants: The Benefits of Persistent and Addressable Media (Like the Web)

Sandro Hawke

W3C MIT/LCS
200 Technology Square, Cambridge, MA 02139, USA

sandro@w3.org

DRAFT $Id: paper.html,v 1.16 2002/11/28 01:32:37 sandro Exp $

Abstract

This paper approaches issues in the web's evolving design by characterizing the web as a shared environment where information may be placed for others to perceive. This view allows analogies to other communication systems throughout history, with a special focus on systems of publication, and this helps illuminate issues in web architecture.

Keywords

Web Architecture, Web Services, Semantic Web, Distributed Computing, Shared-Memory, Message-Passing, URL, URI, Names, Identifiers

Status

In progress, unclear -- may need massive retargeting. New message: HTTP presents a glorified file-system interface; why is that good, and how should we use it?

1. Introduction

Despite the success of the web, or perhaps because of it, the set of design principles it follows and embodies (its "architecture") has never been thoroughly documented or agreed upon. This causes ongoing problems as people attempt to extend the web into new areas and to continue to implement what they perceive to be its original design. As in any complex system, construction and maintenance done without a clear overall picture carries a risk of increased friction and complexity, decreased usability, and even catastrophic failure.

Each web author adds a little to the system, while each creator of authoring tools or browsers adds quite a bit more. With larger contributions, the risks to the overall system (the web) are greater. Occasionally great controversies arise as architectural features are added which have potential conflicts. While the furor has died down, some people are still upset about cookies, javascript, server push, and many other old issues. In the past two years, the web services and semantic web activities have emerged among those with loud claims and presenting significant risks if deployed in an environment where the web's existing architecture is misunderstood.

In the past year, the W3C's Technical Architecture Group (TAG) has begun to develop authoritative expertise and documentation to address these problems, but their work is still incomplete. This paper grew out of a private technical briefing (with ensuing discussion) on the relationship between web services and the semantic web; my hope is that it will contribute to the general understanding of that relationship and perhaps to the larger discussion of web architecture.

2. A Shared Environment

2.1. Two Ways to Communicate

As far as we can tell, information travels from one person's mind to another's by way of the physical world. One person affects the world, and another perceives it. Sometimes the effect is vibrations in the air, and another hears it; sometimes the effect is marks on a piece of paper, which another might see. More recently, with computer systems, the intermediate effects might be tiny electrical currents or magnetic domains, but the result is the same: people are generally able to convey their thoughts, with some degree of accuracy and at some cost, to other people.

Specific techniques for communication have varied over the centuries and with circumstances. Today, we have an enormous range of options, from the most ancient to the latest gadgets. Among all these, there remains a strong split between techniques which involve a persistent effect on the environment and those which are by nature transitory. Shouting out a warning is transitory; erecting a big red sign is persistent.

This division is not absolute: the erected sign might immediately collapse, and the shouted warning might echo for several seconds, but the qualitative difference remains. Persistent messages are issued with much less idea of and control over who will receive the information and with less opportunity for feedback and continued interaction. Balancing these losses, persistent messages can gain a much larger audience across time and space, and can reach some audiences at a much lower cost (such as with a posted warning). For the receiver, getting information from persistent messages offers the freedom, control, and simplicity of wandering in a bookstore, where obtaining transitory information requires complex social interactions to locate informed people and get them to communicate.

The differences between communicating by making persistent changes to the environment versus just "sending a message" are everywhere. In designing distributed or multi-threaded computer applications, developers weigh these same two approaches for arranging how the computer processes communicate, calling them "message-passing" and "shared-memory". With message-passing, they imagine a process constructing a digital message and transmitting it to one or more receivers. People using computers behave similarly when they send e-mail. With shared-memory, by contrast, an area of storage is allocated where one or more processes can place information for others to later see (and perhaps modify). People using their computers communicate like this when they author and read web pages.

In fact, this distinction goes back long before human society or even the human species. For tens of millions of years, bees and ants have each lived in communities with social structure involving essential division of labor and communication of information necessary to survival. In each of them, scouts are tasked with finding food sources and reporting back; they must inform others where to go and gather food. This architecture obviously allows much greater efficiency than having each worker do their own scouting.

The bees and the ants use different communication techniques, however. Honey bees use direct communication: the scout does a "dance" in which particular body movements indicate the direction and distance to the discovered food source. Ants, on the other hand, modify their environment and leave a persistent message: the scout, on the way back home after finding some food, activates a scent gland and drags it on the ground. This creates a coded trail for the gatherers to use reach the discovery.

There are clearly scaling advantages for the ants: workers returning from other jobs learn about other food sources immediately, without needing someone to repeat the directions for them. The directions can also be much more complex, involving numerous twists and turns. Of course one cannot leave long-lasting scent trails in moving air, so the bees, in their different environment, do what they can.

2.2. Using Shared Memory

In modern society, shared memory has been most thoroughly implemented in the medium of print. The publication of books and journals has been essential to the progress of science and probably to the progress of humanity in general. The print media have used all the advantages above to great effect.

In general, we can think of using publications as the same as using shared memory. The word "publish" comes from "public" and still means "to make generally known", while shared memory does not need to be public, but the similarity is still strong. Some publications are in fact restricted to certain audiences, so perhaps the word is gaining this meaning and we can speak, in general, of parts of the shared environment which been modified to carry information as "publications."

Restricted publications, then, are like fenced off portions of the shared environment. Some gate controls who can come in and see the information, and perhaps something records who does so. The basic techniques for contributing and obtaining information are the same; they are simply done in an environment with access control.

None of this is novel, of course. Every large organization uses persistent written records, and the larger organizations have detailed schemes for identifying and managing each document. The web was created in such an environment of written records, which were seen as often inadequate, in an overall setting of scientific research, where publication is clearly essential

2.3. Information Servers

In an environment like that of the bees, where storing information in the environment is not available as a means of communication, one can sometimes create a virtual environment with similar characteristics. I don't know how late-arriving gatherer bees learn the way to the discovery, but I can imagine one possibility: some bee is tasked with repeating the directions over and over and over. Perhaps that bee stays in a special place; any gatherer without a job goes to that special place and looks for someone giving directions.

Telephones offer people a message-passing, direct communication architecture. One person calls another, initiating a conversation, and information passes back and forth between them. This works well for many things, but not so well for others. If I want to go see the new James Bond movie, but I do not know when or where it is playing, should I call my friends and ask them? Should I call random people? No, I call the theater. And do they answer? No, they have a machine which answers the phone and gives me the information I want. This machine is like the direction-giving bee in a special location; it is an information server.

Information servers, then, are things we talk to in a message-passing world which serve the same purpose as a location for stored information. When you can't post a sign or write a book, you need to leave behind a person (or automaton) to carry your message for you.

The Internet is essentially a message-passing system. TCP provides a virtual circuit service, but still the information is sent from one process to another across the net. Higher layers like FTP and HTTP use information servers to provide a system of persistence, analogous to physical locations allowing persistent communication.

The web, then, offers an environment where one can find various persistent messages, just like the physical world, but without the same costs. As long as the information servers stick to their simple jobs of providing virtual pieces of paper, the system behaves gracefully, like a library or ant trails. If they react to being looked at, or record the fact that you looked at them, or change their information based on who is looking, or change their information incoherently, then the virtual environment is revealed and becomes confusing, unpredictable, and perhaps undesirable.

2.4. Names and Addresses

Linguistic communication involves names for things, because of course when we're talking about the location of some food, we can not use either the location itself or the food itself in our speech; we must use words or phrases which identify or name them. If I want to say "I like the portabella mushroom sandwiches at The Lyceum Bar and Grill in Salem, Massachusetts," I need to use several names for things. One of those names, the noun phrase "The Lyceum Bar and Grill, in Salem, Massachusetts", identifies an organization I could call on the phone (or send e-mail, as it turns out). If I put that sentence about the mushrooms on a web page containing my opinions about various local restaurants, and I want to tell people about that web page, I need a name for it as well.

These two kinds of names are especially interesting: names of things or people we can talk to and names of places where information can be found. If a thing we might talk to is an information server, it is in effect both kinds of things. Curiously, the English noun "address" has both these meanings [Merriam-Webster]:

I suggest that an e-mail address is the first kind and a web address is the second kind. While the web address is technically a communications end-point, the server's masquerade as a virtual page service, necessary for good web design, requires it to act like a meaningful and consistent location for information.

3. Old Architectural Issues

4. Issues Surrounding Machine Use of the Web

There are two W3C activities, each with its community of dedicated participants, aimed at using the web to support machine interoperability. They see the web as fundamentally successful but only a small step toward what they want: machines communicating and working together. This hints at artificial intelligence, but as the analogy to bees and ants suggests, cooperative, coordinated activity requires only a well-constructed system, not intelligent participants.

The two activities are focused on different parts of the problem: the web services effort involves standardizing the communication patterns to make them easier to automate, while the semantic web effort involves standardizing techniques for exchanging knowledge between participants without prior arrangement.

While there is some tension between these groups as they compete for mindshare and development resources, they are mostly complementary, each addressing a different piece of large puzzle. From time to time, though, they do wander into each other's territory because they need a partial solution before the other is ready; these incursions, if not done tactfully and with a real awareness of the boundaries, can lead to social difficulties.

So knowing the lay of the land is important; we need an overall approach to interoperability, so we can see where and how the pieces fit together. In general, the options for interoperability are so open as to render choosing an approach absurd, but both these approaches have chosen to leverage the web, so we have a hook: the publication model narrows the field, making systems interoperate primarily though storage areas in a shared environment.

3.1 Web Services

The Web Services approach to distributed systems construction attempts to leverage the technique of publication. Let's publish an interface! Let's take this addressing scheme and access protocol developed for the web and use it for our virtual storefronts and service centers (SOAP). We can of course publish the instructions for using these virtual stores (WSDL), and we can publish directories of all these access points. on-line and easily searchable (UDDI).

Is it still a publication? Yes, it's a place in the environment where information can be put, but is it meant to be persistent and shared? Perhaps sometimes. When it does -- when the purchase order you submit has, itself, a URI -- then it's a publication. You can have the purchaser and seller communicate by each erasing then writing on the same blackboard, or they can each write on a new area of the blackboard, keeping the old parts around for reference if needed.

3.2 The Semantic Web

The Semantic Web approach to building distributed systems also attempts to leverage the technique of publication. Here the focus is on making the publications themselves "understandable" to machines. But what does it mean for a machine to "understand" something? Where is the line between data processing and intelligent thought? It does not matter here; we define "machine understanding" in terms of observable behavior. A machine understands a publication if it uses the information carried by the publication to guide its behavior. For the foreseeable future, it can do this only because it has been programmed to, not because of some kind of true intelligence.

In other words, a program may well understand the message "Turn on the sprinkler system just before dawn" because it has been programmed to recognize exactly that text string and run a sub-program which computes the time of the sunrise and activates the sprinklers accordingly. The same behavior could follow from "RUN MORNSPRK.BAT"; the machine's understanding of the two messages is the same. This is not rocket science; it is pretty simple computer science. Trying to build the Semantic Web is about trying to get straight how we connect messages and their meanings in a way which works not just on your own machine (which is pretty easy), but which scales across the globe.

One short pre-Semantic Web analysis of this problem [AIMA] suggests one of the fundamental problems here is coming up with names for newly discovered things (such as food sources!) and communicating their definition (in a language the computers already understand.) A publishing system (such as the web) gives us handy solutions to both these problems: we can make up names which are unique within a particular publication, and then global name things by naming the publication (which we already know how to do) and then providing the publication-specific name. Definitions can also be provided in the same publication (or perhaps other ones), addressing the second half of this problem. This architecture of referring to definitions instead of transmitting them allows compact communication without a presumption of state.

5. Conclusions

@@ the web is about web pages, and we push that definition at our peril. web services and the semantic web fit together nicely in this model; it's a good way for machines to communicate, too.

Acknowledgments

The ideas in this paper grew out of conversations with Tim Berners-Lee and Dan Connolly. Without them (even if the web still somehow existed!) this work would not have been possible

This work has been supported by the DARPA/DAML project under MIT/AFRL cooperative agreement number F30602- 00-2-0593.

Despite the author's affiliation with the W3C, this work is obviously not on the W3C recommendation track. It is not the product of a W3C working group or interest group and should in no way be construed as reflecting the position of the W3C or its members.

References

[AIMA] Stuart Russell and Peter Norvig: Artificial Intelligence: A Modern Approach. Prentice-Hall, New Jersey, 1995.

[Merriam-Webster] : Merriam-Webster's Collegiate Dictionary.

[Fielding] Roy Fielding: Architectural Styles and the Design of Network-based Software Architectures. Dissertation, University of California, Irvine, 2000. See chapter 5 @@ Not used right now.

@@@ Could use lots more! :-)