This was written for a possible book in aroun 1993/4. Written in Microsoft word, copied to HTML and salvaged by hand. ..Tim BL

What is the World-Wide Web?

This chapter describes the Web from more than one point of view. If you already understand about computers, or about networks, or about information retrieval, you can skip sections which are already evident. In this chapter we'll see what the web looks like in practice, and what the essential elements are which make it possible, and make it different from anything which has happened before.

The World-Wide Web is an idea about how people can work together using computers. It is an idea about how we can share knowledge. That is, a few years ago it was just an idea, but then it became apparent that these ideas could be put into practice using the networks and computers which exist today, that it wouldn't be too difficult, and that it would be exciting.

The web is an open universe of distributed information with a hypertext interface. If that sounds a mouthful, let's look at some of those words. "Open" means that anyone can add to it without being forced to use a particular computer or a particular disk format or a particular anything: The web was designed not to be complete but to be added to. [box on openness]. Universe means that all the elements can be regarded as part of a single all-encompassing whole. Distributed means that the information is not stored, entered or administered in any central way. Although the information is organised, the organisation is also distributed. The web can grow independently of technical bottlenecks, as well as of political or bureaucratic control.

A World of Information

"Information" is a very general word, and quite deliberately so. Any information can be made part of the web, be it a novel, a train timetable, a movie, or tomorrow's weather, or a table of stock prices. A key feature of the web is that it knows no boundaries of information type or source.

There are many information systems all over the world. When you think that almost every printed or typed document is now prepared on a computer, almost all of the information which has been prepared in any form is or has been at some time stored on some computer disk. Add to this all the numeric data in relational databases, the recorded sound and video archives, and the vast amounts of scientific data from spacecraft and experiments of all sorts on earth, and you have a wealth on information which is growing at an explosive rate and which no one until recently has known how to organise.

Here is a small selection of the things which you can find in the web, to give some idea of its diversity.

The satellite picture showing the weather where you are;
A collection of medaeval Gaelic manuscripts;
A customised map of anywhere on the earth;
Copies of original documents from the USSR giving evidence of violations of human rights;
Ordering information and details of available books, computer equipment, etc;
The phone number and address of people in various institutes;
The current state of the accelerators at the European particle physics laboratory;
An english dictionary
Stephen King's latest novel.

The list goes on. The only way to get a true feel for its contents is to explore it yourself. When you do so, you'll be using hypertext.

Hypertext -- the dream

"Hypertext" is the linking of information in an unconstrained way. Strictly, the word refers to the linking of textual information, but we use it even when extended into other media. (Sometimes this is known as "hypermedia".) As a term Hypertext was dreamt up be Ted Nelson [ref. to Lit Mach] , one of the early visionaries of computerised information. The idea goes back further to Vannevar Bush, [scientific adviser to President Roosevelt -- check] who first imagined in 1945 [ref. to Atlantic Monthly article] a machine which could follow the relationships between information in a way mimicking the associative power of the brain. Random association is the ability to link otherwise unconnected thoughts, as in sequences like "press -- squash -- tennis -- elbow -- foot". People, unlike computers, are good at it to the extent that we imagine that associations are the basis of the way the brain works.

In hypertext, a computer shows a person associations. As I write this chapter, there are a hundred related things which come into my mind which I would like to tell you. Because I am writing a book, I order them in my mind in a way I hope will make sense to you, and I put them down in that order on the paper. As you read this book, you have to follow my way of looking at the subject. In places, I suggest that you skip sections, or take a detour. Sometimes I make a reference with a footnote to another place in this book or another. However, I don't do this very often as I know that you are not going to go to the library and look up the reference without is rather breaking the flow of the narrative.

If you were reading this book in hypertext using a computer, it would be a different thing. When you saw a mention of Vanevar Bush's article, you might see it was underlined (or a different colour, or something). If you were interested, you would be able to jump straight to the original by selecting that highlighted phrase with a mouse or with the cursor. If you imagine all the books and articles and notices you read being hypertext, you can imagine that instead of following one train of thought (mine), you would soon be off on a deep exploration of any topics which particularly interested you. This would help both of us. You would have a vastly rich resource in front of you. I would be able to concentrate on my personal message, without having to decide which pieces of background information I ought to include. You would perhaps buy this book not simply for the text itself, but perhaps mainly as a starting point for further reading.

In part, the World Wide Web is the first practical implementation of a global hypertext world imagined by the hypertext pioneers. However, that is only one aspect of the web, and in practice it is other facets of the web which have been more important on the real world.

Hypertext as a technique

It was during the early work on the web that it became apparent that, even where there wasn't an underlying hypertext system, hypertext was a very natural way of presenting information by a computer to a human being. Looking at much o the interaction between a computer and a person, much of it consists of the computer presenting the person with a certain amount of text and a number of options. Whether the options are things to do, files to select or items in a database, and in whatever format they are normally presented to the person, the result can be regarded as a form of hypertext.

Virtual hypertext

Let me give an illustrative example. You have been using the local library's computer to search for some ancient Greek text which mentions Athens. You don't remember much about the text you wanted, so you need some guidance from the librarian, or the computer. At a certain point the librarian, or the computer might say,

"There are 2687 texts I know of which are in Greek and which mention Athens. I could give you the list if you like. Otherwise, we could break them down by author or by date. Incidentally, I also have a lot on Athens in the geography section. Or can you think of more keywords which would help us find it?"

You notice that each of the options the reply mentions are highlighted, and you know that if you select one you will be moving through the information space in the library, hopefully toward your goal. So this is hypertext. The big difference is that this piece of hypertext wasn't written by a person, it was written by the computer.

The text was made up by a computer rather than written by an author;
The links were made up by the computer rather than generated by hand;
There are an extremely large number of such texts to explore.
To the reader, it looks just like hand-written hypertext

The first two points are important because they mean that a link-rich hypertext world can exist without people having to spend the time to make the links. The infinite possible variety is fascinating because it makes the design of such an information space a challenge for the system designer. The last point is important because it means that the user can browse around information of both forms using the same simple program without having to learn any special languages or tools.

This hypertext which is generated automatically from some other form of data we call virtual hypertext. Virtual hypertext allows anyone with some data in any form to publish it "on the web" without any extra trouble. For example, if you have some computer files containing plain textual information or pictures, W3 allows other people to read through the directories on your disk as though through the contents page of a book, with hypertext links to the files. There is more about putting databases onto the web in chapter 3. [ref.] The important conclusion from all this is:

You don't have to write hypertext to use the web.

This is a very important point, and corrects one common misunderstanding about W3. It is important because most of the information in the world is not originally hypertext, and the web is intended to be able to include everything.

So now we have seen what hypertext is, and how exciting it can be, and how for the World-Wide Web it isn't really anything difficult -- it is just a neat way of presenting any information simply.

The web as an information system

Information systems with browsing and search facilities have been around for a long time. The web can be seen as an extension of information systems, in which the the menus become hypertext, and information from all over the globe is part of the same system. This is one way of looking at WWW. Regarding WWW as a mamoth information system suggests that it is rather constraining, that everyone will be forced to fit in to the same way of working. As we shall see, this is not the case, as in fact the web gives a flexibility which in fact allows totally different forms of information to exist, and to be presented in a seamless way to the user.

The web as the fount of all knowledge

The vast size and interconnectedness of the web tempts some who come across it to regard it as, and then expect it to be, the source of all knowledge. It then comes as a shock that it doesn't regurgitate answers to any question as a faster version of the mythical oracle.

The flexibility of the web is at once a solution and a problem. Just as it allows any new fact or thought to be added, it seems to prevent anything from being found! The intention of the web was to have the flexbility to represent knowledge in an unconstrained way, so as not to force upon it a preconcieved form which would in the end invalidate it. The assumption here is that we can't organize the sum of our knowledge "from the top down". Our knowledge is fragmented into millions of brains. If we wish to find a higher order, our best hope is to put what we have into the web, and then look at the result. This does not at all devalue the work of those who catalogue and arrange our knowledge at present: the librarians, journal editors, writers and teachers. On the contrary, it suggests that their work will never end, as the the most useful structures for the presentation of the web will change in time.

So the web is no more an oracle than is paper. However, with human and possibly machine help, I believe it can be organized into a resource which will be vastly superior to paper, and which will provide to ordinary people answers to real questions in a painless and efficient way. As of March 1994, there is a lot to be done, but we can see the beginnings.

The Architecure of the Web

The various pieces of information in the web is stored in many different places. Different people, different organisations, take the initiative for making available different things. This is done with a simple architecture called a "client-server architecture". We call the computers which hold the information "servers" and allowing people to read the information from programs ("clients") running on the same or other computers.

Figure 1. The client-server architecture of W3.

The important point about this is that the human reader does not have to log on to the server computer to get at the data. The client program automatically contacts the server computer and gets the information on his behalf. Then the client is responsible for displaying it prettily on the user's screen, printing it, or whatever. The communication between the client and server is very quick. The client sends over a request for the information it wants, and the server returns that information. Then the connection between the two is broken, and each is left to get on with other things. This way of communicating ("protocol") between client and server is more efficient than methods which involve a lot of chat between the two programs It is easier for the server than protocols which allow the client to keep open a connection for a long time, demanding things every now and again becuase with these protocols the server has to remember where it was in its conversation with each client.

While you are reading a W3 document, then, there is no connection between you and the server. The document has, built into it, pointers to all the places you might want to go next.

Another good reason for not keeping a connection open is that the next place you want to go might well be information from a different server. The W3 hypertext allows one document (real or virtual) to make references to another document on any other server. This works because hidden behind the highlighted text which is the hypertext link is a complete address of the document. The format of this address is in fact a fulcrum of the design of the system, as its ability to represent the address of any present or future document gives W3 users the ability to make links to anything there is.

W3 has a way of making up an address for any accessible piece of information

The address is sometimes called a URI, for Universal Resource Identifier. (There is a certain amount of acronym soup concerning URIs, URLs and URNs which need not concern us at this stage). URIs are decribed in detail in Chapter 5. A URI starts with a prefix stating what sort of an address it is, and the format of the rest of it depends (within certain constraints) on that. This means that when a new protocol comes along with a fancy new addressing (or naming) scheme, we can define a new prefix and extend the addressing scheme. This is one important way that W3 can be added to in the future to grow with technology.

As well as taking into account future protocols, the URL syntax includes forms for addressing objects which are accessed using old protocols predating HTTP. These include the Internet File Transfer Protocol (FTP), the Network News Transfer Protocol (NNTP) and the "Gopher" campus-wide information system protocol. W3 documents can therefore point into existing FTP archives, news groups, and Gopher holes. Although the data within such servers is not hypertext, W3 clients have a built-in ability to speak those protocols., and they generate hypertext views of the directories and new groups for the reader.

News as hypertext

Internet news articles have been around for many years, and have been one of the wonders of the Internet. Although it has not always been used to advantage, news articles contain lots of useful pointer information, to their news groups, their authors, and articles to which they specifically refer. This is all ripe for representation as hypertext.

Background: Internet news

Internet news articles are distributed by a "flooding" method. Every computer passes a new message on to all its neighbours who don't have it already. The articles can, like hot gossip, start anywhere, and will go everywhere. Each article is allocated, by its author, to one or more "news groups". A news group is one of a carefully maintained set of subject headings, a little like a library classification system. In this scheme, comp.sys.mac means "Computing: Specific computer systems: The Macintosh", and "soc.culture.indian" means "Society: Culture: Indian culture" (@@ real group?)

[pic]

Example: A news article as hypertext.

WWW Gateways

As I mentioned above, the idea of virual hypertext is that a computer program can provide a hypertext view of any other world of data which may not have been originally designed with W3 in mind at all.

The program which makes this happen is called a gateway, and the way gateways work is discussed in Chapter 5. Essentially gateway combines a W3 server with a program which accesses the data in whatever for it is stored.

[pic]

Fig n. A Gateway combines a W3 server and a database access program.

The easy thing about making a gateway for existing data is that the W3 server has already been written, and probably, so has a suitable data access program. All that is needed is a little "glue" programming to allow the server to invoke the data access program, and for the results to be translated into hypertext form. This is done without changing the way the data is stored or managed, so the cost is very small. Often the most difficult thing to change would be the way the data is collected, prepared and mantained, and these software and social systems can be left quite undisturbed.

[@@@ To be continued. Other examples of gatewayed hypertext with screen dumps].

Summary

We have looked at W3 as a hyperspace of information, as a collection of client and server software, and as a set of protocols which allow clients and servers to talk to each other. The "web" is an abstract world which relies on the physical "net" for its existence, but that one can explore the web without having to know anything about the net. The web is a seamless world populated with hypertext, multimedia, and things which we can query. Some of the hypertext is created carefully by hand, some is generated on the fly by a computer, but whatever the origin of the data, it is presented in the same simple way.

[Missing from this ch: Information hiding in servers -- black box (television analogy): reasons for. Justifies virtual idea.]