Realising the Full Potential of the Web

Tim Berners-Lee, Director of the World-Wide Web Consortium

Based on a talk presented at the W3C meeting, London, 1997/12/3

Abstract

The first phase of the Web is human communication though shared knowledge. We have a lot of work to do before we have an intuitive space in which we can put down our thoughts and build our understanding of what we want to do and how and why we will do it. The second side to the Web, yet to emerge, is that of machine-understandable information. As this happens, the day-to-day mechanisms of trade and bureaucracy will be handled by agents, leaving humans to provide the inspiration and the intuition. This will come about though the implementation of a series of projects addressing data formats and languages for the Web, and digital signatures.

The original dream

The Web was designed to be a universal space of information, so when you make a bookmark or a hypertext link, you should be able to make that link to absolutely any piece of information that can be accessed using networks. The universality is essential to the Web: it looses its power if there are certain types of things to which you can’t link.

There are a lot of sides to that universality. You should be able to make links to a hastily jotted crazy idea and to link to a beautifully produced work of art. You should be able to link to a very personal page and to something available to the whole planet. There will be information on the Web which has a clearly defined meaning and can be analysed and traced by computer programs; there will be information, such as poetry and art, which requires the full human intellect for an understanding which will always be subjective.

And what was the purpose of all this? The first goal was to work together better. While the use of the Web across all scales is essential to the concept, the original driving force was collaboration at home and at work. The idea was, that by building together a hypertext Web, a group of whatever size would force itself to use a common vocabulary, to overcome its misunderstandings, and at any time to have a running model - in the Web - of their plans and reasons.

For me, the forerunner to the Web was a program called ‘Enquire’, which I made for my own purposes. I wrote it in 1980, when I was working at the European Particle Physics Lab (CERN), to keep track of the complex web of relationships between people, programs, machines and ideas. In 1989, when I proposed the Web, it was as an extension of that personal tool to a common information space.

When we make decisions in meetings, how often are the reasons for those decisions (which we so carefully elaborated in the meeting) then just typed up, filed as minutes and essentially lost? How often do we pay for this, in time spent passing on half-understandings verbally, duplicating effort through ignorance and reversing good decisions from misunderstanding? How much lack of co-operation can be traced to an inability to understand where another party is ‘coming from’? The Web was designed as an instrument to prevent misunderstandings.

For this to work, it had to be not only easy to ‘browse’, but also easy to express oneself. In a world of people and information, the people and information should be in some kind of equilibrium. Anything in the Web can be quickly learned by a person and any knowledge you see as being missing from the Web can be quickly added. The Web should be a medium for the communication between people: communication through shared knowledge. For this to work, the computers, networks, operating systems and commands have to become invisible, and leave us with an intuitive interface as directly as possible to the information.

Re-enter machines

There was a second goal for the Web, which is dependent on the first. The second part of the dream was that, if you can imagine a project (company, whatever) which uses the Web in its work, then there will be an map, in cyberspace, of all the dependencies and relationships which define how the project is going. This raises the exciting possibility of letting programs run over this material, and help us analyze and manage what we are doing. The computer renters the scene visibly as a software agent, doing anything it can to help us deal with the bulk of data, to take over the tedium of anything that can be reduced to a rational process, and to manage the scale of our human systems.

Where are we now?

The Web you see as a glorified television channel today is just one part of the plan. Although the Web was driven initially by the group work need, it is not surprising that the most rapid growth was in public information. Web publishing, when a few write and many read, profited most from the snowball effect of exponentially rising numbers of readers and writers. Now, with the invention of the term ‘intranet’, Web use is coming back into organisations. (In fact, it never left. There have always been since 1991, many internal servers, but as they were generally invisible from outside the companies’ firewalls they didn't get much press!). However, the intuitive editing interfaces which make authoring a natural part of daily life are still maturing. I thought that in 12 months we would have generally available intuitive hypertext editors. (I have stuck to that and am still saying the same thing today!)

It is not just the lack of simple editors that has prevented use of the Web as a collaborative medium. For a group of people to use the Web in practice, they need reliable access control, so that they know their ideas will only be seen by those they trust. They also need access control and archival tools that, like browsing, don't require one to get into the details of computer operating systems.

There is also a limit to what we can do by ourselves with information, without the help of machines. A familiar complaint of the newcomer to the Web, who has not learned to follow links only from reliable sources, is the about the mass of junk out there. Search engines flounder in the mass of undifferentiated documents that range vastly in terms of quality, timeliness and relevance. We need information about information, ‘metadata’, to help us organise it.

As it turns out, many of these long-term needs will hopefully be met by technology, which for one reason or another is being developed by the technical community, and agreed upon by groups such as the World-Wide Web Consortium (W3C), in response to various medium-term demands.

The World Wide Web Consortium - W3C

The Consortium exists as a place for those companies for whom the Web is essential to meet and agree on the common underpinnings that will allow everyone to go forward. (There are currently over 230 member organisations.)

Whether developing software, hardware, networks, information for sale, or using the Web as a crucial part of their business life, these companies are driven by current emerging areas such as Web publishing, intranet use, electronic commerce, and Web-based education and training. From these fields medium-term needs arise and, where appropriate, the Consortium starts an Activity to help reach a consensus on computer protocols for that area. Protocols are the rules that allow computers to talk together about a given topic. When the industry agrees on protocols, then a new application can spread across the world, and new programs can all work together as they all speak the same language. This is key to the development of the Web.

Where is the Web Going Next?

Avoiding the World Wide Wait

You've heard about it, you may have experienced it, but can anything be done about it?

One reason for the slow response you may get from a dial-up Internet account simply follows from the ‘all you can eat’ pricing policy. The only thing which keeps the number of Internet users down is unacceptable response, so if we were to suddenly make it faster, there would almost immediately be more users until it was slow again. I've seen it: when we speeded up an overloaded server by a factor of five, it once again rose to 100% utilisation as the number of users increased by a factor of five.

Eventually, there will be different ways of paying for different levels of quality. But today there some things we can do to make better use of the bandwidth we have, such as using compression and enabling many overlapping asynchronous requests. There is also the ability to guess ahead and push out what a user may want next, so that the user does not have to request and then wait. Taken to one extreme, this becomes subscription-based distribution, which works more like email or newsgroups.

One crazy thing is that the user has to decide whether to use mailing lists, newsgroups, or the Web to publish something. The best choice depends on the demand and the readership pattern. A mistake can be costly. Today, it is not always easy for a person to anticipate the demand for a page. For example, the pictures of the Schoemaker-Levy comet hitting Jupiter taken on a mountain top and just put on the nearest Mac server or the decision Judge Zobel put onto the Web - both these generated so much demand that their servers were swamped, and in fact, these items would have been better delivered as messages via newsgroups. It would be better if the ‘system’, the collaborating servers and clients together, could adapt to differing demands, and use pre-emptive or reactive retrieval as necessary.

Data about Data - Metadata

It is clear that there should be a common format for expressing information about information (called metadata), for a dozen or so fields that needed it, including privacy information, endorsement labels, library catalogues, tools for structuring and organising Web data, distribution terms and annotation. The Consortium's Resource Description Framework (RDF) is designed to allow data from all these fields to be written in the same form, and therefore carried together and mixed.

That by itself will be quite exciting. Proxy caches, which make the Web more efficient, will be able to check that they are really acting in accordance with the publisher's wishes when it comes to redistributing material. A browser will be able to get an assurance, before imparting personal information in a Web form, on how that information will be used. People will be able, if the technology is matched by suitable tools, to endorse Web pages that they perceive to be of value. Search engines will be able to take such endorsements into account and give results that are perceived to be of much higher quality. So a common format for information about information will make the Web a whole lot better.

The Web of trust

In cases in which a high level of trust is needed for metadata, digitally signed metadata will allow the Web to include a ‘Web of trust’. The Web of trust will be a set of documents on the Web that are digitally signed with certain keys, and contain statements about those keys and about other documents. Like the Web itself, the Web of trust does not need to have a specific structure like a tree or a matrix. Statements of trust can be added exactly so as to reflect actual trust. People learn to trust through experience and though recommendation. We change our minds about who we trust for different purposes. The Web of trust must allow us to express this.

Hypertext was suitable for a global information system because it has this same flexibility: the power to represent any structure of the real world or a created imagined one. Systems that force you to express information in trees or matrices are fine so long as they are used for describing trees or matrices. The moment you try to use one to hold information that does not fit the mold, you end up twisting the information to fit, and so misrepresenting the situation. Similarly, the W3C's role in creating the Web of trust will be to help the community have common language for expressing trust. The Consortium will not seek a central or controlling role in the content of the Web.

‘Oh, yeah?’

So, signed metadata is the next step. When we have this, we will be able to ask the computer not just for information, but why we should believe it. Imagine an ‘Oh, yeah?’ button on your browser. There you are looking at a fantastic deal that can be yours just for the entry of a credit card number and the click of a button. "Oh, yeah?", you think. You press the ‘Oh, yeah?’ button. You are asking your browser why you should believe it. It, in turn, can challenge the server to provide some credentials: perhaps, a signature for the document or a list of documents that expresses what that key is good for. Those documents will be signed. Your browser rummages through with the server, looking for a way to convince you that the page is trustworthy for a purchase. Maybe it will come up with an endorsement from a magazine, which in turn has been endorsed by a friend. Maybe it will come up with an endorsement by the seller's bank, which has in turn an endorsement from your bank. Maybe it won't find any reason for you to actually believe what you are reading at all.

Data about things

All the information mentioned above is information about information. Perhaps the most important aspect of it is that it is machine-understandable data, and it may introduce a new phase of the Web in which much more data in general can be handled by computer programs in a meaningful way. All these ideas are just as relevant to information about the real world: about cars and people and stocks and shares and flights and food and rivers.

The Enquire program assumed that every page was about something. When you created a new page it made you say what sort of thing it was: a person, a piece of machinery, a group, a program, a concept, etc. Not only that, when you created a link between two nodes, it would prompt you to fill in the relationship between the two things or people. For example, the relationships were defined as ‘A is part of B’ or ‘A made B’. The idea was that if Enquire were to be used heavily, it could then automatically trace the dependencies within an organisation.

Unfortunately this was lost as the Web grew. Although it had relationship types in the original specifications, this has not generally become a Web of assertions about things or people. Can we still build a Web of well-defined information?

My initial attempts to suggest this fell on stony ground, and not surprisingly. HTML is a language for communicating a document for human consumption. SGML (and now XML) gives structure, but not semantics. Neither the application, nor the language, called for it.

With metadata we have a need for a machine-understandable language that has all the qualities we need. Technically, the same apparatus we are constructing in the Resource Description Framework for describing the properties of documents can be used equally well for describing anything else.

A crying need for RDF

Is there a real need for this metadata and is there a market in the medium term that will lead companies to develop in this direction? Well, in the medium term, we see the drivers already - web publishing, education and training, electronic commerce and intranets.

I have mentioned the vicious circle that caused the Web to take off initially. The increasing amount of information on the Web was an incentive for people to get browsers, and the increasing number of browsers created more incentive for people to put up more Web sites. It had to start somewhere and it was bootstrapped by making ‘virtual hypertext’ servers. These servers typically had access to large databases - such as phone books, library catalogues and existing documentation management systems. They had simple programs which would generate Web pages ‘on the fly’ corresponding to various views and queries on the database. This has been a very powerful ‘bootstrap’ as there is now a healthy market for tools to allow one to map one's data from its existing database form on to the Web.

Now here is the curious thing. There is so much data available on Web pages, that there is a market for tools that ‘reverse engineer’ that process. These are tools that read pages, and with a bit of human advice, recreate the database object. Even though it takes human effort to analyse the way different Web sites are offering their data, it is worth it. It is so powerful to have a common, well defined interface to all the data so that you can program on top of it. So the need for well defined interface to Web data in the short term is undeniable.

What we propose is that, when a program goes out to a server looking for data, say a database record, that the same data should be available in RDF, in such a way that the rows and columns are all labelled in a well-defined way. That it may be possible to look up the equivalence between field names at one Web site and at another, and so merge information intelligently from many sources. This is a clear need for metadata, just from looking at the trouble libraries have had with the numbers of very similar, but slightly different ways of making up a catalogue card for a book.

Interactive Creativity

I want the Web to be much more creative than it is at the moment. I have even had to coin a new word - Intercreativity - which means building things together on the Web. I found that people thought that the Web already was ‘interactive’, because you get to click with a mouse and fill in forms! I have mentioned that better intuitive interfaces will be needed, but I don’t think they will be sufficient without better security.

It would be wrong to assume that digital signature will be mainly important for electronic commerce, as if security were only important where money is concerned. One of my key themes is the importance of the Web being used on all levels from the personal, through groups of all sizes, to the global population.

When you are working in a group, you do things you would not do outside the group, You share half-baked ideas, reveal sensitive information. You use a vernacular that will be understood; you can cut corners in language and formality. You do these things because you trust the people in the group, and that others won't suddenly have access to it. To date, on the Web, it has been difficult to manage such groups or to allow one to control access to information in an intuitive way.

Letting go

So, where will this get us? The Web fills with documents, each of which has pointers to help a computer understand it and relate it to terms it knows. Software agents acting on our behalf can reason about this data. They can ask for and validate proofs of the credibility of the data. They can negotiate as to who will have what access to what and ensure that our personal wishes for privacy level be met.

The world is a world of human beings, as it was before, but the power of our actions is again increased. The Web already increases the power of our writings, making them accessible to huge numbers of people and allowing us to draw on any part of the global information base by a simple hypertext link. Now we image the world of people with active machines forming part of the infrastructure. We only have to express a request for bids, or make a bid, and machines will turn a small profit matching the two. Search engines, from looking for pages containing interesting words, will start indexes of assertions that might be useful for answering questions or finding justifications.

I think this will take a long time. I say this deliberately, because in the past I have underestimated how long something will take to become available (e.g. good editors in 12 months).

Now we will have to find how best to integrate our warm fuzzy right-brain selves into this clearly defined left-brain world. It is easy to know who we trust, but it might be difficult to explain that to a computer. After seeding the semantic Web with specific applications, we must be sure to generalise it cleanly, leaving it clean and simple so that the next generation can learn its logical concepts along with the alphabet.

If we can make something decentralised, out of control, and of great simplicity, we must be prepared to be astonished at whatever might grow out of that new medium.

It’s up to us

One thing is certain. The Web will have a profound effect on the markets and the cultures around the world: intelligent agents will either stabilise or destabilise markets; the demise of distance will either homogenise or polarise cultures; the ability to access the Web will be either a great divider or a great equaliser; the path will either lead to jealousy and hatred or peace and understanding.

The technology we are creating may influence some of these choices, but mostly it will leave them to us. It may expose the questions in a starker form than before and force us to state clearly where we stand.

We are forming cells within a global brain and we are excited that we might start to think collectively. What becomes of us still hangs crucially on how we think individually.