Book outline

Transcript of Tim Berners-Lee's talk to the LCS 35th Anniversary celebrations, Cambridge Massachusetts, 1999/April/14. See also:

[edited for comprehensibility]

TIM BERNERS-LEE: It's a great pleasure to be addressing you on this 35th anniversary. Of course, it's a 35th anniversary of LCS, and it's also the 35th anniversary of the Web, if you count in Web years. [Laughter.]

I will say a little bit about where I'm coming from, what the original idea was, because I don't want to talk about the future as a prediction. I don't give predictions. That I leave to Bob. It's dangerous: you end up eating your articles, and so I will stick to talking about what I would like to see partly because when there are a bunch of people from LCS in the audience, the next thing you find is somebody has come around to your office, knocked on the door, and said that, by the way, they've done it.

When I'm talking about what I would like to see, you know, it hasn't changed very much in ten years. So if I talk about where I'm coming from, what I wanted to see then, then, that's two-thirds of my hopes for the future. I'll give a little bit of history of how where the World Wide Web Consortium came to LCS, and then I'll talk a bit about the Web and about an interesting distinction between what we used to call documents, and what we used to call data.

The basic ideas of the Web is that an information space through which people can communicate, but communicate in a special way: communicate by sharing their knowledge in a pool. The idea was not just that it should be a big browsing medium. The idea was that everybody would be putting their ideas in, as well as taking them out. This is not supposed to be a glorified television channel Also everybody should be excited about the power to actually create hypertext. Writing hypertext is good fun, and being with a group of people writing hypertext and trying to work something out, by making links is a different way of working. I hoped that it would be a way that soon, for example, the European Particle Physics Laboratory at Geneva, Switzerland, where I was at the time. I'd hoped it would be a way for us to much more efficiently use people who came and went, use student work, use people working remotely. And leave a trail, not a paper trail, but a trail in hyperspace.

So I had hoped that the Web would be a tool for us, understanding each other and working together efficiently on larger scales. Getting over the problem which befalls the organization that was so fun when it was a start-up of six people (many of you will know about this phenomenon). When you get to 60 people it is still great fun, and you're still rollerblading in the parking lot. And then when you get to 61 people, you worry that you don't know that person's name, and the difficulties of scaling the organization set in.

There's a second half to the dream really, and I must admit that originally I was a little bit careful about expressing this. But the second half is the hope that when we've got all of our organization communicating together through this medium which is accessible to machines, to computer programs, that there will be some cool computer programs which we could write to analyze that stuff: to figure out how the organization really runs; and what is its real structure, never mind the structure we have given it; and all kinds of things like that. And to do that, of course, the information on the Web would have to be understandable to some extent by a machine and at the moment it's not.

Here is a very basic history overview. I originally wrote a proposal. That's the piece of paper which I dropped into the time capsule, for those of you who were at the party. I wrote the proposal in 1989 and tried to explain that I thought the global hypertext would be a great idea. Now, the world is full of people writing these proposals and since Vennevor Bush started in 1945 and it was published in the Atlantic Monthly and still nobody developed a global hypertext system. And then Doug Enbgelbart actually showed people how to do it two decades later, and still it didn't happen because he just didn't happen to be in the right place at the right time. But I was.

I was in right place in that the European particle physics community was full of people with machines on their desk — now just about starting to be Internet worked: connected to the Internet as opposed to all sorts of proprietary networks. And I was at a place where my boss Mike Sandel and his boss David Williams, who is sitting down here, were prepared to not say no. They let me go ahead and do it, "even though we can't actually justify it." That happened actually in 1990 when I bought one of those new NeXT machines, which was a great programming environment in lots of ways. I could actually put together a hypertext editor, (browser/editor; it was the same thing—It was modeless) pretty quickly. And then in the summer of '91 we actually released the code, put it up on an FTP server and drew people's attention to the first Web site and the first Web client and started to try to push this. It was still very difficult, you know, to explain how exciting global hypertext is if you only have a couple of Web pages. That may seem silly now and obvious, but it's very difficult to show the excitement in one Web page.

The excitement of a hypertext link is that it can point to anything out there. When there's nothing out there then that is just difficult to demonstrate. So for several years it's a question of first trying to justify my existence. In fact I wasn't working on anything else, and the other people who had got onto the team one way or another. They sort of slipped through, working in different places, working across the world collaborating over the Internet. And persuading people to put out Web browsers was tricky. It involved all kinds of doing sneaky things, suggesting that they needed a Web browser for a very specific application so that they would get it and then that they would be—they would just increase the number of clients out there which would increase the incentive for somebody to put up a server and vice-versa. And eventually the thing started snowballing.

Now, in 1992 it was clear that it was taking off. It still wasn't clear that it would, for example ever take over from the Internet Gopher, which was another system expanding exponentially on the Internet. But people were already starting to come into my office. Alan Kotok from Digital came with three colleagues, unannounced. Now, people don't generally drop in Geneva unannounced, particularly Americans. We found a conference room quickly and he explained that they were starting to investigate what Digital should do, how Digital should address this "Internet" and the World Wide Web. "We're concerned about stability and we understand that it all hinges on some specifications which you have stored on a disk somewhere..". They wondered how stable they were and how we get to insure their continued stability and their evolution.

I started talking with them and other people about what sort of a body we needed to make sure that the Web would evolve into something we could use—now it was becoming a serious thing. They were very adamant, like everyone else, that there should be some neutral forum where people could meet. I started shopping around. I looked at a number of different possibilities: setting it up as a company; joining a large company and setting it up base there, setting it up at some other institution. I traveled around a bit and talked to a lot of people and there's one place which came up with checks in all the boxes. In fact it was on a bus going from a conference dinner in Newcastle in northern England on one rainy night to a small hotel that I sat next to David Gifford from LCS, who listened to the story politely and said I should mail this Michael D something—mld@hq.lcs.mit.edu—and he might be interested.

I did and next thing Michael dropped in in Zurich and from then on I discovered that not only could I sell him the idea of setting up as a base in the U.S. but I could sell him on the idea of setting it up as an international thing. He was just as enthusiastic as me about that. So that's the story of how the Web Consortium came to LCS. And the rest is more or less history and acronyms, and I won't to into the acronyms in case you are frightened about them. But basically things have been happening.

The fundamental thing about the space—about this Web, as I said, is that anything can refer to anything. Otherwise it's no fun. You've got to be able to make the link to anything. It's no good asking people to put things on the Web, saying that anything of importance should have this "URL",if you then request anything else. To make such an audacious request you have to then release anything else. So that requires that the Web has completely minimalist design. We don't impose anything else. It has to be independent of anything. The great challenge, really the raison d'etre initially for getting the Web protocols out, was to be independent of hardware platform: to be able to see the stuff on the mainframe from your PC and to be able to see the stuff on the PC from the Mac. To get across those boundaries was at the time so huge and strange and unbelievable. And if we don't do things right it will be huge and strange and unbelievable again: we could go back down that route very easily.

It was important to get it should be independent of software. The World Wide Web originally was a client program called "World Wide Web". I eventually renamed the program because I didn't want the World Wide Web to be one program. It's very important that any program that can talk the World Wide Web protocols—(HTTP, HTML,...) can provide equivalent access to the information.

It's very important to be independent of the way you actually happen to access this information. We're using a rather large screen here but it works just as well on this small screen. It should also work if you need to have these read to you, because maybe you're visually impaired or maybe you're driving along. 20 percent of the people who have access to the Web have some sort of impairment; maybe they can see the screen fine but they can't use a mouse. So it's very important that we separate the content from the way we're presenting it. This slide is just an HTML file, but it has a style sheet that says it needs to be big and it should be white on blue according to the guidelines.

It's important that the Web should be independent of language and culture, and I could now talk for two hours just about that. In the Consortium, just as we have a Web accessibility initiative addressed the question of accessibility, we have an activity which looks specifically about internationalization. But then you have to add culture, then you're talking about a whole lot more than just using Unicode and just making sure that you can make the letters go up and down the page instead of across the page.

It's important that the Web should be independent of quality of information. I don't want it to be somewhere where you would publish technical reports only after you had finished. If you can link to anything I want this to be part of the process. So the review of the technical report and the scribbling of the original note which led to the idea that became the project which resulted in the technical report should all be there and they should all be linked together. So it's very important that you should be able to instantly go in there and edit. (Now actually I'm very sorry that this is not my machine so I'm not using my editor. Otherwise I would be able to just go into this slide and put the cursor in the middle and edit the slide.) At the same time, when I use the word "quality," it's important to remember that the idea of quality is completely subjective. So the Web shouldn't have in it any particular built-in notion of what quality means at all.

There are one, two, three, four, five, six dimensions I have mentioned along which documents on the Web can vary. Throughout all the history and through the future evolution it's been very important to maintain this invariance with all the fancy new ideas that came in. Every now and again we get a new suggestion that flagrantly violates one of these areas, and we have to find ways to turn it around and express it in a way which does not.

The last dimension of independence is an interesting one. There's a difference between documents and data. This division that David Williams used to lead originally was called "Documents and Data." There was a feeling around the organization that it was a very funny old name, and it should be renamed as "Computing and Networking," and now it's probably being renamed as "Information Technology," or "Information Systems". But at one point it was Documents and Data. And perhaps that was the silliest name at all, but perhaps it was the most insightful. Because on the Web you find "documents" of the sorts of things people read and write, and you find "data" out there which is the sorts of things machines read and write. And that distinction is interesting. And it's important that the Web should allow everything on that spectrum as well; that we should have things which are very specifically aimed at people, caligraphy and poetry. At the same time we should have hard data which is processable very efficiently, and logic which can be analyzed by a machine. And things in between. A lot of the Web is sort of things in between. When you hit a Web page which has stock prices on it, there is data on there. You're looking for data. When you look for the weather you're looking for data but it comes in this sort of dressed up fashion with a nice pink flashing border and a few ads at the top in a way that's designed to appeal to you and entice you to buy things.

So you could think of it, if you like, as three layers: at the top, there is the presentation layer. For this slide it's defined by style sheet. And in the middle there's content, a funny word which seems to be popular on the Web nowadays. This, the HTML code, which says that this thing which in fact the style sheet had turned yellow is a first level heading, and this thing is an unordered list. And then underneath—there isn't a lot on this page I would say would be data. There's a metadata at the top which gives the relationship between this slide and the other slides. But the data are the things like the stock prices and who actually wrote this and when it was created, and what we think the weather is going to be like tomorrow in Boston and things like that.

I'm going to contrast these two sides a little bit. Because when we're looking at the way forward and also when we're assessing how far we've got, those are the two benchmarks.

How well are we doing? Are we doing human communication through shared knowledge? Let's look through the document side. On this side the languages are natural language. They're people talking to people. So the language is you just can't analyze them very well. And this is the big problem on the net for a lot of people, is the problem for my mother and your mother and our kids. They go out to search engines and they ask a question and the search engine gives these stupid answers. It has read a large proportion of the pages on the entire Web (which is of course amazing) but it doesn't understand any of them — and it tries to answer the question on that basis. Obviously you get pretty unpredictable results. However, the wonderful thing is that when people communicate in this way, this kind of fuzzy way, people can solve problems intuitively. When people browse across the Web and see something expressed in natural language, they think, "Aha!" and suddenly solve a totally unrelated problem due to the incredible ability that the human brain has to spot a pattern totally out of context by a huge amount of parallel processing.

It's very important that we use this human intuitive ability because everything else we can automate, but we're not very good at automatically doing that. I wanted the Web to be what I call an interactive space where everybody can edit. And I started saying "interactive," and then I read in the media that the Web was great because it was "interactive," meaning you could click. This was not what I meant by interactivity, so I started calling it "intercreativity". (I don't generally believe in making up words to solve problems, so I'm sorry about this one.) What I mean is being creative with others. A few fundamental rules make this possible. As you can read, so you should be able (given the authority) to write. If you can see pictures on your screen, why can't you take pictures and very easily and intuitively put them up there? You feel that you know how to use the Web? Somebody yesterday asked me, "What's the problem? The Web is so intuitive. Hasn't it solved that problem?" I asked,
"Do you take digital photographs?"
"Yes"
"So how long does it take you to get them on a Web page so the rest of the family can see them?"
"Oh, I wouldn't know how to do that."

We're certainly not there. At the moment I certainly cannot put the cursor in the middle of this slide and correct a spelling mistake. So in fact there's a huge amount we have to do. One of the reasons this is difficult is that it's actually hard. The research community produced group editors which would allow you to edit documents and share a document. And while two people are working at the same time—we know how to do that; we the academic community. But I don't have it here now. I can't edit this so that somebody watching this on a broadcast can see the edit at the same time.

So one of the reasons is that it's actually hard to get the software working seriously, as a product. It also needs a whole lot of infrastructure. We need a lot more stability. We need people to learn to stop changing URL's, so links don't break. That's just a question often of hygiene and making an organizational commitment, when you put something on the Web, to keeping it there. But also, underneath, we need digital signature. We need digital signature so that when you share things with your colleagues you know that you're sharing it with your colleagues and you're not sharing it with just anybody, any hacker who happened to turn up on that strip of Ethernet. So if you ask me what is the most important thing for us to do over the next 35 years, that I would hope in the next five to ten years we can fix this. We can fix this so that you can use the Web intuitively as the way that you express an "aha!", a thought, the moment that you think of something. And I can fix this slide the moment I realize it's got garbage on the bottom.

Now a look on the other side. The other side is very different. Data has very well-defined meaning. So typically a huge number of Web pages are generated from databases. The people who produce the databases may, when they started it with a little spreadsheet, have had a vague idea of what the columns meant, but by now have a very good idea of what the columns mean. The database expresses well-defined relationship between things in the columns. When you had a weather server to pick up the temperature in Massachusetts, in fact the person behind it knows that this is the temperature in degrees Centigrade measured at seven o'clock in the morning at Logan Airport using this little thermometer four feet above the ground by that little bench that you see on the television. So there is well-defined data and there are well-defined things you do with it. When you write a digital check a fairly well-defined thing has got to happen. And when you look at your bank statement after having written the check and the check having even been cashed, there's got to be a very simple logical relationship between those things. You don't generally send pieces of poetry, which should give the bank a feel for the amount of money to pay to the payee.

At the moment there's a very strange phenomenon going on. The data is being exported as Web pages. There are programs which want to process that data, who want to, for example, analyze the stock prices, who want to look at all the bookstores and find out where you can get that book cheapest and then present you with a comparative shopping list—and there are lots of Web sites out there. If you're not using one, do: you could save yourself some money. What's happening is that they are often going out to a Web site which may or may not be cooperative: it may just be putting that information on the Web. Sometimes the Web sites that they are scraping for data, would not cooperate if asked to. But the data is out there; it's available. And so you have one program which is turning it from data into documents, and another program which is taking the document and trying to figure out where in that mass of glowing flashing things is the price of the book. It picks it out from the third row of the second column of the third table in the page. And then when something changes suddenly you get the ISBN number instead of the price of a book and you have a problem. This process is called "screen scraping," and is clearly ridiculous, and the fact that everybody is doing it shows to me that there is a very very clear demand for actually shipping the data as data. So that if somebody wants to do an SQL query, if somebody wants to query an object out here, they don't have to go through this whole simulation of a very simple query in order to actually get at the data.

The idea of "the semantic Web" is the side of the Web where data has meaning. What's meaning? I'm not suggesting that you should program your computer to understand the meaning of life right now. I am using meaning in the sense that either there is a program which knows somehow how to pay a check and therefore can just process a check, or somebody has to find a relationship between what the documents, the checks, call price and what this catalog calls price. So there has been a link made between the meaning of one column and the meaning of another. So meaning in general on the semantic Web is defined relatively. Just like in a dictionary.

Don't panic. I'm not becoming relativist about my morals. I'm just pointing out that all definitions that we use at the moment are relative to other definitions and so on just as in a dictionary. One of the things which we are doing now is we are moving to a state when all documents will be self defining, self describing. So with the top of a document which uses all kinds of tags like price and shoe size there will be a URL of the document that defines exactly what shoe size means in this context. We won't have remove this ambiguity which happened when we extended HTML and started putting cool things like tables into HTML. People who were around in those days will rememeber how the word spread that it would be really nice to have tables in HTML: you couldn't put a table in a Web page before that. But everybody started doing it at once and when anyone started a table they marked up in the HTML code with "<TABLE>". So when you read "<TABLE>" you had no idea what sort of markup was coming in. And that lasted until we organized a global meeting of all the people involved to agree on it.

Now, we can't—every time somebody wants to think of a new idea, a new term, a new column in a database—have a global meeting to decide about it. We have to let people invent new terms all the time as they do anyway, but just make sure there's no ambiguity. Also we have to allow people to combine more than one vocabulary in the document. We don't just want to make something which works; we want to make something which can evolve. This is very important from the point of view of the World Wide Web Consortium cutting itself out of the loop as much as possible.

We have 320 members, various types—companies, organizations, individuals—all coming together to discuss global status. and we can't do that when you want to invent languages for pharmaceuticals, languages for whatever your favorite new database application may be. What we need to be able to do is to be able to send documents around which use standard vocabulary, and add extensions in in a well-defined way; which mix in the extensions, so that somebody who understands the standards but doesn't an extension can figure out whether this is a problem. And in the case that the data is in fact just informational data on the side, can process the rest. This in fact allows us to move from using one vocabulary to another vocabulary.

This partial understanding sounds like a failure. But in fact partial understanding is what allows us to actually function in the world. If you think of an invoice, if you send an invoice from one company to another, when it's paid, the person who allows that to be paid and sends the check off, checks various fields on that invoice and checks that it's been authorized an appropriate person. They check the amount, but when they look at the parts they don't have to understand exactly what a "lower left-hand engine bearing cover bolt bracket" is, because that part of the document is in fact completely ignorable for purposes of paying the invoice. A huge amount of information, stuff we read, everything that runs our business, is like that. There are documents going around in which different people understand different parts. And that is how we can extend the language. And that is how we can evolve the whole of society that uses this language. If we're going to be moving to the semantic Web we have to be able to do that.

We've talked a lot during this fest about digital signature. And, of course, digital signature, if we were only allowed to do it, would be fundamental to this. And it will be fundamental to this. We have, in fact, directly following this on Thursday and Friday, at the Consortium, a workshop about signing XML, the basic language for data, with digital signatures.Digital signature on top of the semantic Web turns it into a Web of trust in which a computer can not only reason and make deductions, using not only the logic of it, but also the model of trust. I could also talk to you about this for six hours, but I won't.

Let's look about what happens as we scale these things up. Remember the human side that when the Web was difficult to sell not only because looking at two hypertext pages wasn't sufficient to make people very excited, but also there was a certain fear that the Web would break structures. There was a lot of people I spoke to initially wanted the Web to be hierarchical because they wanted the hierarchical feeling of control. Or they decided the best documentation system for them was a matrix. In fact the Web broke out of the box and allowed you to express a hierarchy or a matrix equally well, but it allowed you to express other things, too, which was a little bit frightening. It's been a dramatic change for the individual. I am, of course, very interested in whether it can be a dramatic change for society. And I've got a feeling that I could talk for two hours about most of these points.

A really exciting thing would be if we could scale that ability to make intuitive leaps. I've always wanted to be able to do this with a group, of very bright, very enthusiastic people really interested in specific overlapping areas, say LCS, or all the people who are trying to find a cure for AIDS, or whatever. A typical thing researcher tries to do is to get as much into his or her head at once and then hope that the solution forms, the penny drops, that connection is made, and they can write it down before they go to sleep. How can you get a group of people to do the same thing? Maybe if we can use the Web as a very low bandwidth ineffective small set of neural connections which connect the people. Imagine that one person surfing the Web can leave a trail. In other words, if somebody, as they're surfing the Web and they notice an interesting association and connection can represent that with a link, then another person surfing the Web on another topic maybe find that link and use it and as a result bring a new communal path a little bit further on. And so the group as a whole after a while will be able to make that "Aha!". That's something I would find very exciting.

On the other side, promoting the machine communication is running across all the same hopes and fears as promoting the human communication. The same problems that—when suggesting this to somebody, it's very difficult to explain how if you, instead of just putting a database on the Web you put it on in a way that everything has a URL and it's part of a Web—that when all the databases are linked together, and when there are links meaning—when there are links between the meaning of this column and the meaning, well, that's not very exciting when I just described it as, you know, the last name in this is the same as the last name in this. But imagine that all the last name columns in all the databases on the Web were all directly or indirectly linked together by links. Then effectively you'd be able to join any databases that talk about the last name of a person on that together. You'd be able to query the whole Web as all the data on Web is one huge database. Which would be very very powerful, and I'm glad we talked about privacy yesterday. So the same rules have to apply. Anything can refer to anything. Wherever there was an identifier in your data language suddenly you have to be able to use a URI, and there's a certain amount of resistance to that. Because people want to maintain the fact that the systems are predictable. They don't want the language to become too expressive, because computer science is all about—this is perhaps a little unfair—the art of designing languages which are sufficiently constraining so that you can only write solvable problems in them. If you look at a particular query or you look at the language of writing what you can ask an ATM to do it's very simple, because an ATM can only do a few things. But when you link together all the data you end up with a representation of the world, and the world is a very complex place, and you need an arbitrarily expressive language for expressing that.

We end up with this tension between that and systems which we will be producing which will be predictable, like checks. We will have to constrain the checks so that you can only put an integer in there. You cannot put an expression, say that this is "pay the bearer on demand the smallest number expresseable in two distinct ways as the sum of two cubes", or something which Ron will cook up you can only calculate it in 35 years. People want that check to terminate. They want the payment to happen in a finite time. They're very worried when we suggest that the underlying structure for this will be very expressive. But in fact, when you put all these systems together, the result will be all the independent machines — Michael's bulldozers— taken together will be a huge very very complex map of the world.

I used to say that the Web would mimic the world. In fact, it ends up being the world to a certain extent. So the well be on their heuristics, we will not have to use heuristics, don't panic, in order to pay checks. But it will be a very exciting place to explore algorithms which break what we call the closed world assumption of the people who try to export things in boxes without any breathing holes. Of course, the really exciting thing happens when we mix the two worlds. I don't know we can solve any serious problems unless we do. I'm not asking for the machines to join the human world with artificial intelligence. I'm happy for other people to ask for that. But I'm just saying that if we as humans, when we have gone already to the trouble of putting data into databases, putting our schedules, our appointments into a schedule database—we've already in other cases done that; it's in a very well-defined form. Let's not lose that information. Let's not lose that semantics. Let's use it. Let's digitally sign it. Let's allow machines to start operating on it. And with this mixture of predictable mechanisms of heuristics I think it should be very exciting. For me the fundamental Web is the Web of people. It's not the Web of machines talking to each other; it's not the network of machines talking to each other. It's not the Web of documents. Remember when machines talked to each other over some protocol, two machines are talking on behalf of two people. The Consortium has a whole technical domain "Technology and Society" which recognizes that, at the end of the day, if we're not doing something for the Web of People, then we're really not doing something useful at all.

Originally it was social need that drove me to make the Web in the first place. In the future one of the exciting things is finding what I call social machines. We know about working groups and we know about social voting structures and we know about all sorts of social systems, and a lot of people are very excited about what sort of new social systems we can make on the Web, which maybe can be run by little machines; things that you can log onto and become part of and progress, just as we progress documents along standardization tracks, as we endorse things. We can invent new forms which maybe will allow us to exploit the fact that we don't have geographical boundaries anymore. I'm very interested in a more fractal, less hierarchical structure arising in society, allowing us to operate using the web of trust. Perhaps we can, now that we've got machines that can help us find out individually where we best fit, how we can weave ourselves into the Web to contribute best to society. Maybe we can continue another very small step along that path that we started when we stopped (some of us, most of the time) using violence to settle or to decide things, and moved on to using money, or in some cases stopped using money and started actually thinking about what other people were feeling and trying to do, and sharing their goals. Maybe we can find new systems based on peer respect, in which we work together and appreciate that we are all in fact trying to go in the same direction. To me that would be very exciting and make the whole thing worthwhile. Thank you very much for your attention.

[Applause.]