Tim Berners-Lee
First written 2001/10/25. , last change: $Date: 2001/10/29 17:55:05 $
Status: W3C Semantic Web Activity background. Editing status: published, not perfect.

This is more of a white paper than a technical note. It addresses that oft-asked question, "So how is the semantic web going to affect me?"

Up to Design Issues

Business Model for the Semantic Web

Enterprise Application Integration and other stories

Why, when XML gives us interoperability, do we need the Semantic Web?

Pre and Post Web

Consider the state of documentation systems in 1989. This was when the Internet was starting to become internationally established, but retrieving documents from remote systems was an expert's game. Suppose you needed to transfer information from one system to another. On a lucky day, you might have a PC which had network access to both systems. However, the remote systems used completely different protocols. You might telnet to one system and have to learn the library access program before you could search its database. Having done that, you would have to copy the information into your clipboard (or the back on an envelope) and connect somehow -- maybe 3270 terminal emulation - to another machine to access its documentation control system. If you were lucky, you could find your way to the right place in the new system, and then paste in (or retype) the information.

That was before the Web. Now, miraculously, we have the Web. Both systems now have Web servers, and despite still in fact running on different machines and the same weird programs, the Web interfaces make them seem part of the same smooth, consistent, world of information. For the documents in our lives, everything is simple and smooth.

The Personal Information disaster

But what about the data in our lives? There are lots of ways in which our machines can use data when they can understand it. When my calendar understands dates, it can warn me when an appointment is coming up. When my Global Position System device understands latitude and longitude, it can show me the way to where I should be. When my address book understands that something is a phone number or an email address, it can set up communication with a person with a click.

But consider the reality of how we use these things. Suppose you are browsing the web and you come across a web page about the meeting you want to go to. It has on it the time and place, details of other documents, and of other people involved in organizing and attending the meeting. You decide to attend the meeting. At this point, you would like your calendar to have an entry at the right date and time, with hypertext links to the details. You would like your in-car navigation system, at that date and time , to be programmed with the coordinates of the location. You would like your Rolodex to seem to contain, until the meeting is over, the contact info for the people involved. You'd like to do all this with one click.

What you in fact have to do is laboriously cut and paste details into your address book, finding the date and time yourself. You have to copy the contact details by hand from the web page into your address book, manually sorting out the address lines and phone numbers. And if you use a GPS system, you may have to manually fiddle with the buttons on it to set up a way-point at the coordinates of the meeting. This is just the same as the documentation system before the web. For data, we are still pre-Web.

Enterprise Application Integration

If this is bad - and evident now you think about it - for personal data, consider the impact for a company. The same situation exists when you look at trying to connect the various data-handling applications on which your company depends. There is a certain overlap between the stock control system and the accounting system which, if the connection were made, would save a lot of re-keying and associated errors. You hire a consultant to write the glue code to suck the data out of the stock control system, reformat it, and blow it into the accounting system. The same thing happens when you realize that your customer relationship management system could be set up with data from the order control system - in fact it is crazy that it isn't. And so on. If you have N applications which run your company, there are of order N2 ways in which you may want to connect them together. Which all adds up to a lot of custom programming by a lot of consultants.

Of course, the good news is that if all the applications use XML, the consultant only has to learn to handle XML data, not the full range of weird internal formats in which data used to be stored and transferred. This means that some of the application glue can be constructed using XML tools such as XSLT, the transformation language. The bad news is that the problem is still an N2 problem. For every pair of applications, in fact for each way in which they need to be linked, someone has to create an XML to XML bridge.

If you take XML files from two different applications, you can't just merge them. To take a (XML) query on an XML document, add add in some constraints from another document, you can't just merge two queries. It's now as though everything is in a relational databases which can be joined together.

Relational data strikes again

All you need to do is move up a thin layer of interoperability. Just as database systems suddenly became compatible by adopting a consistent relational model, so your unstructured data can also adopt a relational model, and get all the benefits you need to solve these problems.

The relational language for data on the Net is called RDF. When information from two sources is in RDF needs to be merged, you basically concatenate the files into one big file. When you want to extend a query on an RDF file to include constraints from another, you just write it in.

RDF information can, like XML, have more or less structure. When XML is made up of elements and attributes - which tells you only about how things are written into the file - RDF data is made up of statements where each statement expresses the value of one property of something -- the exact equivalent of one cell in a database table. All the relational database ideas work - joins and views, for example, written easily using common tools.

What happens now to your enterprise application integration problem? The information from each application is output in, or converted into, RDF. Any query can run over any selection of this data. Filters can be written very simply, and converters to extract and calculate the data you need. Then, this is is input to the applications which need it. Basically, the problem is linear in the size of your system. Just as new web servers can be fitted into the web without disturbing the rest, so new RDF applications supply and use information without upsetting the rest of the system. The N2 problem has gone.

Is this rocket science? Well, not really. The Semantic Web, like the World Wide Web, is just taking well established ideas, and making them work interoperability over the Internet. This is done with standards, which is what the World Wide Web Consortium is all about. We are not inventing relational models for data, or query systems or rule-based systems. We are just webizing them. We are just allowing them to work together in a decentralized system - without a human having to custom handcraft every connection.

We're working to the time when you can click on the web page for the meeting, and your computer, knowing that it is indeed a form of appointment, will pick up all the right information, and understand it enough to send it to all the right applications.

Up to Design Issues

Tim BL