Tim Berners-Lee on eGovernment
video for EU Ministerial eGovernment Conference 2007

Background

Tim Berners-Lee recorded a video keynote for the 4th EU Ministerial Conference that took place in Lisbon on 19-21 September 2007. The video length is 15 minutes.

The video is encoded is Ogg Theora format. If you have problems watching it, we have some help available. The conference organizers also made the video available at YouTube. There is also a set of accompanying slides that was used as a summary of the video content by José M. Alonso to complement the keynote at the conference.

Transcript

Ladies and gentleman, let me add my welcome to the conference. I'm sorry I can't be with you in person. I have a few minutes now of thoughts for you at this eGovernment workshop to talk about things that are most important for the next phase of eGovernment and the interactions of governments and governments and people intermediated by the World Wide Web.

I'm gonna talk about making data available, about openness, and also a little bit about transparency in the sense of governments making clear to people how they're using data appropriately. There are some slides that are on the Web, they're maybe in your handouts, I'll go through them very rapidly, if it weren't for the slides, I could talk to you for 45 minutes and we don't have enough time for that.

So let me talk first about openness. So why openness, what it means for a government. An important thing to remember when you're being open, firstly I think, government departaments have an obligation to make the data available to the public at large, NGOs, other governments; unless there are really good reasons for keeping that information confidential, then it should be put out, and now we have standards for putting data on the Web, we have standards for putting documents on the Web, I think both documents and data should be put out there using standards.

Why standards? Well, it's fair for a range of reasons, if you put them in a proprietary format then some people will be able to read them and some won't, you're also forcing people to buy a particular software, which I don't think it's the role of government, but also very importantly, when you put things out using standards, when you put information there in HTML, using a standard, when you put data out there in RDF, then you can be more sure that the archives will be redable by posterity. People studying from other countries, and other times, will be able to understand the information.

So the idea of opennes is to maximize reuse. I'd just like to reflect a little on what reuse of information means. If you think about the Web, the value that the Web adds to information is unexpected reuse. When something is put on the Web, and maybe is put on because one person asks for it, but it's often reused by other people in ways unimagined by the person who first asked for the information. Similarly, when you're looking for things on the Web, you find things which you never expected to find. That is the power. We do that at the moment with documents and we should do it also with data.

Who can reuse this? When you put information from your government department on the Web, it may be reused by the public clearly; by your collegaues as well, it may be that there are people within the same department who actually haven't found access to your data otherwise; but also in other agencies, very importantly, we don't want government department to be stove pipes of information, we want it to be used by people in other agencies both in your country and other countries. Very often you can only get a real view of what is happening in the World by combining data across many different software application fields and many different countries.

It's also obviously important for companies you're doing business with, for people that you award grants to, you should be able to put out the data of what grants available and so on, your partners up and down the supply chain in general. It's also important for research because as you, for example, put out the data about unammended roads in your county or whatever it is, somebody somewhere else can be correlating how that changes with time with other factors and learning, so research is always interested in taking a different look at this data.

It can be used by executive management, the very high level, Ministerial level, when you want to make a decision, it's very important, sometimes in a hurry, that a Minister has access to, can ask a question which needs a view of data across of, perhaps, many agencies, to be able to answer that question. If the data has all been provided in a standard format, one can rapidly perform that query, get the result back, produce a graph, whatever it is, can base the esence of the situation quickly, particularly interesting and important to think ahead about data availability for emergencies. When emergency occurs you don't know, by definition, what information you're gonna need. When planes hitted the twin towers, nobody had that problem before, they needed access to lots of different information to combine together and it would be very much easier if that information would be available in a standard format. So, if you like, it's the art of planning for the unexpected.

It's very important when you this to use the standards. The current standards, the Semantic Web standards, which are RDF, OWL and SPARQL (the query language), are different in a few technical fundamental ways, now trying to get across the esence of one of these which impact the way that government departments work together. In the past, before the Semantic Web standards, you had to choose. If someone was putting some information out there on the Web or securating it in some form, they had to choose whether to use an ISO standard, whether to use a national standard, whether to use a local standard that had been produced pehaps by town or been invented for a particular project. That is a difficult decision to make; because ISO standards are very hard to make, they take a long time, because you have to get a lot of people to agree, and they only typically exist for few concepts. Meanwhile, local standards like termns that have to define, so to say, holes on the road that need to be fixed, may only be defined by local town so they're not so reusable.

The Semantic Web allows you to send data or put data on the Web using a mixture of terms. So when something involves the time or a date or latitudes and longitudes, then you can use terms which will be recognized by software in many different applications across the World. When you use terms like the category of pothole in a road, then it's maybe something that is local only to a particular area, but when information about a particular change goes out, it will have mixed data. Some of the details of the pothole data may be only understandable by the local town, but some other thing, like the fact that the event happened in a particular time, date or place would be understandable by anybody, anybody will be able to put in on a map or a timescale, in between there can be national standards. So in fact, it turns out that when you send data across the net, data is sent in a mixture, and the Semantic Web tehcnology allos you to go out in a mixture of languages if you like, so every line in the form goes out and is written in a different language; some languages very well known, some less well known, and that is sort of magic about the Semantic Web technology, that allows you to go around that problem if you like without to having to make a one big choice of whether you have to use a given standard or not, it allows you to use a mixture, and puts a constant pressure for the development of terms that are more wildly shared. Understading that, I think it's important because you have to be able to push back on people who say that "it's too difficult" or "there's no standard" or "we don't want to use standards technology because we don't have terms, there aren't terms out there for everything we need", and you can push back and say "well, use the terms that are out there and exist, and when they don't exist, don't use them."

So the Semantic Web, in a sense, is a technology which allows this balance between the how many of common languages and the diversity of too many languages to be in a better balance. When you put data on the Semantic Web, in fact you will be giving identifiers, URIs, things of storage, HTTP, all kinds of things, to portals, to people, to government departments, people who have a public face, certainly to roles, to organizations, to projects. If a thing is useful for a government you can give a identifier to it, a URI, in a given department, give it that URI. A very important concept is that of linked data, that is when date that is published by one department and it will use the URIs for things which are under the control of another department it will use those URIs. So somebody who is picking the information of department A will be able to automatically pull in the relevant backing information about this object which is defined by department B. We call that linked data, and the linked data is starting to take off now on the Semantic Web.

So basically the rules about putting data on the Web are very similar to the rules about putting information on the Web in general. Use standards. Use URIs to identify things. Now we're doing to put information about objects and projects, and things we want to be able to process. People across the World may want to pull into there, processing systems pull into spreadsheets, put onto maps and so on. Use URIs, same old rules in fact we had for the Web.

So I've told briefly about the importance of openness, perhaps some of you, when I said "oh, use URI for person" well, to some extent if someone has a public role it's reasonable for them to have an identifier so people can find things that they've said, things that they've written, how you can contact them. Obviously a few people mattered about "oh, wait a moment, we're identifying people, what about privacy?". Privacy, of course, is very important, and is one of the areas where we have to be careful because of misuse of information. There are lots of types of misuse of information. It might be breaking copyright, it may be using information I picked up as a personal listener; if I put something on my iPod and I use it here to entertain you all on the public address system, in fact I'd be breaking the rules because I got it for my private enjoyment and I'm not supposed to use it for entertaining a hall full of people. There are ways in which we get information for one parrticular use and in fact we are constrained by society, by laws, by ethics, by regulations into the way we use it, perhaps, many many things, not only what we think of naturally is privacy as a category, think of it as appropriate use.

One way of looking at trying to prevent people misusing data is to prevent them getting it but, of course, in the iPod example I need that data for my personal listening, I have to use my own discretion, understand that I have a constraint and I don't then use it to power a concert of several thousand people. So access control for that wouldn't work, and it turns out that for many things in government as well, government agencies have access to all kinds of data which they've got maybe for the purposes of counter terrorism or crime prevention, and they have access to that for that purpose and not for other purposes or maybe for pursuing some particularly bad source of crime but not for pursuing people who forgot to renew the library books. So what's more important and much more practical than trying to limit access to the most secret and very sensitive information is, in general, for inmformation around, governments, I think, tracking, building systems which track where the data came from, tracking what we call the provenance of the data. So the provenance of the data is its source, but more importantly is the things associated with the source and how I got it, in effect, how can I use it, maybe licensing information, whether it's released with a Creative Commons license for example that is commonly on the net; I can put something out there and state that this can be used for any non-commercial activity as long as you put my name, you associate my name with it, for example. But also is associated with the trustworthy of the information, so it's very important aspect for me and for anybody I pass the data to be able to explain how I got it, so if an important decision is eventually going to be made based on analysis made on that data, then somebody can go back and check and make sure it's based on appropriately sound sources. Of course, often the data is combined, so maybe I got some data from one source but then I combined it with something much more sensitive, so we have to be careful that our systems track whether the data has been polluted from the public data I took it but I added to it and processed and used data that was available for me for a very specific purpose.

So I feel that we should build systems that are aware of the provenance of data, track the acceptable uses and that is much more important than trying to do the typical security thing of blocking people's access to it, in general people will have access to all kinds of data and they have to be responsible and accountable for how they've used it, so we must build systems which allow to show they've been used in the right way.

So in conclusion, let me say that governments should be open, they should use standards, this does not mean changing how existing systems work, it means just attaching standards compliant pieces to existing systems, and when we build these systems, which will be so powerful, we should be very careful and make sure they always use data in appropriate fashion. Thank you very much for your attention.