status: draft for internal discussion
A number of governments are interested in publishing data on the Web. The public policy objectives for doing this vary, from enabling greater transparency, delivering more efficient public services, or encouraging greater commercial exploitation to public sector information, but the practicalities are the same.
Some easy steps, but only starting points
The quickest and easiest way to make data available on the Web, is to create a Web site and hang datasets off it, such as raw XML files, or more likely CSV files. Most governments will naturally start here. The "Web site as fileserver" approach to publishing data is the obvious place to start. However, it is akin to publishing documents on the Web, but those documents not having any hyperlinks. The Web is more than a big global fileserver.
You can also consider enriching your existing (X)HTML resources with semantics, metadata, and identifiers. This will at least allow a third party to much more easily extract (meaning from) your content until you are ready to deploy better Semantic Web resources and services. If you want third parties to extract from your web content, you should explicitly state this along with any terms by which it may happen.
The real power comes when you put your data on the Web
The ability to identify things by a URI is what makes the Web, the Web. If you want to do "open government data", check out the Architecture of the World Wide Web, Volume 1. It says, "global naming leads to global network effects". What this means with data, is that both datasets and useful fragments of data, need to have URIs. This is really important. If you give URIs to your data, and to bits of your data, people will be able to use it more easily across the Web. Check out Cool URIs for the Semantic Web for some good advice about URIs.
@@add pointer the the Self Describing Web (TAG Finding)?@@
There are four expectations for the linked data Web. That you use URIs as names for things; that you use HTTP URIs so that people can look up those names; that when someone looks up a URI, you provide useful information; and that you include links to other URIs, so that they can discover more things. Our best advice to governments is try to do just this, and do so using the open Web standards provided by W3C.
@@Maybe add here XML+XLink+XQuery@@
If you really want to help people discover and explore the data you are publishing, there are some useful W3C standards which can help. SPARQL is the Web's querying language. Making available your data via a SPARQL Endpoint is a very powerful approach. This will help people discover and explore your data, write their own queries. With SPARQL you really will have joined the Web of data.
Choosing what to publish as data on the Web
@@as much structured information as possible!@@
@@criteria for prioritising: e.g. survey potential consumers, start with the most ready data@@
Choosing formats for putting data on the Web
There are lots of different formats for data on the Web. RDF and XML were designed for putting data Web, and are good formats to use. RDF in particular allows you to link data together, to really exploit the potential of the Web's linking architecture. There are lots of ways of publishing RDF data and for turning data from databases or other formats, like CSV or Excel, into RDF. XML is an excellent format for information interchange and also for some types of data. If you have an existing Web site, and would like to make your XHTML information (say a table of figures) available as re-useable data, RDFa and GRDDL are two good approaches to use. While not a W3C standard, JSON is also a popular format amongst data mashers and you may wish to use that as well.
@@establish a preference for (open) standards (including complete toolset), interoperability, flexibility, expressiveness (e.g. ability to add metadata), addressability@@
@@also important to consider what suits the data@@
@@explore pros and cons and suitable cases for Atom, POX, GeoRSS, KML, RDFa/microformats, POSH, HTML data tables, CSV, JSON, RDF, more@@
@@if building a new application, consider using one of these standard formats or tools as the primary datasource or query engine respectively@@
We're all learning
There is lots of learning going on at the moment about how best governments can put data on the Web. The W3C technologies and approaches described here can be implemented quickly and often at relatively little cost. If this is all new to you, there are a number of communities that can help. If you are a government department looking for support with this, think about joining W3C and becoming involved in the e-Government activity. There you will find people from other governments trying to similar things, sharing their experiences together. Even if you don't join W3C, please let us know how you are getting on. We can help.