This wiki has been archived and is now read-only.


From W3C eGovernment Wiki
Jump to: navigation, search

Data.gov.* Memo

status: draft for internal discussion

From: W3C eGovernment Interest Group
To: Any government wishing to set-up data.gov.*

A number of governments are interested in publishing data on the Web. The public policy objectives for doing this vary, from enabling greater transparency, delivering more efficient public services, or encouraging greater commercial exploitation to public sector information, but the practicalities are the same.

Some easy steps, but only starting points

The quickest and easiest way to make data available on the Web, is to create a Web site and hang datasets off it, such as raw XML files, or more likely CSV files. Most governments will naturally start here. The "Web site as fileserver" approach to publishing data is the obvious place to start. However, it is akin to publishing documents on the Web, but those documents not having any hyperlinks. The Web is more than a big global fileserver.

You can also consider enriching your existing (X)HTML resources with semantics, metadata, and identifiers. This will at least allow a third party to much more easily extract (meaning from) your content until you are ready to deploy better Semantic Web resources and services. If you want third parties to extract from your web content, you should explicitly state this along with any terms by which it may happen.

The real power comes when you put your data on the Web


The ability to identify things by a URI is what makes the Web, the Web. If you want to do "open government data", check out the Architecture of the World Wide Web, Volume 1. It says, "global naming leads to global network effects". What this means with data, is that both datasets and useful fragments of data, need to have URIs. This is really important. If you give URIs to your data, and to bits of your data, people will be able to use it more easily across the Web. Check out Cool URIs for the Semantic Web for some good advice about URIs.

@@add pointer the the Self Describing Web (TAG Finding)?@@


There are four expectations for the linked data Web. That you use URIs as names for things; that you use HTTP URIs so that people can look up those names; that when someone looks up a URI, you provide useful information; and that you include links to other URIs, so that they can discover more things. Our best advice to governments is try to do just this, and do so using the open Web standards provided by W3C.


@@Reference the points made in the last two slides of Michael Sperberg-McQueen's presentation?@@

Expose interfaces

@@Maybe add here XML+XLink+XQuery@@

If you really want to help people discover and explore the data you are publishing, there are some useful W3C standards which can help. SPARQL is the Web's querying language. Making available your data via a SPARQL Endpoint is a very powerful approach. This will help people discover and explore your data, write their own queries. With SPARQL you really will have joined the Web of data.

@@also Google, Yahoo supporting RDFa, a step worth mentioning? --Oscar@@

Choosing what to publish as data on the Web

@@as much structured information as possible! --Hugh@@

@@criteria for prioritising: e.g. survey potential consumers, start with the most ready data?@@

@@just put it and many minds principle, so surveying initially not needed --Oscar, Rinke, Jose@@

@@Owen and Oscar suggested we need working examples, e.g. RDFa in the UK Civil Service)@@

A good place to start would be with your mission, vision, value, goal, and objective statements. Posting them on the Web in conformance with AIIM's emerging Strategy Markup Language (StratML) standard will enable stakeholders, including potential performance partners, to more easily discover and provide feedback on objectives of interest to them. Prototype forms are available enabling users to create StratML documents without any knowledge of StratML or XML, and several prototypes have also been developed demonstrating some of the ways in which the information contained in StratML documents can be referenced and reused.

To the degree that you may already be gathering or producing documents and data in XML format, it would be good to publish your XML schemas (XSDs) on your Web site, preferably with xsd:documentation elements containing the definitions of other elements in your schemas. Doing so will enable intermediaries to automatically generate data dictionaries and provide query services enabling discovery of documents and datasets of interest to users.

@@See also the following subsections of the eGov Act: 202(b) - Performance Integration, 207(e) - Public Access to Electronic Information, and 207(f) - Agency Websites.@@

@@Reference EU PSI Directive? See [1] & [2] & [3]@@

Choosing formats for putting data on the Web

There are lots of different formats for data on the Web. RDF and XML were designed for putting data Web, and are good formats to use. RDF in particular allows you to link data together, to really exploit the potential of the Web's linking architecture. There are lots of ways of publishing RDF data and for turning data from databases or other formats, like CSV or Excel, into RDF. XML is an excellent format for information interchange and also for some types of data. If you have an existing Web site, and would like to make your XHTML information (say a table of figures) available as re-useable data, RDFa and GRDDL are two good approaches to use. While not a W3C standard, JSON is also a popular format amongst data mashers and you may wish to use that as well.

@@establish a preference for (open) standards (including complete toolset), interoperability, flexibility, expressiveness (e.g. ability to add metadata), addressability@@

@@also important to consider what suits the data@@

@@highlight RDFa, not JSON --Oscar, Rinke@@

@@explore pros and cons and suitable cases for Atom, POX, GeoRSS, KML, RDFa/microformats, POSH, HTML data tables, CSV, JSON, RDF, more@@

@@if building a new application, consider using one of these standard formats or tools as the primary datasource or query engine respectively@@

Social Issues

@@talk here about licensing, provenance, other similar issues? At least we should say the license should be crystal clear not tu make reusers worry and also machine processable -- Jose@@

We're all learning

There is lots of learning going on at the moment about how best governments can put data on the Web. The W3C technologies and approaches described here can be implemented quickly and often at relatively little cost. If this is all new to you, there are a number of communities that can help. If you are a government department looking for support with this, think about joining W3C and becoming involved in the e-Government activity. There you will find people from other governments trying to similar things, sharing their experiences together. Even if you don't join W3C, please let us know how you are getting on. We can help.