Providing data on the Web

It is important to make data (machine processable stuff) available on the web, just as it is useful to make human readable documents available. For documents, the lingua franca is HTML; for data it is RDF. If you want to learn about RDF, one way to start with the Notation3 tutorial.

Here are some basic rules to follow when making data available.

Give anything of importance a URI

The simplest way to do this is to write an RDF file, say http://example.com/products.rdf, and use rdf:id="hairdryer" within it to define something whose URI will then be http://example.com/products.rdf#hairdryer.

Don't get confused between URIs for things and for web pages. When a semantic web engine is finding out information about a http://example.com/products.rdf#hairdryer, a simple thing to do is to look up the RDF web page http://example.com/products.rdf.

(If you use a URI without a hash to identify a product for example, then you may not serve a simple RDF document at that URI: you have to set an HTTP 303 redirect from that address to some document, so that the document and the product are not confused. This process is not recommended for typical systems, and can make the actual querying of your data take longer.)

Looking up the URI should be useful

It is important that someone can identify something by just that URI. If there is data about a transaction about the hair dryer and you are its manufacturer, then a machine looking up http://example.com/products.rdf#hairdrier should get the public data about it, maybe things like its cost, description, availability and shipping weight.

Make links to other things

Maybe the hair dryer is available from several retail outlets. You can express this in data, to allow someone to quickly find the nearest open one. When you do, you will of course use URIs to identify the stores.

Suppose you have a separate page of data about each of the stores you deal with. The fact that the product list uses a URI such as http://example.com/outlets/putney.rdf#store invites anyone or anything viewing the data to look up http://example.com/outlets/putney.rdf on the web. You have linked the data.

And don't forget yourself and your company. The fact that your company is the manufacturer of the product is important. So have a URI for the company and make sure that it returns basic contact information, and so on.

Link both ways

It is just as likely that someone will be looking ant your company and wondering what products it has as that they are looking at the product and needing to know who makes it.

Within your site, each thing, person or concept and so on will have a URI which is in a page of RDF. When an RDF statement relates two things on different pages, it is good to put it on both pages. This allows a person -- or a query processor -- to find out about one page from the other.

When a statement links something on your site to something on another site you do not control, then it is still good to have links at both sites, but this requires negotiation just as with any web site. You will have to persuade the other site to use your URIs for the things your site publishes information about. Make it worth their while by keeping the information rich and up to date.

State when two things are the same

You may have a database which hives a URI to each of your retail stores. Then you may find that actually some of them have RDF data available too, and they have their own URIs for the stores. That's OK. There is no central coordination on the semantic web which says everyone hs to use the same URI for the same thing. Sure, it helps when it happens. But if you have been using a different one, you should continue to serve data at that address, but you can include a statement that it is the same thing as that identified by the external URI.

This done using the owl:sameAs property to connect the two URIs. (In Notation3, it can be written '='.).

Make virtual data

I have described the process of putting data on the Web as though you are writing RDF files by hand. In fact, for all but the smallest application, your site will be database backed, and driven by relational database and/or XML content.

You can use PHP scripts, or whatever scripting technology you use to general human readable HTML pages to make a virtual web of RDF files, which are each generated only on request over the net.

Publish legacy formats

Some applications have well-defined formats for their data which are not RDF, but are standards, or well-known proprietary formats. For many of these you may find existing converters to RDF. iCalendar files, photo metadata, and so on, can be converted to RDF using scripts written by others.

In some cases you might find a new format and if you can write a converter for it, and contribute yourself to the RDF-trading community.

Which ontology to use?

The data you publish will use terms from many vocabularies (ontologies) all mixed up. This is how the semantic web works.

Talk to others about what they are using, or browse the web.

Ideally, find an existing well establishes ontology which others have developed. Reuse common ontologies for common terms.
Where you need terms which are not in any ontology you can find or want to use, then just make up your own. Write a little RDF schema (from RDFS and/or OWL) and publish it at the namespace URI
If you find others want to use your ontology or one like it, you may get involved in discussions about developing and sharing it, and maybe making a standard for others to use easily and be interoperable in the future. In this case, you may want to ask w3.org to host the ontology so that is somewhere where it can be left for posterity.

Try it out.

There are lots of ways of checking that the data on your site makes sense. One is to explore it by hand using a tool such as the tabulator. Another, if you are using OWL, is to check it for OWL consistency using a tool such as @@pellet.

( Up to ... )