Linked Data Quick Start

(note –this page is in progress) Once you have decided to try out linked data, what next? How do you get started? What would be a good project to start on?

The first question is: Is your first priority publishing data you have or utilizing someone else’s data? A common scenario is publishing some new data and linking it with data from other sources.


There are multiple resources for publishing existing data as linked data. The advantage of doing this is that it open up the data to the web – data that may be in internal files, data bases or repositories can be made available on the web as RDF. By doing this we can use common tools and techniques to use and integrate this data. There are some basic questions that need to be asked:

What data do I want to publish? Do I want to publish all or part of it?

You probably have gigabytes of data, what should be published? The “change in mindset” for open government is that we transition from a “need to know” to a “need to restrict” mindset. If you are a public agency then the public paid for all the data you have, why would you not publish it? On the other hand, data that would violate privacy, security or confidentiality polices can’t be published. To get started find the least controversial and most useful data you can publish, get something out to get started! If your agency is already responsible for some public data this should be easy, get it exposed as RDF. Of course choosing the right data is a business decision, lets assume you have selected a data set to publish. Note that you don’t have to publish the entire data set, you can select certain data elements to publish.

Privacy & Security – can I expose this data?

Of course care must be taken not to expose data that would violate privacy or security policies. Exposing linked data is no different that publishing data in any other format – if you could put it on a web page you should be able to expose the data as linked data, the same policies should apply. In some cases personal identity information needs to be concealed. This is either done by summarizing personal information so that individuals are not identified at all or by masking the true identity of a person. Note that there are issues with identity masking in that some software can combine data sets and “unmask” the data. If you need to mask data you probably should get an expert in the field. For your first project, we will assume that the data you are publishing is not private.

Is this data open to everyone or is some security required?

Like any web resource, linked data can be fully open or protected. The same techniques and technologies that are used for protecting web pages can be used to protect linked data. Likewise, the same polices for exposing data on web pages applies to linked data. Consider the security requirements for your linked data. Most of the early projects have provided fully open data, usually data that the public has paid for and has access to in some other way. It is, of course, simpler to make something fully open and that is probably best for your first projects. The next simplest is linked data resources behind a firewall or on a private network – this works exactly like the open data since it is the network that is restricting access. For password protection and restricted access you will need a RDF Repository that supports secure access. Needed: Link to more detailed article about linked data security.

What vocabularies should I use, are there existing RDF Vocabularies for this purpose or will I be creating a new one?

