Warning:
    This wiki has been archived and is now read-only.
Tutorial/Linked Data
DRAFT, in development. See the the slidy (slides) version.
| Title | Mission Possible: Deploying Government Linked Data (Pt1) | 
|---|---|
| Author | Sandro Hawke, (sandro@w3.org), W3C/MIT, @sandhawke John L. Sheridan, @johnlsheridan | 
| Event | gov 2.0 expo, May 25-26, 2010, Washington DC | 
Contents
- 1 Mission Possible
- 2 Part 1: Fundamentals
- 3 About Us
- 4 About You (Just Curious)
- 5 Context
- 6 Download and Programmatic Access
- 7 About the "Semantic Web"
- 8 What Is Linked Data?
- 9 Benefits of Linked Data
- 10 Why does Linked Data make sense for government?
- 11 RDF Triples
- 12 Quick Demo
- 13 So How Do You Add Your Data?
- 14 URIs (A Little Web Architecture)
- 15 Information Resource
- 16 Resource
- 17 URLs identify Information Resources
- 18 URIs identify Resources
- 19 Hash and Slash
- 20 Hash vs Slash
- 21 Review
- 22 Publishing Data
- 23 RDF/XML Example
- 24 Turtle Prefixes
- 25 Turtle Example
- 26 RDFa Example
- 27 One Possible RDF-JSON Example
- 28 One Possible GRDDL Example
- 29 SPARQL
- 30 Content Negotiation
- 31 Challenges
Mission Possible
Part 1:
-  What, exactly, is Linked Data?
- The big picture and today's building blocks
 
Part 2:
-  Viewing Your Data as Triples (slides)
- Working with RDF vocabularies; communicating via triples
 
Part 3:
-  Publishing Triples on the Web (slides)
- The mechanics and politics of actually making the data available
 
please send comments and questions to sandro@w3.org subject 'tutorial'
Part 1: Fundamentals
- Context and Motivation
- What is a URI?
- What is an RDF Triple?
- Data dormats for RDF Triples
About Us
-  @sandhawke
- programmer (C, C++, Java, Perl, Prolog, now mostly Python)
- at W3C since 2000, doing RDF, OWL, RIF, SPARQL, Govt
-  W3C: consensus standards, founded by TimBL 1994
- about 60 Working Groups doing HTML(5), CSS, SVG, XML, Accessibility
 
 
-  @johnlsheridan
- Civil Servant since 2004
- lead on Linked Data for data.gov.uk
- co-chair of W3C e-Government IG
 
About You (Just Curious)
- Do you work for the government (Federal, State, City, County) or for a supplier?
- Can you read/write HTML?
-  Can you read/write some data format:
- JSON, XML, SQL?
 
-  Can you program in some language:
- C, C++, Java?
- Perl, Python, Ruby?
- Javascript?
- XSLT?
 
-  Do you know how HTTP works:
- Response Codes (eg 403 Forbidden)?
- Content Types and Content Negotiation?
- RESTful APIs?
 
-  Formal math, logic, logic programming?
- The difference between "domain" and "range"?
- The Unique Names Assumption? Negation-as-Failure?
 
Context
-  Growing demand for open government data
- US, UK, Australia, New Zealand, Netherlands, Denmark, Sweden, ...
- Astrurias and Catalan Regional Governments, London, Vancouver, ...
 
-  Many motivations
-  transparency and engagement
- holding government accountable and promoting choice by informing citizens
 
-  efficiency and enhanced public services
- enabling re-use of information within the public sector
 
-  innovation and economic growth
- encouraging and supporting data-based innovation
 
 
-  transparency and engagement
Download and Programmatic Access
-  Downloadable datasets
- Excel, CSV, XML
- global data
- one-off visualisations
- static data
 
-  Programmatic access
- JSON / XML APIs
- local data
- on-demand visualisations
- changing data
 
About the "Semantic Web"
 
- The Web should be more than just documents for people to read
- Allow machines to traverse, aggregate, analyze, answer
-  TimBL's vision, but it's a big tent
- "Semantics", "Ontologies" (Research Funding)
- Web Architecture, REST
 
See Kate Ray's Web 3.0 Video (esp. until 3:37 or 6:50)
Linked Data has a narrower goal; uses some of the same technologies.
What Is Linked Data?
Extending spreadsheets and databases to work over the Web.
- Give web identifiers (URIs) to things
- Publish information about them as Web Resources (good website architecture)
- Use Triples (subject, property, value)
 
| Subject | Property | Value | 
|---|---|---|
| http://dbpedia.org/resource/Massachusetts | http://dbpedia.org/resource/nickname | "Bay State" | 
Benefits of Linked Data
-  Enables web-scale data publishing
- distributed publication with web-based discovery mechanisms
 
-  Everything is a resource
- discover more about properties, classes, codes within a code list (we'll explain more)
 
-  Everything can be annotated
- make comments about observations, data series, points on a map (we'll explain more)
 
-  Easy to extend
- create new properties as required, no need to plan everything up-front (we'll explain more)
 
-  Easy to merge
- slot together RDF graphs, no need to worry about name clashes (we'll explain more)
 
Why does Linked Data make sense for government?
- Responsible Publishing of data (we'll explain why)
- Combine different data about the same things, although it is held by different parts and levels of government
- Can make it easier for people to consume your data (we'll explain how)
- Can help solve some snags other approaches miss
RDF Triples
 
Quick Demo
| Subject | Property | Value | 
|---|---|---|
| http://dbpedia.org/resource/Massachusetts | nickname | "Bay State" | 
-  sameAs.org says:
- http://airports.dataincubator.org/regions/US-MA
- http://data.nytimes.com/N77561588666860073361
- http://mpii.de/yago/resource/Massachusetts
- http://rdf.freebase.com/ns/guid.9202a8c04000641f8000000000589f0a
- http://sw.opencyc.org/2008/06/10/concept/en/Massachusetts_State
- http://umbel.org/umbel/sc/Massachusetts_State
 
- explore with a Linked Data Browser
- ... and try to imagine you're a machine.
So How Do You Add Your Data?
- Think in Subject-Property-Value Triples
- Use URIs
- Publish on the Web
URIs (A Little Web Architecture)
URIs are like URLs, with a few extra tricks.
Long history, "Web Architecture", lots of debate.
Here it is, put simply.
Information Resource
- Anything whose current state can be entirely represented in bytes.
-  This is what we see on the Web, including:
- documents (maintained, or frozen)
- video and audio recordings
- photographs, drawings
- databases (product catalog)
 
Resource
- Anything at all. Anything anyone can conceptualize.
- Includes Information Resource, of course
-  Also includes:
- specific people, cities, countries
- my dog, the set of all dogs, the set of all animals
- ... even Unicorns and Dragons
 
 
URLs identify Information Resources
- URLs are "Web Addresses"
- Such as http://www.w3.org/People/Sandro or http://whitehouse.gov
-  There are standard Internet protocols (RFC 2616):
- given a URL, get the bytes which convey the current state of that information resource
- every browser (web client) does this
 
URIs identify Resources
Any resource. Using filenames for things that aren't files.
- They often look just like URLs (many are URLs)
-  But they can behave differently
- With some, there is no protocol for GET
- If it's a URL, GET works
- For others, the URI has an associated URL
 
- Sometimes IRI when it can contain non-ascii characters, like http://www.example.org/rosé .
Hash and Slash
Two kinds of indirection:
-  Hash URIs contain a hash ("#") character: http://vocab.deri.ie/dcat#granularity
- chop off the hash and everything after it
- do your GET on what's left
- see what the result says about full, original URI
-  Sometimes it turns out to be a "fragment" URL
- Hopefully it's not both (but it happens sometimes)
 
 
-  Slash URIs don't contain a hash: http://purl.org/dc/elements/1.1/creator
- try to do a GET
- you might just get contents (so it was a URL)
- you might get a "303 See Other" redirecting you elsewhere
-  Hopefully the new place says something useful using the original URI.
- try to GET http://dbpedia.org/resource/Massachusetts
- redirect ("303 See Other") to http://dbpedia.org/page/Massachusetts
- which has data about http://dbpedia.org/resource/Massachusetts
 
 
Hash vs Slash
Hash URIs:
- easier to construct
- more efficient on a small scale
- often used for small, controlled situations
Slash URIs:
- more control over user experience
- better scaling
You'll see both.
When publishing, your software may choose for you.
Review
- Use URIs to identify things
- Think in Triples
- Publish on the web
Publishing Data
- There's lots of choice!
- Don't be overwhelmed, it means there's at least one method that will work well in your situation.
| Publication Method | Advantages | Disadvantages | 
|---|---|---|
| RDF/XML Document | Oldest, best supported | Confusingly like normal XML | 
| Turtle (N3) Document | Simplest | Not technically a standard yet | 
| HTML Document with RDFa | Fits inside HTML attributes | Can get very complicated | 
| JSON | Normal JSON, but also RDF | Promising, but still being developed | 
| GRDDL | Use the XML you have/want | Needs to download+run XSLT | 
| SPARQL | Query Protocol | Query Protocol | 
RDF/XML Example
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:db="http://dbpedia.org/resource/"> <rdf:Description rdf:about="http://dbpedia.org/resource/Massachusetts"> <db:Governor> <rdf:Description rdf:about="http://dbpedia.org/resource/Deval_Patrick" /> </db:Governor> <db:Nickname>Bay State</db:Nickname> <db:Capital> <rdf:Description rdf:about="http://dbpedia.org/resource/Boston"> <db:Nickname>Beantown</db:Nickname> </rdf:Description> </db:Capital> </rdf:Description> </rdf:RDF>
Turtle Prefixes
First triple:
<http://dbpedia.org/resource/Massachusetts> <http://dbpedia.org/resource/Governor> <http://dbpedia.org/resource/Deval_Patrick> .
Abbreviate it:
@prefix db: <http://dbpedia.org/resource/> db:Massachusetts db:Governor db:Deval_Patrick.
- Read the same by turtle parsers
Turtle Example
@prefix db: <http://dbpedia.org/resource/> db:Massachusetts db:Governor db:Deval_Patrick; db:Nickname "Bay State"; db:Capital db:Boston. db:Boston db:Nickname "Beantown".
RDFa Example
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN"
    "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"
      xmlns:db="http://dbpedia.org/resource/"
      version="XHTML+RDFa 1.0">
  <head>
    <title>About Massachusetts</title>
  </head>
  <body>
    <div about="http://dbpedia.org/resource/Massachusetts">The
    Massachusetts governor is
      <span rel="db:Governor">
	<span about="http://dbpedia.org/resource/Deval_Patrick">Deval
	Patrick
	</span>,
      </span>
      the nickname is "<span property="db:Nickname">Bay State</span>",
      and the capital
      <span rel="db:Capital">
	<span about="http://dbpedia.org/resource/Boston">
	  has the nickname "<span property="db:Nickname">Beantown</span>".
	</span>
      </span>
    </div>
  </body>
</html>
One Possible RDF-JSON Example
{ "__iri": "db:Massachusetts",
  "db:Nickname": "Bay State",
  "db:Governor": { "__iri": "db:Deval_Patrick"  },
  "db:Capital": {  "__iri": "db:Boston",
                   "db:Nickname": "Beantown"
                 },
  "__prefixes": { "db:": "http://dbpedia.org/resource/" }
}
One Possible GRDDL Example
<MyDataSet xmlns="http://example.org/my-data-xml-namespace"> <State> <name>Massachusetts</name> <governor>Deval_Patrick</governor> <nickname>Bay State</nickname> <capital> <name>Boston</name> <nickname>Beantown</nickname> </capital> </State> </MyDataSet>
All the hard work is done by an XSLT program downloaded via the XML namespace URL. (Not implemented for this demo, sorry.)
SPARQL
- A query language, somewhat like SQL
prefix db: <http://dbpedia.org/resource/> prefix dbo: <http://dbpedia.org/ontology/> SELECT ?dnym WHERE { db:Massachusetts dbo:demonym ?dnym }
prefix db: <http://dbpedia.org/resource/> prefix dbo: <http://dbpedia.org/ontology/> SELECT ?cap WHERE { db:Massachusetts dbo:capital ?cap }
dbpedia sparql service and sparql tutorial
Content Negotiation
How do you manage all these options?
- Information Resources can have multiple Representations
- When you GET, you can say which type you want (HTML or XML say)
- HTTP Server returns an appropriate representation
Try:
curl -L --header "Accept: application/rdf+xml" http://vocab.deri.ie/dcat
curl -L --header "Accept: text/turtle" http://vocab.deri.ie/dcat
curl -L --header "Accept: text/html" http://vocab.deri.ie/dcat
Challenges
-  You will run into some issues:
- Long term critics of the Semantic Web
- Data consumers who don't want RDF
- Suppliers trying to sell a different technology
- Gov people who think you're trying to spoil years on work on their XML Schemas
 
- Listen and re-assure - Linked Data can help all these people too!