Warning:
This wiki has been archived and is now read-only.
Tutorial/Linked Data
DRAFT, in development. See the the slidy (slides) version.
Title | Mission Possible: Deploying Government Linked Data (Pt1) |
---|---|
Author | Sandro Hawke, (sandro@w3.org), W3C/MIT, @sandhawke John L. Sheridan, @johnlsheridan |
Event | gov 2.0 expo, May 25-26, 2010, Washington DC |
Contents
- 1 Mission Possible
- 2 Part 1: Fundamentals
- 3 About Us
- 4 About You (Just Curious)
- 5 Context
- 6 Download and Programmatic Access
- 7 About the "Semantic Web"
- 8 What Is Linked Data?
- 9 Benefits of Linked Data
- 10 Why does Linked Data make sense for government?
- 11 RDF Triples
- 12 Quick Demo
- 13 So How Do You Add Your Data?
- 14 URIs (A Little Web Architecture)
- 15 Information Resource
- 16 Resource
- 17 URLs identify Information Resources
- 18 URIs identify Resources
- 19 Hash and Slash
- 20 Hash vs Slash
- 21 Review
- 22 Publishing Data
- 23 RDF/XML Example
- 24 Turtle Prefixes
- 25 Turtle Example
- 26 RDFa Example
- 27 One Possible RDF-JSON Example
- 28 One Possible GRDDL Example
- 29 SPARQL
- 30 Content Negotiation
- 31 Challenges
Mission Possible
Part 1:
- What, exactly, is Linked Data?
- The big picture and today's building blocks
Part 2:
- Viewing Your Data as Triples (slides)
- Working with RDF vocabularies; communicating via triples
Part 3:
- Publishing Triples on the Web (slides)
- The mechanics and politics of actually making the data available
please send comments and questions to sandro@w3.org subject 'tutorial'
Part 1: Fundamentals
- Context and Motivation
- What is a URI?
- What is an RDF Triple?
- Data dormats for RDF Triples
About Us
- @sandhawke
- programmer (C, C++, Java, Perl, Prolog, now mostly Python)
- at W3C since 2000, doing RDF, OWL, RIF, SPARQL, Govt
- W3C: consensus standards, founded by TimBL 1994
- about 60 Working Groups doing HTML(5), CSS, SVG, XML, Accessibility
- @johnlsheridan
- Civil Servant since 2004
- lead on Linked Data for data.gov.uk
- co-chair of W3C e-Government IG
About You (Just Curious)
- Do you work for the government (Federal, State, City, County) or for a supplier?
- Can you read/write HTML?
- Can you read/write some data format:
- JSON, XML, SQL?
- Can you program in some language:
- C, C++, Java?
- Perl, Python, Ruby?
- Javascript?
- XSLT?
- Do you know how HTTP works:
- Response Codes (eg 403 Forbidden)?
- Content Types and Content Negotiation?
- RESTful APIs?
- Formal math, logic, logic programming?
- The difference between "domain" and "range"?
- The Unique Names Assumption? Negation-as-Failure?
Context
- Growing demand for open government data
- US, UK, Australia, New Zealand, Netherlands, Denmark, Sweden, ...
- Astrurias and Catalan Regional Governments, London, Vancouver, ...
- Many motivations
- transparency and engagement
- holding government accountable and promoting choice by informing citizens
- efficiency and enhanced public services
- enabling re-use of information within the public sector
- innovation and economic growth
- encouraging and supporting data-based innovation
- transparency and engagement
Download and Programmatic Access
- Downloadable datasets
- Excel, CSV, XML
- global data
- one-off visualisations
- static data
- Programmatic access
- JSON / XML APIs
- local data
- on-demand visualisations
- changing data
About the "Semantic Web"
- The Web should be more than just documents for people to read
- Allow machines to traverse, aggregate, analyze, answer
- TimBL's vision, but it's a big tent
- "Semantics", "Ontologies" (Research Funding)
- Web Architecture, REST
See Kate Ray's Web 3.0 Video (esp. until 3:37 or 6:50)
Linked Data has a narrower goal; uses some of the same technologies.
What Is Linked Data?
Extending spreadsheets and databases to work over the Web.
- Give web identifiers (URIs) to things
- Publish information about them as Web Resources (good website architecture)
- Use Triples (subject, property, value)
Subject | Property | Value |
---|---|---|
http://dbpedia.org/resource/Massachusetts | http://dbpedia.org/resource/nickname | "Bay State" |
Benefits of Linked Data
- Enables web-scale data publishing
- distributed publication with web-based discovery mechanisms
- Everything is a resource
- discover more about properties, classes, codes within a code list (we'll explain more)
- Everything can be annotated
- make comments about observations, data series, points on a map (we'll explain more)
- Easy to extend
- create new properties as required, no need to plan everything up-front (we'll explain more)
- Easy to merge
- slot together RDF graphs, no need to worry about name clashes (we'll explain more)
Why does Linked Data make sense for government?
- Responsible Publishing of data (we'll explain why)
- Combine different data about the same things, although it is held by different parts and levels of government
- Can make it easier for people to consume your data (we'll explain how)
- Can help solve some snags other approaches miss
RDF Triples
Quick Demo
Subject | Property | Value |
---|---|---|
http://dbpedia.org/resource/Massachusetts | nickname | "Bay State" |
- sameAs.org says:
- http://airports.dataincubator.org/regions/US-MA
- http://data.nytimes.com/N77561588666860073361
- http://mpii.de/yago/resource/Massachusetts
- http://rdf.freebase.com/ns/guid.9202a8c04000641f8000000000589f0a
- http://sw.opencyc.org/2008/06/10/concept/en/Massachusetts_State
- http://umbel.org/umbel/sc/Massachusetts_State
- explore with a Linked Data Browser
- ... and try to imagine you're a machine.
So How Do You Add Your Data?
- Think in Subject-Property-Value Triples
- Use URIs
- Publish on the Web
URIs (A Little Web Architecture)
URIs are like URLs, with a few extra tricks.
Long history, "Web Architecture", lots of debate.
Here it is, put simply.
Information Resource
- Anything whose current state can be entirely represented in bytes.
- This is what we see on the Web, including:
- documents (maintained, or frozen)
- video and audio recordings
- photographs, drawings
- databases (product catalog)
Resource
- Anything at all. Anything anyone can conceptualize.
- Includes Information Resource, of course
- Also includes:
- specific people, cities, countries
- my dog, the set of all dogs, the set of all animals
- ... even Unicorns and Dragons
URLs identify Information Resources
- URLs are "Web Addresses"
- Such as http://www.w3.org/People/Sandro or http://whitehouse.gov
- There are standard Internet protocols (RFC 2616):
- given a URL, get the bytes which convey the current state of that information resource
- every browser (web client) does this
URIs identify Resources
Any resource. Using filenames for things that aren't files.
- They often look just like URLs (many are URLs)
- But they can behave differently
- With some, there is no protocol for GET
- If it's a URL, GET works
- For others, the URI has an associated URL
- Sometimes IRI when it can contain non-ascii characters, like http://www.example.org/rosé .
Hash and Slash
Two kinds of indirection:
- Hash URIs contain a hash ("#") character: http://vocab.deri.ie/dcat#granularity
- chop off the hash and everything after it
- do your GET on what's left
- see what the result says about full, original URI
- Sometimes it turns out to be a "fragment" URL
- Hopefully it's not both (but it happens sometimes)
- Slash URIs don't contain a hash: http://purl.org/dc/elements/1.1/creator
- try to do a GET
- you might just get contents (so it was a URL)
- you might get a "303 See Other" redirecting you elsewhere
- Hopefully the new place says something useful using the original URI.
- try to GET http://dbpedia.org/resource/Massachusetts
- redirect ("303 See Other") to http://dbpedia.org/page/Massachusetts
- which has data about http://dbpedia.org/resource/Massachusetts
Hash vs Slash
Hash URIs:
- easier to construct
- more efficient on a small scale
- often used for small, controlled situations
Slash URIs:
- more control over user experience
- better scaling
You'll see both.
When publishing, your software may choose for you.
Review
- Use URIs to identify things
- Think in Triples
- Publish on the web
Publishing Data
- There's lots of choice!
- Don't be overwhelmed, it means there's at least one method that will work well in your situation.
Publication Method | Advantages | Disadvantages |
---|---|---|
RDF/XML Document | Oldest, best supported | Confusingly like normal XML |
Turtle (N3) Document | Simplest | Not technically a standard yet |
HTML Document with RDFa | Fits inside HTML attributes | Can get very complicated |
JSON | Normal JSON, but also RDF | Promising, but still being developed |
GRDDL | Use the XML you have/want | Needs to download+run XSLT |
SPARQL | Query Protocol | Query Protocol |
RDF/XML Example
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:db="http://dbpedia.org/resource/"> <rdf:Description rdf:about="http://dbpedia.org/resource/Massachusetts"> <db:Governor> <rdf:Description rdf:about="http://dbpedia.org/resource/Deval_Patrick" /> </db:Governor> <db:Nickname>Bay State</db:Nickname> <db:Capital> <rdf:Description rdf:about="http://dbpedia.org/resource/Boston"> <db:Nickname>Beantown</db:Nickname> </rdf:Description> </db:Capital> </rdf:Description> </rdf:RDF>
Turtle Prefixes
First triple:
<http://dbpedia.org/resource/Massachusetts> <http://dbpedia.org/resource/Governor> <http://dbpedia.org/resource/Deval_Patrick> .
Abbreviate it:
@prefix db: <http://dbpedia.org/resource/> db:Massachusetts db:Governor db:Deval_Patrick.
- Read the same by turtle parsers
Turtle Example
@prefix db: <http://dbpedia.org/resource/> db:Massachusetts db:Governor db:Deval_Patrick; db:Nickname "Bay State"; db:Capital db:Boston. db:Boston db:Nickname "Beantown".
RDFa Example
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN" "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xmlns:db="http://dbpedia.org/resource/" version="XHTML+RDFa 1.0"> <head> <title>About Massachusetts</title> </head> <body> <div about="http://dbpedia.org/resource/Massachusetts">The Massachusetts governor is <span rel="db:Governor"> <span about="http://dbpedia.org/resource/Deval_Patrick">Deval Patrick </span>, </span> the nickname is "<span property="db:Nickname">Bay State</span>", and the capital <span rel="db:Capital"> <span about="http://dbpedia.org/resource/Boston"> has the nickname "<span property="db:Nickname">Beantown</span>". </span> </span> </div> </body> </html>
One Possible RDF-JSON Example
{ "__iri": "db:Massachusetts", "db:Nickname": "Bay State", "db:Governor": { "__iri": "db:Deval_Patrick" }, "db:Capital": { "__iri": "db:Boston", "db:Nickname": "Beantown" }, "__prefixes": { "db:": "http://dbpedia.org/resource/" } }
One Possible GRDDL Example
<MyDataSet xmlns="http://example.org/my-data-xml-namespace"> <State> <name>Massachusetts</name> <governor>Deval_Patrick</governor> <nickname>Bay State</nickname> <capital> <name>Boston</name> <nickname>Beantown</nickname> </capital> </State> </MyDataSet>
All the hard work is done by an XSLT program downloaded via the XML namespace URL. (Not implemented for this demo, sorry.)
SPARQL
- A query language, somewhat like SQL
prefix db: <http://dbpedia.org/resource/> prefix dbo: <http://dbpedia.org/ontology/> SELECT ?dnym WHERE { db:Massachusetts dbo:demonym ?dnym }
prefix db: <http://dbpedia.org/resource/> prefix dbo: <http://dbpedia.org/ontology/> SELECT ?cap WHERE { db:Massachusetts dbo:capital ?cap }
dbpedia sparql service and sparql tutorial
Content Negotiation
How do you manage all these options?
- Information Resources can have multiple Representations
- When you GET, you can say which type you want (HTML or XML say)
- HTTP Server returns an appropriate representation
Try:
curl -L --header "Accept: application/rdf+xml" http://vocab.deri.ie/dcat
curl -L --header "Accept: text/turtle" http://vocab.deri.ie/dcat
curl -L --header "Accept: text/html" http://vocab.deri.ie/dcat
Challenges
- You will run into some issues:
- Long term critics of the Semantic Web
- Data consumers who don't want RDF
- Suppliers trying to sell a different technology
- Gov people who think you're trying to spoil years on work on their XML Schemas
- Listen and re-assure - Linked Data can help all these people too!