DRAFT, in development. See the the slidy (slides) version.
|Title||Mission Possible: Deploying Government Linked Data (Pt3)|
|Author||Sandro Hawke, (email@example.com), W3C/MIT, @sandhawke|
John L. Sheridan, @johnlsheridan
|Event||gov 2.0 expo, May 25-26, 2010, Washington DC|
Publishing Triples on the Web
- The Mechanics of Publication
- Various Platforms
- Data Changes
- The Politics of Publication
- Aligning Governance
- Continuity Policies
- Maintaining Provenance
Patterns for Publishing
- Harvestable RDF
- RDFa embedded in web pages
- XHTML or XML and GRDDL
- provide XSLT stylesheet that translates XML to RDF/XML
- RDF formats generated from underlying database
- Queryable RDF
- store RDF within triplestore
- provide SPARQL endpoint
- layer user-friendly APIs on top of endpoint
Mechanics of Publication
What do you need to publish your triples?
- Static Documents
- Web Platforms
- Custom Code
Generate by hand, or output from existing systems.
Maybe built into MySQL, Oracle, ...
jena rdflib redland swipl
Linked Data API
- Easy-to-use APIs built on linked data
- queryable through URI parameters
- return simple JSON or XML
- /doc/school => list of schools
- /doc/school?_page=2 => second page of schools
- /doc/school?constituency.code=142 => list of schools in Dulwich and West Norwood
- /doc/school/consitituency/142 => list of schools in Dulwich and West Norwood
- /doc/school/consitituency/142?min-highAge=7&max-lowAge=7 => list of schools in Dulwich and West Norwood that accept seven year olds
- Easy to implement (existing implementations in PHP, Java)
- API 'meta' tells you the SPARQL generated
This is an API. Every change affects someone.
Design for change.
The World Changes
A set of triples should be true for some time range
Suggestion: use dc:temporal to declare that time range.
One URL for archival copy:
Another URL for "latest":
- which will be the same as schools_2010_05 for a few more days
This is good practice for many kinds of web pages.
Link among the versions.
Similar archive/latest mechanism, but different reasons.
"restated financial statements" for some time period.
Metadata can indicate the difference, causes.
Push and Pull Feeds
- enable efficient local mirroring
- news of changes
dcat Data Catalog Vocabulary
- metadata to catalog
- metadata from catalogs
Politics of Publication
Tim Berners-Lee's five stars:
- Publish the data on the Web in any format (eg .pdf)
- Publish in a machine-readable format (eg .xls)
- Publish in a non-proprietary format (eg .csv)
- Publish as RDF Linked Data (eg .rdf)
- Establish useful links between resources
Maybe you're already at 2 or 3.
Jumping in at 5 might be easiest.
- Government data is usually created and governed by someone
- Try to use existing governance structures for Linked Data publishing
- Operates at different levels
- Who can have a .gov domain?
- How to mint URIs?
- Who should mint URIs?
- Which URIs should I use?
- What URIs are promoted for wider use within government?
Who will serve the URI if the agency changes names?
Who will serve the URI if the agency is shut down?
Redirections vs Content
Role of Archives Organizations
- Important for government data and a key part of responsible publishing
- Helps data consumers know what they are dealing with
- Operates at different levels
- Organisational level - who made this data, how and when?
- File level - what processing was done to make this file, when?
- Can be done simply (eg Dublin Core Terms) or with more sophistication (eg using OPMV specialisations)
Local Semantic Web Meetups
Participate in W3C eGov Interest Group
Email firstname.lastname@example.org subject "tutorial"