Warning:
This wiki has been archived and is now read-only.

Tutorial/Publishing Triples

From W3C eGovernment Wiki
Jump to: navigation, search

DRAFT, in development. See the the slidy (slides) version.

Title Mission Possible: Deploying Government Linked Data (Pt3)
Author Sandro Hawke, (sandro@w3.org), W3C/MIT, @sandhawke
John L. Sheridan, @johnlsheridan
Event gov 2.0 expo, May 25-26, 2010, Washington DC

Part 3

Publishing Triples on the Web

  1. The Mechanics of Publication
    • Various Platforms
    • Data Changes
    • Catalogs
  2. The Politics of Publication
    • Aligning Governance
    • Continuity Policies
    • Maintaining Provenance

Patterns for Publishing

  • Harvestable RDF
    • RDFa embedded in web pages
    • XHTML or XML and GRDDL
      • provide XSLT stylesheet that translates XML to RDF/XML
    • RDF formats generated from underlying database
  • Queryable RDF
    • store RDF within triplestore
    • provide SPARQL endpoint
    • layer user-friendly APIs on top of endpoint

Mechanics of Publication

What do you need to publish your triples?

  • Hardware
  • Software
  • Expertise

?

Platforms

  • Static Documents
  • Web Platforms
  • SQL-Based
  • Triplestores
  • Custom Code

Static Documents

http://www.w3.org/TR/swbp-vocab-pub/

Generate by hand, or output from existing systems.

Web Platforms

Drupal 6, Drupal 7

Semantic MediaWiki

Some RDFa is easy.

SQL-Based

D2R Server

Maybe built into MySQL, Oracle, ...

RDB2RDF Working Group

Triplestores

http://www.w3.org/2001/sw/wiki/Category:Triple_Store

Custom Servers

jena rdflib redland swipl

http://www.w3.org/2001/sw/wiki/Category:Programming_Environment

Linked Data API

  • Easy-to-use APIs built on linked data
    • queryable through URI parameters
    • return simple JSON or XML

For example:

  • /doc/school => list of schools
  • /doc/school?_page=2 => second page of schools
  • /doc/school?constituency.code=142 => list of schools in Dulwich and West Norwood
  • /doc/school/consitituency/142 => list of schools in Dulwich and West Norwood
  • /doc/school/consitituency/142?min-highAge=7&max-lowAge=7 => list of schools in Dulwich and West Norwood that accept seven year olds

Note:

  • Easy to implement (existing implementations in PHP, Java)
  • API 'meta' tells you the SPARQL generated

See project slides

Data Changes

This is an API. Every change affects someone.

Design for change.

The World Changes

A set of triples should be true for some time range

Suggestion: use dc:temporal to declare that time range.

One URL for archival copy:

  • schools_2010_01
  • schools_2010_02
  • schools_2010_03
  • ...

Another URL for "latest":

  • schools_latest
    • which will be the same as schools_2010_05 for a few more days

This is good practice for many kinds of web pages.

Link among the versions.

Corrections

Similar archive/latest mechanism, but different reasons.

"restated financial statements" for some time period.

Metadata can indicate the difference, causes.

Push and Pull Feeds

Dataset Dynamics

  • enable efficient local mirroring
  • news of changes

Catalogs

dcat Data Catalog Vocabulary

    • metadata to catalog
    • metadata from catalogs

Politics of Publication

Tim Berners-Lee's five stars:

  1. Publish the data on the Web in any format (eg .pdf)
  2. Publish in a machine-readable format (eg .xls)
  3. Publish in a non-proprietary format (eg .csv)
  4. Publish as RDF Linked Data (eg .rdf)
  5. Establish useful links between resources

Maybe you're already at 2 or 3.

Jumping in at 5 might be easiest.

Aligning Governance

  • Government data is usually created and governed by someone
  • Try to use existing governance structures for Linked Data publishing
  • Operates at different levels
    • Who can have a .gov domain?
    • How to mint URIs?
    • Who should mint URIs?
    • Which URIs should I use?
    • What URIs are promoted for wider use within government?

Continuity Policies

Who will serve the URI if the agency changes names?

Who will serve the URI if the agency is shut down?

Redirections vs Content

Role of Archives Organizations

Maintaining Provenance

  • Important for government data and a key part of responsible publishing
  • Helps data consumers know what they are dealing with
  • Operates at different levels
    • Organisational level - who made this data, how and when?
    • File level - what processing was done to make this file, when?
  • Can be done simply (eg Dublin Core Terms) or with more sophistication (eg using OPMV specialisations)

Next Steps

Local Semantic Web Meetups

Participate in W3C eGov Interest Group

Email sandro@w3.org subject "tutorial"