RE: Technical issues impacting government use of linked data from Cory Casanave on 2010-03-31 (public-egov-ig@w3.org from March 2010)

From: Cory Casanave <cory-c@modeldriven.com>
Date: Tue, 30 Mar 2010 23:16:36 -0400
To: "Thomas Bandholtz" <thomas.bandholtz@innoq.com>
Cc: <public-egov-ig@w3.org>
Message-ID: <4F65F8D37DEBFC459F5A7228E5052044A03B98@DATCENTRALSRV.datcentral.local>
Thomas,
Part of my point is that we have no accepted standard way to do this,
you are proposing a way - a good start.  But, until we all know the same
way we don't have interop.
As I understand your proposal, you are suggesting that each graph have a
triple referencing metadata about that graph.  So if I have the URI:
http://stuff.modeldriven.org/rdf/people#cory I would know that the graph
located at http://stuff.modeldriven.org/rdf/people contained a triple
with the subject http://stuff.modeldriven.org/rdf/people, with a
specific and accepted predicate.  I don't know that this is accepted
practice, so we don't have interoperability today.

I can see 3 issues with this proposal:
1) If this metadata is to give me the location of the query point for
this graph so that I don't have to get the entire graph, how will I get
this special triple without knowing that query point or downloading the
entire graph?
2) best practice seems to be to separate the data and the metadata.  By
embedding the metadata link in the data we may overly couple the data
with only one context and configuration - I would have to think more
about that.
3) This may work for a single graph, but we are interested in complex
configurations of graphs.  We don't have a way to represent and query
such a configuration.  It is not acceptable to expect that the query
writer will "know" all the graphs that need be assembled for a given
purpose - the query should be against such a configuration.  Sich a
configuration may have reference to many physical graphs and may
associate logical URIs with physical URLs.  Whatever mechanism we come
up with should allow for such configurations.
Sorry for the late reply!
Regards,
Cory Casanave

-----Original Message-----
From: Thomas Bandholtz [mailto:thomas.bandholtz@innoq.com] 
Sent: Sunday, March 14, 2010 10:50 AM
To: Cory Casanave
Cc: public-egov-ig@w3.org
Subject: Re: Technical issues impacting government use of linked data

For metadata about Linked Data we have the "Vocabulary of Interlinked
Datasets (voiD)", see http://rdfs.org/ns/void-guide. In voiD you can
describe the SPARQL endpoint (issue 2) of a dataset and give links to
interlinked (associated) datasets (at least this about issue 4), and
there are some hooks for linking provenance (issue 3) statements.

voiD considers "Discovery via links in the dataset's documents" (issue
1) using back-links as dcterms:isPartOf from one "document" (i.e. data
item) to the dataset:

    <http://dbpedia.org/data/Berlin> dcterms:isPartOf :DBpedia .

    :DBpedia a void:Dataset ;
             dcterms:title "DBPedia" ;
             dcterms:description "RDF data extracted from Wikipedia" ;
             foaf:homepage <http://dbpedia.org/> ;
             void:exampleResource <http://dbpedia.org/resource/Berlin> ;

I would prefer rdfs:isDefinedBy instead which "is used to indicate a
resource defining the subject resource. This property may be used to
indicate an RDF vocabulary in which a resource is described."
http://www.w3.org/TR/rdf-schema/#ch_isdefinedby

In SKOS we have skos:inScheme with a similar meaning.

This gives patterns for issue 1-3 at least.

Issue 4 is something in ongoing discussion about "RDF federation", or
"SPARQL Federation", which requires a solution from the server side.
Sandro has worked on this.

Anyway, starting with one dataset's voiD, you can lookup the SPARQL
endpoints of the interlinked datasets in their respective voiD metadata
and query all those. However, each dataset may have a different RDF
schema, so you might be restricted to searching the rdfs:label
assertions as the only common query ;-)
This is not so easy.
But how would you solve this with relational databases and Web Service
interfaces? Absolutely no chance!

Olaf Hartig and Juan Sequeda are currently working on "SQUIN - Query the
Web of Linked Data"
"This service executes queries over the whole Web of Linked Data and,
hence, enables applications to access the whole Web as if it is a single
giant database."
http://squin.sourceforge.net/

Have a nice weekend!
Thomas

Cory Casanave schrieb:
> On the demo call today we discussed a couple of technical issues that
> impact but are not specific to government.  These are:
> 
> 1)       That given a data URI, there is no standard way to
> programmatically access the metadata about the resource.
> 
> 2)       That given a data URI there is no standard programmatic way
to
> access a SPARQL query point for that resource and/or for associated
> resources.
> 
> 3)       That the metadata accessed should have standard links for
> provenance - even very simple provenance that does not require
research
> 
> 4)       How do we contextualize a query such that all data resources
of
> interest within a certain context are included in a query, without the
> user having to know all the details of the data sets involved?
> 
>  
> 
> All of the above could be accomplished with URI conventions and
> supporting ontologies.   My question is: What are the existing or
> proposed conventions and ontologies to satisfy these requirements?
>  Should the eGov group provide or reference such conventions for use
by
> the government and/or within our government demos?
> 
>  
> 
> Regards,
> 
> Cory Casanave
> 

-- 
Thomas Bandholtz, thomas.bandholtz@innoq.com, http://www.innoq.com
innoQ Deutschland GmbH, Halskestr. 17, D-40880 Ratingen, Germany
Phone: +49 228 9288490 Mobile: +49 178 4049387 Fax: +49 228 9288491
Received on Wednesday, 31 March 2010 03:17:09 UTC