DataIntegration

From SPARQL Working Group
Revision as of 12:21, 19 May 2009 by Apassant (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search


Data integration

Use-case 1: Storing Enterprise Data

EDF R&D deployed a set of Enterprise 2.0 tools (blogs, wikis, RSS feeds) that provide RDF export in order to enable interoperability between them. In order to provide advanced query / browsing interfaces for this data, it was more convenient to store it in a central RDF-store. To store the data, services send a ping to the store as soon as new data is created (e.g. creating a blog post automatically generated its SIOC export), that then immediately loads the files in the store:

LOAD <http://weblog.mycompany.org/blog/223/sioc.rdf>
LOAD <http://wiki.mycompany.org/wiki/topic/SPARQLWorkingGroup/rdf>

When a file is deleted (e.g. removing a blog post), the RDF-store also recieved a ping, to delete data

CLEAR <http://weblog.mycompany.org/blog/223/sioc.rdf>

Since this scenario uses only LOADING / UPDATING / REMOVING RDF graphs spread over a company's network, it does not require a WHERE close and works simply with the LOAD and CLEAR clauses from the current SPARUL proposal.

Details about the general use-case, see: http://www.w3.org/2001/sw/sweo/public/UseCases/EDF/

Use-case 2: doap:store

doap:store (http://doapstore.org - seems currently down, I'm investigating) provides a search engine and browsing interface for DOAP projects found on the Web. It relies on PingTheSemanticWeb to fetch each hour the new DOAP files that have been discovered and then run a simple python script to update its triple-store (Virtuoso powered)

#!/usr/bin/python
import os, string, urllib, sys, xml.dom.minidom
usock = urllib.urlopen('http://pingthesemanticweb.com/export/')
document = xml.dom.minidom.parse(usock)
for item in document.getElementsByTagName('rdfdocument'):
  graph = item.getAttribute('url')
  cmd = "/opt/virtuoso/bin/isql localhost exec=\"sparql load <%s> into graph <%s>\";" %(graph, graph)
  os.system(cmd)

The resulting SPARUL queries are then simple

LOAD <$uri> INTO GRAPH <$uri>

That might be reduced to

LOAD <$uri> INTO GRAPH <$uri>