RDF adapters expose the data of some source file or source system as RDF, to be consumed by some RDF application. Adapters can be pure "foo to RDF" converters or exporters, they can also provide a dynamic view on the data. This page is about architectures for such adatpers. For implementations, go to RDFImportersAndAdapters.
Styles of adapter interfaces
- Adapters as exporters: An adapter exports all it's data in one big chunk, e.g. by filling a Model instance given as an argument. Easy implementation, but unpractical for large, live datasets. Examples: Kowari Content Handlers, kSpaces Auto-Tagging plugins
- Adapters as query engines: An adapter is able to answer queries of some sort, e.g. simple find(Subject,Predicate,Object) queries or full SPARQL queries. Powerful for finding specific things in large datasets, but hard to implement. Examples: Jena graphs, D2RQ Adapter, Gnowsis Graph Adapters.
- Adapters as resource description providers: The adapter is asked to provide a description of a specific resource, adressed by URI. Easy implementation, scalable, but doesn't allow complex queries on large datasets. Examples: Some gnowsis Adapters. Further distinction is possible based on what is contained in a resource description:
- The adapter might decide itself how much information it puts into each description (e.g. SPARQL DESCRIBE). Very easy implementation.
- a common protocol, most likely CBD (ConciseBoundedDescriptions) might dictate which triples have to be included in a description. This adds implementation costs (adapter developers must know the protocol), but in the case of CBDs it enables straightforward crawling of all resources to retrieve the whole graph (no problems with blank node identity)
- Adapters as crawlable data sources: The adapter provides an interface that is easily crawlable and allows incremental gathering of large data sources like a lotus notes, an IMAP server or a Mircosoft Outlook installation. The interface of these adapters includes everything mentioned in Adapters as resource description providers and adds to it methods for listing the contents and finding changes.
- a concept of containers and resources (rdf: sequences and resources)
- list the contents of a sequence. The content of a sequence are either other sequences or resources. Each resource needs enough properties to determine if the resource is altered or new (dcterms:modified date, dc:title, uri).
- get the details of a resource. If a resource is found to be new, the client using the adapter calls a "get description" method to get all the details of a resource. This is usually triggered when during crawling a resource has a modified date, size, title, ...
A byproduct of these crawlable data sources is that the crawling semantic can be reused in graphical user interfaces. If the container structure of a filesystem, webdav or IMAP is represented in the sequence structure, users know the structure and can navigate in it. Examples: not known, probably someone did it. In the most unstable gnowsis, such an adapter is implemented.
- Gnowsis Adapter Framework: Treating Structured Data Sources as Virtual RDF Graphs by Leo Sauermann and Sven Schwarz at the ISWC 2005
- HostingApplication: An application that hosts data. An adapter provides access to that data to an RDF app. (from gnowsis)
- ClientApplication: An application that consumes data provided by an adapter. (from gnowsis)
This is a twist on the adapter concept. Can we modify the data stored in the source file by modifying the RDF graph?
Of the examples above, only Kowari plugins are updateable.
There are a number of challenges:
- Generating new IDs/URIs: The identifier for new resources is often generated deep within the source system (think SEQUENCE/AUTO_INCREMENT in a database). At the time when the new data is added to the RDF graph, we might not be able to know the URI of the new resource. Solution: Pass in blank node, let adapter pick the identifier.
- BlankNodes: How to know if a blank node in the replacement data is the same as one in the old data? See DeltaView
- Structural limitations of the adapter's data model: Unlike a general-purpose RDF triple store, most adapters have a pretty fixed structure. An MP3 file can store the name of exactly one artist, and it cannot store the artist's email address. It's impossible to update the data if you get the cardinalities or the vocabulary wrong.
- Incomplete data: RDF in general copes well with incomplete data. There's nothing wrong with knowing only a person's first name, even if properties for last name and email address exist. In a database, we might be unable to insert a new row unless we know the first name, last name and email address. Solution: Updates must be done in complete chunks that satisfy all the adapter's constraints.
- Rename/replace difference: Say I have a filesystem adapter, which exposes the URI file:/Users/richard/stuff (a directory). How do I rename the directory? The simple "add/remove triple" approach doesn't work. Solution: Never rename resources. This is difficult if the name (URI) is more than a mere opaque ID, because that might force us to change the resource's identity when some of it's aspects (say, it's location) changes.
How could the changes be conveyed to the adapter?
- Updated resource description: If the adapter is based on resource descriptions, then update could be implemented as the replacement of the description by a new one. Implementation should be easy, and as long as CBDs are used, there should be no blank node problems. Checking the new description for consistency with the adapter's data model shouldn't be too much of a problem. But we have to be careful to return *exactly* the same area of data around the resource as in the original description, or we will accidently delete statements.
- RdfDiff: There are some proposals on how to express differences between RDF graphs. This might be used to implement adapter updates. RDF diff is a tricky problem though. Especially regarding blank node identity. Checking diffs for consistency with the required adapter data structure could be hard as well.
- Replacement graph: One could give the adapter a whole new graph. This would replace everything previously known to the adpater. This is conceptually simple, and consistency checking is easy, but it doesn't scale well, and increases the risk of accidently losing concurrent changes. Kowari does this.
- Add/remove triples: Like in Jena graphs, one could tell the adapter to remove any triples that are no longer true, and provide new triples instead. This is a well-tested strategy, but can be tedious and tricky because simple changes like "Change the name from foo to bar" must be implemented as "Remove the old name, foo. Add a new name, bar.", where the resource passes an inconsistent state. There are also some issues with blank nodes.
- Why I love Patrick Sticker's URIQA approach LeoSauermann: Adapters should be built around resource descriptions, not around the used classes and properties. This simplifies implementation and avoids problems with blank nodes. ConciseBoundedDescriptions are a good choice for a resource description format. A resource's URI should also function as an URL from where to retrieve its description.