Warning:
This wiki has been archived and is now read-only.

Integration of legacy data

From SDshare Community Group
Jump to: navigation, search

Use case

A power company wants an archive system where document metadata is automatically enriched when documents are archived. For example, a document might be related to a work order when archived. The enrichment process should then add the customer the work is being done for, and the project the work is part of. What relations to traverse to include new objects should be configurable, not hard-wired into code. The background data necessary is located in a number of different pre-existing applications.

The two obvious approaches are query federation or building something like a data warehouse. The queries to extract new metadata will need to be very general, typically using open patterns like "<...> ?p ?o", with some conditions on ?p. We estimate that on average 10 queries are needed per document to reach transitive closure. There will also be at least five source systems. And at peak times the system needs to process 2 documents per second, which translates to 20 queries per second. This seems unrealistic in a query federation setting, and we therefore choose the data warehouse alternative.

For data warehousing we will need to be able to import data from the data sources, and then later to keep the data in sync. (We here take the use of RDF as a given, and do not attempt to justify it.)

Solution

TBW