Use Case Crosswalk Maintenance

From XG Provenance Wiki
Jump to: navigation, search

Name

Crosswalk Maintenance

Owner

Kai Eckert

Provenance Dimensions

  • Primary: Debugging
  • Secondary: Attribution

Background and Current Practice

This use-case is taken from DC-09 conference article. Please refer to this paper for more details.

University libraries need to handle metadata from diverse sources that is usually encoded in incompatible metadata formats and of disparate quality. To facilitate a unified search interface on this heterogeneous metadata accumulation, the metadata formats need to be aligned. Typically, a format that forms a common denominator of all formats involved is chosen and the metadata is converted into this target format using crosswalks.

These crosswalks are usually hand-crafted by metadata experts and then transferred into program logic or transformation stylesheets. In the case of errors in the resulting metadata, the crosswalk has to be improved. The identification of the erronous part of the crosswalk can be tedious and after the crosswalk change, the whole set of resulting metadata has to be recreated, as it can not be determined, which parts of it are affected by the change.

Goal

The goal is to support the maintenance of crosswalks. Additional provenance information is provided for each resulting metadata record that enables efficient debugging.

Use Case Scenario

The program logic that is derived from the mappings is extended to not only write the resulting metadata elements, but additionally for every element the following information:

  • the version of the crosswalk used
  • the number of the mapping rule used
  • the source fields used

With this information, at least the following maintenance steps can be supportet:

  • Crosswalk updates: After a change in the crosswalk, we can recreate all records that are affected.
  • Fixing mapping errors: If an error in the metadata is found, the responsible rule in the crosswalk can directly be identified.

Problems and Limitations

This use-case requires provenance on statement level, which has to be supported by the underlying infrastructure. However, in RDF exist two mechanisms that support this: Reification and Named Graphs.

A drawback is the overhead that is produced by the additional information. As this information has to be stored for every statement, the needed storage space might increase by some factor.

Unanticipated Uses

Other use-cases that require provenance on statement-level, like Use_Case_Metadata_Merging

Existing Work

Working examples by means of RDF Reification can be found here: DC-09 conference article