Difference between revisions of "Prov-XML Identifiers"

From Provenance WG Wiki
Jump to: navigation, search
(Analysis)
(Purpose)
Line 4: Line 4:
  
 
This document contains discussions on how to represent identifiers (prov:id) and reference identified elements (prov:ref) in PROV-XML
 
This document contains discussions on how to represent identifiers (prov:id) and reference identified elements (prov:ref) in PROV-XML
 +
 +
== Requirements ==
 +
 +
# Allow for 'scruffy' provenance
 +
# Support referencing provenance records from external serializations in different formats (PROV-O, PROV-N)
 +
# Play nice with XML tooling
  
 
== Attribute Type - prov:id and prov:ref ==
 
== Attribute Type - prov:id and prov:ref ==

Revision as of 17:07, 24 January 2013

Purpose

This document contains discussions on how to represent identifiers (prov:id) and reference identified elements (prov:ref) in PROV-XML

Requirements

  1. Allow for 'scruffy' provenance
  2. Support referencing provenance records from external serializations in different formats (PROV-O, PROV-N)
  3. Play nice with XML tooling

Attribute Type - prov:id and prov:ref

In the schema prov:id and prov:ref are defined as XML attributes.

ID/IDREF

Use type xs:ID for prov:id and xs:IDREF for prov:ref

 <xs:attribute name="id" type="xs:ID"/>
 <xs:attribute name="ref" type="xs:IDREF"/>

Contraints

Validity constraint: ID
Values of type ID must match the Name production. A name must not appear more than once in an XML document as a value of this type; i.e., ID values must uniquely identify the elements which bear them.


Validity constraint: One ID per Element Type
No element type may have more than one ID attribute specified.


Validity constraint: ID Attribute Default
An ID attribute must have a declared default of #IMPLIED or #REQUIRED.


Validity constraint: IDREF
Values of type IDREF must match the Name production, and values of type IDREFS must match Names; each Name must match the value of an ID attribute on some element in the XML document; i.e. IDREF values must match the value of some ID attribute.

Advantages

  1. ID recognized by XML tools as an identifier type
    1. There is a uniqueness constraint on prov:id values (scope global to document)
  2. IDREF recognized by XML tools as a reference to an identified element
    1. A prov:ref must match the prov:id of some identified element in the document

Disadvantages

  1. lexical space is the same as the unqualified XML name (known as the xs:NCName datatype)
    1. ID and IDREF cannot contain colons, whitespaces, or start with numbers
    2. URIs and qualified names are not valid IDs because both contain colons.
  2. entity/relation records defined in different bundles in the same document cannot have the same prov:id value
  3. ID must be required or have a specified default value but PROV-DM defines identifiers as optional
    1. note: xmllint does not seem to complain if prov:id is left optional

QName

Use type xs:QName for both prov:id and xs:IDRef

 <xs:attribute name="id" type="xs:QName"/>
 <xs:attribute name="ref" type="xs:QName"/>

Contraints

TODO

Advantages

  1. Closest type to PROV-DM QualifiedName

Disadvantages

  1. No uniqueness contraint on prov:id
  2. The value of prov:ref is not required to match any existing prov:id
  3. Full URIs (e.g. http://example.com/ns/ex#e1) are not valid values of xs:QName; you must use a namespace.

anyURI

Use type xs:anyURI for both prov:id and xs:IDRef

 <xs:attribute name="id" type="xs:anyURI"/>
 <xs:attribute name="ref" type="xs:anyURI"/>

Contraints

  1. TODO

Advantages

  1. Alignment with PROV-O requirement (from RDF) that identifiers be URIs.
  2. URIs are valid values for prov:id

Disadvantages

  1. No uniqueness contraint on prov:id
  2. The value of prov:ref is not required to match any existing prov:id

ID/IDREF, XLink, and XPointer

  • Use type xs:ID for prov:id and xs:IDREF for prov:ref

 <xs:attribute name="id" type="xs:ID"/>
 <xs:attribute name="ref" type="xs:IDREF"/>

  • Use XLink's xlink:type="simple" to simplify use of referencing between two elements. The simple type specifies only one locator (target).
  • Use XPointer fragment identifier for simpler referencing of local and remote IDs.
  • Note that an XLink can link to an entire document, while the use of an XPointer can link to a specific element (identified by an ID) of a document.

Example

  • Example use of ID/IDREF along with XPointers to reference entities across prov-xml documents.

http://www.example.com/trace1.provx:

 <prov:document
   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
   xmlns:xsd="http://www.w3.org/2001/XMLSchema"
   xmlns:xlink="http://www.w3.org/1999/xlink"
   xmlns:prov="http://www.w3.org/ns/prov#">
   <prov:entity prov:id="e1"/>
 </prov:document>

http://www.example.com/trace2.provx:

 <prov:document
   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
   xmlns:xsd="http://www.w3.org/2001/XMLSchema"
   xmlns:xlink="http://www.w3.org/1999/xlink"
   xmlns:prov="http://www.w3.org/ns/prov#">
   <prov:entity prov:id="e2"/>
 </prov:document>

http://www.example.com/trace3.provx:

 <prov:document
   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
   xmlns:xsd="http://www.w3.org/2001/XMLSchema"
   xmlns:xlink="http://www.w3.org/1999/xlink"
   xmlns:prov="http://www.w3.org/ns/prov#">
   <prov:wasDerivedFrom xsi:type="prov:Derivation">
     <prov:generatedEntity xlink:type="simple" xlink:href="http://www.example.com/trace2.provx#e2"/>
     <prov:usedEntity xlink:type="simple" xlink:href="http://www.example.com/trace1.provx#e1"/>
   </prov:wasDerivedFrom>
   <prov:wasDerivedFrom>
     <prov:generatedEntity xlink:type="simple" xlink:href="http://www.example.com/trace2.provx#e2"/>
     <prov:usedEntity xlink:type="simple" xlink:href="http://www.example.com/trace1.provx#e1"/>
     <prov:type xsi:type="xsd:string">physical transform</prov:type>
   </prov:wasDerivedFrom>
 </prov:document>

Contraints

  1. Dependent on the ID/IDREF approach for identifiers.
  2. Use of XLink and XPointer assumes that each prov-xml document is URL-accessible.

Advantages

  1. Leverages existing W3C Recommendations on ID/IDREF, XLink, and XPointer.
  2. Use of xlink:type="simple" maintains simplicity.
  3. Leverages ID/IDREF for internal references in the same prov-xml document.
  4. Leverages XLink and XPointer when references need to span across multiple prov-xml documents.
  5. Use of XPointer fragment identifier can allow for more complex references via xpaths if needed.

Disadvantages

  1. Needs more community implementations of XLinks.
  2. Use of XLink and XPointer assumes that each prov-xml document is URL-accessible.
  3. Shares disadvantages with the ID/IDREF approach. e.g. constrained usage of colons in IDs.

Analysis

  • Use of of XPointer xpath fragment allows for more complex references.

 trace.provx#xpointer(xpath)

  • But use of the simpler XPointer fragment identifier simplifies usage and adoption.

 trace.provx#xpointer(id("entity1"))

can be expressed simply as:

 trace.provx#entity1

allowing concurrent unique references to IDs within a prov-xml document via IDREF and across multiple prov-xml documents via XPointer fragment identifier.

  • XLink 1.1's XLink Attribute Usage Patterns indicates that the xlink syntax could be even simpler where the xlink:type="simple" is optional when xlink:href is used. So the prov:entity link could be reduced to:

 <prov:generatedEntity xlink:href="http://www.example.com/trace2.provx#e2"/>

  • The use of IDREF for local references could be replaced uniformly with XLinks to local IDs:

 <prov:generatedEntity xlink:href="#e2"/>

Analysis

TODO

References