Difference between revisions of "Prov-XML Identifiers"

From Provenance WG Wiki
Jump to: navigation, search
(Purpose)
(Disadvantages)
Line 115: Line 115:
 
# The value of prov:ref is not required to match any existing prov:id
 
# The value of prov:ref is not required to match any existing prov:id
 
# URI correctness does not appear to be validated by schema parsers [http://books.xmlschemata.org/relaxng/ch19-77009.html citation]
 
# URI correctness does not appear to be validated by schema parsers [http://books.xmlschemata.org/relaxng/ch19-77009.html citation]
 +
## Schema validators do not recognize the namespace component of a qname-formed id (e.g. "ex:foo")
 +
## Schema validators will not test the existence of a namespace mentioned in a name-formed id (e.g. "ex" from "ex:foo" can be undefined)
  
 
<br/>
 
<br/>

Revision as of 18:21, 24 January 2013

Purpose

This document contains discussions on how to represent identifiers (prov:id) and reference identified elements (prov:ref) in PROV-XML

Requirements

  1. Allow for 'scruffy' provenance
  2. Support referencing provenance records from external serializations in different formats (PROV-O, PROV-N)
  3. Play nice with XML tooling

Other Considerations

from PROV-DM Qualified Name:

PROV-DM stipulates that a qualified name can be mapped into an IRI by concatenating the IRI associated with the prefix and the local part.

Attribute Type - prov:id and prov:ref

In the schema prov:id and prov:ref are defined as XML attributes.

ID/IDREF

Use type xs:ID for prov:id and xs:IDREF for prov:ref

 <xs:attribute name="id" type="xs:ID"/>
 <xs:attribute name="ref" type="xs:IDREF"/>

Contraints

Validity constraint: ID
Values of type ID must match the Name production. A name must not appear more than once in an XML document as a value of this type; i.e., ID values must uniquely identify the elements which bear them.


Validity constraint: One ID per Element Type
No element type may have more than one ID attribute specified.


Validity constraint: ID Attribute Default
An ID attribute must have a declared default of #IMPLIED or #REQUIRED.


Validity constraint: IDREF
Values of type IDREF must match the Name production, and values of type IDREFS must match Names; each Name must match the value of an ID attribute on some element in the XML document; i.e. IDREF values must match the value of some ID attribute.

Advantages

  1. ID recognized by XML tools as an identifier type
    1. There is a uniqueness constraint on prov:id values (scope global to document)
  2. IDREF recognized by XML tools as a reference to an identified element
    1. A prov:ref must match the prov:id of some identified element in the document

Disadvantages

  1. lexical space is the same as the unqualified XML name (known as the xs:NCName datatype)
    1. ID and IDREF cannot contain colons, whitespaces, or start with numbers
    2. URIs and qualified names are not valid IDs because both contain colons.
  2. entity/relation records defined in different bundles in the same document cannot have the same prov:id value
  3. ID must be required or have a specified default value but PROV-DM defines identifiers as optional
    1. note: xmllint does not seem to complain if prov:id is left optional

QName

Use type xs:QName for both prov:id and xs:IDRef

 <xs:attribute name="id" type="xs:QName"/>
 <xs:attribute name="ref" type="xs:QName"/>

Contraints

TODO

Advantages

  1. Closest type to PROV-DM QualifiedName
  2. Schema validators will test for the existence of a namespace specified in a QName (e.g. "ex:foo" is invalid if namespace "foo" is not defined)

Disadvantages

  1. No uniqueness contraint on prov:id
  2. The value of prov:ref is not required to match any existing prov:id
  3. Full URIs (e.g. http://example.com/ns/ex#e1) are not valid values of xs:QName; you must use a namespace.
  4. number-only local names (e.g. "ex:0001") are not supported

anyURI

Use type xs:anyURI for both prov:id and xs:IDRef

 <xs:attribute name="id" type="xs:anyURI"/>
 <xs:attribute name="ref" type="xs:anyURI"/>

Contraints

  1. TODO

Advantages

  1. Alignment with PROV-O requirement (from RDF) that identifiers be URIs.
  2. URIs are valid values for prov:id

Disadvantages

  1. No uniqueness contraint on prov:id
  2. The value of prov:ref is not required to match any existing prov:id
  3. URI correctness does not appear to be validated by schema parsers citation
    1. Schema validators do not recognize the namespace component of a qname-formed id (e.g. "ex:foo")
    2. Schema validators will not test the existence of a namespace mentioned in a name-formed id (e.g. "ex" from "ex:foo" can be undefined)


When prov:id has type xs:anyURI the following XML validates successfully even though the namespace foo is not defined:

<prov:document
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xmlns:xsd="http://www.w3.org/2001/XMLSchema"
    xmlns:prov="http://www.w3.org/ns/prov#"
    xmlns:ex="http://example.com/ns/ex#">
	
    <prov:entity prov:id="foo:001"/>

</prov:document>

ID/IDREF, XLink, and XPointer

  • Use type xs:ID for prov:id and xs:IDREF for prov:ref

 <xs:attribute name="id" type="xs:ID"/>
 <xs:attribute name="ref" type="xs:IDREF"/>

  • Use XLink's xlink:type="simple" to simplify use of referencing between two elements. The simple type specifies only one locator (target).
  • Use XPointer fragment identifier for simpler referencing of local and remote IDs.
  • Note that an XLink can link to an entire document, while the use of an XPointer can link to a specific element (identified by an ID) of a document.

Example

  • Example use of ID/IDREF along with XPointers to reference entities across prov-xml documents.

http://www.example.com/trace1.provx:

 <prov:document
   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
   xmlns:xsd="http://www.w3.org/2001/XMLSchema"
   xmlns:xlink="http://www.w3.org/1999/xlink"
   xmlns:prov="http://www.w3.org/ns/prov#">
   <prov:entity prov:id="e1"/>
 </prov:document>

http://www.example.com/trace2.provx:

 <prov:document
   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
   xmlns:xsd="http://www.w3.org/2001/XMLSchema"
   xmlns:xlink="http://www.w3.org/1999/xlink"
   xmlns:prov="http://www.w3.org/ns/prov#">
   <prov:entity prov:id="e2"/>
 </prov:document>

http://www.example.com/trace3.provx:

 <prov:document
   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
   xmlns:xsd="http://www.w3.org/2001/XMLSchema"
   xmlns:xlink="http://www.w3.org/1999/xlink"
   xmlns:prov="http://www.w3.org/ns/prov#">
   <prov:wasDerivedFrom xsi:type="prov:Derivation">
     <prov:generatedEntity xlink:type="simple" xlink:href="http://www.example.com/trace2.provx#e2"/>
     <prov:usedEntity xlink:type="simple" xlink:href="http://www.example.com/trace1.provx#e1"/>
   </prov:wasDerivedFrom>
   <prov:wasDerivedFrom>
     <prov:generatedEntity xlink:type="simple" xlink:href="http://www.example.com/trace2.provx#e2"/>
     <prov:usedEntity xlink:type="simple" xlink:href="http://www.example.com/trace1.provx#e1"/>
     <prov:type xsi:type="xsd:string">physical transform</prov:type>
   </prov:wasDerivedFrom>
 </prov:document>

Contraints

  1. Dependent on the ID/IDREF approach for identifiers.
  2. Use of XLink and XPointer assumes that each prov-xml document is URL-accessible.

Advantages

  1. Leverages existing W3C Recommendations on ID/IDREF, XLink, and XPointer.
  2. Use of xlink:type="simple" maintains simplicity.
  3. Leverages ID/IDREF for internal references in the same prov-xml document.
  4. Leverages XLink and XPointer when references need to span across multiple prov-xml documents.
  5. Use of XPointer fragment identifier can allow for more complex references via xpaths if needed.

Disadvantages

  1. Needs more community implementations of XLinks.
  2. Use of XLink and XPointer assumes that each prov-xml document is URL-accessible.
  3. Shares disadvantages with the ID/IDREF approach. e.g. constrained usage of colons in IDs.

Analysis

  • Use of of XPointer xpath fragment allows for more complex references.

 trace.provx#xpointer(xpath)

  • But use of the simpler XPointer fragment identifier simplifies usage and adoption.

 trace.provx#xpointer(id("entity1"))

can be expressed simply as:

 trace.provx#entity1

allowing concurrent unique references to IDs within a prov-xml document via IDREF and across multiple prov-xml documents via XPointer fragment identifier.

  • XLink 1.1's XLink Attribute Usage Patterns indicates that the xlink syntax could be even simpler where the xlink:type="simple" is optional when xlink:href is used. So the prov:entity link could be reduced to:

 <prov:generatedEntity xlink:href="http://www.example.com/trace2.provx#e2"/>

  • The use of IDREF for local references could be replaced uniformly with XLinks to local IDs:

 <prov:generatedEntity xlink:href="#e2"/>

Analysis

TODO

References