Prov-XML Identifiers
From Provenance WG Wiki
(Difference between revisions)
(→Analysis) |
(→Purpose) |
||
| Line 4: | Line 4: | ||
This document contains discussions on how to represent identifiers (prov:id) and reference identified elements (prov:ref) in PROV-XML | This document contains discussions on how to represent identifiers (prov:id) and reference identified elements (prov:ref) in PROV-XML | ||
| + | |||
| + | == Requirements == | ||
| + | |||
| + | # Allow for 'scruffy' provenance | ||
| + | # Support referencing provenance records from external serializations in different formats (PROV-O, PROV-N) | ||
| + | # Play nice with XML tooling | ||
== Attribute Type - prov:id and prov:ref == | == Attribute Type - prov:id and prov:ref == | ||
Revision as of 17:07, 24 January 2013
Contents |
Purpose
This document contains discussions on how to represent identifiers (prov:id) and reference identified elements (prov:ref) in PROV-XML
Requirements
- Allow for 'scruffy' provenance
- Support referencing provenance records from external serializations in different formats (PROV-O, PROV-N)
- Play nice with XML tooling
Attribute Type - prov:id and prov:ref
In the schema prov:id and prov:ref are defined as XML attributes.
ID/IDREF
Use type xs:ID for prov:id and xs:IDREF for prov:ref
<xs:attribute name="id" type="xs:ID"/> <xs:attribute name="ref" type="xs:IDREF"/>
Contraints
- Validity constraint: ID
- Values of type ID must match the Name production. A name must not appear more than once in an XML document as a value of this type; i.e., ID values must uniquely identify the elements which bear them.
- Validity constraint: One ID per Element Type
- No element type may have more than one ID attribute specified.
- Validity constraint: ID Attribute Default
- An ID attribute must have a declared default of #IMPLIED or #REQUIRED.
- Validity constraint: IDREF
- Values of type IDREF must match the Name production, and values of type IDREFS must match Names; each Name must match the value of an ID attribute on some element in the XML document; i.e. IDREF values must match the value of some ID attribute.
Advantages
- ID recognized by XML tools as an identifier type
- There is a uniqueness constraint on prov:id values (scope global to document)
- IDREF recognized by XML tools as a reference to an identified element
- A prov:ref must match the prov:id of some identified element in the document
Disadvantages
- lexical space is the same as the unqualified XML name (known as the xs:NCName datatype)
- ID and IDREF cannot contain colons, whitespaces, or start with numbers
- URIs and qualified names are not valid IDs because both contain colons.
- entity/relation records defined in different bundles in the same document cannot have the same prov:id value
- ID must be required or have a specified default value but PROV-DM defines identifiers as optional
- note: xmllint does not seem to complain if prov:id is left optional
QName
Use type xs:QName for both prov:id and xs:IDRef
<xs:attribute name="id" type="xs:QName"/> <xs:attribute name="ref" type="xs:QName"/>
Contraints
TODO
Advantages
- Closest type to PROV-DM QualifiedName
Disadvantages
- No uniqueness contraint on prov:id
- The value of prov:ref is not required to match any existing prov:id
- Full URIs (e.g. http://example.com/ns/ex#e1) are not valid values of xs:QName; you must use a namespace.
anyURI
Use type xs:anyURI for both prov:id and xs:IDRef
<xs:attribute name="id" type="xs:anyURI"/> <xs:attribute name="ref" type="xs:anyURI"/>
Contraints
- TODO
Advantages
- Alignment with PROV-O requirement (from RDF) that identifiers be URIs.
- URIs are valid values for prov:id
Disadvantages
- No uniqueness contraint on prov:id
- The value of prov:ref is not required to match any existing prov:id
ID/IDREF, XLink, and XPointer
- Use type xs:ID for prov:id and xs:IDREF for prov:ref
<xs:attribute name="id" type="xs:ID"/> <xs:attribute name="ref" type="xs:IDREF"/>
- Use XLink's xlink:type="simple" to simplify use of referencing between two elements. The simple type specifies only one locator (target).
- Use XPointer fragment identifier for simpler referencing of local and remote IDs.
- Note that an XLink can link to an entire document, while the use of an XPointer can link to a specific element (identified by an ID) of a document.
Example
- Example use of ID/IDREF along with XPointers to reference entities across prov-xml documents.
http://www.example.com/trace1.provx:
<prov:document xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:prov="http://www.w3.org/ns/prov#"> <prov:entity prov:id="e1"/> </prov:document>
http://www.example.com/trace2.provx:
<prov:document xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:prov="http://www.w3.org/ns/prov#"> <prov:entity prov:id="e2"/> </prov:document>
http://www.example.com/trace3.provx:
<prov:document xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:prov="http://www.w3.org/ns/prov#"> <prov:wasDerivedFrom xsi:type="prov:Derivation"> <prov:generatedEntity xlink:type="simple" xlink:href="http://www.example.com/trace2.provx#e2"/> <prov:usedEntity xlink:type="simple" xlink:href="http://www.example.com/trace1.provx#e1"/> </prov:wasDerivedFrom> <prov:wasDerivedFrom> <prov:generatedEntity xlink:type="simple" xlink:href="http://www.example.com/trace2.provx#e2"/> <prov:usedEntity xlink:type="simple" xlink:href="http://www.example.com/trace1.provx#e1"/> <prov:type xsi:type="xsd:string">physical transform</prov:type> </prov:wasDerivedFrom> </prov:document>
Contraints
- Dependent on the ID/IDREF approach for identifiers.
- Use of XLink and XPointer assumes that each prov-xml document is URL-accessible.
Advantages
- Leverages existing W3C Recommendations on ID/IDREF, XLink, and XPointer.
- Use of xlink:type="simple" maintains simplicity.
- Leverages ID/IDREF for internal references in the same prov-xml document.
- Leverages XLink and XPointer when references need to span across multiple prov-xml documents.
- Use of XPointer fragment identifier can allow for more complex references via xpaths if needed.
Disadvantages
- Needs more community implementations of XLinks.
- Use of XLink and XPointer assumes that each prov-xml document is URL-accessible.
- Shares disadvantages with the ID/IDREF approach. e.g. constrained usage of colons in IDs.
Analysis
- Use of of XPointer xpath fragment allows for more complex references.
trace.provx#xpointer(xpath)
- But use of the simpler XPointer fragment identifier simplifies usage and adoption.
trace.provx#xpointer(id("entity1"))
can be expressed simply as:
trace.provx#entity1
allowing concurrent unique references to IDs within a prov-xml document via IDREF and across multiple prov-xml documents via XPointer fragment identifier.
- XLink 1.1's XLink Attribute Usage Patterns indicates that the xlink syntax could be even simpler where the xlink:type="simple" is optional when xlink:href is used. So the prov:entity link could be reduced to:
<prov:generatedEntity xlink:href="http://www.example.com/trace2.provx#e2"/>
- The use of IDREF for local references could be replaced uniformly with XLinks to local IDs:
<prov:generatedEntity xlink:href="#e2"/>
Analysis
TODO
References
- http://www.en8848.com.cn/reilly%20books/xml/schema/ch09_01.htm
- http://www.w3.org/TR/xmlschema-2/
- http://www.w3.org/TR/2000/WD-xml-2e-20000814
- http://www.w3.org/TR/prov-dm/
- Using Qualified Names (QNames) as Identifiers in Content
- Namespaces in XML 1.0
- Namespaces in XML Errata: clarifies the meaning of colons in other contexts.
- XML Linking Language (XLink) Version 1.1
- XML Pointer Language (XPointer)
