ProvXMLNamespaces

From Provenance WG Wiki
Revision as of 11:45, 3 December 2012 by Ssoiland (Talk | contribs)

Jump to: navigation, search


How to handle namespaces with multiple XML schemas

This page tries to summarise different strategies to support this, and which hooks are needed to make the core schema extensible.


Making the core.xsd support extensions

There are different ways to make the core PROV XSD support extensibility. Open questions are:

  • Should a document using a PROV extension still be valid by the core XSD?
  • Should a document using a third party extension still be valid by the core XSD?

References:


Using <xs:any /> for ##any NS

prov.xsd can include <xs:any />, allowing any extension with any namespace, in strategic places, such as at the end of <dependencies> and <records>.

  <xs:element name="records" type="prov:Records"/>
  <xs:complexType name="Records">
    <xs:sequence>
      <xs:element ref="prov:account"  minOccurs="0" maxOccurs="unbounded"/>
      <xs:element ref="prov:activity" minOccurs="0" maxOccurs="unbounded"/>
      <xs:element ref="prov:entity"   minOccurs="0" maxOccurs="unbounded"/>
      <xs:element ref="prov:agent"    minOccurs="0" maxOccurs="unbounded"/>
      <xs:element ref="prov:note"     minOccurs="0" maxOccurs="unbounded"/>
      <xs:element name="dependencies" type="prov:Dependencies" minOccurs="1"/>
      <xs:any minOccurs="0" maxOccurs="unbounded" namespace="##any" processContents="lax" />
    </xs:sequence>
    <xs:attribute ref="prov:id"/>
  </xs:complexType>

Note that this is different from the 'provide any attributes' kind of <xs:any> which elements like <prov:entity> has - those are explicitly in namespace "##other" and don't have any PROV semantics.


Advantages:

  • Documents using extensions are still valid according to prov.xsd
  • No schema required for extensions (unless we set processContents="strict")
  • Extensions appear at predictable locations in the document

Disadvantages:

  • Any (and often non-nonsensical) <prov:elements> can also be inserted at the wrong place, for instance <prov:dependencies> <prov:entity> </>
  • Element before needs to be bounded and non-optional - like above <prov:dependencies> is now required, this is because the parsers would otherwise not know if <prov:dependencies> was part of the dependencies element or the xs:any block. (Unique Article Attribution constraint)

One workaround for the first two would be to wrap the <xs:any> in an <prov:extension> element; thus even if someone chooses to put <prov:entity> within the <prov:extension>, that is more clearly outside the core schema. This can also make it easier for non-schema parsers to deal with unknown elements.


Using <xs:any /> for ##other NS

As above, but this time using namespace="##other". This means that <prov:blah> elements are not allowed.


Advantages:

  • Documents using extensions (in other NSes) are still valid according to prov.xsd
  • No schema required for extensions (unless we set processContents="strict")
  • Extensions appear at predictable locations in the document


Disadvantages:

  • PROV extensions need separate namespaces or <xsd:redefine>ing (see below)
  • Element before needs to be bounded and non-optional is required (or the any needs to be wrapped in <prov:extension>)


Using substitution groups and abstract elements

Rather than <xs:any>, strategic places define an optional element reference, which is defined abstract. Extensions can implement this using xsd:substitutionGroup="extraRecord".

Note that unless we block this in the schema, this is allowed already for any top-level <xs:element>.

  <xs:element name="extraRecord" abstract="true" />
  <xs:element name="records" type="prov:Records"/>
  <xs:complexType name="Records">
    <xs:sequence>
      <xs:element ref="prov:account"  minOccurs="0" maxOccurs="unbounded"/>
      <xs:element ref="prov:activity" minOccurs="0" maxOccurs="unbounded"/>
      <xs:element ref="prov:entity"   minOccurs="0" maxOccurs="unbounded"/>
      <xs:element ref="prov:agent"    minOccurs="0" maxOccurs="unbounded"/>
      <xs:element ref="prov:note"     minOccurs="0" maxOccurs="unbounded"/>
      <xs:element name="dependencies" type="prov:Dependencies" minOccurs="0"/>
      <xs:element ref="prov:extraRecord" minOccurs="0" maxOccurs="unbounded"/>      
    </xs:sequence>
    <xs:attribute ref="prov:id"/>
  </xs:complexType>

Advantages:

  • Documents by default have to comply with the PROV XSD strictly
  • Extensions appear at predictable locations in the document
  • No need for wrapping element or non-optional element before

Disadvantages:

  • Extensions must be defined in a resolvable (or loaded) schema
  • Documents using extensions need to specify a resolvable xsi:schemaLocation to be valid.
  • Substitution groups are generally less understood than the other techniques


Extending complex types

The extension document define subclasses of any of our <xs:complexType>. In the instance document, an element of an extension type is specified using xsi:type<code>

Note that unless we block this in the schema, this is allowed already for any top-level <xs:complexType> .


Advantages:

  • Anything can be extended, no specific hooks need to be placed in core schema (good or bad)
  • Extensions can only add elements at the end of sequences (check!)

Disadvantages:

  • Extensions can appear at any place in the document
  • Extensions must be defined in a resolvable (or loaded) schema
  • Documents using extensions need to specify a resolvable <code>xsi:schemaLocation to be valid.
  • Documents using extensions must use xsi:type to specify the subtype
  • Multiple extensions might need to be arranged in a hierarchy (for instance if both want to subclass Records to add their custom elements).


Our extensions and the PROV namespace

The question of what you get when you resolve the namespace for content type application/xml (Note: XSD does not have its own media type) - do you get just the core schema or a 'mega-schema' that includes/imports all extensions in our namespace? We should preferably try to do the second, with the 'core' schema resolvable separately.

The question then is how this is achieved, as this affects how our extensions are defined.

References: