ProvXMLNamespaces

From Provenance WG Wiki
Revision as of 16:16, 6 December 2012 by Ssoiland (Talk | contribs)

Jump to: navigation, search

How to handle namespaces with multiple XML schemas

This page tries to summarise different strategies to support this, and which hooks are needed to make the core schema extensible.


Making the core.xsd support extensions

There are different ways to make the core PROV XSD support extensibility. Open questions are:

  • Should a document using a PROV extension still be valid by the core XSD?
  • Should a document using a third party extension still be valid by the core XSD?

References:


Using <xs:any /> for ##any NS

prov.xsd can include <xs:any />, allowing any extension with any namespace, in strategic places, such as at the end of <dependencies> and <records>.

  <xs:element name="records" type="prov:Records"/>
  <xs:complexType name="Records">
    <xs:sequence>
      <xs:element ref="prov:account"  minOccurs="0" maxOccurs="unbounded"/>
      <xs:element ref="prov:activity" minOccurs="0" maxOccurs="unbounded"/>
      <xs:element ref="prov:entity"   minOccurs="0" maxOccurs="unbounded"/>
      <xs:element ref="prov:agent"    minOccurs="0" maxOccurs="unbounded"/>
      <xs:element ref="prov:note"     minOccurs="0" maxOccurs="unbounded"/>
      <xs:element name="dependencies" type="prov:Dependencies" minOccurs="1"/>
      <xs:any minOccurs="0" maxOccurs="unbounded" namespace="##any" processContents="lax" />
    </xs:sequence>
    <xs:attribute ref="prov:id"/>
  </xs:complexType>

Note that this is different from the 'provide any attributes' kind of <xs:any> which elements like <prov:entity> has - those are explicitly in namespace "##other" and don't have any PROV semantics.


Advantages:

  • Documents using extensions are still valid according to prov.xsd
  • No schema required for extensions (unless we set processContents="strict")
  • Extensions appear at predictable locations in the document

Disadvantages:

  • Any (and often non-nonsensical) <prov:elements> can also be inserted at the wrong place, for instance <prov:dependencies> <prov:entity> </>
  • Element before needs to be bounded and non-optional - like above <prov:dependencies> is now required, this is because the parsers would otherwise not know if <prov:dependencies> was part of the dependencies element or the xs:any block. (Unique Article Attribution constraint)

One workaround for the first two would be to wrap the <xs:any> in an <prov:extension> element; thus even if someone chooses to put <prov:entity> within the <prov:extension>, that is more clearly outside the core schema. This can also make it easier for non-schema parsers to deal with unknown elements.


Using <xs:any /> for ##other NS

As above, but this time using namespace="##other". This means that <prov:blah> elements are not allowed.


Advantages:

  • Documents using extensions (in other NSes) are still valid according to prov.xsd
  • No schema required for extensions (unless we set processContents="strict")
  • Extensions appear at predictable locations in the document


Disadvantages:

  • PROV extensions need separate namespaces or <xsd:redefine>ing (see below)
  • Element before needs to be bounded and non-optional is required (or the any needs to be wrapped in <prov:extension>)


Substitution groups for existing elements

An extension schema can replace any <xs:element> by using xsd:substitutionGroup="extraRecord" and replace it with a different element (of the same type or an extension type).

Note that unless we block this in the schema, this is allowed already for any top-level <xs:element> (entity, activity, agent, note, label(!), role, type(!), account, container, records).

Advantages:

  • Documents by default have to comply with the PROV XSD strictly
  • No need for wrapping element or non-optional element before

Disadvantages:

  • Extensions can appear at any place in the document
  • Extensions must be defined in a resolvable (or loaded) schema
  • Documents using extensions need to specify a resolvable xsi:schemaLocation to be valid.
  • Extensions would generally also need to extend a type to add their new sub-elements.
  • Substitution groups are generally less understood than the other techniques
  • Documents using extensions would no longer be (easily) understandable as PROV-XML, as they would use different elements for say <prov:records> and <prov:container>


Substitution groups and abstract elements

Rather than <xs:any>, strategic places define an optional element reference, which is defined abstract. Extensions can implement this using xsd:substitutionGroup="extraRecord".

Note that unless we block this in the schema, this is allowed already for any top-level <xs:element>.

  <xs:element name="extraRecord" abstract="true" />
  <xs:element name="records" type="prov:Records"/>
  <xs:complexType name="Records">
    <xs:sequence>
      <xs:element ref="prov:account"  minOccurs="0" maxOccurs="unbounded"/>
      <xs:element ref="prov:activity" minOccurs="0" maxOccurs="unbounded"/>
      <xs:element ref="prov:entity"   minOccurs="0" maxOccurs="unbounded"/>
      <xs:element ref="prov:agent"    minOccurs="0" maxOccurs="unbounded"/>
      <xs:element ref="prov:note"     minOccurs="0" maxOccurs="unbounded"/>
      <xs:element name="dependencies" type="prov:Dependencies" minOccurs="0"/>
      <xs:element ref="prov:extraRecord" minOccurs="0" maxOccurs="unbounded"/>      
    </xs:sequence>
    <xs:attribute ref="prov:id"/>
  </xs:complexType>

Advantages:

  • Documents by default have to comply with the PROV XSD strictly
  • Extensions appear at predictable locations in the document
  • No need for wrapping element or non-optional element before

Disadvantages:

  • Extensions must be defined in a resolvable (or loaded) schema
  • Documents using extensions need to specify a resolvable xsi:schemaLocation to be valid.
  • Substitution groups are generally less understood than the other techniques


Extending existing complex types

The extension document define subclasses of any of our <xs:complexType>. In the instance document, an element of an extension type is specified using xsi:type

Note that unless we block this in the schema, this is allowed already for any top-level <xs:complexType> .


Advantages:

  • Anything can be extended, no specific hooks need to be placed in core schema (good or bad)
  • Extensions can only add elements at the end of sequences (check!)

Disadvantages:

  • Extensions can appear at any place in the document
  • Extensions must be defined in a resolvable (or loaded) schema
  • Documents using extensions need to specify a resolvable xsi:schemaLocation to be valid.
  • Documents using extensions must use xsi:type to specify the subtype
  • Multiple extensions might need to be arranged in a hierarchy (for instance if both want to subclass Records to add their custom elements).


Analysis

Stian has analysed the above, and tried out various methods by making test-schemas in Eclipse, which has a good validator and editor support for XML schemas.

<xs:any /> for ##any

This approach is tricky to deal with in extension documents, because you can't get rid of the <xs:any>, and the extension of say <prov:records> would just be able to add new elements *after* the <xs:any>. Due to the Unique Article Attribution mentioned above, this can get difficult with optional elements, so one might be forced to insert a separator before the new specific element, say <prov:collection>.

So this extension is great for allowing any unvalidated extension, but becomes quite tricky to work around in a genuine XSD extension, like the ones we want to make for PROV Collections and PROV Linking.


<xs:any /> for ##other

As this only allows ##other, then our prov: extensions would not suffer from the unique article attribution problem, but at the downside of such documents not being valid without understanding the prov: extension.

Stian's view is that we should not allow people to add random prov: elements without being specific about the schema, so this is good.

It has however moved our problem for ##any to still be true for any third party developer.


Substitution groups for existing elements

This does not seem recommendable, as documents would easily not be PROV at all, without deep understanding of the extending schema. This means it would break xpaths, etc, because documents with extensions would use <ex:different> rather than say <prov:entity>.

Stian would however not block this in the schema, as there could be usecases where people would want to pick and choose things from our schema for embedding within a different schema.


Substitution groups and abstract elements

This is in Stian's view the cleanest solution, as extension points are made obvious by clear abstract elements (of which PROV has not instances). One difficulty is that substitution groups are seldom used by XSD developers, and so this might be a bit unknown territory. Clear examples (such as the simple extensions PROV itself would make) would help.

Without the <xs:any>, documents doing such extensions would only be valid if they have an <xsi:schemaLocation> specified (as it would define <ex:whatever> as an substitution group for <prov:extraRecord> for instancE). Stian thinks this is a good thing - because then the schema can (hopefully) be retrieved and understood by others.

We can insert these at predictable places (ie. always at the end, after prov: statements), so that applications who don't care about them can ignore them.

The prov: extensions only need to import and implement the substitution group, no <xsi:redefine> needed, and so it is very easy to combine and pick and choose different extensions.



Extending existing complex types

The arguments in favour of this solution is largely the same as above. Documents using extensions would indicate the extension with <xsi:type>, and any extensions would just come below the current prov: statements. Those extensions might still run into problems with <xs:any> as above, as most of the inner types finish with an <xs:any>.

The argument against this is that it does not make it clear what could be extended (unless we use lots of xsd:final on non-extendible , and it would seem like 'well, anything!' - unless we define a similar <extraRecord> etc. with an abstract complex type.

There's two different goals though - do we want extension to be able to add different kind of records etc, or also to add things in more specific types, like <prov:wasGeneratedBy>? For our PROV extensions it seems we only need the first, but third-party extensions might want to restrict or specify which attributes they want to require to be provided.

This can be difficult to use for the prov: extensions as several extensions would subclass or redefine the same type, and could not easily be combined.


Summary

So Stian's recommendation is to go for "Substitution groups and abstract elements", but allow extension by complex types for more specific cases. This makes our joint-namespace issue below much easier as opposed to with

Our extensions and the PROV namespace

The question of what you get when you resolve the namespace for content type application/xml (Note: XSD does not have its own media type) - do you get just the core schema or a 'mega-schema' that includes/imports all extensions in our namespace? We should preferably try to do the second, with the 'core' schema resolvable separately.

If we go for the "Substitution groups and abstract elements" approach, then this is quite straight forward.

The core schema (say core.xsd) defines the PROV elements and a couple of abstract, optional elements like <prov:extraRecords>. The extension is defined in the same namespace, and <xsd:include schemaLocation="core.xsd">. Individual extensions can each make such implementations, and as long as they don't need to redefine any of the types to inject anything, then they can also easily be combined.

Thus if you resolve our namespace for "application/xml" you can be redirected to prov.xsd, which is in our namespace and simply includes <xsd:include schemaLocation="core.xsd"> and the same for each of our standard extensions that are in the same namespace. So no manual merging of files would be needed, and this document can be useful as the <xsi:schemaLocation> if you want to allow any of our official extensions.

Anyone who wants to make a different combination could make their own schema in our namespace, and import the sub-modules, and provide that as their schema location.



References: