Re: Multiple XML schema files for a common target namespace (PROV-ISSUE-608)

A summary of the possible changes based on this discussion.  I am in favor of all three listed changes.

1) rename prov:abstractElement to prov:internalElement (or similar) to make it clear we do not expect non-PROV extensions to use this element.

2) add processContents="lax" on all xs:any elements.

3) change the definition of prov:Bundle to the following (bundleElements name is not final)

  <xs:complexType name="Bundle">
    <xs:complexContent>
      <xs:extension base="prov:Entity">
        <xs:sequence>
          <xs:element name="bundleElements" minOccurs="0">
            <xs:complexType>
              <xs:sequence maxOccurs="unbounded">
                <xs:group ref="prov:documentElements"/>
                <xs:any namespace="##other" processContents="lax" minOccurs="0" maxOccurs="unbounded"/>
              </xs:sequence>
            </xs:complexType>
          </xs:element>
        </xs:sequence>
      </xs:extension>
    </xs:complexContent>
  </xs:complexType>

With the updated Bundle complexType the PROV-XML serialization for a bundle would look like this

<prov:document
	xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	xmlns:xsd="http://www.w3.org/2001/XMLSchema"
	xmlns:ex="http://example.com/ns/ex#"
	xmlns:prov="http://www.w3.org/ns/prov#">

	<prov:person prov:id="bob"/>

	<ex:label>outside-bundle-label</ex:label>

	<prov:activity prov:id="a1"/>

	<prov:bundle prov:id="bundle1">

		<prov:label>bundle1</prov:label>
		<ex:label>label-on-bundle-entity</ex:label>

		<prov:bundleElements>
			
			<ex:label>in-bundle-label</ex:label>
			
			<prov:entity prov:id="ex:report1">
				<prov:type xsi:type="xsd:QName">report</prov:type>
				<ex:version>1</ex:version>
			</prov:entity>

			<ex:version>1.0.0</ex:version>

			<prov:wasGeneratedBy>
				<prov:entity prov:ref="ex:report1"/>
				<prov:activity prov:ref="a1"/>
				<prov:time>2012-05-24T10:00:01</prov:time>
			</prov:wasGeneratedBy>
			
			<ex:content>foo</ex:content>
			
		</prov:bundleElements>

	</prov:bundle>

</prov:document>

I used elements from the namespace "ex" to show how non-PROV elements can be used within a bundle and as PROV attributes on the bundle entity.

--Stephan

On Feb 12, 2013, at 12:49 PM, Stephan Zednik <zednis@rpi.edu> wrote:

> Comments in-line, last two comments are the most important.
> 
> 
> On Feb 12, 2013, at 7:29 AM, Stian Soiland-Reyes <soiland-reyes@cs.manchester.ac.uk> wrote:
> 
>> On Tue, Feb 5, 2013 at 7:29 PM, Stephan Zednik <zednis@rpi.edu> wrote:
>>> This does not follow the pattern Stian suggested of updating Document so
>>> that bundles are required at the bottom of the document.
>>> 
>>> Stian, does this make sense?  Do you still prefer the other pattern you
>>> suggested in the earlier email?
>> 
>> Well to me it does not really matter if xs:any can appear anywhere in
>> <document> or just at the bottom of the <document> - but I think your
>> current solution means that you are allowed to put anything anywhere
>> in <document>, but in <bundle> you can only put the extensions after
>> <prov:value> but before  the documentelements, which is a bit odd.
>> 
>> It might be 'cleaner' to only allow extension stuff at the bottom, but
>> that could make it tricky for the bundle as it (now) specializes the
>> prov:Entity type and therefore the additional elements of Bundle come
>> below the <xs:any> from entity.
>> 
> 
> Yes, originally this worked because we had multiple xs:any in the prov:Bundle (inherited from both prov:Entity and prov:documentElements) but we violated the "unique particle attribution" rule which caused xjc to fail to generate java classes from the schema.
> 
> We changed the schema to work well with xjc but in doing so introduced the odd restriction you have noted.  I am still playing around with it to try to come up with a solution.
> 
>> 
>> 
>> 
>>> Also, I think that we put the abstract element after the choice in document
>>> Elements because it caused problems with schema validation, but I can double
>>> check on that and see if it can be included in the choice.
>> 
>> I know, those things can get tricky.. it's another problem with XSD
>> and its particle separation.
>> 
>> 
>> I tried some example of making an extension:
>> 
>> <https://dvcs.w3.org/hg/prov/file/0bb02b43e80b/xml/examples>
>> 
>> Here in <custom.xsd> I was *NOT* able to use
>> substitutionGroup="prov:abstractElement", because I get:
>> 
>> 		Can't include the substitutionGroup as it causes:
>> "http://www.w3.org/ns/prov#":abstractElement
>> 		and WC[##other:"http://www.w3.org/ns/prov#"] (or elements from their
>> substitution
>> 		group) violate "Unique Particle Attribution".
>> 
>> 
>> Basically this means that the only way to use the
>> substitutionGroup="prov:abstractElement" is to stay within the PROV
>> namespace.  This might not be obvious to someone looking at our
>> schema. So I'm having doubts now.
> 
> We can try to make this more clear in the Note.  The abstractElement is only to be intended to be used with substitionGroups that are in the PROV Namespace.
> 
>> 
>> 
>> However, the general extension mechanism through xsd:any do work well,
>> and can validate also my non-prov elements -<custom-example.xml>, even
>> when I inserted those elements inside <prov:document>.
>> 
>> 
>> In <with-extensions.xml> I tried reusing some schemas of the shelf,
>> XHTML, MathML and DC Terms.  This works fine thanks to xs:any as well.
>> I was even able to do nested inclusion reusing prov: elements, ie:
>> 
>> <prov:document>
>>  <mathml:annotation-xml>
>>    <prov:wasAttributedTo>
>>      <prov:entity prov:ref="formula"></prov:entity>
>>           <prov:agent prov:ref="fred"/>
>>           <dcterms:description>blalalla</dcterms:description>
>> <!-- ... -->
>> 
>> (Those internal prov: elements should probably in most cases NOT be
>> considered part of the <prov:document> !)
>> 
>> Now you can argue whether this would make sense or not, but that is
>> the downside of xsd:any - anything (in non-prov namespaces, in this
>> case) is allowed, not just content that should make sense by
>> declaration of substitution groups. The more xsd:any - the less you
>> have a schema and more you just have lots of fragmented types.
>> 
> 
> I think we are very limited in what we can say about how non-PROV extensions integrate with PROV.
> 
>> 
>> 
>> However I was unable to reuse namespaces like FOAF, because it does
>> not have an XSD schema. So sadly this is not allowed:
>> 
>> <prov:person prov:id="johndoe">
>>       <foaf:name>John Doe</foaf:name>
>> </prov:person>
>> 
>> I think this is too strict, and I suggest changing the xsd:any of
>> <prov:entity> and friends to processContent="lax" - this would only
>> validate against a schema if it's known.
> 
> 
>> 
>> We could rename prov:abstractElement to prov:internal or something to
>> make it less 'tempting' for external use.
>> 
> 
> I am ok with this.
> 
>> 
>> 
>> 
>> We could in theory get rid of the whole documentElements and use only xs:any:
>> 
>> 
>> <xs:element name="document" type="prov:Document" />
>> <xs:complexType name="Document">
>> 		<xs:choice maxOccurs="unbounded">
>> 			<xs:any namespace="##targetNamespace" processContents="strict" />
>> 			<xs:any namespace="##other" processContents="lax" />
>> 		</xs:choice>
>> </xs:complexType>
>> 
>> And then no substition groups is needed in our PROV extensions, any
>> declared <xs:element> would be allowed.
> 
> If I understand this correctly, this would allow PROV attribute elements to be used on the document.
> 
>> For consistency I've set
>> processContent=lax even for content of <prov:document> but we might
>> want to instead say that it should be strict, to encourage
>> PROV-extensions (rather than just providing attributes) to at least
>> declare a schema.
> 
> I agree that PROV extensions should declare a schema.
> 
>> 
>> 
>> This would mean you could also insert <prov:value> inside
>> <prov:document> and so we would have to ensure that only "proper"
>> elements are declared as named <xs:element>.  I tried changing them to
>> xs:group's and group refs which works fine.
>> 
>> 
>> 
>> The above is quite tricky to get to work inside a <prov:bundle>
>> because all its prov elements are optional, and we get a clash between
>> those and the optional xs:any in the prov namespace.
>> 
>> This is a bit odd anyway because <prov:bundle> plays a dual role with
>> both being a way to say an entity which is a bundle, but also just
>> lists its content flatly, and so we can't know if something listed is
>> part of the bundle or an attribute of the bundle - specially for
>> extensions.
>> 
>> Saying something is a bundle could also be done as:
>> 
>> <prov:entity>
>> <prov:type>prov:Bundle</prov:type>
>> </prov:entity>
>> 
>> (I am a  bit confused now, as the PROV-XML document says this is how
>> it should be done)
> 
> We made a change to the types some time ago which is reflected in the editors' draft.
> 
> https://dvcs.w3.org/hg/prov/raw-file/default/xml/prov-xml.html
> 
> Since Bundles are specializations of Entity prov:Bundle extends prov:Entity.
> 
>> 
>> 
>> .. but I know the XML schema has similar 'helpers' for types like
>> prov:Person and prov:Revision so let's assume we keep the
>> <prov:bundle> entity.
>> 
>> I then would propose changing the bundle to be:
>> 
>> <prov:bundle>
>> <prov:label>A bundle</prov:bundle>
>> <dcterms:description>Still not part of the bundle</dcterms:description>
>> <prov:provenanceDescriptions>
>>     <!-- the bundle content -->
>>     <prov:activity />
>>     <!-- .. -->
>> </prov:provenanceDescriptions>
>> </prov:bundle>
>> 
> 
> I like this.
> 
>> (We can argue about the name prov:provenanceDescriptions - I went for
>> something close to PROV-DM)
>> 
>> 
>> So this works fine:
>> 
>> <xs:complexType name="Bundle">
>> 	<xs:complexContent>
>> 		<xs:extension base="prov:Entity">
>> 			<xs:sequence>
>> 				<xs:element name="provenanceDescriptions" minOccurs="0">
>> 					<xs:complexType>
>> 						<xs:choice minOccurs="0" maxOccurs="unbounded">
>> 							<xs:any namespace="##targetNamespace" processContents="strict" />
>> 							<xs:any namespace="##other" processContents="lax" />
>> 						</xs:choice>
>> 					</xs:complexType>
>> 				</xs:element>
>> 			</xs:sequence>
>> 		</xs:extension>
>> 	</xs:complexContent>
>> </xs:complexType>
>> 
>> 
>> Now the xsd:any from prov:Entity does not cause any problems, except
>> that they have to be stated BEFORE <prov:provenanceDescriptions>. To
>> change this we would have to do a copy/paste from prov:Entity instead
>> and move the xsd:any down.
> 
> I am OK with this.
> 
> What does the group think?
> 
>> 
>> 
>> 
>> So it's possible, and not that unclean, to get rid of the substitution
>> groups, but it would allow non-PROV garbage (ie. schema elements which
>> were not intended as PROV extensions, like my MathML example above)
>> within <prov:document> and <prov:bundle>.
>> 
>> I don't know what is the groups thoughts on extensions we should allow
>> for those, but at least it would be consistent with what PROV-N allows
>> - and then perhaps any PROV-N document could be translatable to
>> PROV-XML even without knowing the extensions.
>> 
> 
> I am ok with the substitution groups as they are.  
> 
> If you can present a desirable use case that is disallowed by the current modeling with substitution groups and supported by an alternate modeling than I will consider it.  I don't want to make a late change without an example use case to consider.
> 
> --Stephan
> 
>> 
>> If you wish I can commit my version of the schemas which does the
>> above (but slightly tidied up), either to the tip or a new branch.
>> 
>> 
>> -- 
>> Stian Soiland-Reyes, myGrid team
>> School of Computer Science
>> The University of Manchester
>> 
> 
> 
> 

Received on Tuesday, 12 February 2013 20:57:42 UTC