Re: Multiple XML schema files for a common target namespace (PROV-ISSUE-608)

Comments in-line, last two comments are the most important.


On Feb 12, 2013, at 7:29 AM, Stian Soiland-Reyes <soiland-reyes@cs.manchester.ac.uk> wrote:

> On Tue, Feb 5, 2013 at 7:29 PM, Stephan Zednik <zednis@rpi.edu> wrote:
>> This does not follow the pattern Stian suggested of updating Document so
>> that bundles are required at the bottom of the document.
>> 
>> Stian, does this make sense?  Do you still prefer the other pattern you
>> suggested in the earlier email?
> 
> Well to me it does not really matter if xs:any can appear anywhere in
> <document> or just at the bottom of the <document> - but I think your
> current solution means that you are allowed to put anything anywhere
> in <document>, but in <bundle> you can only put the extensions after
> <prov:value> but before  the documentelements, which is a bit odd.
> 
> It might be 'cleaner' to only allow extension stuff at the bottom, but
> that could make it tricky for the bundle as it (now) specializes the
> prov:Entity type and therefore the additional elements of Bundle come
> below the <xs:any> from entity.
> 

Yes, originally this worked because we had multiple xs:any in the prov:Bundle (inherited from both prov:Entity and prov:documentElements) but we violated the "unique particle attribution" rule which caused xjc to fail to generate java classes from the schema.

We changed the schema to work well with xjc but in doing so introduced the odd restriction you have noted.  I am still playing around with it to try to come up with a solution.

> 
> 
> 
>> Also, I think that we put the abstract element after the choice in document
>> Elements because it caused problems with schema validation, but I can double
>> check on that and see if it can be included in the choice.
> 
> I know, those things can get tricky.. it's another problem with XSD
> and its particle separation.
> 
> 
> I tried some example of making an extension:
> 
>  <https://dvcs.w3.org/hg/prov/file/0bb02b43e80b/xml/examples>
> 
> Here in <custom.xsd> I was *NOT* able to use
> substitutionGroup="prov:abstractElement", because I get:
> 
> 		Can't include the substitutionGroup as it causes:
> "http://www.w3.org/ns/prov#":abstractElement
> 		and WC[##other:"http://www.w3.org/ns/prov#"] (or elements from their
> substitution
> 		group) violate "Unique Particle Attribution".
> 
> 
> Basically this means that the only way to use the
> substitutionGroup="prov:abstractElement" is to stay within the PROV
> namespace.  This might not be obvious to someone looking at our
> schema. So I'm having doubts now.

We can try to make this more clear in the Note.  The abstractElement is only to be intended to be used with substitionGroups that are in the PROV Namespace.

> 
> 
> However, the general extension mechanism through xsd:any do work well,
> and can validate also my non-prov elements -<custom-example.xml>, even
> when I inserted those elements inside <prov:document>.
> 
> 
> In <with-extensions.xml> I tried reusing some schemas of the shelf,
> XHTML, MathML and DC Terms.  This works fine thanks to xs:any as well.
> I was even able to do nested inclusion reusing prov: elements, ie:
> 
> <prov:document>
>   <mathml:annotation-xml>
>     <prov:wasAttributedTo>
>       <prov:entity prov:ref="formula"></prov:entity>
>            <prov:agent prov:ref="fred"/>
>            <dcterms:description>blalalla</dcterms:description>
> <!-- ... -->
> 
> (Those internal prov: elements should probably in most cases NOT be
> considered part of the <prov:document> !)
> 
> Now you can argue whether this would make sense or not, but that is
> the downside of xsd:any - anything (in non-prov namespaces, in this
> case) is allowed, not just content that should make sense by
> declaration of substitution groups. The more xsd:any - the less you
> have a schema and more you just have lots of fragmented types.
> 

I think we are very limited in what we can say about how non-PROV extensions integrate with PROV.

> 
> 
> However I was unable to reuse namespaces like FOAF, because it does
> not have an XSD schema. So sadly this is not allowed:
> 
>  <prov:person prov:id="johndoe">
>        <foaf:name>John Doe</foaf:name>
>  </prov:person>
> 
> I think this is too strict, and I suggest changing the xsd:any of
> <prov:entity> and friends to processContent="lax" - this would only
> validate against a schema if it's known.


> 
> We could rename prov:abstractElement to prov:internal or something to
> make it less 'tempting' for external use.
> 

I am ok with this.

> 
> 
> 
> We could in theory get rid of the whole documentElements and use only xs:any:
> 
> 
>  <xs:element name="document" type="prov:Document" />
>  <xs:complexType name="Document">
> 		<xs:choice maxOccurs="unbounded">
> 			<xs:any namespace="##targetNamespace" processContents="strict" />
> 			<xs:any namespace="##other" processContents="lax" />
> 		</xs:choice>
>  </xs:complexType>
> 
> And then no substition groups is needed in our PROV extensions, any
> declared <xs:element> would be allowed.

If I understand this correctly, this would allow PROV attribute elements to be used on the document.

> For consistency I've set
> processContent=lax even for content of <prov:document> but we might
> want to instead say that it should be strict, to encourage
> PROV-extensions (rather than just providing attributes) to at least
> declare a schema.

I agree that PROV extensions should declare a schema.

> 
> 
> This would mean you could also insert <prov:value> inside
> <prov:document> and so we would have to ensure that only "proper"
> elements are declared as named <xs:element>.  I tried changing them to
> xs:group's and group refs which works fine.
> 
> 
> 
> The above is quite tricky to get to work inside a <prov:bundle>
> because all its prov elements are optional, and we get a clash between
> those and the optional xs:any in the prov namespace.
> 
> This is a bit odd anyway because <prov:bundle> plays a dual role with
> both being a way to say an entity which is a bundle, but also just
> lists its content flatly, and so we can't know if something listed is
> part of the bundle or an attribute of the bundle - specially for
> extensions.
> 
> Saying something is a bundle could also be done as:
> 
> <prov:entity>
>  <prov:type>prov:Bundle</prov:type>
> </prov:entity>
> 
> (I am a  bit confused now, as the PROV-XML document says this is how
> it should be done)

We made a change to the types some time ago which is reflected in the editors' draft.

https://dvcs.w3.org/hg/prov/raw-file/default/xml/prov-xml.html

Since Bundles are specializations of Entity prov:Bundle extends prov:Entity.

> 
> 
> .. but I know the XML schema has similar 'helpers' for types like
> prov:Person and prov:Revision so let's assume we keep the
> <prov:bundle> entity.
> 
> I then would propose changing the bundle to be:
> 
> <prov:bundle>
>  <prov:label>A bundle</prov:bundle>
>  <dcterms:description>Still not part of the bundle</dcterms:description>
>  <prov:provenanceDescriptions>
>      <!-- the bundle content -->
>      <prov:activity />
>      <!-- .. -->
>  </prov:provenanceDescriptions>
> </prov:bundle>
> 

I like this.

> (We can argue about the name prov:provenanceDescriptions - I went for
> something close to PROV-DM)
> 
> 
> So this works fine:
> 
>  <xs:complexType name="Bundle">
> 	<xs:complexContent>
> 		<xs:extension base="prov:Entity">
> 			<xs:sequence>
> 				<xs:element name="provenanceDescriptions" minOccurs="0">
> 					<xs:complexType>
> 						<xs:choice minOccurs="0" maxOccurs="unbounded">
> 							<xs:any namespace="##targetNamespace" processContents="strict" />
> 							<xs:any namespace="##other" processContents="lax" />
> 						</xs:choice>
> 					</xs:complexType>
> 				</xs:element>
> 			</xs:sequence>
> 		</xs:extension>
> 	</xs:complexContent>
> </xs:complexType>
> 
> 
> Now the xsd:any from prov:Entity does not cause any problems, except
> that they have to be stated BEFORE <prov:provenanceDescriptions>. To
> change this we would have to do a copy/paste from prov:Entity instead
> and move the xsd:any down.

I am OK with this.

What does the group think?

> 
> 
> 
> So it's possible, and not that unclean, to get rid of the substitution
> groups, but it would allow non-PROV garbage (ie. schema elements which
> were not intended as PROV extensions, like my MathML example above)
> within <prov:document> and <prov:bundle>.
> 
> I don't know what is the groups thoughts on extensions we should allow
> for those, but at least it would be consistent with what PROV-N allows
> - and then perhaps any PROV-N document could be translatable to
> PROV-XML even without knowing the extensions.
> 

I am ok with the substitution groups as they are.  

If you can present a desirable use case that is disallowed by the current modeling with substitution groups and supported by an alternate modeling than I will consider it.  I don't want to make a late change without an example use case to consider.

--Stephan

> 
> If you wish I can commit my version of the schemas which does the
> above (but slightly tidied up), either to the tip or a new branch.
> 
> 
> -- 
> Stian Soiland-Reyes, myGrid team
> School of Computer Science
> The University of Manchester
> 

Received on Tuesday, 12 February 2013 19:50:17 UTC