Re: Multiple XML schema files for a common target namespace (PROV-ISSUE-608)

Hi Stephan,


On 12/02/13 22:40, Stephan Zednik wrote:
>
> On Feb 12, 2013, at 3:14 PM, Luc Moreau <l.moreau@ecs.soton.ac.uk 
> <mailto:l.moreau@ecs.soton.ac.uk>> wrote:
>
>> Hi Stephan,
>>
>> Thanks for the explanation on lax. Yes this seems reasonable.
>>
>> In your new propose schema, the bundleElements element correspond to 
>> the bundle construct
>> in prov-n.  The difference is that bundleElements are allowed inside 
>> entity, whereas the prov-n
>> bundle construct is only allowed at the toplevel of a document.
>>
>> One strong requirement of part of the WG membership was to avoid 
>> nesting of bundles.
>> With this, you have introduced nesting of bundles.
>> An entity containing a bundleElements occurring inside another 
>> bundleElements.
>
> Is this requirement in the DM?  Is this requirement define outside of 
> the recommendation documents?  On the wiki perhaps?
>

First sentence in
http://www.w3.org/TR/2012/CR-prov-n-20121211/#component4
is

Bundles cannot be nested because a bundle is not an expression, and 
therefore cannot occur inside another bundle.


In prov-dm,  given the definition of entity:
http://www.w3.org/TR/2012/CR-prov-dm-20121211/#term-entity
I don't see where provenance descriptions contained in a bundle can occur.


>>
>> I think it's a significant departure from the dm.
>>
>> Also, personally, I find it useful to be able to return bundles, as a 
>> response to a provenance query.
>
> Is there any guarantee that the bundle entity will be a part of the 
> returned bundle?
>

What do you mean?

> This is how just a bundle would look as PROV-XML:
>
> <prov:document
> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
> xmlns:xsd="http://www.w3.org/2001/XMLSchema"
> xmlns:ex="http://example.com/ns/ex#"
> xmlns:prov="http://www.w3.org/ns/prov#">
>
> <prov:bundle prov:id="bundle1">
> <prov:label>bundle1</prov:label>
> <ex:label>label-on-bundle-entity</ex:label>
> <prov:bundleElements>
> <ex:label>in-bundle-label</ex:label>
> <prov:entity prov:id="ex:report1">
> <prov:type xsi:type="xsd:QName">report</prov:type>
> <ex:version>1</ex:version>
> </prov:entity>
> <ex:version>1.0.0</ex:version>
> <prov:wasGeneratedBy>
> <prov:entity prov:ref="ex:report1"/>
> <prov:activity prov:ref="a1"/>
> <prov:time>2012-05-24T10:00:01</prov:time>
> </prov:wasGeneratedBy>
> <ex:content>foo</ex:content>
> </prov:bundleElements>
> </prov:bundle>
> </prov:document>

Remember this is just one syntax.

It is still possible to write
<prov:entity prov:id="bundle1":
   < prov:type>prov:Bundle</prov:type>
   ...
</prov:entity>

And obviously, all the variants outside prov namespace.

>
> I think it would be easy enough to construct a bundle as a response to 
> a provenance query.
>
>> With the proposed schema change, they would now be nested inside an 
>> entity. Why this extra level of
>> nesting?
>
> The schema previously had what the DM calls the bundleConstructor as 
> an implicit child of a bundle entity so this issue has been present 
> with PROV-XML bundle representation for some time.
>
I don't think so.
The xml schema was aligned to the prov-n grammar, with bundle allowed 
inside document only.

> This is the natural way to model bundles in XML, but it does introduce 
> the possibility of nesting bundles.  The nesting issue could be 
> corrected if we remove prov:bundle from documentElements and add it to 
> the sequence in prov:Document.  Then bundles would not be nestable, 
> but you would also not be able to define a bundle entity inside a bundle.
>
> The current modeling makes a bundle entity outside the scope of the 
> bundle container.  If this is wrong and we always want the bundle 
> entity to be defined within the scope of the bundle entity then we 
> should use the modeling you suggest of defining a 
> prov:bundleConstructor element which is a member of the prov:Document 
> sequence but not the documentElements sequence.
>

I have use cases where the bundle entity is outside the bundle, and 
others where it is inside.



> We should probably pick a scope for the bundle entity to provide 
> direction.  Is the bundle entity inside or outside the 
> bundleConstructor (it's probably too late to ask for a rename to 
> bundleContainer, correct?)

The schema should not make that decision and should let asserters decide 
where they want the bundle entity.

Luc

>
> --Stephan
>
>>
>> So given the above, I am not supportive of the change.
>>
>> Luc
>>
>> On 12/02/13 21:54, Stephan Zednik wrote:
>>>
>>> On Feb 12, 2013, at 2:09 PM, Luc Moreau <l.moreau@ecs.soton.ac.uk 
>>> <mailto:l.moreau@ecs.soton.ac.uk>> wrote:
>>>
>>>> Hi Stephan,
>>>>
>>>> Response interleaved.
>>>>
>>>> On 12/02/13 20:57, Stephan Zednik wrote:
>>>>> A summary of the possible changes based on this discussion.  I am 
>>>>> in favor of all three listed changes.
>>>>>
>>>>> 1) rename prov:abstractElement to prov:internalElement (or 
>>>>> similar) to make it clear we do not expect non-PROV extensions to 
>>>>> use this element.
>>>>
>>>> It's good.
>>>>> 2) add processContents="lax" on all xs:any elements.
>>>> What was the problem with the current definition, what does this 
>>>> allow us to do?
>>>
>>> If a non-PROV namespace does not have a corresponding schema then 
>>> the document will fail to validate.
>>>
>>> processContents 	Optional. Specifies how the XML processor should 
>>> handle validation against the elements specified by this any 
>>> element. Can be set to one of the following:
>>>
>>>   * strict - the XML processor must obtain the schema for the
>>>     required namespaces and validate the elements (this is default)
>>>   * lax - same as strict but; if the schema cannot be obtained, no
>>>     errors will occur
>>>   * skip - The XML processor does not attempt to validate any
>>>     elements from the specified namespaces
>>>
>>>
>>>
>>> This loosens our validation requirements for non-PROV elements.
>>>
>>> Stian's use case example was to use some FOAF elements but 
>>> validation failed because he had not specified a FOAF schema.
>>>
>>>>
>>>>> 3) change the definition of prov:Bundle to the following 
>>>>> (bundleElements name is not final)
>>>>>
>>>>>   <xs:complexType name="Bundle">
>>>>>     <xs:complexContent>
>>>>>       <xs:extension base="prov:Entity">
>>>>>         <xs:sequence>
>>>>>           <xs:element name="bundleElements" minOccurs="0">
>>>>>             <xs:complexType>
>>>>>               <xs:sequence maxOccurs="unbounded">
>>>>>                 <xs:group ref="prov:documentElements"/>
>>>>>                 <xs:any namespace="##other" processContents="lax" 
>>>>> minOccurs="0" maxOccurs="unbounded"/>
>>>>>               </xs:sequence>
>>>>>             </xs:complexType>
>>>>>           </xs:element>
>>>>>         </xs:sequence>
>>>>>       </xs:extension>
>>>>>     </xs:complexContent>
>>>>>   </xs:complexType>
>>>>
>>>> To me, this does not correspond to prov-dm.
>>>> I regard the bundle construct as distinct from the entity construct.
>>>
>>> Well, a Bundle is an Entity so the Bundle complexType extending the 
>>> Entity complexType is good.
>>>
>>> How then to have what the PROV-DM calls the 'bundle constructor'?
>>>
>>> I think of the prov:bundleElements as the bundle constructor and I 
>>> believe that it corresponds to PROV-DM.
>>>
>>> An alternative option would be to make a new element 
>>> prov:bundleConstructor and put it in the documentElements sequence. 
>>>  This may be more like PROV-N, but is less like XML.
>>>
>>> The PROV-DM does not specify a serialization or syntax so a 
>>> XML-native approach should be ok.  I think having the bundle 
>>> constructor as an XML element of a Bundle makes sense in XML.
>>>
>>> --Stephan
>>>
>>>>
>>>>
>>>> Luc
>>>>
>>>>> With the updated Bundle complexType the PROV-XML serialization for 
>>>>> a bundle would look like this
>>>>>
>>>>> <prov:document
>>>>> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
>>>>> xmlns:xsd="http://www.w3.org/2001/XMLSchema"
>>>>> xmlns:ex="http://example.com/ns/ex#"
>>>>> xmlns:prov="http://www.w3.org/ns/prov#">
>>>>>
>>>>> <prov:person prov:id="bob"/>
>>>>>
>>>>> <ex:label>outside-bundle-label</ex:label>
>>>>>
>>>>> <prov:activity prov:id="a1"/>
>>>>>
>>>>> <prov:bundle prov:id="bundle1">
>>>>>
>>>>> <prov:label>bundle1</prov:label>
>>>>> <ex:label>label-on-bundle-entity</ex:label>
>>>>>
>>>>> <prov:bundleElements>
>>>>>
>>>>> <ex:label>in-bundle-label</ex:label>
>>>>>
>>>>> <prov:entity prov:id="ex:report1">
>>>>> <prov:type xsi:type="xsd:QName">report</prov:type>
>>>>> <ex:version>1</ex:version>
>>>>> </prov:entity>
>>>>>
>>>>> <ex:version>1.0.0</ex:version>
>>>>>
>>>>> <prov:wasGeneratedBy>
>>>>> <prov:entity prov:ref="ex:report1"/>
>>>>> <prov:activity prov:ref="a1"/>
>>>>> <prov:time>2012-05-24T10:00:01</prov:time>
>>>>> </prov:wasGeneratedBy>
>>>>>
>>>>> <ex:content>foo</ex:content>
>>>>>
>>>>> </prov:bundleElements>
>>>>>
>>>>> </prov:bundle>
>>>>>
>>>>> </prov:document>
>>>>>
>>>>> I used elements from the namespace "ex" to show how non-PROV 
>>>>> elements can be used within a bundle and as PROV attributes on the 
>>>>> bundle entity.
>>>>>
>>>>> --Stephan
>>>>>
>>>>> On Feb 12, 2013, at 12:49 PM, Stephan Zednik <zednis@rpi.edu 
>>>>> <mailto:zednis@rpi.edu>> wrote:
>>>>>
>>>>>> Comments in-line, last two comments are the most important.
>>>>>>
>>>>>>
>>>>>> On Feb 12, 2013, at 7:29 AM, Stian Soiland-Reyes 
>>>>>> <soiland-reyes@cs.manchester.ac.uk 
>>>>>> <mailto:soiland-reyes@cs.manchester.ac.uk>> wrote:
>>>>>>
>>>>>>> On Tue, Feb 5, 2013 at 7:29 PM, Stephan Zednik <zednis@rpi.edu 
>>>>>>> <mailto:zednis@rpi.edu>> wrote:
>>>>>>>> This does not follow the pattern Stian suggested of updating 
>>>>>>>> Document so
>>>>>>>> that bundles are required at the bottom of the document.
>>>>>>>>
>>>>>>>> Stian, does this make sense?  Do you still prefer the other 
>>>>>>>> pattern you
>>>>>>>> suggested in the earlier email?
>>>>>>> Well to me it does not really matter if xs:any can appear 
>>>>>>> anywhere in
>>>>>>> <document> or just at the bottom of the <document> - but I think 
>>>>>>> your
>>>>>>> current solution means that you are allowed to put anything anywhere
>>>>>>> in <document>, but in <bundle> you can only put the extensions after
>>>>>>> <prov:value> but before  the documentelements, which is a bit odd.
>>>>>>>
>>>>>>> It might be 'cleaner' to only allow extension stuff at the 
>>>>>>> bottom, but
>>>>>>> that could make it tricky for the bundle as it (now) specializes the
>>>>>>> prov:Entity type and therefore the additional elements of Bundle 
>>>>>>> come
>>>>>>> below the <xs:any> from entity.
>>>>>>>
>>>>>> Yes, originally this worked because we had multiple xs:any in the 
>>>>>> prov:Bundle (inherited from both prov:Entity and 
>>>>>> prov:documentElements) but we violated the "unique particle 
>>>>>> attribution" rule which caused xjc to fail to generate java 
>>>>>> classes from the schema.
>>>>>>
>>>>>> We changed the schema to work well with xjc but in doing so 
>>>>>> introduced the odd restriction you have noted.  I am still 
>>>>>> playing around with it to try to come up with a solution.
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> Also, I think that we put the abstract element after the choice 
>>>>>>>> in document
>>>>>>>> Elements because it caused problems with schema validation, but 
>>>>>>>> I can double
>>>>>>>> check on that and see if it can be included in the choice.
>>>>>>> I know, those things can get tricky.. it's another problem with XSD
>>>>>>> and its particle separation.
>>>>>>>
>>>>>>>
>>>>>>> I tried some example of making an extension:
>>>>>>>
>>>>>>> <https://dvcs.w3.org/hg/prov/file/0bb02b43e80b/xml/examples>
>>>>>>>
>>>>>>> Here in <custom.xsd> I was *NOT* able to use
>>>>>>> substitutionGroup="prov:abstractElement", because I get:
>>>>>>>
>>>>>>> Can't include the substitutionGroup as it causes:
>>>>>>> "http://www.w3.org/ns/prov#":abstractElement
>>>>>>> and WC[##other:"http://www.w3.org/ns/prov#"] (or elements from their
>>>>>>> substitution
>>>>>>> group) violate "Unique Particle Attribution".
>>>>>>>
>>>>>>>
>>>>>>> Basically this means that the only way to use the
>>>>>>> substitutionGroup="prov:abstractElement" is to stay within the PROV
>>>>>>> namespace.  This might not be obvious to someone looking at our
>>>>>>> schema. So I'm having doubts now.
>>>>>> We can try to make this more clear in the Note.  The 
>>>>>> abstractElement is only to be intended to be used with 
>>>>>> substitionGroups that are in the PROV Namespace.
>>>>>>
>>>>>>>
>>>>>>> However, the general extension mechanism through xsd:any do work 
>>>>>>> well,
>>>>>>> and can validate also my non-prov elements 
>>>>>>> -<custom-example.xml>, even
>>>>>>> when I inserted those elements inside <prov:document>.
>>>>>>>
>>>>>>>
>>>>>>> In <with-extensions.xml> I tried reusing some schemas of the shelf,
>>>>>>> XHTML, MathML and DC Terms.  This works fine thanks to xs:any as 
>>>>>>> well.
>>>>>>> I was even able to do nested inclusion reusing prov: elements, ie:
>>>>>>>
>>>>>>> <prov:document>
>>>>>>>  <mathml:annotation-xml>
>>>>>>>    <prov:wasAttributedTo>
>>>>>>>      <prov:entity prov:ref="formula"></prov:entity>
>>>>>>>           <prov:agent prov:ref="fred"/>
>>>>>>>           <dcterms:description>blalalla</dcterms:description>
>>>>>>> <!-- ... -->
>>>>>>>
>>>>>>> (Those internal prov: elements should probably in most cases NOT be
>>>>>>> considered part of the <prov:document> !)
>>>>>>>
>>>>>>> Now you can argue whether this would make sense or not, but that is
>>>>>>> the downside of xsd:any - anything (in non-prov namespaces, in this
>>>>>>> case) is allowed, not just content that should make sense by
>>>>>>> declaration of substitution groups. The more xsd:any - the less you
>>>>>>> have a schema and more you just have lots of fragmented types.
>>>>>>>
>>>>>> I think we are very limited in what we can say about how non-PROV 
>>>>>> extensions integrate with PROV.
>>>>>>
>>>>>>>
>>>>>>> However I was unable to reuse namespaces like FOAF, because it does
>>>>>>> not have an XSD schema. So sadly this is not allowed:
>>>>>>>
>>>>>>> <prov:person prov:id="johndoe">
>>>>>>>       <foaf:name>John Doe</foaf:name>
>>>>>>> </prov:person>
>>>>>>>
>>>>>>> I think this is too strict, and I suggest changing the xsd:any of
>>>>>>> <prov:entity> and friends to processContent="lax" - this would only
>>>>>>> validate against a schema if it's known.
>>>>>>
>>>>>>> We could rename prov:abstractElement to prov:internal or 
>>>>>>> something to
>>>>>>> make it less 'tempting' for external use.
>>>>>>>
>>>>>> I am ok with this.
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> We could in theory get rid of the whole documentElements and use 
>>>>>>> only xs:any:
>>>>>>>
>>>>>>>
>>>>>>> <xs:element name="document" type="prov:Document" />
>>>>>>> <xs:complexType name="Document">
>>>>>>> <xs:choice maxOccurs="unbounded">
>>>>>>> <xs:any namespace="##targetNamespace" processContents="strict" />
>>>>>>> <xs:any namespace="##other" processContents="lax" />
>>>>>>> </xs:choice>
>>>>>>> </xs:complexType>
>>>>>>>
>>>>>>> And then no substition groups is needed in our PROV extensions, any
>>>>>>> declared <xs:element> would be allowed.
>>>>>> If I understand this correctly, this would allow PROV attribute 
>>>>>> elements to be used on the document.
>>>>>>
>>>>>>> For consistency I've set
>>>>>>> processContent=lax even for content of <prov:document> but we might
>>>>>>> want to instead say that it should be strict, to encourage
>>>>>>> PROV-extensions (rather than just providing attributes) to at least
>>>>>>> declare a schema.
>>>>>> I agree that PROV extensions should declare a schema.
>>>>>>
>>>>>>>
>>>>>>> This would mean you could also insert <prov:value> inside
>>>>>>> <prov:document> and so we would have to ensure that only "proper"
>>>>>>> elements are declared as named <xs:element>.  I tried changing 
>>>>>>> them to
>>>>>>> xs:group's and group refs which works fine.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> The above is quite tricky to get to work inside a <prov:bundle>
>>>>>>> because all its prov elements are optional, and we get a clash 
>>>>>>> between
>>>>>>> those and the optional xs:any in the prov namespace.
>>>>>>>
>>>>>>> This is a bit odd anyway because <prov:bundle> plays a dual role 
>>>>>>> with
>>>>>>> both being a way to say an entity which is a bundle, but also just
>>>>>>> lists its content flatly, and so we can't know if something 
>>>>>>> listed is
>>>>>>> part of the bundle or an attribute of the bundle - specially for
>>>>>>> extensions.
>>>>>>>
>>>>>>> Saying something is a bundle could also be done as:
>>>>>>>
>>>>>>> <prov:entity>
>>>>>>> <prov:type>prov:Bundle</prov:type>
>>>>>>> </prov:entity>
>>>>>>>
>>>>>>> (I am a  bit confused now, as the PROV-XML document says this is how
>>>>>>> it should be done)
>>>>>> We made a change to the types some time ago which is reflected in 
>>>>>> the editors' draft.
>>>>>>
>>>>>> https://dvcs.w3.org/hg/prov/raw-file/default/xml/prov-xml.html
>>>>>>
>>>>>> Since Bundles are specializations of Entity prov:Bundle extends 
>>>>>> prov:Entity.
>>>>>>
>>>>>>>
>>>>>>> .. but I know the XML schema has similar 'helpers' for types like
>>>>>>> prov:Person and prov:Revision so let's assume we keep the
>>>>>>> <prov:bundle> entity.
>>>>>>>
>>>>>>> I then would propose changing the bundle to be:
>>>>>>>
>>>>>>> <prov:bundle>
>>>>>>> <prov:label>A bundle</prov:bundle>
>>>>>>> <dcterms:description>Still not part of the 
>>>>>>> bundle</dcterms:description>
>>>>>>> <prov:provenanceDescriptions>
>>>>>>>     <!-- the bundle content -->
>>>>>>>     <prov:activity />
>>>>>>>     <!-- .. -->
>>>>>>> </prov:provenanceDescriptions>
>>>>>>> </prov:bundle>
>>>>>>>
>>>>>> I like this.
>>>>>>
>>>>>>> (We can argue about the name prov:provenanceDescriptions - I 
>>>>>>> went for
>>>>>>> something close to PROV-DM)
>>>>>>>
>>>>>>>
>>>>>>> So this works fine:
>>>>>>>
>>>>>>> <xs:complexType name="Bundle">
>>>>>>> <xs:complexContent>
>>>>>>> <xs:extension base="prov:Entity">
>>>>>>> <xs:sequence>
>>>>>>> <xs:element name="provenanceDescriptions" minOccurs="0">
>>>>>>> <xs:complexType>
>>>>>>> <xs:choice minOccurs="0" maxOccurs="unbounded">
>>>>>>> <xs:any namespace="##targetNamespace" processContents="strict" />
>>>>>>> <xs:any namespace="##other" processContents="lax" />
>>>>>>> </xs:choice>
>>>>>>> </xs:complexType>
>>>>>>> </xs:element>
>>>>>>> </xs:sequence>
>>>>>>> </xs:extension>
>>>>>>> </xs:complexContent>
>>>>>>> </xs:complexType>
>>>>>>>
>>>>>>>
>>>>>>> Now the xsd:any from prov:Entity does not cause any problems, except
>>>>>>> that they have to be stated BEFORE <prov:provenanceDescriptions>. To
>>>>>>> change this we would have to do a copy/paste from prov:Entity 
>>>>>>> instead
>>>>>>> and move the xsd:any down.
>>>>>> I am OK with this.
>>>>>>
>>>>>> What does the group think?
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> So it's possible, and not that unclean, to get rid of the 
>>>>>>> substitution
>>>>>>> groups, but it would allow non-PROV garbage (ie. schema elements 
>>>>>>> which
>>>>>>> were not intended as PROV extensions, like my MathML example above)
>>>>>>> within <prov:document> and <prov:bundle>.
>>>>>>>
>>>>>>> I don't know what is the groups thoughts on extensions we should 
>>>>>>> allow
>>>>>>> for those, but at least it would be consistent with what PROV-N 
>>>>>>> allows
>>>>>>> - and then perhaps any PROV-N document could be translatable to
>>>>>>> PROV-XML even without knowing the extensions.
>>>>>>>
>>>>>> I am ok with the substitution groups as they are.
>>>>>>
>>>>>> If you can present a desirable use case that is disallowed by the 
>>>>>> current modeling with substitution groups and supported by an 
>>>>>> alternate modeling than I will consider it.  I don't want to make 
>>>>>> a late change without an example use case to consider.
>>>>>>
>>>>>> --Stephan
>>>>>>
>>>>>>> If you wish I can commit my version of the schemas which does the
>>>>>>> above (but slightly tidied up), either to the tip or a new branch.
>>>>>>>
>>>>>>>
>>>>>>> -- 
>>>>>>> Stian Soiland-Reyes, myGrid team
>>>>>>> School of Computer Science
>>>>>>> The University of Manchester
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>> -- 
>>>> Professor Luc Moreau
>>>> Electronics and Computer Science   tel:   +44 23 8059 4487
>>>> University of Southampton          fax:   +44 23 8059 2865
>>>> Southampton SO17 1BJ               email: l.moreau@ecs.soton.ac.uk 
>>>> <mailto:l.moreau@ecs.soton.ac.uk>
>>>> United Kingdom http://www.ecs.soton.ac.uk/~lavm 
>>>> <http://www.ecs.soton.ac.uk/%7Elavm>
>>>>
>>>>
>>>>
>>>
>>
>> -- 
>> Professor Luc Moreau
>> Electronics and Computer Science   tel:   +44 23 8059 4487
>> University of Southampton          fax:   +44 23 8059 2865
>> Southampton SO17 1BJ               email:l.moreau@ecs.soton.ac.uk
>> United Kingdomhttp://www.ecs.soton.ac.uk/~lavm
>>
>

-- 
Professor Luc Moreau
Electronics and Computer Science   tel:   +44 23 8059 4487
University of Southampton          fax:   +44 23 8059 2865
Southampton SO17 1BJ               email: l.moreau@ecs.soton.ac.uk
United Kingdom                     http://www.ecs.soton.ac.uk/~lavm

Received on Tuesday, 12 February 2013 23:03:24 UTC