This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 13935 - xsd 1.1 assertions testing comment nodes
Summary: xsd 1.1 assertions testing comment nodes
Status: CLOSED FIXED
Alias: None
Product: XML Schema
Classification: Unclassified
Component: Structures: XSD Part 1 (show other bugs)
Version: 1.1 only
Hardware: All All
: P2 normal
Target Milestone: PR
Assignee: David Ezell
QA Contact: XML Schema comments list
URL:
Whiteboard:
Keywords: resolved
Depends on:
Blocks:
 
Reported: 2011-08-28 16:44 UTC by Mukul Gandhi
Modified: 2012-01-07 01:07 UTC (History)
5 users (show)

See Also:


Attachments

Description Mukul Gandhi 2011-08-28 16:44:39 UTC
The Saxon test assert023.n1.xml (in the "assert" test set), specifies the following requirement,

XML instance document:

<temp x="204"><!--comments not allowed--></temp>

XSD 1.1 schema:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified" attributeFormDefault="unqualified">
  <xs:element name="temp">
    <xs:complexType>
       <xs:sequence/>     
       <xs:attribute name="x" use="required"/>
       <xs:attribute name="y" use="optional"/>      
       <xs:assert test="empty(.//comment())"/>
    </xs:complexType>
  </xs:element>
</xs:schema>

and says that the expected validity of this test is "invalid". It seems that expected validity of this test should be "valid". I don't think that, the XSD (1.0 as well as 1.1) language allows validating XML comments [1]. With this [1] in mind, the approach Xerces follows in this regard is, that it doesn't ever include XPath comment nodes in the XDM tree that it constructs. Therefore an assert like above would also be true for Xerces.

If you agree with this bug report, would you kindly change the status of this test to "valid". Or let us know, that how would you justify the correctness of this test?

Thanks.
Comment 1 Mukul Gandhi 2011-08-28 20:04:46 UTC
(In reply to comment #0)
> Therefore an assert like above would also be true for Xerces.

Sorry for the typo above. This should have been read as below,

"Therefore an assert like above would always evaluate to true for Xerces." [1]

This (ref, [1]) would happen for Xerces since, the XDM tree for <assert> components never having comment nodes cannot produce a 'true' result by evaluation of XPath expressions, that would attempt to check for existence of comment nodes in the <assert> XDM trees.

Would be great if SAXONICA or the XML Schema WG may comment on this issue. 

Thanks.
Comment 2 Sandy Gao 2011-10-07 13:26:40 UTC
I'm guessing the reference [1] should be

http://www.w3.org/TR/xmlschema11-1/#sec-wsnormalization
Comment 3 Michael Kay 2011-10-07 14:20:59 UTC
I think the spec needs clarification. The sentence you cite certainly indicates that comments and PIs should be ignored; however the rules in 3.13.4.1 for constructing an XDM instance to use for testing assertions has the consequence that they are not ignored for the purpose of assertion testing. I had assumed this was a feature.
Comment 4 David Ezell 2011-10-07 15:44:26 UTC
Change section 3.1.4 to delete or rephrase second paragraph, in such a way so as to allow XPath expressions to address PIs and comments.
Comment 5 Mukul Gandhi 2011-10-07 15:58:35 UTC
(In reply to comment #4)
> Change section 3.1.4 to delete or rephrase second paragraph, in such a way so
> as to allow XPath expressions to address PIs and comments.

I think, it would be nice that having PIs and comments in the XDM instance for assertions could be made implementation defined. i.e the spec may allow implementations not to have (and vice-versa, i.e to have) PIs and comment in the XDM instance for assertions.

And we could change this particular test case, to introduce a new version token like following:

<expected validity="invalid" version="with-comments-and-PI"/>
<expected validity="valid" version="without-comments-and-PI"/>

(similar to the version tokens introduced for CTA like,
<expected validity="valid" version="full-xpath-in-CTA"/>
<expected validity="invalid" version="restricted-xpath-in-CTA"/>
)

Thanks.
Comment 6 David Ezell 2011-10-21 15:47:38 UTC
The WG notes that making something implementation defined requires strong justification for so doing, and the WG would really rather not go there at this time.
Comment 7 Pete Cordell 2011-10-25 09:01:32 UTC
For our data binding tool comments and PIs are not relevant so we discard them ASAP when parsing.  What is the use-case that requires us to re-architect our system to pull comments and PIs through the layers so that people can assert that comments and PIs are not present?
Comment 8 Michael Kay 2011-10-25 14:13:02 UTC
The use case for allowing assertions to examine comments and PIs is that schemas are often there to protect applications from receiving data that they can't handle, which might include comments and PIs. For example, an application checking XML documents before they are put on a public web server might well want to check that the document does, or does not, contain an xml-stylesheet processing instruction. Equally, before sending a document to a phototypesetter you might want to check that it does or does not contain a processing instruction defining the page size.

Another use case: comments and PIs can be used as a covert channel, or more likely, as a lazy way of adding information to a data flow without changing the schema. You might want a validation process to prevent this happening.

So there are definitely cases for allowing it. 

I would say, if you want to strip comments and PIs before validating, that's fine. But in that case they should not be present either in the validated document (PSVI) or in the infoset presented for validation. Currently if the infoset contains comments and PIs, they will still be there in the PSVI, and if that's the case then I think they should be visible to assertions.
Comment 9 Pete Cordell 2011-10-25 19:32:50 UTC
> The use case for allowing assertions to examine comments and PIs is that
> schemas are often there to protect applications from receiving data that they
> can't handle, which might include comments and PIs. For example, an application
> checking XML documents before they are put on a public web server might well
> want to check that the document does, or does not, contain an xml-stylesheet
> processing instruction. Equally, before sending a document to a phototypesetter
> you might want to check that it does or does not contain a processing
> instruction defining the page size.

I have _some_ sympathy for this, although I don't feel schema validation is the only way to achieve this result if it's important to you(/one).

> Another use case: comments and PIs can be used as a covert channel, or more
> likely, as a lazy way of adding information to a data flow without changing the
> schema. You might want a validation process to prevent this happening.

I have very little sympathy for this.  People who attempt this sort of thing should suffer all they deserve!  Further, the flip side of acknowledging that you can prevent comments that contain covert data, is that you're implicitly opening up the can of worms that says you can validate that your comments have valid covert data!

> I would say, if you want to strip comments and PIs before validating, that's
> fine. But in that case they should not be present either in the validated
> document (PSVI) or in the infoset presented for validation. Currently if the
> infoset contains comments and PIs, they will still be there in the PSVI, and if
> that's the case then I think they should be visible to assertions.

Sounds reasonable to me.  But how you formulate that in suitable words for the spec, or test cases I don't know.
Comment 10 Mukul Gandhi 2011-11-01 04:06:27 UTC
(In reply to comment #6)
> The WG notes that making something implementation defined requires strong
> justification for so doing, and the WG would really rather not go there at this
> time.

I've few new thoughts about this below,

I'm in favor of both view points in this regard (infact I see pros and cons of both these options). i.e allowing comments and PIs to be visible to assertions, and also if the comments and PIs are not visible to assertions.

According to me, the advantage of allowing comments and PIs to be visible to assertions is providing the only facility in XSD 1.1 language to allows XSD schema authors to use comments and PIs for XML validation purposes.

The cons of allowing comments and PIs for validation purposes may have following disadvantages,
- in typical real world XML documents, comments are provided in XML documents in large quantity. If we allow comments to affect XML instance validity, then 
it could make the XSD validator inefficient (all comments will be pushed to the validator layer, and the validator has to consider them. consider some of huge comments like HTML fragments and so on -- the cost of processing these by the validator may be high).
- The current overall philosophy of XSD is to ignore comments and PIs for all validation purposes. If we now allow comments and PIs to be visible to assertions, then they may validate or invalidate an element or attribute, and this would conflict with the current high level approach taken by XSD language in this regard (i.e comments and PIs would not affect validity of XML documents). I believe, if we allow comments and PIs for assertions, then this must be consistent with the overall current XSD philosophy (perhaps a modified view of the WG, that comments and PIs may affect the validity of XML documents, and the WG in future versions of the spec may allow comments and PIs as first class constructs for validation purposes).

I can imagine another solution to be able to consider comments and PIs for validation purposes. i.e use comments and PIs in a non XSD layer for validation 
purposes. i.e one could write a custom validation pipeline that allows "non XSD & comments/PI" aware processing which then moves to the XSD layer (or perhaps in another direction). The combined result of evaluating this validation pipeline may be the effective validation outcome for the application.

Since Saxon allows using comments and PIs for assertion processing (I don't suggest to prohibit implementations to use comments and PIs for validation 
purposes), but some of other implementations currently don't, I see a possible case of allowing using this feature as implementation defined. As I wrote 
earlier above, the performance costs of using comments (majorly) and PIs should allow implementations to not provide this functionality, or provide it via an 
option.

But I would be happy with the decision that WG would take about this point.


Thanks.
Comment 11 Michael Kay 2011-11-01 08:35:18 UTC
Perhaps an acceptable way forward would be:

By default, comments and processing instructions are excluded from the XDM instance presented to the XPath assertion (and text nodes that thereby become adjacent are merged). Processors MAY provide a user option to retain comments and processing instruction so that assertions can examine their contents or test for their presence or absence.

Note: I'm not swayed by arguments that different implementors have interpreted the draft spec differently. We need to do what is right for users.
Comment 12 Pete Cordell 2011-11-01 10:40:37 UTC
(In reply to comment #11)
> Perhaps an acceptable way forward would be:
> 
> By default, comments and processing instructions are excluded from the XDM
> instance presented to the XPath assertion (and text nodes that thereby become
> adjacent are merged). Processors MAY provide a user option to retain comments
> and processing instruction so that assertions can examine their contents or
> test for their presence or absence.

Michael's suggestion works for me.
Comment 13 Mukul Gandhi 2011-11-01 12:08:51 UTC
(In reply to comment #12) 
> Michael's suggestion works for me.

I too agree.

In case when Michael's suggestion may be incorporated into the spec (of-course after WG as whole agrees to it), I would wish to have the relevant XSD 1.1 test suite test case (mentioned in my original comment in this thread) to be either removed from the test suite, or may be flagged with a custom version token (to be able to allow implementations who follows the MAY clause of this, to ignore this test case via some filter while producing internal & public test suite compliance reports).

Thanks.
Comment 14 Michael Kay 2011-11-29 22:01:18 UTC
The test assert023 has been changed to reflect the decision that comments are by default not visible to assertions. (In Saxon there is now a switch to control whether comments and PIs are visible, and by default it is set to false.)
Comment 15 C. M. Sperberg-McQueen 2011-12-27 02:24:13 UTC
A wording proposal along the lines suggested in comment 11 has been prepared; it's at

  https://www.w3.org/XML/Group/2004/06/xmlschema-1/structures.b13935.html
  (member-accessible material)

Accordingly, I'm marking this bug as needing review.
Comment 16 C. M. Sperberg-McQueen 2012-01-06 23:30:45 UTC
The proposal mentioned in comment 15 and based on the suggestion in comment 11 was adopted by the XML Schema WG at its call today.  The changes have been integrated into the member-accessible status quo document at https://www.w3.org/XML/Group/2004/06/xmlschema-1/structures.html -- accordingly, I'm marking this issue resolved.

Mukul, I am sorry but I don't remember whether you have member access or not.  if you have member access, please examine the changes and (as the originator of the report) close the bug if you are satisfied with them, or reopen it if you are not satisfied.  If you don't have member access, please let me know (here or via offline email) and I'll see that the text of the wording proposal is sent to you for review.  If we don't hear from you in the next week, we'll assume that you are satisfied with the change (an assumption strengthened by your statement in comment 13 that you're OK with the proposal in comment 11.

Pete, thank you for your comments on the issue; from comment 12 I gather that you are happy with the change made; if this is not true, please say so within the next week.
Comment 17 Mukul Gandhi 2012-01-07 01:07:42 UTC
I've examined the changes made within the status quo document, for resolving this issue and I find them correct and consistent with the agreement made as mentioned. As advised, I'm therefore marking this issue as "closed".

I've member access by virtue of being an employee of a member organization, so I was able to access the member version of the spec.

Thanks.