This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 5023 - Relationship between identity constraints and assertions
Summary: Relationship between identity constraints and assertions
Status: CLOSED FIXED
Alias: None
Product: XML Schema
Classification: Unclassified
Component: Structures: XSD Part 1 (show other bugs)
Version: 1.1 only
Hardware: PC Windows XP
: P2 normal
Target Milestone: CR
Assignee: C. M. Sperberg-McQueen
QA Contact: XML Schema comments list
URL:
Whiteboard: XPath cluster
Keywords: resolved
Depends on:
Blocks:
 
Reported: 2007-09-05 17:09 UTC by Erik Wilde
Modified: 2009-04-20 09:43 UTC (History)
5 users (show)

See Also:


Attachments
wording proposal (64.22 KB, text/html)
2009-04-13 00:14 UTC, C. M. Sperberg-McQueen
Details

Description Erik Wilde 2007-09-05 17:09:54 UTC
With the introduction of assertions in XSDL 1.1 there now is an overlap in functionality between identity constraints and assertions. Both constructs have very similar use cases, but work a little bit differently. The most important difference is that identity constraints are defined in the context of elements, and assertions in the context of types. This makes them behave very differently in the presence of type derivation. While the specification is not a tutorial, I think that the relationship (and the differences) should be made more explicit.

I also would recommend XSDL 1.1 authors to avoid identity constraints completely and only use assertions, which are better integrated into the data model. But this kind of comment is probably is more appropriate for a tutorial or a best practices document.
Comment 1 C. M. Sperberg-McQueen 2008-01-23 22:03:31 UTC
Thanks for the comment.

Speaking for myself, I think you are right that there is overlap in the
functionality of the two constructs, and that a note in the spec pointing
among other things to the differences you mention might be in order.

I suspect that the Working Group is unlikely to want to make the recommendation
you suggest in your second paragraph; because identity constraints are 
weaker than assertions, they can (at least in theory) be simpler for 
some schema authors to use.  And some implementations may be able to exploit
the relative weakness of identity constraints to optimize them in a way that
it might be difficult to match for the corresponding assertions.   
Comment 2 David Ezell 2008-01-23 22:13:44 UTC
Resolved:  editors to construct an appropriate note, mentioning that content models and types are also constraints, and that there is some overlap in all of them, and that the appeal to idomatic use makes any of them easier to use in some contexts than in others.  Consider referencing the "rule of least power" finding at  http://www.w3.org/2001/tag/doc/leastPower.html
Comment 3 Erik Wilde 2008-01-28 18:59:40 UTC
one thing that many people do not understand about identity constraints is that they belong to elements and not to types and thus are not passed down the type derivation hierarchy. so if you want an identity constraint for a type, you had to indicate that by using a comment.

assertions now provide a way to define identity constraints which are part of the type hierarchy, and i think this is one of the very important differences between these constructs (and of course the expressiveness of the constraints).

in my experience with xml schema, most users have trouble understanding the abstract type hierarchy, and how it related to the concrete elements and attributes available for building instances. i made my initial comment because i thought it could be very helpful for xml schema to get a better understanding of the dependencies between certain types of constraints, and type derivation.
Comment 4 C. M. Sperberg-McQueen 2008-05-23 23:45:40 UTC
Since the issue raised here appears not likely to affect the conformance
of schema documents, schemas, or processors, I am marking it 'editorial'.
Comment 5 C. M. Sperberg-McQueen 2009-04-13 00:14:17 UTC
Created attachment 684 [details]
wording proposal


A wording proposal is now at 

  http://www.w3.org/XML/Group/2004/06/xmlschema-1/structures.b5023.html
  (member-only link)

for WG review.  In the interests of allowing the originator of the issue
to examine it, I am also attaching it to this issue (but please note that
the wording is subject to amendment by the Working Group).
Comment 6 Erik Wilde 2009-04-13 01:41:01 UTC
this looks really useful, i think this will help readers quite a bit. thanks!

minor editorial remarks:

"XSD has three forms of constraint ..." -> "XSD has three forms of constraints ..."

"In version 1.0 of this specification, identity constraints used [XPath 1.0] They now use [XPath 2.0]." -> "In version 1.0 of this specification, identity constraints used [XPath 1.0]; they now use [XPath 2.0]."

in theory, one could point to schematron and point out that originally, XSD was mostly about grammars, and that schematron was invented to be used as an alternative for cases where grammar rules do not work very well. XSD 1.1 probably allows some schematron applications to be covered by XSD now, but schematron's abstract patterns and phases still are language constructs which do not readily map to any language feature of XSD. i realize that the spec may not be the appropriate place for something like this, but it certainly would be useful for people searching for some guidance and comparison. (treated a bit more systematically, it also would be a perfect topic for a balisage paper... ;-)
Comment 7 Sandy Gao 2009-04-13 16:12:17 UTC
One minor comment on the proposal in comment #5.

The insertion at the beginning of 2.2.4 starts with

"XSD has three forms of constraint which allow convenient expression of certain rules which would be inconvenient, or impossible, to express otherwise."

But we have other rules that are "inconvenient, or impossible, to express otherwise". For example, how to specify open contents or fixed value constraints?

I think what these 3 have in common is that they all use XPaths and they are all about relationship between different parts of a document. Convenience and feasibility aren't unique to them.
Comment 8 C. M. Sperberg-McQueen 2009-04-14 22:50:29 UTC
Proposed amendment for the opening of 2.2.4.  For 

    XSD has three forms of constraint which allow convenient 
    expression of certain rules which would be inconvenient, 
    or impossible, to express otherwise.

read

    This section describes constructs which use [XPath 2.0]
    expressions to constrain the input document; using them
    certain rules can be expressed conveniently which would
    be inconvenient or impossible to express otherwise.

The point SG makes in comment 7 is quite true:  any of our
constructs makes it possible, or at least more convenient, 
to express rules that couldn't be expressed so easily or at
all otherwise.  I'd like to retain the idea here because 
(a) many people seem to believe that nothing one can 
do with the co-occurrence could be done at all without
them -- I'd like to point out that sometimes it's just a question
of convenience, not expressive power -- and (b) it helps
set the stage for the discussion in the second insertion of
how to choose among constructs when you can express
the same thing several ways.

This formulation obviated EW's first editorial suggestion.
I like his second.  (I like his third, too, but am afraid of it
and don't propose we do anything about it in the spec.)

Comment 9 Michael Kay 2009-04-14 23:05:13 UTC
If we're advising users which of these mechanisms to use under which circumstances, then I think there are two observations I would want to make:

(a) it's a lot easier for an implementation to enforce identity constraints and CTA while processing the document in a streaming manner than it is to enforce assertions

(b) using a specialized mechanism such as identity constraints rather than a general mechanism like assertions may result in a more focused and intelligible error message when the condition is violated. For example, the assertion test="count(//empno)=count(distinct-values(//empno))" is unlikely, if violated, to result in an error message that tells the user how many duplicate employee numbers there are, which numbers are duplicated, or where to look for them in the instance document. I think the same is probably true of other reasonable ways of expressing this constraint, such as test="every $a in //empno, $b in //empno satisfies $a is $b or $a ne $b"
Comment 10 Noah Mendelsohn 2009-04-15 15:00:27 UTC
Michael Kay writes:

> using a specialized mechanism such as identity
> constraints rather than a general mechanism like
> assertions may result in a more focused and
> intelligible error message when the condition is
> violated.


Yes, indeed.  Perhaps the TAG finding Rule of Least Power (http://www.w3.org/2001/tag/doc/leastPower.html) is pertinent to consideration of these tradeoffs.  

> it's a lot easier for an implementation to enforce
> identity constraints and CTA while processing the
> document in a streaming manner than it is to
> enforce assertions

Sigh.  I can't help remembering all the debates about whether insisting on a suitable subset of XPath would have made this less of a concern.  Too late to reopen that, but I do find it disappointing to find out that after all, you as a skilled implementer wind up warning users away from assertions exactly because of this foreseeable concern.  That doesn't make the tradeoff we chose wrong, but I remain somewhat uncomfortable with it.

Noah
Comment 11 Michael Kay 2009-04-15 15:19:10 UTC
>you as a skilled implementer wind up warning users away from assertions exactly because of this foreseeable concern.

No, I wouldn't for a moment warn users away from assertions where they are appropriate. But XPath is a powerful language and you need to consider what you are asking for. You need to be aware that if you write, as I wrote, 

test="every $a in //empno, $b in //empno satisfies $a is $b or $a ne $b"

then many implementations are going to use memory proportional to document size and time proportional to the square of document size, while other ways of writing the same constraint (like xs:unique) are likely to use constant memory and linear time.

Of course some implementations may optimize some constructs, but anyone with SQL experience knows that you can't expect every processor to optimize every construct.

>insisting on a suitable subset of XPath would have made this less of a concern

There are many constraints that people want to express that require the full power of XPath. I don't regard it as a "concern" that people are able to express constraints that are expensive to evaluate - it's their decision.
Comment 12 Noah Mendelsohn 2009-04-15 15:32:27 UTC
Michael Kay writes:

> There are many constraints that people want to
> express that require the full power of XPath. I
> don't regard it as a "concern" that people are
> able to express constraints that are expensive to
> evaluate - it's their decision.

Fair enough, but that doesn't quite capture my concern. I worry some that, since XPath in general doesn't stream without a lot of hard work, many implementors might not do an efficient job even with the simple XPaths that do.  So, instead of the situation you describe, in which a smart user knows which paths will stream and which won't, even the simple ones will run much slower than they might.

That said:  it's certainly the case that nobody would be dumb enough to use space that's the square of the document size just to evaluate @a > @b, even though doing so might be tempting for:

test="every $a in //empno, $b in //empno satisfies $a is $b or $a ne $b"

So I think your point holds to a significant degree anyway.  It will be unfortunate if implementations don't take the trouble to notice and optimize the simple common cases.

Noah
Comment 13 Erik Wilde 2009-04-15 15:57:23 UTC
i think assumptions about optimizations built into processors are a bit optimistic. for example, a very obvious optimization in the space of XML implementations would be to use xsl:key for building an index, but recently i ran into an XSLT processor (in a highly successful commercial product, XML Spy), which does not seem to do so. i am not sure, but the performance really looked as if they treated every key() call as a search of the document tree.

http://dret.typepad.com/dretblog/2008/12/itunes-xml.html

more generally speaking, this is the spectrum of limited declarative vs. more expressive procedural languages, and XPath 2.0 has opened the door quite a bit to make XPath so expressive that is has become much harder to optimize. my guess is that most XSD implementers will not bother to carefully optimize XPath expressions (are there already test cases for XSD 1.1? do they contain sophisticated XPath assertions? do they maybe even contain functionally equivalent XSDs that use different constraint mechanisms to do the same thing? that might be a worthwhile set of additions to the test cases.), also because they might simply use existing XPath 2.0 libraries instead of implementing XPath 2.0 as a part of their XSD 1.1 implementation.
Comment 14 Michael Kay 2009-04-15 16:17:34 UTC
>in a highly successful commercial product, XML Spy

The fact that XML Spy is so successful despite its allegedly poor performance is a good reminder that as technology providers, we often over-estimate the importance of performance to our users.

(However, the post you cite provides no evidence that XML Spy doesn't using hashing or indexing to support xsl:key - it seems to be pure conjecture. Such conjectures by users are often way off the mark, in my experience.)

>My guess is that most XSD implementers will not bother to carefully optimize XPath expressions

Depends on the dynamics of the market. In theory, if the user community needs it, they will pay for the necessary investment.
Comment 15 Erik Wilde 2009-04-15 16:34:08 UTC
(In reply to comment #14)
> (However, the post you cite provides no evidence that XML Spy doesn't using
> hashing or indexing to support xsl:key - it seems to be pure conjecture. Such
> conjectures by users are often way off the mark, in my experience.)

yes, i was simply guessing and made no systematic tests. the performance differences, however, were dramatic, so at least it is safe to say that something significant was going on (or going wrong), and that the implementation had some significant performance issues with the combination of XML/XSLT i was using.

(i would still take a bet that i am right; but i probably shouldn't... :-)
Comment 16 Dave Peterson 2009-04-15 16:54:47 UTC
(In reply to comment #13)
> i think assumptions about optimizations built into processors are a bit
> optimistic. for example, a very obvious optimization in the space of XML
> implementations would be to use xsl:key for building an index, but recently i
> ran into an XSLT processor (in a highly successful commercial product, XML
> Spy), which does not seem to do so. i am not sure, but the performance really
> looked as if they treated every key() call as a search of the document tree.

(Andy to comment #14)

> The fact that XML Spy is so successful despite its allegedly poor performance
> is a good reminder that as technology providers, we often over-estimate the
> importance of performance to our users.

At one point I used XMLSpy to edit the Schema spec, but on a document that large, it was indeed unusably slow.  The XMLSpy folk told me they specifically designed their product for use on lots of small documents rather than one or a few a large ones.

> (However, the post you cite provides no evidence that XML Spy doesn't using
> hashing or indexing to support xsl:key - it seems to be pure conjecture. Such
> conjectures by users are often way off the mark, in my experience.)

I don't see that Erik Wilde mad any claim about how XMLSpy actually supports xsl:key.  Only that it ran slowly when he used it to process xsl:key instances "as if" they ran repeated searches of the document tree.
Comment 17 David Ezell 2009-04-17 15:58:12 UTC
   - Adopt second amendment (punctuation) in comment 6
   - Adopt amendment in comment 8
   - Adopt proposal as amended.
Comment 18 C. M. Sperberg-McQueen 2009-04-18 13:49:10 UTC
As describd in comment 17, the XML Schema WG discussed this issue 
at their regular telcon yesterday and adopted the wording proposal 
mentioned in comment 5, amended by (a) the second correction given 
in comment 6 and (b) the change proposed in comment 8.  (The
latter makes the first change of comment 6 inapplicable; the third
point in comment 6 we thought best saved for blog posts and conference
papers.)

With this change, now integrated into the status-quo documents at 

  http://www.w3.org/XML/Group/2004/06/xmlschema-1/structures.html
  (member-only link)

the XML Schema WG believes it has resolved this issue.  Accordingly I
am marking this bug RESOLVED / FIXED.  

Erik Wilde, as the originator of the issue, please let us know 
if you agree with this resolution of your issue, by 
adding a comment to the issue record and changing the
Status of the issue to Closed. Or, if you do not agree with this
resolution, please add a comment explaining why. If you wish to
appeal the WG's decision to the Director, then also change the
Status of the record to Reopened. If you wish to record your
dissent, but do not wish to appeal the decision to the Director,
then change the Status of the record to Closed. If we do not hear
from you in the next ten days or so, we will assume you agree
with the WG decision.
` 
Comment 19 Erik Wilde 2009-04-20 09:43:40 UTC
thanks a lot for your responses and work on this issue. i think the issue has been resolved in a way which improves the specification. thanks!