5289 – "##defined" in wildcards

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 5289 - "##defined" in wildcards

Summary: "##defined" in wildcards

Status:	RESOLVED WONTFIX

Alias:	None

Product:	XML Schema
Classification:	Unclassified
Component:	Structures: XSD Part 1 (show other bugs)
Version:	1.1 only
Hardware:	PC Windows XP

Importance:	P2 normal
Target Milestone:	---
Assignee:	C. M. Sperberg-McQueen
QA Contact:	XML Schema comments list

URL:
Whiteboard:	important, work, nis cluster
Keywords:

Depends on:
Blocks:

Reported:	2007-11-27 14:32 UTC by Michael Kay
Modified:	2008-01-25 21:14 UTC (History)
CC List:	0 users

See Also:

Attachments

Description Michael Kay 2007-11-27 14:32:40 UTC

I've started looking at the "##defined" keyword in wildcards, and in particular its impact on type subsumption. (This comment relates to a priority feedback request).

My current feeling is that this feature is a bad idea because it causes unpredictable side-effects. It means that the meaning of a type is impacted by the set of element/attribute definitions that the schema processor has available and is capable of resolving against, which is intrinsically unpredictable (and may be rapidly changing) in many processing environments. Arguably, for simple validity checking (does a name match a wildcard?) this is unlikely to make much of a difference most of the time. But it can have other effects: for example adding a new element declaration to the schema can invalidate the type subsumption relationship between two existing types in the schema, and the processor therefore needs potentially to re-evaluate such relationships every time a new element or attribute is added.

Arguably the "effect at a distance" of "##defined" is not new, it already exists for processContents="strict". However, I believe that the type subsumption rules for processContents="strict" do not take into account the actual set of element declarations that a wildcard is capable of matching, because they do not affect local validity of an element against the wildcard.

I might be able to accept "##defined" if the type subsumption rules were based on an open-world model, that is an assumption that an element declaration exists for every possible element name.

Comment 1 Noah Mendelsohn 2007-12-07 01:03:31 UTC

> for example adding a new element declaration to the schema can
> invalidate the type subsumption relationship between two existing types in the
> schema, and the processor therefore needs potentially to re-evaluate such
> relationships every time a new element or attribute is added.

Well, let's be careful here.  I think this is an issue we've gone around on.  I do understand that we asked for priority feedback, so discussing this is in order.  Still, "Adding an element" is not an operation the schema Recommendation has ever talked about.  Neither, for that matter, does the XML Recommendation talk about "adding" Element Type Declarations to a DTD.  Indeed, I think it's fair to say that used in certain ways parameter entities can cause action at a distance in DTDs.  Anyway, returning to schemas...

Implicit in this bug report is that we have a requirement to facilitate the construction of certain kinds of systems.  These would, I infer, be systems that store schemas in a repository for reuse, and more specifically, that would try to do incremental recompilation of selected parts as new constructs are added to the collection of those available.  Specifically, in this case, the goal would be to not have to recompute a subsumption relationship merely because, when comparing the old schema and the new, some not obviously related element declaration is present in the new for which no corresponding one existed in the old (note, we don't "add" declarations, but we can talk informally about similar declarations that appear in two similar schemas.)

In short, I think there's scope creep implicit in this request.  Furthermore, while I do understand that this is a significant concern for some systems like Saxon, and we've asked for feedback, I don't think there's new information here.  I thought these concerns were weighed, and the decisions was made to include ##defined, albeit not without reservations on the part of some WG members.

So I'm asking the questions:  where is the requirement to deal with what we might call the "open world view", and what has changed since we had a very long, detailed, and in some ways contentious debate that led us to include this feature?  I had thought that this aspect of the issue had been resolved, if not easily, and that the priority feedback was largely because of concerns that ##defined might not meet its intended use cases, or might make it difficult maintain schemas in the case where one user imported or perhaps redefined a schema document that had been created by another.

As you say, we have closed world assumptions for other features, such as strict processing and I think also for substitution groups.  The fact that these are a bit easier to deal with when doing incremental compilation doesn't make easy incremental compilation a goal I think (though it's no doubt a nice to have.)  Whatever the other pros and cons of ##defined, I'd prefer not to reopen the question of its closed world charcteristics.  I think we've been more than half pregnant on that, if you'll excuse the metaphor, for a long time.

Noah

Comment 2 Michael Kay 2007-12-07 09:14:36 UTC

Noah, your comment seems to be making some technical arguments, but it is mostly a "point of order" that suggests the issue should not be opened because it has already been adequately discussed.

I'm in the slightly ambiguous position here in that I'm a member of the working group, but I wasn't a member at the time of those discussions. But I don't think I would feel inhibited from making this comment even if I had initially been the leading exponent or opponent of the current feature.

My experience of standards work is that you always get a better standard if it is informed by implementation experience, and that implementing a specification always yields new insights that were not available when the feature was initially discussed "in a vacuum". My understanding of the last call process is that it is there to encourage people to start implementing the specification and to report their experiences, and I do not think there are any process grounds for rejecting such feedback.

As for the technical point, I'm not sure I understand your remark that we don't talk about "adding an element" to a schema. It's true that we don't have a very well articulated model of schema construction, but we do explicitly say that the process can be incremental, and that you can use some definitions in the schema before all definitions are available. All my comment is doing is to point out that the new ##defined facility adds further implementation complexity to the process of incremental schema construction.

I also think that we should give more thought to the XML database scenario. This isn't my scenario as an implementor, but I do come across it as a user and as a consultant. There are a number of different ways that an XML database can implement the concept of "a schema", but in most of them, schemas as well as instances are likely to be long-lived, and to change over time. It sems to me that action-at-a-distance facilities like ##defined make it more difficult to ensure that when changes are made to a schema, they are being made in a backward-compatible way, that is, to guarantee that all existing instances will remain valid.

Comment 3 C. M. Sperberg-McQueen 2007-12-07 19:24:45 UTC

Comment #2 seems to suffer from a certain confusion over levels and 
responsibilities.

One may or may not read the XSDL spec as talking implicitly about operations
like adding elements to schemas.  As a factual matter, the spec does talk
about operations on schemas, such as composition, which can be regarded as
entailing the addition of elements.  (It's true that the spec does not address
the topic directly. Some readers believe that the very special degree of clarity
and utility achieved by XSDL's discussion of schema composition may result in 
part from that silence; if they are right, then we should be very careful to do 
the right thing when considering changes.)

But it is one thing to claim that the spec talks or does not talk about some 
topic T.  It is quite a different one to claim that because the spec does
not talk about T, the Working Group should not think about, or discuss, T.
It not only does not follow in general, but the argument is irrelevant and
serves only to distract the WG from what ought to be our concern with the
usability of our language.

Comment #1 claims that the bug report constitutes scope creep.  Not so.
Whether "##defined" is always useful, sometimes useful, or never useful,
and its good and bad features, are necessarily in scope for discussion as 
soon as ##defined is proposed.  Comment #1 claims that the particular 
argument offered is out of scope; this claim seems to be contradicted by
the Working Group's classification of bug 2826 as in scope.  (Bug 2826
is RQ-135 Consistency and validity for a set of schema components.)

In my role as staff contact, I have to observe that as a process matter the
Working Group is obligated to make a good-faith effort to resolve Last Call
comments; as I understand "good faith", this obligation means the Working
Group cannot legitimately respond to any Last Call comment by declining to
consider it on the grounds that we have already closed the question it relates
to.  When the comment relates to a question on which the Working Group 
actively solicited feedback, the process requirements do not change, but 
it is even clearer that attempts to shut down arguments without engaging them
constitute bad faith.

So in my role as process cop for the Working Group I have to suggest to Noah
that his attempt to suppress the argument raised in this issue is itself out
of order.  If the Working Group has consensus on an analysis which weights
the argument lightly, then it will not take much time to reassert that
consensus, explain why the Working Group is not moved by the argument, and
offer the appropriate counter-arguments.  If the Working Group does not have
such consensus, then a consensus may form around a new understanding of
the problem.  Attempts to characterize last-call comments as "reopening"
closed issues are intrinsically suspect, because they appear to be motivated
by a desire to avoid putting the alleged consensus of the Working Group to 
any test.   What reason might one have for avoiding such a test, if not
the suspicion that the alleged consensus will not hold up in an open 
discussion?

Comment 4 Sandy Gao 2007-12-10 15:01:29 UTC

In the bug description:

> Arguably the "effect at a distance" of "##defined" is not new, it already
> exists for processContents="strict". However, I believe that the type
> subsumption rules for processContents="strict" do not take into account the
> actual set of element declarations that a wildcard is capable of matching,
> because they do not affect local validity of an element against the wildcard.

The subsumption rules do take into account "strict", in a nontrivial way. If base has a strict wildcard, and the derived has a local element without a matching global, then the restriction is invalid; when you add a matching global, the restriction may become valid.

The reason the restriction (without a global) is invalid is because base requires xsi:type and derived doesn't.

Comment 5 David Ezell 2008-01-25 21:14:02 UTC

WG discussed this at the f2f.  It's clear that there is no consensus for the removal of ##defined from the status quo, at least without further input from users.

The chair has opted to close this issue pending new information.