5297 – Drop restriction of assertions to down-pointing XPaths?

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 5297 - Drop restriction of assertions to down-pointing XPaths?

Summary: Drop restriction of assertions to down-pointing XPaths?

Status:	RESOLVED WONTFIX

Alias:	None

Product:	XML Schema
Classification:	Unclassified
Component:	Structures: XSD Part 1 (show other bugs)
Version:	1.1 only
Hardware:	Macintosh All

Importance:	P2 normal
Target Milestone:	---
Assignee:	C. M. Sperberg-McQueen
QA Contact:	XML Schema comments list

URL:
Whiteboard:	important, hard, XPath cluster
Keywords:

Depends on:
Blocks:

Reported:	2007-11-30 04:26 UTC by C. M. Sperberg-McQueen
Modified:	2008-01-25 16:39 UTC (History)
CC List:	0 users

See Also:

Attachments

Description C. M. Sperberg-McQueen 2007-11-30 04:26:16 UTC

In our design of assertions and conditional type assignment, we have
for some time taken the position that the XPath expressions used
should not point up or left (or right) from the element on which the
assertions are checked (or the type of which is being calculated).
This was not a unanimous decision, if I recall correctly, but one to
which some WG members assented reluctantly.  

It seemed clear, when we reached agreement on this point, that the
prohibition was inconvenient, but as far as we could tell it did not
actually affect the expressive power of assertions.

One example used at the time (see
http://lists.w3.org/Archives/Member/w3c-xml-schema-ig/2006Mar/0041.html
for an appearance of this example in WG discussion) was that of the
HTML 'input' and 'form' elements: 'form' is legal only within an
'input' element.  A natural way to express the constraint is to 
write an assertion of the form 

  ancestor::html:form

or

  count(ancestor::html:form) > 0

in the declaration of the type assigned to the input element.  We came
to the conclusion that upward-pointing assertions were not essential
to this scenario, however, when we realized that one could place the
assertion on the HTML element, and formulate it as 

  count(.//input) = count(.//form//input)

Recently I have become aware of a fallacy in our reasoning, which
leads me to conclude that we reasoned our way to a false conclusion
and that we need to reconsider the decisions based in part on that
conclusion.

HTML is defined as a modular vocabulary, and the forms module is
intended to be usable in arbitrary XML vocabularies, not only in
(X)HTML itself.  Placing the assertion on the HTML element does not
guarantee that it will be enforced wherever 'form' and 'input' are
used.  There is no element on which the assertion can be placed so
that it reliably enforces the constraint.  It appears, then, that we
were wrong to believe that the current form of assertions in XSDL 1.1
could be used to handle the HTML form/input use case.

Under the circumstances, I no longer believe it plausible to forbid
assertions to point upward in the tree, and believe that we should
drop the restriction (and with it, our mechanisms for 'tree trimming'
prior to XPath evaluation).

Comment 1 Pete Cordell 2007-11-30 09:33:54 UTC

I'm against allowing upward pointing XPaths.

Let's try and remember the 80/20 principle.  I don't think we have to scourer the world looking for every conceivable use-case and then accommodate it.

Comment 2 Michael Kay 2007-11-30 10:27:10 UTC

I think it's a good principle that an element is valid or invalid based only on its content, and not on its context in some larger document. XSLT and XQuery rely on this principle, allowing you to assemble a document bottom-up, knowing that if a component is valid in itself, then it will still be valid when inserted into a larger tree. We do have one exception to this, the document-level constraints on ID/IDREF, but that's manageable.

In the use case cited, if we argued that an input element can never exist without a containing form, then it would be impossible for XSLT to construct the form. There are definitely contexts where it's quite reasonable for an input element to exist without the containing form, for example in the results of a query performing introspection on the form. This truly is a constraint on valid HTML elements, not a constraint on input elements.

There is one usability issue with constraints like count(.//input) = count(.//form//input) - it's very hard to produce an error message that identifies the offending input element. If users can be persuaded to write such constraints in the form

every $i in .//input satisfies ($i intersect .//form//input)

then it might be possible to do better. At present our restricted XPath subset discourages this.

Comment 3 Noah Mendelsohn 2007-12-07 01:07:36 UTC

I agree with both Pete and Mike:  we don't have to handle every use case, and there are good architectural reasons for types (and thus elements) to have context-independent validation requirements.

Also:  I assume I'm correct that the original bug report took a shortcut in asking for XPaths that could point up, left or right.  Presumably what's wanted is that, along with relaxation of the tree trimming rules that would make such added XPath capability useful? I remain opposed, but it's certainly useful to have the proposal be clear in any case.  Did I correctly infer what was intended?  Thank you.

Thanks.

NOah

Comment 4 C. M. Sperberg-McQueen 2007-12-07 18:33:02 UTC

Comments #1, #2, and #3 all argue that we should not make any change
here, first because there is no need to support every conceivable use
case and second because there are good reasons for validation to be
context independent.

These arguments are not wholly satisfying.  

The first presents a generally accepted principle as if it were an
argument: we cannot support all conceivable use cases.  This is not in
dispute, in part because some of us are well aware that it is not
possible to do so even in theory -- but it does not address the
salient question, which is: should we seek to support *this* use case?

This particular example is not a discovery made after long search: the
context awareness of HTML form and input elements has been a topic of
conversation, in the context of schema languages and their expressive
power, for about as long as HTML has had DTDs.  The last I heard, HTML
was not an obscure language of no particular interest for the world at
large; the ability to specify features built into widely used markup
languages is, I think, something the designer of any mechanism for
specifying markup vocabularies should be thinking hard about, if the
mechanism being designed is intended to be of general use.

The case for taking this use case seriously is simple: HTML is the
flagship markup language of W3C, and the feature makes it possible to
define a well known part of HTML precisely and concisely.  Our earlier
decision not to support it was based on a false premise.  The case
against considering this use case seriously appears to be: the W3C's
schema language doesn't really have any need to support the features
needed to define W3C's markup languages, or any other obscure and
little-used languages.  These arguments are most charitably described
as laughable; perhaps there are others.

The second argument offered is that there are good reasons for
validation to be context-free.  If those reasons exist, deciding on
this issue will involve weighing them carefully against the reasons
for making it possible to write an XSDL schema that expresses an
important constraint in (X)HTML.  Such a careful weighing of one point
against another will be easier if the reasons for context-free
validation are identified.  Comment #3 does not identify any
arguments, architectural or otherwise; it only says they exist.  

Michael Kay does identify a specific reason: bottom-up construction.
I think he is correct that the locality of validity may make it easier
to guarantee the validity of larger constructs made from smaller ones,
and I expect that it will be useful to distinguish different kinds of
validity which depend on different parts of the document, just as we
do now in distinguishing local validity from 'deep' validity.

It's not entirely clear to me how to weigh against each other the
two situations 

  (A) Validity is simple to calculate but does not include some 
      simple mechanically checkable constraints imposed by the 
      definition of the vocabulary. So "validity" is less useful 
      as a concept than it might be.

  (B) "Validity" included more of the constraints imposed by the
      definition of the vocabulary, so it's a more useful concept
      than in (A).  But it's more complex to calculate.

But it does seem clear to me that dismissing the requirements of HTML
out of hand, as if they were a corner case, is not the right way to
weigh them.  And assertions that there are good architectural reasons
for something do not carry, in a careful Working Group, the same
weight as the architectural reasons themselves.

Comment 5 C. M. Sperberg-McQueen 2007-12-07 18:42:18 UTC

Comment #3 asks for clarification of the issue, but I cannot provide the desired
clarification because I do not understand the question.

The Working Group has entertained several different formulations of the spec
intended to achieve the goal of ensuring that neither assertions nor conditional
type assignment depend on nodes outside the subtree rooted in the item to which
the assertions or conditional type assignment are attached by declaration.  In
some of those formulations, specific XPath axes have been made illegal in the
XPath expressions.  In others, the input tree has been truncated so that any
expression which attempts to refer to nodes outside the subtree will evaluate
to the empty set.  Other formulations are possible which involve neither 
syntactic restrictions on XPath expressions nor tree surgery.  The details of
the spec prose are not of interest here; what is at issue is the goal the
WG has been trying to achieve, which the description of the issue argues 
should be revisited.  If the goal is dropped, then I think the implications 
for the spec prose are obvious enough.

Comment 6 Noah Mendelsohn 2007-12-13 00:18:41 UTC

> In some of those formulations, specific XPath axes
> have been made illegal in the XPath expressions.
> In others, the input tree has been truncated so
> that any expression which attempts to refer to
> nodes outside the subtree will evaluate to the
> empty set.  Other formulations are possible which
> involve neither syntactic restrictions on XPath
> expressions nor tree surgery.  The details of the
> spec prose are not of interest here; what is at
> issue is the goal the WG has been trying to
> achieve, which the description of the issue argues
> should be revisited.  If the goal is dropped, then
> I think the implications for the spec prose are
> obvious enough.

OK.  That's what I wanted to be sure I understood.  Thank you.

Noah

Comment 7 David Ezell 2008-01-25 16:39:15 UTC

HST calls the attention of the WG to his proposal for resolving the form/input issue, in comment 19 on issue 5003: http://www.w3.org/Bugs/Public/show_bug.cgi?id=5003#c19

WG asserts that it's too late for that.

Propose that we close this issue

Approved.
Dissent from W3C.
Dissent from NACS.