10207 – [XPath] Matching abstract schema-element tests

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 10207 - [XPath] Matching abstract schema-element tests

Summary: [XPath] Matching abstract schema-element tests

Status:	RESOLVED FIXED

Alias:	None

Product:	XPath / XQuery / XSLT
Classification:	Unclassified
Component:	XPath 3.0 (show other bugs)
Version:	2nd Edition Recommendation
Hardware:	PC Windows NT

Importance:	P2 normal
Target Milestone:	---
Assignee:	Jonathan Robie
QA Contact:	Mailing list for public feedback on specs from XSL and XML Query WGs

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2010-07-20 12:16 UTC by Oliver Hallam
Modified:	2012-10-29 13:51 UTC (History)
CC List:	2 users (show)

See Also:

Attachments

Description Oliver Hallam 2010-07-20 12:16:01 UTC

The rules for matching schema element tests referring to abstract elements in XPath 2.0 are not well defined.

The resolution of bug 10065 changed the text of section 2.5.4.4, but this problem exists with both versions of the text.

The initial rule states (that a node matches if):

A SchemaElementTest matches a candidate element node if all three of the
following conditions are satisfied:

(old)
1. The name of the candidate node matches the specified ElementName or matches
the name of an element in a substitution group headed by an element named
ElementName.

2. derives-from(AT, ET) is true, where AT is the type annotation of the
candidate node and ET is the schema type declared for element ElementName in
the in-scope element declarations.

3. If the element declaration for ElementName in the in-scope element
declarations is not nillable, then the nilled property of the candidate node is
false.

(from bug 10065 - subject to editorial changes)
1. The name of the candidate node matches the specified ElementName or matches
the name of an element in a substitution group headed by an element named
ElementName. Call this element the substituted element.

2. derives-from(AT, ET) is true, where AT is the type annotation of the
candidate node and ET is the schema type declared for the substituted element
in the in-scope element declarations.

3. If the substituted element is not nillable, then the nilled property of the
candidate node is false.

In both cases if the name of an abstract element is matched then ET is not defined.

The rules were changed to ensure that any element that matches a schema-element tests is valid according to the schema the node resides in. In order to keep in the spirit of the changed rules we do not want to match nodes whose names refer to abstract elements.

I believe we should add another condition between conditions 1 and 2 in the new text (and replace the word "three" with "four" in the paragraph preceding the list):

The substituted element is not abstract.

This will have to be modified to match the terminology used by the text to resolve bug 10065.

If this change were adopted in 2.0 then no change need to be made to Formal Semantics, as it explicitly states that it does not define the case of abstract elements (Section D.1.1)

Comment 1 Michael Kay 2010-07-20 13:12:10 UTC

I have some sympathy with this: if the declaration of E is abstract, then an element named E shouldn't be attributed to that element declaration, and if it meets all the stated conditions then it only does so by accident. However, I have some concern about compatibility: this is an accident that can and does happen, for example if someone writes <element name="x"/> in the schema when they intended <element ref="x"/>, thus creating confusion between local and global element declarations, and I'm not convinced it's enough of a bug to justify changing the rules.

If we follow this logic further we would start having to look at the "block" and "final" attributes to determine whether the element can validly appear as a member of the substitution group. That's surprisingly complex logic and I would really rather avoid it.

Note that technically I think we are talking about potential rather than actual membership of the substitution group: from XSD 1.0:

<quote>
Element declarations are potential members of the substitution group, if any, identified by {substitution group affiliation}. Potential membership is transitive but not symmetric; an element declaration is a potential member of any group of which its {substitution group affiliation} is a potential member. Actual membership may be blocked by the effects of {substitution group exclusions} or {disallowed substitutions}, see below.
</quote>

Two specific points about comment #0:

(a) "In both cases if the name of an abstract element is matched then ET is not
defined." I don't follow the logic. Just because an element declaration is abstract doesn't mean it has no declared type.

(b) "The substituted element is not abstract": it's not the element that is abstract, but the element declaration. The same criticism applies to the "existing proposed" text "If the substituted element is not nillable". I suggest changing rule 1 to

"The name of the candidate node matches the specified ElementName or matches
the name of an element declaration that is a potential member of the substitution group headed by an element named ElementName. Call this element declaration ED."

and then using ED accordingly, for example "ED is not nillable". (The term "substituted element" is in any case very confusing: if sugar substitutes for honey then properly, sugar is "substituting" and honey is "substituted", though many people might say "I substituted sugar").

Comment 2 Oliver Hallam 2010-07-20 15:13:58 UTC

I agree with all your comments.

My main discomfort with the current rules is that you can match an element against a schema type that would fail to validate according to the in-scope schemas.

Since the rules are in fact clear for 2.0 (and hence there is no pressing need to fix this), I have changed this bug to be against XPath 2.1.

My personal opinion is that the rules should be equivalent to those in schema and if possible the spec should reference the rules in schema instead of redefining them.  In a processor that handles schemas statically it is just a matter of computing the set of element types that are validly substitutable for an element, but I an unsure of the ramifications of this change for an implementation that can accept new schemas at runtime.

Comment 3 Michael Kay 2010-07-20 15:50:57 UTC

The underlying reason for the clumsy specification of schema-element(X) is that the only information we can base the decision on, in terms of XDM node properties, is the element name, the type annotation, and the isNilled property.

>My personal opinion is that the rules should be equivalent to those in schema

I don't think schema has any direct analogue of schema-element(X). Which rules were you thinking of?

Comment 4 Oliver Hallam 2010-07-20 16:53:08 UTC

After looking closer at the definition of substitution group, I think we already require using "block" and "final" (and all the complexity it adds).  

The sentence said:

1. The name of the candidate node matches the specified ElementName or matches
the name of an element in a substitution group headed by an element named
ElementName.

If we look at the definition of substitution group:

Definition: Substitution groups are defined in [XML Schema] Part 1, Section 2.2.2.2. Informally, the substitution group headed by a given element (called the head element) consists of the set of elements that can be substituted for the head element without affecting the outcome of schema validation.]

Section 2.2.2.2 specifies:

Note that element substitution groups are not represented as separate components. They are specified in the property values for element declarations (see Element Declarations (§3.3)).

And Element Declarations defines the substitution group as (3.3.6.4):

[Definition:]  One element declaration is substitutable for another if together they satisfy constraint Substitution Group OK (Transitive) (§3.3.6.3).

[Definition:]   Every element declaration (call this HEAD) in the {element declarations} of a schema defines a substitution group, a subset of those {element declarations}. An element declaration is in the substitution group of HEAD if and only if it is ·substitutable· for HEAD.

So finally we reach Section 3.3.6.3 to define what can substitute for a head element:

Schema Component Constraint: Substitution Group OK (Transitive)
For an element declaration (call it M, for member) to be substitutable for another element declaration (call it H, for head) at least one of the following must be true:
1 M and H are the same element declaration.
2 All of the following are true:
2.1 H. {disallowed substitutions} does not contain substitution.
2.2 There is a chain of {substitution group affiliations} properties from M to H, that is, either M.{substitution group affiliations} contains H, or M.{substitution group affiliations} contains a declaration whose {substitution group affiliations} contains H, or . . .
2.3 The set of all {derivation method}s involved in the ·derivation· of M.{type definition} from H.{type definition} does not intersect with the union of (1) H.{disallowed substitutions}, (2) H.{type definition}.{prohibited substitutions} (if H.{type definition} is complex, otherwise the empty set), and (3) the {prohibited substitutions} (respectively the empty set) of any intermediate declared {type definition}s in the ·derivation· of M.{type definition} from H.{type definition}.

This definition already involves H.{disallowed substitutions} and so I believe that the block attribute must already be taken into account.

I believe that the final attribute does not make a difference here - It is only used to enforce Schema Component Constraint: Element Declaration Properties Correct (3.3.6.1).

We already require these attributes to be handled, and hence all the messiness that that entails.  I believe that the change suggested is sufficient.



A minor simplification:

I believe

1. The name of the candidate node matches the specified ElementName or matches
the name of an element in a substitution group headed by an element named
ElementName.

is equivalent to

1. The name of the candidate node matches name of an element in a substitution group headed by an element named ElementName.

as an element is always included in a substitution group headed by itself

Comment 5 Michael Kay 2010-07-20 17:59:41 UTC

Comment #4 is citing the XSD 1.1 working draft, and I think the inference is correct if we take XSD 1.1 as the target. But XQuery/XPath currently cite XSD 1.0.

XSD 1.0 is far less clear-cut. Section 3.3.6 in "Schema Component Constraint: Substitution Group" defines two concepts, the potential substitution group of an element and the actual substitution group, and the only things that steer us towards using the actual substitution group in preference are (a) the choice of name "actual", and (b) the feeling that that it makes more sense that way. It would probably be a good idea to clarify this.

In fact it seems the XSD 1.0 definition of "actual substitution group" is not the same as the XSD 1.1 definition of "substitution group". (A) the 1.0 definition excludes abstract element declarations, while 1.1 includes them; (B) the 1.1 definition excludes element declarations whose type is "blocked" from being derived from the head's type, while the 1.0 definition includes them (they are not validly substitutable, but they are part of the actual substitution group). 

Why did you lead us into this swamp?

Comment 6 Oliver Hallam 2010-09-22 16:44:30 UTC

A small note:

The XQuery and XPath specs state:

[Definition: Substitution groups are defined in [XML Schema] Part 1, Section 2.2.2.2. Informally, the substitution group headed by a given element (called the head element) consists of the set of elements that can be substituted for the head element without affecting the outcome of schema validation.]

However this informal definition does not match the current behaviour (this is the original issue I reported).

Comment 7 Jonathan Robie 2010-10-26 15:55:52 UTC

We have decided to adopt the 2ed rules":

* abstract elements do not appear in substitution groups
* block attributes must be taken into account when building the  substitution groups.

And I will use the initial bug report as the basis for my text.

Comment 8 Michael Kay 2012-10-15 09:40:12 UTC

I am tasked with writing tests that relate to the changes made to the specification in relation to bugs 10065 and 10207, which concern the details of how membership of substitution groups is defined in relation to the schema-element() type test. So I've been re-reading these complex bug reports, and the current text of the specification.

As far as I can see, most of the problems addressed during the discussion of bug 10207 are not resolved in the current draft. In some ways, the wording is worse than it ever was. The first rule in 2.5.5.4 now reads:

1. The name of the candidate node matches the specified ElementName, or it matches the name of an element in a substitution group headed by an element named ElementName and the substituted element is not abstract. Call this element the substituted element.

I read this as saying that the substituted element is the substituted element. But even after resolving that, it's all wrong. The members of a substitution group are not elements, but element declarations. It is element declarations that are abstract, not elements. And despite the discussion on the bug report, there is no clarity about which of the two element [declarations] we call the "substituted" element [declaration] - given that A is substituting for B, I think we are using "substituted" to refer to A, when it would be more logical to call that the substituting element and B the substituted one.

There has been no clarification about what we mean by an element [declaration] being "in a substitution group" despite all the discussion about whether we meant actual membership or potential membership; there is no mention of the decision we made that membership is affected by block/final (in other words that we are talking about actual membership).

Given the way the rule is punctuated, most readers would parse it as (A or (B and C)); but Oliver originally proposed adding the rule about the element being abstract as a separate fourth rule, which implies that the correct parse is ((A or B) and C). In any case, the editorial change applied to the proposal as agreed has introduced an ambiguity.

Compounding the problem, the example below the rules has not changed since 1.0. It talks about whether "customer" (the head of the substitution group) is nillable, whereas the intent of the change is that we are concerned with whether the relevant member of the substitution group is nillable.

In the case where substitution groups are not involved ("The name of the candidate node matches the specified ElementName"), we don't seem to define a meaning for the term "substituted element", and yet the term is used in the following two rules which apply equally to this case.

I think the following wording would better reflect the agreed resolution of the bug:

A SchemaElementTest matches a candidate element node if all of the following conditions are satisfied:

1. Either:

1a. The name N of the candidate node matches the specified ElementName, or

1b. The name N of the candidate node matches the name of an element declaration that is a member of the actual substitution group headed by the declaration of element ElementName.

NOTE: the term "actual substitution group" is defined in [XSD]. The actual substitution group of an element declaration H includes those element declarations P that are declared to have H as their direct or indirect substitution group head, provided that P is not declared as abstract, and that P is validly substitutable for H, which means that there must be no blocking constraints that prevent substitution.

2. The schema element declaration named N is not abstract.

3. derives-from( AT, ET ) is true, where AT is the type annotation of the candidate node and ET is the schema type declared in the schema element declaration named N.

4. If the schema element declaration named N is not nillable, then the nilled property of the candidate node is false.

Example: The SchemaElementTest schema-element(customer) matches a candidate element node in the following two situations:

(a) customer is a top-level element declaration in the in-scope element declarations; the name of the candidate node is customer; the element declaration of customer is not abstract; the type annotation of the candidate node is the same as or derived from the schema type declared in the customer element declaration; and either the candidate node is not nilled, or customer is declared to be nillable.

(b) customer is a top-level element declaration in the in-scope element declarations; the name of the candidate node is client; client is an actual (non-abstract and non-blocked) member of the substitution group of customer; the type annotation of the candidate node is the same as or derived from the schema type declared for the client element; and either the candidate node is not nilled, or client is declared to be nillable.

Comment 9 Jonathan Robie 2012-10-29 13:51:02 UTC

The Working Group approves the wording in comment #8.