This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 7059 - Forking XPath
Summary: Forking XPath
Status: CLOSED FIXED
Alias: None
Product: HTML WG
Classification: Unclassified
Component: pre-LC1 HTML5 spec (editor: Ian Hickson) (show other bugs)
Version: unspecified
Hardware: All All
: P2 normal
Target Milestone: ---
Assignee: Ian 'Hixie' Hickson
QA Contact: HTML WG Bugzilla archive list
URL:
Whiteboard:
Keywords: NE
Depends on:
Blocks:
 
Reported: 2009-06-26 14:45 UTC by Jonathan Robie
Modified: 2010-10-04 14:45 UTC (History)
10 users (show)

See Also:


Attachments

Description Jonathan Robie 2009-06-26 14:45:44 UTC
http://www.whatwg.org/specs/web-apps/current-work/#interactions-with-xpath-and-xslt intentionally forks XPath 1.0. 

I strongly suggest that you work with XQuery and XSL Working Groups, which produce the XPath specification, to come up with a better solution. We started discussion on a Bugzilla bug against our spec:

http://www.w3.org/Bugs/Public/show_bug.cgi?id=6777

In this discussion, the reason given for forking XPath is that some people do not want to implement XPath 2.0, which solves your problems. But instead, your draft forks the XPath standard by creating a new version of XPath 1.0 where the name tests have special-purpose semantics.

XPath 2.0 was published 23 January 2007 (before your Working Group was even chartered), solves your problem, and is widely implemented. Now, in June, 2009, you propose to create an incompatible version of XPath 1.0 and say that web browsers must implement this instead.

And this incompatible version is not even a subset of XPath 2.0.
Comment 1 Henri Sivonen 2009-06-26 15:18:25 UTC
> I strongly suggest that you work with XQuery and XSL Working Groups, which
produce the XPath specification, to come up with a better solution.

Note that that's what I tried first.

> In this discussion, the reason given for forking XPath is that some people do
not want to implement XPath 2.0, which solves your problems.

Browsers have XPath 1.0 implementations and don't have XPath 2 implementations. It isn't feasible to make the namespacing of elements in HTML DOMs gated on browsers implementing XPath 2.0.

> XPath 2.0 was published 23 January 2007 (before your Working Group was even
chartered), solves your problem, and is widely implemented.

What's relevant is what the status quo of XPath support in browsers is.

> And this incompatible version is not even a subset of XPath 2.0.

I'd personally be OK with pursuing making prefixless expressions not to match no-namespace nodes in HTML documents Gecko if the WebKit developers are OK with making the same change to WebKit. Right now, the HTML5 spec, Gecko and WebKit agree, so we have interop and a spec that matches implementation, which is a pretty good situation to be in.
Comment 2 Henri Sivonen 2009-06-26 15:27:53 UTC
Also note that this doesn't affect XPath matching in XML documents in any way. As far as I can tell, applying XPath to HTML documents has always been outside the immediate interest and charter of the XSL WG.
Comment 3 Jonathan Robie 2009-06-26 17:30:03 UTC
(In reply to comment #2)

> Also note that this doesn't affect XPath matching in XML documents in any way.
> As far as I can tell, applying XPath to HTML documents has always been outside
> the immediate interest and charter of the XSL WG.

Both XQuery and XSLT are frequently applied to HTML documents. The document is first parsed to create an XDM instance (often using tools like Tagsoup to deal with cruft), then processed appropriately. 

Screen scraping and data integration are one important class of applications that do this.

XPath is used in both XQuery and XSLT. It's going to be extremely confusing if XPath expressions are interpreted differently when executed inside a browser environment, especially since the documents that define the XPath standard do not support this interpretation.

I suggest that you define a profile of XPath 2.0 that corresponds to the functionality of XPath 1.0 plus default namespaces, and also define the mapping of your XML documents to XDM (you have to do this regardless, because XPath is defined in terms of the XDM, not the DOM). 

You may also want to allow implementations to optionally support all of XPath 2.0.


Comment 4 Jonathan Robie 2009-06-26 17:37:44 UTC
(In reply to comment #1)
> > I strongly suggest that you work with XQuery and XSL Working Groups, which
> > produce the XPath specification, to come up with a better solution.
> 
> Note that that's what I tried first.

Let's keep working on it ... 

> I'd personally be OK with pursuing making prefixless expressions not to match
> no-namespace nodes in HTML documents Gecko if the WebKit developers are OK with
> making the same change to WebKit. Right now, the HTML5 spec, Gecko and WebKit
> agree, so we have interop and a spec that matches implementation, which is a
> pretty good situation to be in.

But HTML documents are widely read by many applications that are not web browsers, and some of these use XQuery or XSLT to extract data from web pages. 

XPath handling has to work in all of these environments. And the XPath language used in all of these environments should correspond to the XPath specification.
Comment 5 Ian 'Hixie' Hickson 2009-06-26 22:19:48 UTC
Here's how I understand the situation:

* HTML5 changes HTML's processing rules such that it would break any XPath that is currently being applied to text/html documents.
* To keep XPath 1.0 expressions working, therefore, XPath 1.0 implementations that implement HTML5 would need to implement some changes.
* They in fact already do. For example, WebKit does what HTML5 now describes.

Personally, I would rather not have to mention XPath in the HTML5 spec; the only reason it does mention it is to keep XPath 1.0 working.

If the XPath working group would rather this text be removed, then I would be happy to remove it. Is that the case?
Comment 6 Jonathan Robie 2009-06-26 22:52:29 UTC
(In reply to comment #5)
> Here's how I understand the situation:
> 
> * HTML5 changes HTML's processing rules such that it would break any XPath that
> is currently being applied to text/html documents.
> * To keep XPath 1.0 expressions working, therefore, XPath 1.0 implementations
> that implement HTML5 would need to implement some changes.
> * They in fact already do. For example, WebKit does what HTML5 now describes.
> 
> Personally, I would rather not have to mention XPath in the HTML5 spec; the
> only reason it does mention it is to keep XPath 1.0 working.
> 
> If the XPath working group would rather this text be removed, then I would be
> happy to remove it. Is that the case?


I'm speaking personally so far, the XQuery and XSLT Working Groups may or may not decide to endorse any comments I make here.

If you want a language that has different semantics from XPath, I think the clean thing to do would be to create a completely different syntax. As long as the language is almost the same as XPath, but semantically different, I think it is likely to be called XPath by users and to cause confusion. Some of these same users will also be processing XML and HTML using real XPath as part of XQuery or XSLT, but not in a web browser environment.

As far as I understand, the changes that are needed are already supported in XPath 2.0 if you use default element namespaces and define a mapping to the XDM, and specify that you use compatibility mode. One workable approach is to say that you support the subset of XPath 2.0 that corresponds to the XPath 1.0 grammar.

I suspect that one or more people from our Working Group would be happy to discuss alternatives, perhaps in a telcon or an IRC meeting.
Comment 7 Ian 'Hixie' Hickson 2009-06-26 22:59:06 UTC
I personally have no horse in this race; I'm most happy to have HTML5 require what the XPath working group would like it to require. Please let me know what the XPath working group's opinion is on this topic so that I can update the spec accordingly.

Again, the only reason we added this text in the first place was to make XPath 1.0 compatible with HTML5, which I had expected to be desired by the working group; if such compatibility isn't desired, that's fine. (We have similar text to make HTML5 compatible with Selectors.) I expect discussion of a new language specifically for HTML would be best handled by a separate group; I don't think it would belong in HTML5 proper.
Comment 8 Jonathan Robie 2009-06-27 03:24:42 UTC
Thanks - I will request that we discuss this on next Tuesday's telcon and offer an opinion as the XML Query and XSL Working Groups. Our next meeting is June 30th.

We *definitely* want XPath to be compatible with HTML 5 documents. 
Comment 9 Michael[tm] Smith 2009-06-27 03:38:31 UTC
(In reply to comment #5)
> Here's how I understand the situation:
> 
> * HTML5 changes HTML's processing rules such that it would break any XPath that
> is currently being applied to text/html documents.
> * To keep XPath 1.0 expressions working, therefore, XPath 1.0 implementations
> that implement HTML5 would need to implement some changes.
> * They in fact already do. For example, WebKit does what HTML5 now describes.

So, as a point of clarification: This seems like one of those cases where the draft is actually descriptively documenting actual existing processing behavior in at least one major browser, not it in fact unilaterally or arbitrarily changing the processing rules.

> Personally, I would rather not have to mention XPath in the HTML5 spec; the
> only reason it does mention it is to keep XPath 1.0 working.
> 
> If the XPath working group would rather this text be removed, then I would be
> happy to remove it. Is that the case?

I would be less happy for you to remove it.

I think the reason why Henri requested that it be documented is a good one, I would like to see us try to come some agreement about how to get resolution on this, and I think simply removing the text is not a satisfactory resolution. The value of having the text in there now is for it to get exactly the kind of wider review that will encourage discussion about the underlying issue that we need to resolve here.
Comment 10 Anne 2009-06-27 07:02:08 UTC
Note also that supporting XPath 2.0 or a new language does not solve the problem. The problem is legacy XPath 1.0 expressions applying to legacy HTML content. What changed is that legacy HTML content turns into a namespaced tree (to make HTML and XHTML more consistent and all sorts of other smallish benefits) and so we have to change how XPath 1.0 expressions are evaluated because otherwise the legacy content would break.

Potential solutions:

 1. Break with XPath 1.0 as proposed.
 2. Abandon the namespaced tree approach.
 3. ???

Since 2 is an explicit design goal 1 seems like the best solution, but I'd be interested in hearing about a potential number 3 given that a) we can keep the tree namespaced and b) legacy XPath expressions keep working.
Comment 11 Jonathan Robie 2009-06-27 23:49:30 UTC
(In reply to comment #10)
> Note also that supporting XPath 2.0 or a new language does not solve the
> problem. The problem is legacy XPath 1.0 expressions applying to legacy HTML
> content. What changed is that legacy HTML content turns into a namespaced tree
> (to make HTML and XHTML more consistent and all sorts of other smallish
> benefits) and so we have to change how XPath 1.0 expressions are evaluated
> because otherwise the legacy content would break.
> 
> Potential solutions:
> 
>  1. Break with XPath 1.0 as proposed.
>  2. Abandon the namespaced tree approach.
>  3. ???
> 
> Since 2 is an explicit design goal 1 seems like the best solution, but I'd be
> interested in hearing about a potential number 3 given that a) we can keep the
> tree namespaced and b) legacy XPath expressions keep working.

I think it would be helpful to get a small group together from your Working Group and from the XSL and XQuery Working Groups to make sure we understand the requirements on both sides and look for solutions.

If you are mapping both sets of HTML onto the same namespace, and you want unprefixed names to match names in that namespace, you can do that in XPath 2.0 by declaring the default element namespace to be that namespace. An implementation is allowed to set a default element namespace.
Comment 12 Ian 'Hixie' Hickson 2009-06-28 00:04:19 UTC
As Anne noted, the problem is with existing deployed content using XPath 1.0, not with future content (which as you say, is handled fine by XPath 2.0 features).
Comment 13 Henri Sivonen 2009-06-29 12:15:43 UTC
(In reply to comment #3)
> Both XQuery and XSLT are frequently applied to HTML documents. The document is
> first parsed to create an XDM instance (often using tools like Tagsoup to deal
> with cruft), then processed appropriately. 

But that's different. TagSoup assigns HTML elements into the http://www.w3.org/1999/xhtml namespace *like an HTML5 parser does* but legacy browser-based HTML parsers didn't.

If you have an app that currently uses TagSoup and any version of XPath, you don't need to change anything on the XPath layer if you move from TagSoup to an HTML5-compliant parser.

> Screen scraping and data integration are one important class of applications
> that do this.

Indeed, but they have nothing to do with this special case in the spec.

In the screen scraping scenario, the XPath expressions are supplied by the scraper developer--not by the remote Web content. The issue at hand has everything to do with the case where the XPath 1.0 expressions are supplied by existing content in JavaScript programs using the document.evaluate API.

Hixie, I think the spec should make it clearer that the willful violation of XPath 1.0 only applies to UAs that support scripting and let scripts in content evaluate XPath expressions against the DOM.

> XPath is used in both XQuery and XSLT. It's going to be extremely confusing if
> XPath expressions are interpreted differently when executed inside a browser
> environment, especially since the documents that define the XPath standard do
> not support this interpretation.

Frankly, I think most users of XPath will never even realize that this hack is in place and, therefore, won't be confused by it.

> I suggest that you define a profile of XPath 2.0 that corresponds to the
> functionality of XPath 1.0 plus default namespaces, and also define the mapping
> of your XML documents to XDM (you have to do this regardless, because XPath is
> defined in terms of the XDM, not the DOM). 

The point of having this in the spec is to provide advice to implementors who have XPath 1.0 engines but haven't upgraded to DOM5 yet. When I implemented this for Gecko, I first had to experience test case failures and then go find out what WebKit does. The only reason I'm pursuing this is that I want to do unto the next implementor what I wish the previous implementor had done unto me.

(In reply to comment #6)
> If you want a language that has different semantics from XPath, I think the
> clean thing to do would be to create a completely different syntax.

That's completely infeasible, since the whole point is to keep existing XPath 1.0 expressions, which are already part of existing script out there, working.

(In reply to comment #11)
> (In reply to comment #10)
> I think it would be helpful to get a small group together from your Working
> Group and from the XSL and XQuery Working Groups to make sure we understand the
> requirements on both sides and look for solutions.

Here are the requirements for the case where the UA accepts XPath 1.0 expressions from Web content through scripting:

 1) Prefixless name expressions in XPath 1.0 expressions passed to document.evaluate() must match against HTML element nodes in HTML documents (for existing expressions). This requirement is not negotiable. It's a non-starter to suggest that a browser vendor whose previous release exhibits this behavior make their next release not exhibit this behavior.

 2) Name expressions whose namespace http://www.w3.org/1999/xhtml should match against HTML element nodes in HTML documents (for prospective expressions). This isn't a hard requirement, but not having this property would hinder expression portability between HTML and XHTML.

 3) The solution must not require browser vendors who currently ship XPath 1.0 engines to upgrade to an XPath 2.x engine. This is practically a hard requirement.

 4) HTML element nodes in the DOM should report http://www.w3.org/1999/xhtml as their namespace. (Note that giving up on this point would require special casing all over while putting the hack in the XPath matcher isolates the hack. Also note that this property removes the need of a hack from Selectors. As a consequence, it's safe to consider this as a pretty serious requirement at this point.)

 5) It's more important for different browsers to do the same thing than for some browsers to be more purely XPath 2.0-like.
 
 6) The XPath engine shouldn't have to modify its behavior depending on whether the expression came in via document.evaluate() or other means. This is a fairly hard requirement.

Here are the requirements for other cases (already satisfied by TagSoup + off-the-shelf XPath library):

 A) Name expressions whose namespace http://www.w3.org/1999/xhtml should also match against HTML element nodes in HTML documents.

 B) HTML elements should be in the http://www.w3.org/1999/xhtml namespace.

- -

As you can see, the only degree of freedom here for UAs that support scripting and document.evaluate() is whether no-namespace expressions match against no-namespace element nodes *in addition to* matching against HTML nodes. And even in that case, uniformity between browsers is more important than being a purer subset of XPath 2.0.

There's no impact on applications that don't get their XPath expressions from Web content but whose XPath expressions are supplied by the application developer.
Comment 14 Jonathan Robie 2009-06-29 16:21:36 UTC
As I understand, you want a path expression with no prefixes to match elements from either documents whose HTML elements are in no namespace, or documents whose HTML elements are in the HTML namespace.

As I understand, this affects only document.evaluate(), which applies only to one document at a time.

Did I get that right? If so, I'll make sure we discuss this in the Working Groups tomorrow and make a recommendation. If I'm missing or misunderstanding requirement, please let me know.
Comment 15 Henri Sivonen 2009-06-30 06:31:57 UTC
(In reply to comment #14)
> As I understand, you want a path expression with no prefixes to match elements
> from either documents whose HTML elements are in no namespace, or documents
> whose HTML elements are in the HTML namespace.

No. As of HTML5, HTML elements (elements that implement the HTMLElement DOM interface) are always in the http://www.w3.org/1999/xhtml namespace (like with TagSoup). Elements that aren't in a namespace aren't HTML elements (and, therefore, don't implement the HTMLElement interface). All parser-inserted element nodes in HTML documents are in one of these three namespaces:
http://www.w3.org/1999/xhtml
http://www.w3.org/2000/svg
http://www.w3.org/1998/Math/MathML

The HTMLness of a document is an implementation-internal boolean on the object that implements the Document interface. "HTML document" refers to a DOM tree with the HTMLness bit set--not to the byte stream. ("Document" refers to the parsed representation--not to the source text here.)

No-namespace nodes can be introduced to HTML documents only by a script calling createElementNS(null, ...) or by moving a no-namespace node from an XML document into an HTML document. 

(Note that XSLT cannot introduce no-namespace nodes into HTML documents, because the namespace is changed to http://www.w3.org/1999/xhtml when an XSLT program tries to do so: bug 6776. Also note that XSLT programs still can introduce no-namespace nodes into XML documents, i.e. documents whose HTMLness bit is false.)

For compatibility with existing XPath 1.0 expressions whose authors have assumed that HTML elements aren't in a namespace, no-namespace XPath expressions need to match against nodes whose namespace is http://www.w3.org/1999/xhtml when the owner document of those nodes has been marked as an HTML document. 

> As I understand, this affects only document.evaluate(), which applies only to
> one document at a time.

This affects any API that allows the evaluation of XPath expressions against DOM trees whose HTMLness bit is true. Thus, it also applies to the XSLTProcessor API:
https://developer.mozilla.org/en/Using_the_Mozilla_JavaScript_interface_to_XSL_Transformations

Note that when XSLT is applied to an XML document that has the xml-stylesheet PI and that is being loaded from the network, the input DOM doesn't have the HTMLness bit set, so this doesn't apply.
Comment 16 Jonathan Robie 2009-07-03 12:50:47 UTC
I think wording similar to this would meet your requirements, as I understand them:

In XPath 1.0, if a NameTest has no prefix, then the namespace URI is null. In the context of document.evaluate(), if a NameTest has no prefix, the namespace URI is http://www.w3.org/1999/xhtml if the principle node type of the NameTest is element; otherwise, it has no namespace URI.

Comment 17 Henri Sivonen 2009-07-07 10:37:06 UTC
Does that formulation affect testing expressions against attribute names?
Comment 18 Jonathan Robie 2009-07-07 11:23:00 UTC
(In reply to comment #17)
> Does that formulation affect testing expressions against attribute names?

No. That would have been clearer if I had said "is null" in both places, instead of "is null" in one place and "has no namespace URI" in the other. So this is better wording:

"In XPath 1.0, if a NameTest has no prefix, then the namespace URI is null. In
the context of document.evaluate(), HTML5 specifies a different default element namespace: if a NameTest has no prefix and the principle node type of the NameTest is element, the namespace URI is http://www.w3.org/1999/xhtml; if a NameTest has no prefix and the principle node type of the NameTest is not element, the namespace URI is null."

Comment 19 Jim Melton 2009-08-27 22:41:25 UTC
The XML Query WG plans to meet at TPAC 2009 (Technical Plenary and Advisory Council meeting).  We think that it would be beneficial for the XML Query and HTML WGs to get together for an hour or two during that week to discuss this issue.  To that end, I'm sending separate email to the chair(s) of the HTML WG containing this suggestion. 

Jim Melton, Chair of the XML Query WG
Comment 20 Henri Sivonen 2009-08-28 07:41:11 UTC
If I understand comment 18 right, the only change it makes to the current HTML 5 spec is flipping the detail that I called the only degree of freedom in comment 13 the other way round. That is, compared to the current HTML 5 spec, the difference is that a prefixless name expression wouldn't match a no-namespace element node in a tree flagged as HTML.

I'm willing to pursue a patch with this change in Gecko if WebKit devs are also willing to change WebKit likewise.

In any case, the spec text needs to be scoped to anything that causes XPath expressions to be evaluated against an HTML DOM in a browser--not just document.evaluate(). For example, this would also need to apply to invoking an XSLT transform from JS with an HTML input document.
Comment 21 Maciej Stachowiak 2009-08-28 07:53:36 UTC
(In reply to comment #20)
> If I understand comment 18 right, the only change it makes to the current HTML
> 5 spec is flipping the detail that I called the only degree of freedom in
> comment 13 the other way round. That is, compared to the current HTML 5 spec,
> the difference is that a prefixless name expression wouldn't match a
> no-namespace element node in a tree flagged as HTML.
> 
> I'm willing to pursue a patch with this change in Gecko if WebKit devs are also
> willing to change WebKit likewise.
> 

I believe we'd be willing to change this in WebKit.

Comment 22 Ian 'Hixie' Hickson 2009-09-07 10:30:20 UTC
I added the new condition. I hope that's what was intended. If it wasn't please reopen the bug with the exact text changes you would like.
Comment 23 Michael[tm] Smith 2009-09-09 03:46:42 UTC
(In reply to comment #22)
> I added the new condition. I hope that's what was intended. If it wasn't please
> reopen the bug with the exact text changes you would like.

for the record, the diff for the change is here:

http://html5.org/tools/web-apps-tracker?from=3764&to=3765

The newly added text reads:

[[
Irrespective of the requirements defined in XPath 1.0, a name expression must not evaluate to matching a node when the following conditions are all met:

  - The name expression has no namespace.
  - The expression is being tested against an element node.
  - The element is in no namespace.
  - The element's document is an HTML document.
]]
Comment 24 Jonathan Robie 2009-09-09 11:48:15 UTC
The latest change, and the associated diffs, add more special purpose semantics to the processing of HTML using XPath. That does not address the bug.

XPath defines what name expressions do. They should work the same way for HTML as for XML.
Comment 25 Henri Sivonen 2009-09-09 11:54:42 UTC
(In reply to comment #24)
> The latest change, and the associated diffs, add more special purpose semantics
> to the processing of HTML using XPath. That does not address the bug.
> 
> XPath defines what name expressions do. They should work the same way for HTML
> as for XML.

How does the effect of the current spec text differ from the effect of what you suggested in comment 18? If it differs in any way, my comment 20 is based on a misunderstanding.
Comment 26 Jonathan Robie 2009-09-09 12:18:35 UTC
First off, I believe that our Working Groups have agreed to meet to discuss this issue. I suggest that we not resolve the issue before that discussion.

The suggestion in comment #18 is that you use the XPath 1.0 semantics as defined in that specification, and be clear about the default namespace:

> In XPath 1.0, if a NameTest has no prefix, 
> then the namespace URI is null. In the context 
> of document.evaluate(), HTML5 specifies a different 
> default element namespace: if a NameTest has no prefix 
> and the principle node type of the NameTest is element, 
> the namespace URI is http://www.w3.org/1999/xhtml; if a
> NameTest has no prefix and the principle node type of the 
> NameTest is not element, the namespace URI is null.


What I see in the current draft gives match conditions in addition to those defined in XPath 1.0, plus match conditions defined in XPath 1.0 that do not apply, and says that it is willfully violating the XPath specification.

> In addition to the cases where a name expression would match a
> node per XPath 1.0, a name expression must evaluate to matching a
> node when all the following conditions are also met:
> 
>     * The name expression has no namespace.
>     * The name expression has local name that is a match for local.
>     * The expression is being tested against an element node.
>     * The element has local name local.
>     * The element is in the HTML namespace.
>     * The element's document is an HTML document.
> 
> Irrespective of the requirements defined in XPath 1.0, a name
> expression must not evaluate to matching a node when the
> following conditions are all met:
> 
>     * The name expression has no namespace.
>     * The expression is being tested against an element node.
>     * The element is in no namespace.
>     * The element's document is an HTML document.
> 
> These requirements are a willful violation of the XPath 1.0
> specification, motivated by desire to have implementations be
> compatible with legacy content while still supporting the changes
> that this specification introduces to HTML regarding which
> namespace is used for HTML elements. [XPATH10]
Comment 27 Jonathan Robie 2009-09-09 12:59:50 UTC
(In reply to comment #25)

> How does the effect of the current spec text differ from the effect of what you
> suggested in comment 18? If it differs in any way, my comment 20 is based on a
> misunderstanding.

I believe the effect is the same - but it is very difficult for the reader to determine whether this is true or not, because the text currently adds some cases to what XPath 1.0 specifies, then makes some restrictions that XPath 1.0 does not contain.

Perhaps the two WGs, when we meet, could usefully discuss whether there are any differences in the end result.

But I also think that the language of the XHTML specification should read more like comment #18 than like what it currently contains, i.e. it should conform to what XPath 1.0 says except for the default element namespace, rather than make two changes to how name tests work, add a note saying that it willfully violates the XPath 1.0 specification, and hoping that the reader can realize that if you think about it hard enough, the only real difference is the default element namespace. 

Comment 28 Ian 'Hixie' Hickson 2009-09-09 23:38:12 UTC
We're not going to delay the spec until such time as the workings groups can meet — the HTML working group is literally hundreds of people, so all we'd ever be able to do is have a subset of the group meet, like we're doing at the TPAC. Plus, such a group from the HTMLWG would by charter be unable to make any decisions, anyway, since we have to make decisions in a way that allows asynchronous participation.

If you don't want the spec to say what it says now, could you be more precise about what you would like it to say, and how that differs from what it says?
Comment 29 Henri Sivonen 2009-09-10 07:50:23 UTC
(In reply to comment #28)
> We're not going to delay the spec until such time as the workings groups can
> meet — the HTML working group is literally hundreds of people, so all we'd
> ever be able to do is have a subset of the group meet, like we're doing at the
> TPAC. Plus, such a group from the HTMLWG would by charter be unable to make any
> decisions, anyway, since we have to make decisions in a way that allows
> asynchronous participation.

I won't be at TPAC.

(In reply to comment #27)
> (In reply to comment #25)
> 
> > How does the effect of the current spec text differ from the effect of what you
> > suggested in comment 18? If it differs in any way, my comment 20 is based on a
> > misunderstanding.
> 
> I believe the effect is the same 

Great!

> but it is very difficult for the reader to
> determine whether this is true or not, because the text currently adds some
> cases to what XPath 1.0 specifies, then makes some restrictions that XPath 1.0
> does not contain.

Isn't that what any delta spec does?

> But I also think that the language of the XHTML specification should read more
> like comment #18 than like what it currently contains, i.e. it should conform
> to what XPath 1.0 says except for the default element namespace, rather than
> make two changes to how name tests work, add a note saying that it willfully
> violates the XPath 1.0 specification, and hoping that the reader can realize
> that if you think about it hard enough, the only real difference is the default
> element namespace.

I disagree. I think the way the spec is currently worded requires the least amount of thought on the behalf of the implementor who is hacking the delta to an existing XPath 1.0 implementation. After all, what's in the spec in the implementation delta that is needed (with or without the edit from rev 3765). If even you can't immediately see if your wording matches this delta, your wording isn't obvious enough.

Invoking the concept of "default namespace" doesn't make sense, because it is an XPath 2.0 concept and HTML5 is specifying a delta on top of XPath 1.0.
Comment 30 Jonathan Robie 2009-09-10 13:43:02 UTC
(In reply to comment #28)
> We're not going to delay the spec until such time as the workings groups can
> meet — the HTML working group is literally hundreds of people, so all we'd
> ever be able to do is have a subset of the group meet, like we're doing at the
> TPAC. Plus, such a group from the HTMLWG would by charter be unable to make any
> decisions, anyway, since we have to make decisions in a way that allows
> asynchronous participation.

Asynchronous participation is fine with me - from my email, it looks like our chair proposed to have some members of your WG join our WGs telcon on the 15th (Maciej Stachowiak also pointed out that this needs to be resolved before TPAC).

Currently, the chairs of our two working groups are communicating about this, and it's pretty clear that our WG does not consider this resolved yet. I suggest we leave it open until we reach agreement.

> If you don't want the spec to say what it says now, could you be more precise
> about what you would like it to say, and how that differs from what it says?

Commend #18 was my attempt to do this. It is not an official response from the XML Query WG, merely wording that I thought solved the problem.
Comment 31 Jonathan Robie 2009-09-10 14:54:48 UTC
(In reply to comment #29)

> Isn't that what any delta spec does?

HTML5 should not be a delta spec for XPath 1.0. If it were, that would be forking XPath.

> I disagree. I think the way the spec is currently worded requires the least
> amount of thought on the behalf of the implementor who is hacking the delta to
> an existing XPath 1.0 implementation. After all, what's in the spec in the
> implementation delta that is needed (with or without the edit from rev 3765).
> If even you can't immediately see if your wording matches this delta, your
> wording isn't obvious enough.
> 
> Invoking the concept of "default namespace" doesn't make sense, because it is
> an XPath 2.0 concept and HTML5 is specifying a delta on top of XPath 1.0.

The advantage of invoking the concept of a default element namespace is that XPath 2.0 - which is the successor of XPath 1.0, not a private fork - already uses this concept to provide the functionality you want to provide using a different concept.

I think there are some issues with your current wording, and it is easier to adopt wording like what I suggested in comment 18 than to fix this - let me try to explain.

* The name expression has no namespace.

There is no such thing as a "name expression", you mean a NameTest. I think you mean to say "The NameTest has no prefix".

* The name expression has local name that is a match for local.

I think you mean to say that the local part of the NameTest matches the local name 'local'.

* The expression is being tested against an element node.

I would describe this in terms of what the NameTest specifies - the default element namespace is used for an unprefixed QName appearing in a position where an element name is expected.


If I make those changes to both bullet lists in your spec, then the first bullet list says that unprefixed NameTests for elements in an HTML document must match elements in the HTML namespace, and the second bullet list says that unprefixed Nametests for elements in an HTML document must not match elements that have no namespace. 

And in that case, this is equivalent to my language in comment #18, except that I forgot to mention this is true only for HTML documents:

"In XPath 1.0, if a NameTest has no prefix, then the namespace URI is null. In the context of document.evaluate(), HTML5 specifies a default element namespace <add>for HTML documents</add>: if a NameTest has no prefix and the  rinciple node type of the NameTest is element, the namespace URI is  http://www.w3.org/1999/xhtml; if a NameTest has no prefix and the principle node type of the NameTest is not element, the namespace URI is null."

If you want to be clear that default element namespaces were added in XPath 2.0, you could state that in a NOTE.
Comment 32 Ian 'Hixie' Hickson 2009-09-11 00:19:36 UTC
"NameTest" appears to be nothing but a token in the XPath grammar, so I don't understand what it means to talk about it matching elements or anything like that. 
Also, it's hard to determine, out of context, if your proposed text is stating a fact that can be derived from other requirements, or if it is intended to be introducing new requirements.

How about this text:

A node test consisting of a QName with no prefix (i.e. that matches the UnprefixedName production) must be treated as if it instead had the namespace URI equal to HTML namespace when the node is an element whose document is an HTML document. [XPATH10] [XMLNS]
Comment 33 Jonathan Robie 2009-09-11 13:22:04 UTC
(In reply to comment #32)
> "NameTest" appears to be nothing but a token in the XPath grammar, so I don't
> understand what it means to talk about it matching elements or anything like
> that. 
> Also, it's hard to determine, out of context, if your proposed text is stating
> a fact that can be derived from other requirements, or if it is intended to be
> introducing new requirements.
> 
> How about this text:
> 
> A node test consisting of a QName with no prefix (i.e. that matches the
> UnprefixedName production) must be treated as if it instead had the namespace
> URI equal to HTML namespace when the node is an element whose document is an
> HTML document. [XPATH10] [XMLNS]

This is a *lot* closer. I would change "must be treated as if it instead had" to "has", and also make it clear that it is the path expression, and not the document, that determines whether the node test is testing an element name.

Here's my best shot at this:

"A node test consisting of a QName with no prefix (i.e. that matches the
UnprefixedName production) 
has the namespace URI http://www.w3.org/1999/xhtml (the HTML namespace)
when the node test occurs in a position where an element or type name is expected
and the path expression is applied to a document that is an HTML document. [XPATH10] [XMLNS]"

I would also change the note, since this is no longer such a willful violation:

Note: This is equivalent to adding the default element namespace feature of XPath 2.0 to XPath 1.0, and using the HTML namespace as the default element namespace for HTML documents. It is motivated by the desire to have implementations be compatible with legacy HTML content while still supporting the changes that this specification introduces to HTML regarding the namespace used for HTML elements, and by the desire to use XPath 1.0 rather than XPath 2.0.
Comment 34 Ian 'Hixie' Hickson 2009-09-11 22:51:43 UTC
> This is a *lot* closer. I would change "must be treated as if it instead had"
> to "has"

It has to say "must" otherwise there's no normative conformance criteria, and it becomes impossible to determine if it's a statement trying to modify XPath, or a statement trying to describe the results of other conformance criteria (and failing).


> and also make it clear that it is the path expression, and not the
> document, that determines whether the node test is testing an element name.

Surely it's the user agent that determines whether the node test is testing an element name?

I used "the node" because that is what XPath 1.0 section 2.3 Node Tests does — it says "A node test that is a QName is true if and only if the type of the node [...]" where "the node" has no referent.

Is there some other term I can use?


> Here's my best shot at this:
> 
> [...] namespace URI http://www.w3.org/1999/xhtml (the HTML namespace)

HTML5 style is to refer to the namespace as "the HTML namespace" with a hyperlink to its definition, not to repeat the namespace wherever it occurs.


> when the node test occurs in a position where an element or type name is
> expected

This doesn't appear to use XPath 1.0 terminology. Do you mean "when the node test's principle node type is element"? If so, isn't this redundant with saying that the condition only applies when "the node" is an element?


> and the path expression is applied to a document that is an HTML document.

As far as I can tell, this would miss nodes that are outside of the document but whose owner document is an HTML document. It would also fail in the case where a node is in a different document than its owner document (e.g. as in an XBL shadow tree), though it may be that XPath doesn't support that today anyway.


> I would also change the note, since this is no longer such a willful violation:

It's exactly as much of a willful violation as before. We haven't actually changed the implementation requirements at all relative to the text the spec had last week, we've just rephrased it in a different way. It's still requiring that implementations break XPath 1.0 requirements.

Please let me know if you still think the spec's current text (quoted aboved) is inadequate.
Comment 35 Jonathan Robie 2009-09-14 22:04:41 UTC
(In reply to comment #34)

> It has to say "must" otherwise there's no normative conformance criteria, and
> it becomes impossible to determine if it's a statement trying to modify XPath,
> or a statement trying to describe the results of other conformance criteria
> (and failing).

OK. I was trying to simplify the wording, you're right about keeping must, the rest of this part is editorial. 
 
> > and also make it clear that it is the path expression, and not the
> > document, that determines whether the node test is testing an element name.
> 
> Surely it's the user agent that determines whether the node test is testing an
> element name?
> 
> I used "the node" because that is what XPath 1.0 section 2.3 Node Tests does
> — it says "A node test that is a QName is true if and only if the type of the
> node [...]" where "the node" has no referent.
> 
> Is there some other term I can use?

You can look at a path expression and tell whether a node test will test elements or not, you don't need a document to do that. The user agent may apply a path expression to a document, but you don't need to do so to determine what the path expression means.

You suggested "when the node test's principle node type is element" - that works fine.

> > Here's my best shot at this:
> > 
> > [...] namespace URI http://www.w3.org/1999/xhtml (the HTML namespace)
> 
> HTML5 style is to refer to the namespace as "the HTML namespace" with a
> hyperlink to its definition, not to repeat the namespace wherever it occurs.

OK.

> > when the node test occurs in a position where an element or type name is
> > expected
> 
> This doesn't appear to use XPath 1.0 terminology. Do you mean "when the node
> test's principle node type is element"? If so, isn't this redundant with saying
> that the condition only applies when "the node" is an element?

"when the node test's principle node type is element" is fine. You don't have "the node" until the path expression is applied to a document - and even then, it might not be applied to a given node. I don't think it's a good idea to define this in terms of the document to which it is applied, and that's certainly not how XPath does it.

If you look at the XPath 1.0 spec, you'll see that principal node type is defined in terms of the expression (with respect to its axes), not the document that is tested.

> > and the path expression is applied to a document that is an HTML document.
> 
> As far as I can tell, this would miss nodes that are outside of the document
> but whose owner document is an HTML document. It would also fail in the case
> where a node is in a different document than its owner document (e.g. as in an
> XBL shadow tree), though it may be that XPath doesn't support that today
> anyway.

OK - leaving that part out is fine with me, and I like the semantics you propose better.

> > I would also change the note, since this is no longer such a willful violation:
> 
> It's exactly as much of a willful violation as before. We haven't actually
> changed the implementation requirements at all relative to the text the spec
> had last week, we've just rephrased it in a different way. It's still requiring
> that implementations break XPath 1.0 requirements.
> 
> Please let me know if you still think the spec's current text (quoted aboved)
> is inadequate.

The NOTE is purely editorial, do what you wish with it.

I'll bring this up on the XML Query WG / XSL WG telcon tomorrow morning at 11:00 - 1:00 EST, and I'm pretty sure you would be welcome to join. I'll also be available on IRC on the #xquery channel during working hours (again EST) if it's helpful to chat.

Comment 36 Ian 'Hixie' Hickson 2009-09-22 07:17:39 UTC
> > This doesn't appear to use XPath 1.0 terminology. Do you mean "when the node
> > test's principle node type is element"? If so, isn't this redundant with saying
> > that the condition only applies when "the node" is an element?
> 
> "when the node test's principle node type is element" is fine. You don't have
> "the node" until the path expression is applied to a document - and even then,
> it might not be applied to a given node. I don't think it's a good idea to
> define this in terms of the document to which it is applied, and that's
> certainly not how XPath does it.

We don't have a choice here as far as I can see — surely we don't want these changes applying outside of HTML documents.

I'm marking this fixed since it appears the text is adequate now; please feel free to reopen it if there is still a problem.
Comment 37 Jonathan Robie 2009-09-23 16:06:07 UTC
The XML Query and XSL Working Groups discussed this in our telcon yesterday, and we suggest that you treat this as a modification of the following language in XPath 1.0:

<oldText>
A QName in the node test is expanded into an expanded-name using the namespace declarations from the expression context. This is the same way expansion is done for element type names in start and end-tags except that the default namespace declared with xmlns is not used: if the QName does not have a prefix, then the namespace URI is null (this is the same way attribute names are expanded). It is an error if the QName has a prefix for which there is no namespace declaration in the expression context.
</oldText>

Here is the text we suggest that you use. 

<newText>
A QName in the node test is expanded into an expanded-name using the namespace declarations from the expression context. If the QName has a prefix, then there must be namespace declaration for this prefix in the expression context, and the correponding namespace URI is the one that is associated with this prefix. It is an error if the QName has a prefix for
which there is no namespace declaration in the expression context. 

If the QName has no prefix and the principal node type of the axis is element, then the default element namespace is used. Otherwise if the QName has no prefix, the namespace URI is null. The default element namespace is a member of the  context for the XPath expression. The value of the default element namespace when executing an XPath expression through the DOM3 XPath API is determined in the following way:

(1) If the context node is from an HTML DOM, the default element namespace is
"http://www.w3.org/1999/xhtml".

(2) Otherwise, the default element namespace URI is null.

Note: This is equivalent to adding the default element namespace feature of XPath 2.0 to XPath 1.0, and using the HTML namespace as the default element namespace for HTML documents. It is motivated by the desire to have implementations be compatible with legacy HTML content while still supporting the changes that this specification introduces to HTML regarding the namespace used for HTML elements, and by the desire to use XPath 1.0 rather than XPath 2.0.
</newText>

Jonathan
(on behalf of the XML Query and XSL Working Groups)

 
Comment 38 Ian 'Hixie' Hickson 2009-09-28 19:26:41 UTC
Ok, I've checked in the above. Thanks!
Comment 39 contributor 2009-09-28 19:26:49 UTC
Checked in as WHATWG revision r4007.
Check-in comment: Rewrite how we patch XPath 1.0 for HTML5.
http://html5.org/tools/web-apps-tracker?from=4006&to=4007
Comment 40 Michael[tm] Smith 2009-10-05 04:36:09 UTC
Jonathan,

Can you please review the current text in the spec and confirm whether you (and the XQuery and XSL WGs) are satisfied with it or if you want to request any further changes to it. The section is here:

http://dev.w3.org/html5/spec/embedded-content-0.html#interactions-with-xpath-and-xslt 
Comment 41 Maciej Stachowiak 2010-03-14 14:48:20 UTC
This bug predates the HTML Working Group Decision Policy.

If you are satisfied with the resolution of this bug, please change the state of this bug to CLOSED. If you have additional information and would like the editor to reconsider, please reopen this bug. If you would like to escalate the issue to the full HTML Working Group, please add the TrackerRequest keyword to this bug, and suggest title and text for the tracker issue; or you may create a tracker issue yourself, if you are able to do so. For more details, see this document:
  http://dev.w3.org/html5/decision-policy/decision-policy.html

This bug is now being moved to VERIFIED. Please respond within two weeks. If this bug is not closed, reopened or escalated within two weeks, it may be marked as NoReply and will no longer be considered a pending comment.
Comment 42 Jonathan Robie 2010-04-19 13:16:37 UTC
I am happy with these changes.