This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 6777 - In HTML documents, no-namespace expression must match http://www.w3.org/1999/xhtml nodes
Summary: In HTML documents, no-namespace expression must match http://www.w3.org/1999/...
Status: RESOLVED INVALID
Alias: None
Product: XPath / XQuery / XSLT
Classification: Unclassified
Component: XPath 2.0 (show other bugs)
Version: Recommendation
Hardware: All All
: P2 enhancement
Target Milestone: ---
Assignee: Jonathan Robie
QA Contact: Mailing list for public feedback on specs from XSL and XML Query WGs
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-04-06 08:31 UTC by Henri Sivonen
Modified: 2009-06-26 14:55 UTC (History)
6 users (show)

See Also:


Attachments

Description Henri Sivonen 2009-04-06 08:31:20 UTC
HTML 5 harmonizes the DOM representation of HTML and XHTML so that HTML element
nodes parsed from text/html streams are assigned into the
http://www.w3.org/1999/xhtml namespace. 

http://www.w3.org/TR/html5/tree-construction.html#creating-and-inserting-elements
http://www.w3.org/TR/html5/tree-construction.html#namespaces

(CSS has previously harmonized selector
namespace-sensitivity behavior so that HTML nodes are treated as if they were
in the http://www.w3.org/1999/xhtml namespace.)

Existing content expects to be able to use document.evaluate() to run XPath expressions on DOM tree parsed from text/html in such a way that name expressions that are in no namespace match against HTML elements that have the local name specified in the expression. (Note that existing browsers already store the local name in lower case internally, and HTML 5 exposes this through the DOM localName property as well, so the local name is a non-issue here.)

Thus, to support existing content, implementations must during XPath evaluation consider a name expression to match is the following case in addition to the cases where it is already specified to match:
 * The name expression has no namespace.
AND
 * The name expression has local name l.
AND
 * The expression is being tested against an element node.
AND
 * The element node has local name l.
AND
 * The element node has namespace http://www.w3.org/1999/xhtml
AND
 * The owner document of the element node is an "HTML document", where "HTML document" is a term of the art defined in HTML 5:
http://www.w3.org/TR/html5/dom.html#html-documents

I think the XPath specification should specify this behavior.

The relevant WebKit changeset is
http://trac.webkit.org/changeset/19564

The relevant Gecko bug is
https://bugzilla.mozilla.org/show_bug.cgi?id=468708
Comment 1 Michael Kay 2009-04-06 08:56:39 UTC
Reclassified as an enhancement request.

Personal response:

(1) Could you please explain why the XPath 2.0 facility of defining a default element namespace for unprefixed QNames in an XPath expression is not adequate to meet the requirement?

(2) XPath works on data conforming to the XDM data model. If you want to define a mapping from the concept of "HTML document" defined in HTML5 to the concept of an XDM document, then you are perfectly free to do so, and implementors are free to implement it. I would think that such a mapping belongs architecturally in the HTML5 specification rather than in the XDM specification.

Michael Kay
Comment 2 Henri Sivonen 2009-04-06 09:40:20 UTC
(In reply to comment #1)
> (1) Could you please explain why the XPath 2.0 facility of defining a default
> element namespace for unprefixed QNames in an XPath expression is not adequate
> to meet the requirement?

The important thing is that specs and/or implementations need to change in such a way that existing content that call document.evaluate() doesn't need to change.

Changing document.evaluate() to supply a different namespace mapping context to the XPath expression compiler would entail tampering with the values returned by an arbitrarily programmable XPathNSResolver object.
http://www.w3.org/TR/DOM-Level-3-XPath/xpath.html#XPathNSResolver

CCing Alexey Proskuryakov for comment about the implementation rationale in WebKit. I sought to match the WebKit behavior exactly in Gecko and didn't consider remapping the return values of XPathNSResolver instead.

> (2) XPath works on data conforming to the XDM data model. If you want to define
> a mapping from the concept of "HTML document" defined in HTML5 to the concept
> of an XDM document, then you are perfectly free to do so, and implementors are
> free to implement it. I would think that such a mapping belongs architecturally
> in the HTML5 specification rather than in the XDM specification.

HTML 5 defines its data model in terms of the DOM. DOM Level 3 defines a mapping to the Infoset. XDM defines construction from an Infoset. Thus, an HTML element in an HTML document is in the http://www.w3.org/1999/xhtml for the purposes of XDM and defining this differently would be architecturally unsound if the direct HTML 5 -> XDM and the indirect HTML 5 -> DOM -> Infoset -> XDM mappings produced different XDM results.
Comment 3 Henri Sivonen 2009-04-06 09:45:19 UTC
(In reply to comment #2)
> I sought to match the WebKit behavior exactly in Gecko and didn't
> consider remapping the return values of XPathNSResolver instead.

Note that expressions that will be matched against attribute names would still need to treat no-namespace names as no-namespace names, since the attribute on HTML elements are in no namespace.
Comment 4 Michael Kay 2009-04-06 09:47:54 UTC
>The important thing is that specs and/or implementations need to change in such a way that existing content that call document.evaluate() doesn't need to change.

Could I get this clear? You are asking for W3C specifications to change in such a way that they match the behaviour of a Microsoft-defined API?
Comment 5 Henri Sivonen 2009-04-06 11:19:56 UTC
(In reply to comment #4)
> Could I get this clear? You are asking for W3C specifications to change in such
> a way that they match the behaviour of a Microsoft-defined API?

I don't know of any Microsoft origins of document.evaluate(). It seems that Netscape was the first to expose it to HTML documents, and Netscape also contributed the editor for the relevant DOM Level 3 spec.

The origin doesn't matter, though. Gecko, WebKit and Opera support the API and expose it to Web content, so breaking existing content that uses the API would not be good. It is easier to change specs than to change existing content.

I chatted with Alexey Proskuryakov about this on IRC. The following points came up:
 * It is desirable to put the change in the XPath evaluation phase, because this way the XPath expression compiler doesn't need ahead of evaluation time whether a given name expression will be tested against an element or against an attribute.

 * This way, it is still possible to match a no-namespace element node should an author want to actually match one. (This is mostly theoretical.)

 * DOM Level 3 XPath is defined in terms of XPath 1.0 and browsers implement XPath 1.0. XPath 2.0 is incompatible with XPath 1.0, so browsers can't implement XPath 2.0 for the document.evaluate() API. Therefore, XPath 2.1 probably isn't the right place to specify this for the purposes of the existing API unless XPath 2.1 restores compatibility with existing XPath 1.0 content.

I'll follow up with the Web Apps WG that now manages the maintenance of DOM specs.
Comment 6 Olli Pettay 2009-04-06 12:35:14 UTC
(In reply to comment #2) 
> The important thing is that specs and/or implementations need to change in such
> a way that existing content that call document.evaluate() doesn't need to
> change.

Note, there is no need to change XPath handling, if HTML5 is changed to
behave like some of the current browser engines do: HTML elements are in
null namespace. (I know you don't like this option.)
Comment 7 Henri Sivonen 2009-04-06 13:02:51 UTC
(In reply to comment #6)
> Note, there is no need to change XPath handling, if HTML5 is changed to
> behave like some of the current browser engines do: HTML elements are in
> null namespace. (I know you don't like this option.)

This is the only case where the harmonization makes implementation less elegant than it was before. In all other cases, the harmonization improves implementation elegance by removing dual code paths for HTML and XHTML.
Comment 8 Elliotte Harold 2009-04-06 14:09:08 UTC
It is unreasonable to expect the XPath specification to request special treatment for one class of documents. You are essentially proposing to fork the XPath specification so that different rules apply depending on the input document and the processor. The same result should entail whether an XPath expression is evaluated by a DOM inside a browser or by an external processor outside the browser not using the DOM at all. You are proposing that these two cases would produce different results because one would understand the case of an owner document and one would not.


This is an ugly idea that would significantly increase the complexity and learning curve for XPath, as well as break much existing software that is designed to  process HTML documents using the current well-defined XPath data model. 

The correct solution is simple: require namespace well-formedness for HTML 5 documents. Until the spec takes that simple step, you're going to find yourself asking for one special case after another. This is not the first and it will not be the last. 

Almost everything the W3C has done for the last 12 years has been predicated on the notion of namespace well-formedness. You may be right that this was the wrong decision, and that we need to throw out 12 years of deployed tools and technologies and start over. However, don't expect that we can retrofit your new model onto the existing stack. If HTML 5 won't accept namespace well-formedness, then it's going to have to build its own replacements for XPath, XQuery, etc. The existing ones just won't work. 
Comment 9 Michael Kay 2009-04-06 14:11:15 UTC
>breaking existing content that uses the API would not be good

OK, so we're talking about the DOM level 3 API to XPath 1.0. You're changing the DOM representation of the content so that the elements will be in the XHTML namespace instead of the null namespace, and you want to change the semantics of XPath 1.0 so that it behaves as if you had not made this fundamental change. Is that right?
Comment 10 Jonathan Robie 2009-04-06 14:44:21 UTC
> The important thing is that specs and/or implementations need to change in such
> a way that existing content that call document.evaluate() doesn't need to
> change.

I think you want your DOM3 implementation to use http://www.w3.org/1999/xhtml as the default element namespace.

See http://www.w3.org/TR/xpath20/#dt-def-elemtype-ns.

I don't think this requires a change to the XPath specification.

Jonathan
Comment 11 Henri Sivonen 2009-04-06 15:04:04 UTC
(In reply to comment #8)
> It is unreasonable to expect the XPath specification to request special
> treatment for one class of documents. 

text/html is a notable class of documents.

> You are essentially proposing to fork the XPath specification so 
> that different rules apply depending on the input document and 
> the processor.

I'm not proposing different rules by processor. I'm proposing a different rule to apply depending on the HTMLness flag of the owner document. Non-browser XPath processors don't need such a flag and would, therefore, be unaffected.

> The same result should entail whether an XPath
> expression is evaluated by a DOM inside a browser or by an external processor
> outside the browser not using the DOM at all. You are proposing that these two
> cases would produce different results because one would understand the case of
> an owner document and one would not.

If a processor accepts XPath expressions from JavaScript programs that are out there on the Web, a no-namespace expression needs to match nodes parsed from text/html. If a non-DOM processor doesn't accept XPath expressions from existing JavaScript programs, it is unaffected.

This wouldn't be the first time that APIs are subtly different in the browser and in server-side Java, BTW. For example, DOM getAttribute method must return null in browsers when the attribute is missing. Returning the empty string like the spec says would Break the Web.

> This is an ugly idea that would significantly increase the complexity and
> learning curve for XPath, as well as break much existing software that is
> designed to  process HTML documents using the current well-defined XPath data
> model.

The point of the change is to reduce the differences between DOM/Infoset representations of equivalent text/html and application/xhtml+xml documents. This, for example, removes the need to make Selectors behave as if HTML elements HTML documents were in the http://www.w3.org/1999/xhtml namespace, because with this change they simply are in that namespace without "as if". This should be considered an architectural win.

In fact, while implementing this change, this XPath issue was the only case where elegance was reduced instead of being increased. It's unfortunate that this then logically must be the case that people working primarily with XML notice.

> The correct solution is simple: require namespace well-formedness for HTML 5
> documents. 

Into which namespace would you assign HTML element nodes?

> Until the spec takes that simple step, you're going to find yourself
> asking for one special case after another. This is not the first and it will
> not be the last. 

What step, precisely, would you like HTML5 to take?

> Almost everything the W3C has done for the last 12 years has been predicated on
> the notion of namespace well-formedness. You may be right that this was the
> wrong decision, and that we need to throw out 12 years of deployed tools and
> technologies and start over. However, don't expect that we can retrofit your
> new model onto the existing stack.

The way I see it, HTML is what is being retrofitted to the namespace-enabled stack here.

> If HTML 5 won't accept namespace
> well-formedness, then it's going to have to build its own replacements for
> XPath, XQuery, etc. The existing ones just won't work. 

The whole point of making text/html and application/xhtml+xml DOMs / Infosets consistent is to reduce the number of special cases and to enable the use of the same above-DOM/Infoset technologies for both. 

Unfortunately, in the case of DOM Level 3 XPath API, existing content has been deployed prior to this harmonization. (Obviously, we'd have no issue here if HTML and XHTML had been namespace-wise consistent ever since DOM Level 2.)

(In reply to comment #9)
> >breaking existing content that uses the API would not be good
> 
> OK, so we're talking about the DOM level 3 API to XPath 1.0. You're changing
> the DOM representation of the content so that the elements will be in the XHTML
> namespace instead of the null namespace, and you want to change the semantics
> of XPath 1.0 so that it behaves as if you had not made this fundamental change.
> Is that right?

Looking backward on existing content, that is right. I'm now pursuing this in the context of the spec defining document.evaluate() instead of pursuing this in the context of the spec defining XPath itself.

Looking forward, the same XPath name expressions that use the http://www.w3.org/1999/xhtml namespace will work on both trees originating from text/html and application/xhtml+xml.
Comment 12 Henri Sivonen 2009-04-06 15:09:21 UTC
(In reply to comment #10)
> > The important thing is that specs and/or implementations need to change in such
> > a way that existing content that call document.evaluate() doesn't need to
> > change.
> 
> I think you want your DOM3 implementation to use http://www.w3.org/1999/xhtml
> as the default element namespace.
> 
> See http://www.w3.org/TR/xpath20/#dt-def-elemtype-ns.
> 
> I don't think this requires a change to the XPath specification.

Indeed, this would make sense for XPath 2.0. document.evaluate() is constrained to being compatible with XPath 1.0, though. When I filed this bug, I thought I should just get this noted in the latest version of XPath without realizing that document.evaluate() wouldn't be migrating to XPath 2.0 for other reasons.

What would be the best way to approximate the behavior of an XPath 2.0 processor with the http://www.w3.org/1999/xhtml namespace as its default namespace and with the Compatibility Mode set to true in a processor that is otherwise only an XPath 1.0 processor?
Comment 13 Elliotte Harold 2009-04-06 15:31:53 UTC
Henri,

Just to answer a couple of your points and questions, and then let's take it to private e-mail or whatwg or some such since a bug tracking system isn't the right place for a debate.

The specific step I would like to see happen is for HTML 5 to mandate namespace well-formedness and require draconian error handling. I am simply not convinced by the numerous arguments made against that simple step. 

However I realize that this is not the direction the WhatWG is going to go. Fine. Maybe you're even right. In that case, I want the WhatWg to stop pretending that HTML is really XML and that it can use XML-specs like XPath. It can't. They don't work with non-namespace well-formed documents. Please stop trying to pollute useful, relatively sensible specs like XPath 1.0 with ugly kludges to support one special case. 
Comment 14 Michael Kay 2009-04-06 15:35:03 UTC
>What would be the best way to approximate the behavior of an XPath 2.0 processor with the http://www.w3.org/1999/xhtml namespace as its default namespace and with the Compatibility Mode set to true in a processor that is otherwise only an XPath 1.0 processor?

You can't. XPath 1.0 says that an unprefixed QName in the XPath expression matches only elements that are in no namespace. That is, //p guarantees to return a set of elements having local-name()='p', namespace-uri()=''. That's a fundamental invariant of XPath 1.0 and can't be changed in a conformant implementation. If you want to change that invariant, then you are inventing a language that is not XPath 1.0.

It's of course high time that browsers and the DOM moved forward to XPath 2.0, in which case they could take advantage of new facilities in XPath 2.0, like the ability to define a default namespace for unprefixed element names. That still wouldn't achieve compatibility of course - the only way to ensure that the expression //p[namespace-uri()=''] continues to return the same result that it does now is to map your HTML5 DOM to an XDM instance (or the XPath 1.0 equivalent) in which the p elements are in no namespace. Since you're so concerned about compatibility, I can't see why you are so reluctant to do that.
Comment 15 Henri Sivonen 2009-04-06 15:44:36 UTC
(In reply to comment #14)
> That
> still wouldn't achieve compatibility of course - the only way to ensure that
> the expression //p[namespace-uri()=''] continues to return the same result that
> it does now is to map your HTML5 DOM to an XDM instance (or the XPath 1.0
> equivalent) in which the p elements are in no namespace. Since you're so
> concerned about compatibility, I can't see why you are so reluctant to do that.

That would address the backward-compatibility issue without addressing the forward-looking point of changing how HTML elements in the DOM are assigned to a namespace. The point of assigning HTML elements to the http://www.w3.org/1999/xhtml namespace is to make the data model representation consistent between the two serializations of HTML (text/html and application/xhtml+xml). So that in the future, authors can write XPath expressions with names in the http://www.w3.org/1999/xhtml namespace and those expressions work on DOMs regardless of whether the DOM came from text/html or application/xhtml+xml.

Comment 16 Jonathan Robie 2009-04-06 15:52:23 UTC
(In reply to comment #15)
> (In reply to comment #14)
> > That
> > still wouldn't achieve compatibility of course - the only way to ensure that
> > the expression //p[namespace-uri()=''] continues to return the same result that
> > it does now is to map your HTML5 DOM to an XDM instance (or the XPath 1.0
> > equivalent) in which the p elements are in no namespace. Since you're so
> > concerned about compatibility, I can't see why you are so reluctant to do that.
> 
> That would address the backward-compatibility issue without addressing the
> forward-looking point of changing how HTML elements in the DOM are assigned to
> a namespace. The point of assigning HTML elements to the
> http://www.w3.org/1999/xhtml namespace is to make the data model representation
> consistent between the two serializations of HTML (text/html and
> application/xhtml+xml). So that in the future, authors can write XPath
> expressions with names in the http://www.w3.org/1999/xhtml namespace and those
> expressions work on DOMs regardless of whether the DOM came from text/html or
> application/xhtml+xml.


Name tests do something very simple: the test to see whether the name specified in the name test is the same as the name of a node. If they don't match, the test says they are not the same.

I think the tools you have to work with have already been identified - you have control over the mapping to the data model, and you have control over the default element namespace for queries. 

I don't think redefining XPath to account for the way you are mapping things makes sense. It's a little like redefining the + operator if your accounts don't balance, these are simple, well-defined operators that are universal, and should not be adapted to particular use cases.

Jonathan
Comment 17 Michael Kay 2009-04-06 15:55:18 UTC
Sorry, but if you want the elements to be in a namespace, then I don't think you can persuade XPath 1.0 (or indeed XPath 2.0) to behave as if they aren't.
Comment 18 Henri Sivonen 2009-04-06 16:04:59 UTC
(In reply to comment #16)
> I think the tools you have to work with have already been identified - you have
> control over the mapping to the data model, and you have control over the
> default element namespace for queries. 

The data model approach would mean that you could never even in the future write unified XPath expressions for both text/html and application/xhtml+xml DOMs.

As for the default namespace, if there's a way how I could hack the XPath 2.0 default element namespace behavior into an XPath 1.0 processor, I'd be interested in exploring that approach. Requiring the XPath 1.0 processor to be replaced with an XPath 2.0 processor entirely is not a feasible first step.

How far away from approximating XPath 2.0 in the Compatibility Mode with http://www.w3.org/1999/xhtml as the default namespace would the result be if the check was:
If the document is an HTML document and a name expression is being compared against an element, behave as if "" namespace on the expression side were the "http://www.w3.org/1999/xhtml" namespace.
?
Comment 19 Jonathan Robie 2009-04-06 17:39:37 UTC
(In reply to comment #18)
> (In reply to comment #16)
> > I think the tools you have to work with have already been identified - you have
> > control over the mapping to the data model, and you have control over the
> > default element namespace for queries. 
> 
> The data model approach would mean that you could never even in the future
> write unified XPath expressions for both text/html and application/xhtml+xml
> DOMs.

I don't understand this argument. You want us to treat these as though they have the same name. You have control over the XDM that is used for processing. You can give them the same name in the XDM.

If you want to say they sort of have the same name, I don't think XPath can help you, because we have no concept of sort of having the same name. If your design requires a data model in which names that are different are considered the same, I think you might want to explore whether that design can be changed.

> As for the default namespace, if there's a way how I could hack the XPath 2.0
> default element namespace behavior into an XPath 1.0 processor, I'd be
> interested in exploring that approach. Requiring the XPath 1.0 processor to be
> replaced with an XPath 2.0 processor entirely is not a feasible first step.
 
That's a much less radical change to the DOM than the change you are asking for in XPath. Changing the fact that a nametest matches a node that has th same name is not a feasible first step either.

> How far away from approximating XPath 2.0 in the Compatibility Mode with
> http://www.w3.org/1999/xhtml as the default namespace would the result be if
> the check was:
> If the document is an HTML document and a name expression is being compared
> against an element, behave as if "" namespace on the expression side were the
> "http://www.w3.org/1999/xhtml" namespace.
> ?

How far from the current definition of integer addition would it be to say that 2 + 2 = 5 if the integers are drawn from a particular domain?

Our operators are defined purely in terms of our data model. They never look to see where the data model came from - in fact, there's nothing in our data model to tell us where its data originally came from. That's how it should be.

I think you should treat the semantics of XQuery 1.0, XPath 1.0, and XPath 1.1 as a given, these are existing recommendations with many users. 

Jonathan
Comment 20 Henri Sivonen 2009-04-06 18:40:14 UTC
(In reply to comment #19)
> (In reply to comment #18)
> > (In reply to comment #16)
> > > I think the tools you have to work with have already been identified - you have
> > > control over the mapping to the data model, and you have control over the
> > > default element namespace for queries. 
> > 
> > The data model approach would mean that you could never even in the future
> > write unified XPath expressions for both text/html and application/xhtml+xml
> > DOMs.
> 
> I don't understand this argument. You want us to treat these as though they
> have the same name. You have control over the XDM that is used for processing.
> You can give them the same name in the XDM.

The thing is that HTML nodes need to match against two kinds of expressions:
1) Expressions from the past the have no namespace.
2) Expressions from the future that have the http://www.w3.org/1999/xhtml namespace.

Basically, the issue is having the backwards-compatibility cake and eating the HTML/XHTML DOM consistency cake, too.

> They never look to see where the data model came from - in fact, 
> there's nothing in our data model to tell us where its data 
> originally came from. That's how it should be.

There isn't an HTMLness flag in the W3C DOM, either. Ideally, that's how it should be. Unfortunately, compatibility with existing content requires DOM documents in browsers to know where they came from. In DOM Level 2, even element nodes were different depending on where they come from. That's what's implemented in Gecko today.

HTML 5 is trying to contain this problem by reducing these differences so that element nodes no longer know where they came from but their owner document still knows and needs to be queried in a handful of cases. This is what's implemented in WebKit today and what's being implemented in Gecko. Unfortunately, the only place in the entire browser platform (discovered so far, I suppose) where this effort has a negative impact on elegance and complexity in XPath as used in document.evaluate().
Comment 21 Jonathan Robie 2009-04-06 19:57:24 UTC
(In reply to comment #20)
 
> The thing is that HTML nodes need to match against two kinds of expressions:
> 1) Expressions from the past the have no namespace.
> 2) Expressions from the future that have the http://www.w3.org/1999/xhtml
> namespace.
> 
> Basically, the issue is having the backwards-compatibility cake and eating the
> HTML/XHTML DOM consistency cake, too.

But you do this by trying to force a major inconsistency into XPath. Speaking for myself, I'd be very surprised if this proposal were accepted.

A nametest that does not contain a wildcard specifies one name. It matches nodes that have that name. I would strongly oppose any change to that.

Jonathan
Comment 22 David Carlisle 2009-04-06 20:22:43 UTC
This bug is clearly misclassified (against XPath2.x) when the issue appears to be an Xpath 1.0 one. It would seem that the default element namespace in Xpath2
would meet the requirements.

So what is being requested is actually an XPath 1.2 specification which is XPath 1.0 but with the addition of the default element namespace feature. I can't see there being much enthusiasm for a new version of xpath 1 this long after xpath 2 has been released.

One possible route forward would be for the HTML5 spec to do do what Henri indicated is not feasible.

> Requiring the XPath 1.0 processor to be
> replaced with an XPath 2.0 processor entirely is not a feasible first step.


HTML5 could (like many specs before it) define a profile/subset of xpath2.

(In this case, basically default to BC mode, initialise the default element namespace to xhtml, don't allow "for" or  "," operators and restrict the function library to those functions that were in xpath 1). Any Xpath 1 engine that could not make the small number of changes to support such a profile of xpath 2 probably isn't going to change at all anyway so it doesn't matter what spec you make for them.

Defining things in terms of a subset of XPath2 rather than an extension of XPath 1 gives more or less equivalent functionality now, but with a much clearer path forward to future versions using full XPath 2.x.

I find it ironic that a breaking change in xpath (which would confuse all existing xpath users) is being proposed in the name of preserving existing content when for example changes such as outlawing the use of

<?xml-stylesheet href="../style/foo.xsl"

in Firefox 3.x broke a large fraction of existing pages (when used from
the local filesystem). If it's thought acceptable to force people to rearrange the directory structure of their site and all the links in all their pages, why is it unacceptable to  admit that you have changed the namespace used in the html to XDM mapping, and that people using xpath will need to update to the new reality?

David
(interested observer, I'm not on any relevant WG)
Comment 23 Henri Sivonen 2009-04-07 15:09:38 UTC
(In reply to comment #22)
> So what is being requested is actually an XPath 1.2 specification which is
> XPath 1.0 but with the addition of the default element namespace feature.

Right. I filed this Bugzilla item on the wrong spec. (I was unaware that XPath 2.0 was incompatible with XPath 1.0, so I thought the latest spec would be the most natural place.)

> One possible route forward would be for the HTML5 spec to do do what Henri
> indicated is not feasible.
> 
> > Requiring the XPath 1.0 processor to be
> > replaced with an XPath 2.0 processor entirely is not a feasible first step.
> 
> 
> HTML5 could (like many specs before it) define a profile/subset of xpath2.

My statement of feasibility wasn't about specs. It was about software. That is, it's necessary to address this issue in implementations independently of adding the entire set of XPath 2.0 features.

> (In this case, basically default to BC mode, initialise the default element
> namespace to xhtml, don't allow "for" or  "," operators and restrict the
> function library to those functions that were in xpath 1). Any Xpath 1 engine
> that could not make the small number of changes to support such a profile of
> xpath 2 probably isn't going to change at all anyway so it doesn't matter what
> spec you make for them.

In an implementation that doesn't allow a prefix to be bound to no namespace (Gecko currently allows this but WebKit and Presto don't), how would the outcome be black-box distinguishable from the behavior I asked about in comment #18?

| If the document is an HTML document and a name 
| expression is being compared against an element, 
| behave as if "" namespace on the expression side 
| were the "http://www.w3.org/1999/xhtml" namespace.

> I find it ironic that a breaking change in xpath (which would confuse all
> existing xpath users) is being proposed in the name of preserving existing
> content when for example changes such as outlawing the use of
> 
> <?xml-stylesheet href="../style/foo.xsl"
> 
> in Firefox 3.x broke a large fraction of existing pages (when used from
> the local filesystem).

Breaking existing content on the Web for a non-security reason and breaking content on the local file system for a security reason are totally different things.

> If it's thought acceptable to force people to rearrange
> the directory structure of their site and all the links in all their pages, why
> is it unacceptable to  admit that you have changed the namespace used in the
> html to XDM mapping, and that people using xpath will need to update to the new
> reality?

The sites still worked when accessed via HTTP, right?
Comment 24 David Carlisle 2009-04-07 15:46:03 UTC
> Right. I filed this Bugzilla item on the wrong spec. (I was unaware that XPath
> 2.0 was incompatible with XPath 1.0, so I thought the latest spec would be the
> most natural place.)

Apart from edge cases in test suites the incompatibilities are pretty hard to
spot. Even if it was 100% compatible reporting an error on xpath2 , which has the feature that you want (making unprefixed names refer to xhtml) would not really help. So what possible change could you ask for in the XPath 2 specification?

> My statement of feasibility wasn't about specs. It was about software. That is,
> it's necessary to address this issue in implementations independently of adding
> the entire set of XPath 2.0 features.

Yes exactly. That's why I suggested that you should define a profile of xpath2
that has the features that you think should be supported by a conforming application.


>  In an implementation that doesn't allow a prefix to be bound to no namespace
> (Gecko currently allows this but WebKit and Presto don't), how would the
> outcome be black-box distinguishable from the behavior I asked about in comment
> #18?


It avoids confusing the poor xpath user.  If you document something as being xpath 1 then the language implemented should be xpath 1, where it is to be expected that unprefixed names mean no namespace, and it has been that way for 10 years.

If you document something as being xpath2 then unprefixed names refer to the default element namespace, which the browser is free to default to xhtml.
or in the words of the xpath 2 spec: "A default initial value for each component may be specified by the host language. "

If you document a specific profile of xpath2  that avoids certain features not likely to be implemented in the next range of browsers, so be it, again it's clear to the user what to expect.

David




Comment 25 Henri Sivonen 2009-06-26 09:24:45 UTC
(In reply to comment #24)
> So what possible change could you ask for in the XPath 2
> specification?

Since browsers don't implement XPath 2, I no longer ask for any change in XPath 2. Marking this as INVALID.

> It avoids confusing the poor xpath user.  If you document something as being
> xpath 1 then the language implemented should be xpath 1, where it is to be
> expected that unprefixed names mean no namespace, and it has been that way for
> 10 years.

Can't do that, since it would break existing Web content. Breaking existing Web content isn't an option here.

This issue is now addressed by HTML 5:
http://www.whatwg.org/specs/web-apps/current-work/#interactions-with-xpath-and-xslt
(Implemented in Gecko and WebKit.)
Comment 26 Jonathan Robie 2009-06-26 14:27:31 UTC
> This issue is now addressed by HTML 5:
> http://www.whatwg.org/specs/web-apps/current-work/#interactions-with-xpath-and-xslt
> (Implemented in Gecko and WebKit.)

I'm speechless.

You say that because you do not want to implement XPath 2.0, which solves your problems, you want to fork the XPath standard by creating a new mode where name tests have special-purpose semantics.

XPath 2.0 was published 23 January 2007, and is widely implemented. In June, 2009 you propose to create an incompatible version of XPath 1.0 and say that web browsers must implement this instead.

I think the solution is to change your implementation to support the standards, not to fork the standards to suit your implementation.
Comment 27 Jonathan Robie 2009-06-26 14:55:22 UTC
I added this bug against HTML5, where the bug seems to have moved:

http://www.w3.org/Bugs/Public/show_bug.cgi?id=7059