This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 6131 - [XPath 2.1] Requirement: context-free paths
Summary: [XPath 2.1] Requirement: context-free paths
Status: CLOSED FIXED
Alias: None
Product: XPath / XQuery / XSLT
Classification: Unclassified
Component: XPath 3.0 (show other bugs)
Version: Recommendation
Hardware: PC Windows NT
: P2 enhancement
Target Milestone: ---
Assignee: Jonathan Robie
QA Contact: Mailing list for public feedback on specs from XSL and XML Query WGs
URL:
Whiteboard:
Keywords: needsDrafting
Depends on:
Blocks:
 
Reported: 2008-10-01 09:17 UTC by Michael Kay
Modified: 2013-06-19 07:33 UTC (History)
7 users (show)

See Also:


Attachments

Description Michael Kay 2008-10-01 09:17:32 UTC
There's a common requirement to generate paths whose evaluation is context free, that is, paths that don't require external binding of namespace prefixes. For an example, see

http://lists.w3.org/Archives/Member/w3c-xsl-wg/2008Sep/0099.html

(member-only link)

Other use cases include XPath expressions contained in XML documents for dynamic evaluation of business rules.

It's possible to use XQuery instead of XPath, in which case you can bind namespaces in the query prolog, but that's not particulary convenient for people wanting to generate paths, as it means all the namespaces must be known at once, and distinct prefixes need to be allocated. Also, XPath implementations are typically smaller than XQuery implementations.

One approach would be to define some kind of internal syntax for binding prefixes within an XPath expression. This could be based on the XPointer syntax or the XQuery syntax. There are two disadvantages with this approach: (a) neither of these existing syntaxes blends well into XPath (and it would be a shame to invent a third), and (b) it still leaves the software that generates the path with the unnecessary job of finding all the namespace URIs up-front and inventing unique prefixes for them. Using the URI "inline" is verbose and perhaps takes fractionally longer to parse, but for many applications this simply doesn't matter.

The proposal is therefore to extend the syntax so that 

(1) all the places in the grammar that currently refer to QName refer instead to EQName. In XPath these constructs are NameTest, VarName, FunctionCall, AtomicType, AttributeName, ElementName, TypeName

(2) the syntax for EQName is

EQName := QName | ExpandedName

ExpandedName := DelimitedURI NCName /* ws:explicit */

DelimitedURI := '`' (EscapeTick | [^`])* '`'

EscapeTick := "``"

I have proposed the "backtick" (x60 grave accent) as the delimiter in preference to the more familiar curly braces (known as Clark notation) to avoid confusion, if not technical ambiguity, with other uses of curly braces in XSLT and XQuery. The backtick is not currently used in XSLT or in XQuery and it is also one of the few characters that cannot legally appear in an IRI; nevertheless I have taken the precaution of allowing it to be escaped by doubling. It's also a rather unattractive character because of its poor legibility, which doesn't matter for the target use case of machine-generated XPaths, but means that we aren't using a character that we would want to save for a greater role in the future. I originally considered proposing non-ASCII characters, perhaps chevrons, but all such characters are allowed in IRIs.

If an EQName contains neither a prefix nor a DelimitedURI then it denotes a name in the default namespace (for example the default namespace for elements and types). If it starts with an empty DelimitedURI, for example ``book, then it denotes a name in no namespace. (This particular notation might prove popular even for hand-written XPath). 

This proposal is specifically for XPath 2.1 but I would imagine XQuery 1.1 will want to remain compatible. I would recommend allowing EQName in place of QName in all places in XQuery except where XML syntax is being mimicked, that is, in direct element and attribute constructors.

(I would actually like to see this notation available in the lexical space of the xs:QName data type, and even in XML syntax for writing element and attribute names. But getting everyone to agree might not be trivial!)

For XSLT, similar issues of context-sensitivity apply in two situations: generated stylesheets, and QNames used at run-time to refer to objects such as keys, decimal formats, output formats, and system property names. If this proposal is accepted I will follow up with an XSLT proposal to allow EQNames in such contexts.
Comment 1 Michael Kay 2008-10-01 09:27:15 UTC
One minor addition to the proposal: allow

[37] Wildcard ::=    

"*"
| (NCName ":" "*")
| (DelimitedURI "*")
| ("*" ":" NCName)    /* ws:explicit */
Comment 2 Michael Kay 2008-12-17 13:53:17 UTC
Another use case for this feature: it's difficult in XQuery to generate XHTML (where output elements have a namespace and no prefix) if the input is in no namespace. See http://www.xmlplease.com/xquery-xhtml, which recommends using wildcards in paths as a workaround for this problem:

<p xmlns="http://www.w3.org/1999/xhtml">
  {data(*:para)}
</p>

This is pretty nasty, because it creates the possibility that you will select elements in the wrong namespace, and it means much less static analysis is possible; it will also make it harder to take advantage of indexes and other optimization techniques.

Without introducing the complexity of separate default namespaces for input and output (which is the XSLT solution), the proposal in this bugzilla would enable users to refer to names in no namespace as

<p xmlns="http://www.w3.org/1999/xhtml">
  {data(``:para)}
</p>
Comment 3 Innovimax 2009-07-03 21:57:54 UTC
In comment #2 my understanding is that it should be

<p xmlns="http://www.w3.org/1999/xhtml">
  {data(``para)}
</p>

(without the colon)
Comment 4 Jonathan Robie 2009-07-07 17:28:49 UTC
John and I discussed this, and we rather like this syntax:

NamespaceDeclExpr ::= "namespace" NCName "=" URILiteral "{" Expr "}"

Example:

namespace a = http://example.com/a { /a:foo/a:bar }


Note that this is parallel to the existing NamespaceDecl production:

NamespaceDecl ::= "declare" "namespace" NCName "=" URILiteral

Example:

declare namespace a = http://example.com/a; 
/a:foo/a:bar
Comment 5 Michael Dyck 2009-07-07 19:46:59 UTC
Re your examples, note that a URILiteral includes delimiting quotes (single or double).
Comment 6 Michael Kay 2009-07-08 08:10:55 UTC
Comment #4 suggests syntax that looks reasonable enough if you approach the problem from an XQuery perspective. But in proposing this approach, we were looking at the problem primarily from the point of view of XPath use cases, and the syntax feels very clumsy at the XPath level. For a start, it introduces curly braces to XPath for the first time, which is a significant disadvantage for some of the environments where XPath is embedded (for example, XSLT) - though not necessarily insuperable.

A major part of the rationale for introducing an EQName syntax, however, was that it could be widely adopted anywhere that QNames are currently used - not only in XPath. For example, some XML vocabularies such as XSLT and XSD make wide use of QNames-in-content in contexts other than XPath, and these share the same requirement to be written in a context-free way. Obviously adoption of EQName syntax would be on a case-by-case basis, but including it in XPath as a first step, and in XSLT soon after, would set a precedent that I think other specs might well choose to follow.

An alternative that was considered was to use Clark names in the format {uri}local. These are already used in a number of interfaces, for example in Java APIs. If the only objection to the proposal is the use of backtick, then perhaps we should consider using Clark names instead. They have the advantage of familiarity and precedent. The reason I recommended using backtick-syntax instead was that the curly braces cause problems in certain syntactic contexts. In particular they cause problems when used in an XSLT attribute value template, especially if the left-curly is the first character.

We also considered the convention used for turning error code QNames into plain strings, namely uri#local. This convention fails (or at any rate, becomes very confusing) if the namespace name contains a "#"; unfortunately namespace names containing "#" are not only only legal, they are common practice in the RDF community. Another disadvantage of this as an EQName syntax is that it's not readily recognized by an XPath tokenizer: the URI needs to be delimited in some way.

(Note: a namespace name should be a legal IRI, though our specs and implementations tend to be tolerant of namespace names that are not legal IRIs. The only characters that are illegal in an IRI are "<", ">", '"', space,
 "{", "}", "|", "\", "^", and "`". But it would be safer to have a syntax that allows any string to be used as a namespace name, which means that there is a need to escape whatever end-delimiter is used.)
Comment 7 John Snelson 2009-07-08 11:42:50 UTC
(In reply to comment #6)
> Comment #4 suggests syntax that looks reasonable enough if you approach the
> problem from an XQuery perspective. But in proposing this approach, we were
> looking at the problem primarily from the point of view of XPath use cases, and
> the syntax feels very clumsy at the XPath level. For a start, it introduces
> curly braces to XPath for the first time, which is a significant disadvantage
> for some of the environments where XPath is embedded (for example, XSLT) -
> though not necessarily insuperable.

I'm sensitive to the problems the use of curly braces would have for attribute value templates in XSLT. However as the XSLT 2.0 spec notes, implementors already have to worry about curly braces in string literals and comments. Having to be concerned with them in another place is not such a big change, since implementations are already having to parse the XPath to correctly find the end of the embedded expression.

> A major part of the rationale for introducing an EQName syntax, however, was
> that it could be widely adopted anywhere that QNames are currently used - not
> only in XPath. For example, some XML vocabularies such as XSLT and XSD make
> wide use of QNames-in-content in contexts other than XPath, and these share the
> same requirement to be written in a context-free way. Obviously adoption of
> EQName syntax would be on a case-by-case basis, but including it in XPath as a
> first step, and in XSLT soon after, would set a precedent that I think other
> specs might well choose to follow.

That's an interesting idea, and certainly ambitious. I can see the use case for this in XQuery and especially in XSLT (not poluting the in-scope namespaces). Variable, function and template names could benefit in this regard, as well as NodeTests.

In the past I've considered a 'using function namespace "http://foo";' construct in XQuery to also help resolve function names.

> An alternative that was considered was to use Clark names in the format
> {uri}local. These are already used in a number of interfaces, for example in
> Java APIs. If the only objection to the proposal is the use of backtick, then
> perhaps we should consider using Clark names instead. They have the advantage
> of familiarity and precedent. The reason I recommended using backtick-syntax
> instead was that the curly braces cause problems in certain syntactic contexts.
> In particular they cause problems when used in an XSLT attribute value
> template, especially if the left-curly is the first character.

I would think that using Clark names has a better chance of success in the wider community. I would be ok with that change - I think it's the backticks that I really don't like.

A bigger issue with using curly braces in Clark names is that they are already used extensively in XQuery and XSLT, and might well find many more uses in the future. I imagine that we'd need to insist on some lexical conventions to restrict these tokens - maybe insisting on no whitespace inside the token around the braces would be reasonable.
Comment 8 John Snelson 2009-07-08 12:00:06 UTC
In concrete terms, I think using Clark names means this syntax:

EQName := QName | ClarkName

ClarkName := DelimitedURI NCName /* ws:explicit */

DelimitedURI ::= '{' [^}]* '}' /* ws:explicit */

Wildcard ::= 
  "*"
| (NCName ":" "*")
| (DelimitedURI "*")
| ("*" ":" NCName)    /* ws:explicit */

It also probably means a note in XSLT's attribute value templates and XQuery's direct attribute constructors clarifying what this means:

<a b="hello {{http://example.com}name}"/>

My suggestion is that we should maintain backwards compatibility, and say that the "{{" in the attribute value is considered an escaped left curly brace, and that the "b" attribute's value is the literal string:

"hello {http://example.com}name}"
Comment 9 Bogdan Butnaru 2009-07-08 22:52:39 UTC
Is there any particular reason why allowing an URILiteral wherever a NCName prefix would be allowed wouldn't work?

This mostly means:
 * allow « URILiteral ":" NCName /* ws:explicit */ » for QName (the xs:QName generated will have an implementation-dependent prefix; we might want to restrict implementations NOT to use a prefix that is already bound to a different namespace (though that wouldn't necessarily be a problem), and maybe explicitly allow/require to reuse a prefix defined for that namespace, if it exists), and
 * « URILiteral ":" "*" /* ws:explicit */ » for wildcard tests.

An empty URILiteral means, of course, no namespace (this might be useful to keep things out of the default element namespace, without explicitly defining a prefix for the empty namespace).

I can't think of any context where a URILiteral followed by a colon currently means something, in either XQuery or XPath (not sure about XSLT).

Also, I believe this would also work for direct element constructors. (I haven't checked the production for NameStart, but I think it disallows quotes.)
Comment 10 Michael Kay 2009-07-08 23:18:33 UTC
>Is there any particular reason why allowing an URILiteral wherever a NCName prefix would be allowed wouldn't work?

It feels rather error-prone to me: even if it's not technically ambiguous there would be very similar constructs with very different meanings, resulting in poor error messages. For example mistyping the ";" as a ":" in 

$a:="fred";if (f($a)) then 1 else 0

would not hit an error until the keyword "then" is encountered.

Comment 11 Bogdan Butnaru 2009-07-09 00:32:08 UTC
(In reply to comment #10)
> It feels rather error-prone to me: even if it's not technically ambiguous there
> would be very similar constructs with very different meanings, resulting in
> poor error messages. For example mistyping the ";" as a ":" in 
> 
> $a:="fred";if (f($a)) then 1 else 0
> 
> would not hit an error until the keyword "then" is encountered.

I wouldn't be very much concerned about that; it would be more dangerous if a similar case would _not_ trigger an error, however, and I'm not sure there aren't any such cases.

(Note that the ws:explicit rule makes your particular example hard to hit in practice: people are used to adding whitespace after a separator like “;”, and they'll usually do it even if they mistype it as “:”.)

However, if we decide it's problematic, we can use a different separator, like “#”, “@” or even “$”. Note that this character will be outside the quotes, so we can pick anything that works for our syntax, ignoring whatever format IRIs have.

We can also do it in reverse, e.g. if we pick “@” it might make more sense to write « localName@"http://example.com" » (this only works if the separator is not allowed in a localName, of course.) We can also allow constructs like « prefix:localName@"http://example.com" », to specify a default prefix for the QName.
Comment 12 John Snelson 2009-07-09 10:38:22 UTC
I think that using "uri":name is certainly possible, but I still prefer Clark names since that has become the defacto standard for writing expanded QNames.
Comment 13 Michael Kay 2009-07-09 11:00:44 UTC
So, would Clark names work?

They would certainly work in XPath, which does not currently use curlies for anything else.

In XSLT there would have to be a rule for their use in attribute value templates: probably simply that if the XPath expression starts with an open-curly, then it must be preceded by a space. For example, code="{ {uri}local }".

In XQuery there's a risk of clashing with existing uses of curly braces. EnclosedExpr would need a similar rule to the XSLT rule above. Most other places that currently use a curly, it's a follow-on to something like text{} or validate{}, and these cases may be OK. However, there's a major clash with scripting, which has

Block ::= "{" BlockDecls BlockBody "}"

which is clearly incompatible.
Comment 14 John Snelson 2009-07-09 11:13:27 UTC
(In reply to comment #13)
> However, there's a major clash with scripting, which has
> 
> Block ::= "{" BlockDecls BlockBody "}"
> 
> which is clearly incompatible.

The correct production to look at is BlockExpr:

BlockExpr ::= "block" Block

XQuery SX doesn't have an expression that starts with "{" any more.


Comment 15 Michael Kay 2009-07-09 11:27:29 UTC
>XQuery SX doesn't have an expression that starts with "{" any more.

That's good. In that case the only problem I can see "by eye" is

[134] CompElemConstructor  ::= "element" (QName | ("{" Expr "}")) "{" ContentExpr? "}"

and the equivalent for attributes, where one would have to insist that the QName is a QName in the current sense, and not an EQName. An EQName could be used in the form

element { {uri}local } { "value" }

or, if we choose to allow it

<{uri}local>value</{uri}local>

If we go for Clark names, the other question is what do about namespace names containing "{" or "}". The simplest is just to say they aren't allowed when using this format. That's not a big restriction because the namespace spec strongly encourages that namespace names should be valid IRIs (though it doesn't make it an error if they aren't).

However, I do feel this syntax is a bit more fragile in that it overloads use of existing symbols. Using a new character such as back-tick would give less risk of future problems extending the grammar, and less danger of poor error messages for incorrect queries.

Comment 16 Jonathan Robie 2009-07-09 13:22:53 UTC
We already use pseudo-functions for NameTests, e.g.

element(customer)

Why not use a similar notation for expanded names?


ExpandedName  ::=  "expanded-name" "(" URILiteral "," NCName ")"

expanded-name(http://example.com/customer, customer)

We could extend ElementTest, AttributeTest, and SchemaElementTest to allow expanded names:

element(expanded-name(http://example.com/customer, customer))

A little verbose, but clear, and easily parsed in the various environments we are considering.



Comment 17 John Snelson 2009-07-09 13:25:32 UTC
(In reply to comment #15)
> >XQuery SX doesn't have an expression that starts with "{" any more.
> 
> That's good. In that case the only problem I can see "by eye" is
> 
> [134] CompElemConstructor  ::= "element" (QName | ("{" Expr "}")) "{"
> ContentExpr? "}"
> 
> and the equivalent for attributes, where one would have to insist that the
> QName is a QName in the current sense, and not an EQName.

Either that or require that a Clark name has no spaces in it. If it's matched as a single token then there's no ambiguity.

> If we go for Clark names, the other question is what do about namespace names
> containing "{" or "}". The simplest is just to say they aren't allowed when
> using this format. That's not a big restriction because the namespace spec
> strongly encourages that namespace names should be valid IRIs (though it
> doesn't make it an error if they aren't).

I think that's reasonable.

> However, I do feel this syntax is a bit more fragile in that it overloads use
> of existing symbols. Using a new character such as back-tick would give less
> risk of future problems extending the grammar, and less danger of poor error
> messages for incorrect queries.

I agree, but I think it's a risk worth taking to use a syntax that most people using XML already understand.
Comment 18 John Snelson 2009-07-09 13:27:59 UTC
(In reply to comment #16)
> We already use pseudo-functions for NameTests, e.g.
> 
> element(customer)
> 
> Why not use a similar notation for expanded names?

Because that doesn't work at all well for variable, function or template names. Java allows the use of package qualified class names, and I think XQuery and XSLT would benefit from adding a way to fully qualify identifiers with their namespace.
Comment 19 Vladimir Nesterovsky 2009-07-31 09:52:46 UTC
(In reply to comment #18)
> (In reply to comment #16)
> > We already use pseudo-functions for NameTests, e.g.
> > 
> > element(customer)
> > 
> > Why not use a similar notation for expanded names?
> Because that doesn't work at all well for variable, function or template names.

That depends on how would you integrate such pseudo-function into the grammars.

I think 

expanded-name(http://example.com/customer, customer)

or

qname('http://example.com/customer', 'customer')

or even

QName('http://example.com/customer', 'x:customer')

are clear candidates achieving desired effect, and and verbose at the same time that prevents wide adaptation comparing to the regular form.
Comment 20 Michael Kay 2009-08-09 09:56:55 UTC
Note that bug #7247 points out how expressions starting with an open curly brace can cause ambiguities in the XQuery grammar.
Comment 21 John Snelson 2009-08-09 15:35:47 UTC
(In reply to comment #20)
> Note that bug #7247 points out how expressions starting with an open curly
> brace can cause ambiguities in the XQuery grammar.

Important to understand - but it doesn't affect the parsing of Clark names according to the proposal I recently submitted. The explicit whitespace rules and other excluded characters see to that.


Comment 22 Nikolay Ognyanov 2009-08-22 19:08:15 UTC
There are multiple rules where ExprSingle can be followed by "keyword" which is not really reserved and can, depending on the context, also represent NCName. With computed constructors and some other expressions ending with a curly bracket construct there are lot of opportunities to invalidate currently valid XQuery 1.0 expressions by treating everything which looks like a Clark name as Clark name. Excluding whitespaces does not really help because in the situations mentioned above they are allowed but not required. Here is an example :

for $a in element b {scheme:path-rootless}return c

This perfectly valid XQuery 1.0 expression would be invalidated if "{scheme:path-rootless}return" is treated as Clark name. At the same time XQuery 1.1. requirements mandate 100% backward compatibility with XQuery 1.0 by saying "Every valid XQuery 1.0 expression MUST be valid in XQuery 1.1 and it MUST evaluate to the same result.". So unless backward compatibility requirements are relaxed (which does not seem very likely or/and very good idea), Clark names can not be introduced in XQuery 1.1. the way suggested above even regardless of other issues which could be brought up. This is a pity but curly braces are so widely (over)used in XQuery and its extensions that introduction of new "naked" curly bracket constructs seems to always turn out harmful one way or another and even curly bracket constructs guarded by lead-in/follow-up words can cause problems, especially when those words are reused in multiple rules.

The syntax that I would vote for is :

EQname: QName | UriLiteral ':' NCName   /* ws:explicit */

It is very similar to the initial suggestion but does not introduce new delimiter and UriLiteral is already in use in XQuery, so it should better be reused here.

Comment 23 Jonathan Robie 2009-08-25 11:45:34 UTC
> The syntax that I would vote for is :
> 
> EQname: QName | UriLiteral ':' NCName   /* ws:explicit */
> 
> It is very similar to the initial suggestion but does not introduce new
> delimiter and UriLiteral is already in use in XQuery, so it should better be
> reused here.

I like this suggestion! 

Comment 24 Michael Kay 2009-08-26 09:23:06 UTC
>> EQname: QName | UriLiteral ':' NCName   /* ws:explicit */

In many ways this suggestion is attractive. However, there is one context where it wouldn't work well:

<xsl:value-of select="key('my:key', 1234)"/>

<xsl:value-of select="system-property('my:operating-system', 1234)"/>

<xsl:if test="function-available('my:function')"/>

Replacing the lexical QNames here by an EQName as proposed here would require use of escaped quotation marks (&quot;), which is unappealing.

So perhaps we should adopt this, but also introduce backtick as a third kind of quotation mark for use in all contexts where we currently allow paired double-quotes or single-quotes? Or perhaps just as a third option for a UriLiteral?

An alternative would be to use a different format (such as {uri}local) for run-time QNames, but that's not nice either.
Comment 25 Jonathan Robie 2009-09-22 16:25:08 UTC
(In reply to comment #22)

> The syntax that I would vote for is :
> 
> EQname: QName | UriLiteral ':' NCName   /* ws:explicit */
> 

We adopted this proposal in today's Working Group call. We also decided that an EQName is not allowed in the element name or attribute name in a direct element constructor in XQuery.
Comment 26 Michael Kay 2009-09-25 10:36:07 UTC
Please note also, as well as being able to write "uri":local, we should be able to write "uri":*. So production 80 for Wildcard needs to allow the option

URILiteral ":" "*"

Comment 27 Michael Kay 2009-10-07 18:22:53 UTC
Another little refinement needed: The accepted proposal means that URILiteral now appears in the XPath syntax as well as the XQuery syntax. However, just as the rules for StringLiteral are different in the two cases, the rules for URILiteral also need to differ, in that XQuery must process any character and built-in entity references appearing in the literal, but XPath must not (if the XPath expression appears in an XML document, the XML parser will already have done this).
Comment 28 Jonathan Robie 2009-10-09 14:16:00 UTC
(In reply to comment #27)
> Another little refinement needed: The accepted proposal means that URILiteral
> now appears in the XPath syntax as well as the XQuery syntax. However, just as
> the rules for StringLiteral are different in the two cases, the rules for
> URILiteral also need to differ, in that XQuery must process any character and
> built-in entity references appearing in the literal, but XPath must not (if the
> XPath expression appears in an XML document, the XML parser will already have
> done this).
> 


I assume that means making the following text conditional, so it appears in XQuery but not in XPath:

<quote>
As in a string literal, any predefined entity reference (such as &amp;amp;), character reference (such as &amp;#x2022;), or EscapeQuot or EscapeApos (for example, "") is replaced by its appropriate expansion. Certain characters, notably the ampersand, can only be represented using a predefined entity reference or a character reference.
</quote>

XPath would have this instead:

<quote>
As in a string literal, if the literal is delimited by apostrophes, two adjacent apostrophes within the literal are interpreted as a single apostrophe. Similarly, if the literal is delimited by quotation marks, two adjacent quotation marks within the literal are interpreted as one quotation mark.
</quote>