This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 29360 - Extending string literal syntax
Summary: Extending string literal syntax
Status: NEW
Alias: None
Product: XPath / XQuery / XSLT
Classification: Unclassified
Component: Requirements for Future Versions (show other bugs)
Version: Working drafts
Hardware: PC Linux
: P2 normal
Target Milestone: ---
Assignee: Jim Melton
QA Contact: Mailing list for public feedback on specs from XSL and XML Query WGs
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-01-06 19:48 UTC by Benito van der Zander
Modified: 2016-01-07 21:46 UTC (History)
4 users (show)

See Also:


Attachments

Description Benito van der Zander 2016-01-06 19:48:35 UTC
The current ways to concat strings and to escape characters in XPath can be surprisingly complicated to use. It would be useful, if there were strings, in which you can directly include variables or escaped entities. Such new string literals could be distinguished with a prefix letter/marker from the older strings.

Here are some examples, in descending order of usefulness and importance:

x"Bring {$foo} to {$bar} xxx {$zzz}" -> Like the direct element constructor, i.e. the example is the same as <temp>Bring {$foo} to {$bar} xxx {$zzz}</temp>/data(), but these strings would also be usable in XPath. Because using concat can be cumbersome if you have larger texts and many things to insert. Just compare the example to concat("Bring ",$foo," to ",$bar," xxx",$zzz) or ("Bring "||$foo||" to "||$bar||" xxx"||$zzz)


e"Foo &amp; bar"  --> An always entity-resolving string, in XPath and XQuery. I.e. the example is like xpath's "Foo & bar" and xquery's "Foo &amp; bar". Currently it is quite hard to escape things in XPath, when you pass a XPath expression to an interpreter that you call as shell command. You can only use " strings in the command, since ' is used to separate the expression from the other command line arguments (or the other way around, but " in bash evaluates $foobar which causes trouble with XPath's variables ) and then people get confused how to use ' in the "-strings (One solution is to close the outer quotes and open new ones, so without it  '$somefunc(e"&apos;")' has to be written as  '$somefunc("'"'"'")' which is extremely hard too read ). 


u"Foo &amp; bar"  --> Always unescaped without entity resolving, in either language. I.e. xpath's "Foo &amp; bar" and xquery's "Foo &amp;amp; bar". Useful, when you have an URL u"http://www.example.com/?request=a&b=c&d=e&..." in XQuery.

q{foo bar string} or q/foo bar string/  or q〠foo bar string〠 -> Here you do not use " or ' as separator, but whatever character follows next to the q, similar to Perl's single quoted strings. This is also helpful when passing XQuery expressions to shell programs, since you can always choose a separator that does not collide with whatever separators your shell is using. The variant q{something} is also similar to the namespace Q{something}, so the syntax is already familiar to XQuery users. 
 
“string”, 「string」 , «string»  -> Unicode quoted strings. Unicode has a lot of different quotes. XQuery is all designed around Unicode, so it is strange that it only uses ascii quotes for strings. These also help with the shell, since they usually ignore non-ascii quotes.
   
let $i := 21 return f"the %1;o$i street" ->  Format strings, like the format-*. In the example "The 21st street". % would mark the begin of a format and the variable marks the type. So f"the %1;o$i street" becomes concat"the ", format-integer($i, "1;o"), " street"). (Having the picture before the variable makes it easier to decide where the picture and variable end) 

l"Hello" --> A localized string. First you load an i18n file (Perhaps with a static directive like declare translation-file "foobar.ts"; ). Then l"Hello" is "Hello" on an English system, but "Hallo" on an German system, or "你好" on a Chinese system.  This could also be combined with the format strings. 

i"http://www.example.org" --> An include string. Like unparsed-text("http://example.org"), but evaluated statically, before the evaluation of the query. In case you need to have some big block of text, like a heredoc.
 
j"Foo \n b\u0061r xyz\""    -> JSON like strings, with the backslash escapes allowed in JSON. Very useful now that XQuery is getting more JSON interoperability features. You could copy+paste a string from a JSON file into the XQuery, after just adjusting it by adding a j. 

c"Foo \n b\x61r \n xyx\""  -> C like strings with the backslash escapes of C.

p"Foo "#10" b"#$61"r "#10" xyz"""  -> Pascal like strings. Also uses ' or " as separators, but # to escape character codepoints.

b"++++++++++[>++++++++>+++>>>+++<<<<<-]>+++.---.-.-----.>++.<----[>>+++<<--]>>.>>[<++++>-]<-----.<<.>.>-----.<-----.+..-.<.>---.>+++++++++.----<++++.>.----.--.<." -> Brainfuck like strings. Evaluates the string as brain fuck program and the literal becomes the output of this evaluation. Although it is probably a bad idea to use < and > here. Perhaps replace it with { and } ? But then it is no longer Brainfuck. This is getting too weird.

`echo foobar` --> Executes some program and returns the result. No prefix, but a different kind of quotes. Bash has these, so they have to be useful for something.
Comment 1 Josh Spiegel 2016-01-06 20:25:01 UTC
Not sure if you noticed, but the most recent draft of XQuery adds string constructors:
http://www.w3.org/TR/2015/CR-xquery-31-20151217/#id-string-constructors
Comment 2 Benito van der Zander 2016-01-06 20:28:04 UTC
oh, I missed that draft
Comment 3 Michael Kay 2016-01-06 22:57:52 UTC
String constructors have been added to XQuery 3.1 to meet this requirement.

http://www.w3.org/TR/xquery-31/#id-string-constructors
Comment 4 Benito van der Zander 2016-01-07 11:17:22 UTC
And there I was thinking, I need to document the x"" strings (that I had in my implementation for a while) here, before the WG goes off-track and invents something completely insane again, like in the JSONiq case. Seems I was too late for that.

Seriously, ``[, why? Where do you even get these ideas from? What was wrong with a single ` ?

And unescaped `` still gets silently swallowed by bash, so it does not help much for the shell. Only q/.../ would.
Comment 5 Michael Kay 2016-01-07 14:37:33 UTC
Benito, it's a pity you don't participate in the working group, then you would (a) have a better understanding of how these decisions are made, and (b) be able to add your personal insights as inputs.

Clearly choosing ASCII characters (or non-ASCII, which was also considered) is a bicycle-shed issue on which everyone has an opinion, and believe me, there were plenty of opinions put forward. Sometimes on these occasions, if one has to explain how the committee reached its decision, it can only be described as a process of exhaustion - after two full meetings discussing the subject, all reasonable suggestions had been found objectionable, and by that time everyone was so fed up with spending time on such an arbitrary decision that the next workable solution had a good chance of being accepted.

I'm reminded of the XSD 1.0 spec which says that the set of datatypes was "judiciously chosen". I've always assumed this was a euphemism for "we spent an awful long time talking about it, and in the end, this is what we ended up with."
Comment 6 Michael Kay 2016-01-07 14:40:45 UTC
And by the way "what's wrong with a single `?" - one of the first requirements the WG set itself was that the start and end delimiters should be different, to allow nesting.
Comment 7 Abel Braaksma 2016-01-07 14:47:37 UTC
(In reply to Benito van der Zander from comment #4)
> Seriously, ``[, why? Where do you even get these ideas from? 
> What was wrong with a single ` ?

The single backtic is used to write interpreted expressions inside the string constructor.

The choice of delimiter was the result of a long discussion of the WG and I doubt they'll look forward to reopen it. In case you are interested in how we came to the backticks-syntax, a good starting point is where Michael Kay suggested chevrons[1], and read all following mail in that and related discussions (some proposals can be found in previous mails as well).

[1] https://lists.w3.org/Archives/Public/public-xsl-query/2015Sep/0042.html
Comment 8 Abel Braaksma 2016-01-07 14:48:17 UTC
(sorry, I accidentally ignored the collision, was already answered by Michael Kay)
Comment 9 Benito van der Zander 2016-01-07 15:58:23 UTC
>(a) have a better understanding of how these decisions are made, and

I was wondering if you have started to smoke some crazy stuff at the recent meetings 

> it can only be described as a process of exhaustion - after two full meetings discussing the subject,  all reasonable suggestions had been found objectionable, and by that time everyone was so fed up with spending time on such an arbitrary decision that the next workable solution had a good chance of being accepted.

That is not a good way to make decisions.

When you get too tired, you should postpone the decision. Much better to not have a good feature than to have a bad one stuck forever in the spec. And take more breaks (or start smoking some crazy stuff at the meetings) 

Are you voting accept vs. reject on each proposal individually? 
That is not a good voting system. Try to use Kemeny-Young or Range voting or something over all the proposals at once.



>And by the way "what's wrong with a single `?" - one of the first requirements the WG set itself was that the start and end delimiters should be different, to allow nesting.

But that does not even matter there.

The interpolation is nested, not the string itself. Just like attributes nest, despite only having “ as separator: <node att="{ <node att="foo"/>/@att }"/>

And now they are not used for nesting, or ``[ ``[ ``] ``] would be valid.



It is even worse than I thought. 
Bash swallows ` even if it is surrounded by “-quotes. And my laptop keyboard is broken. The `-key felt off and I cannot type it there at all.



> and I doubt they'll look forward to reopen it.

If they do not do it before dropping the “Candidate”, they never will

>a good starting point is where Michael Kay
suggested chevrons

Perhaps the language would become better, if you stopped voting and just let Mike decide everything.
Comment 10 Liam R E Quin 2016-01-07 21:46:35 UTC
Benito, the W3C actually uses a consensus process, and voting is only used if agreement cannot be reached. Even then there's a procedure for objections.

My original proposal used @{....@} (it had to be at least two characters because of the way some implemntations parse XPath) with ${ exprsingle } interpolated inside.

But there were objections to ${...} because people thought ${ $foo } looked odd and because of {$foo} being special in XSLT.

I'd especially wanted to be able to include multiline CSS, JSON and JavaScript fragments in these string literals and others accepted that use case, so we tried to use delimiters that were not often found there (that's why I'd used @{..} originally in fact).

The final syntax isn't ideal, although arguably it's better than the <<EOF and <<'EOF' used in the Unix shell (and hence Perl). My very first proposal was based on that, in fact, so you supplied the end delimiter each time, but some implementers had serious difficulties integrating that so I had to back off.

It's not that the committee was on crack (for what it's worth Mike Kay opposed the feature altogether more than once before being persuaded, so his comment about exhaustion may partly come from that initial reluctance). It's that basic syntax features can be hard to add several years after the main design of a language was frozen.

By the way, as I said to you at Balisage, if you want to participate we'd be delighted, although I fear there probably won't be an XQuery 3.2.