Bug 16311 - [Ser30] newlines are an integral part of text mode serialization too
[Ser30] newlines are an integral part of text mode serialization too
Status: RESOLVED FIXED
Product: XPath / XQuery / XSLT
Classification: Unclassified
Component: Serialization 3.0
Working drafts
PC Linux
: P2 normal
: ---
Assigned To: Henry Zongaro
Mailing list for public feedback on specs from XSL and XML Query WGs
http://x-query.com/pipermail/talk/201...
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2012-03-11 03:19 UTC by jidanni
Modified: 2013-01-19 03:22 UTC (History)
3 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description jidanni 2012-03-11 03:19:39 UTC
In XML serialization, we can produce
<X>1</X>
<Y>2</Y>
output.

For TEXT mode serialization, it should be just as easy to get
1
2
output.

Mainly what I'm trying to say is in the world of TEXT mode, please don't just
offer an endless stream of items separated by SPACES. Instead remember that
TEXT mode is two dimensional: newlines and spaces!

Don't force the user to hardcode raw newlines in just to get them back out.
Comment 1 Christian Gruen 2012-03-11 12:06:20 UTC
I've looked up the XQuery Serialization Spec.: Step 3 in Section 2, "Sequence Normalization", contains the sub-sentence "separated by a single space", which would probably have to be extended to take an additional serialization parameter into account. An XQuery 3.0 expression that uses the parameter could e.g. look as follows:

  declare option output:separator "&#10;";
  (1 to 10)

Opinions are welcome.
Comment 2 jidanni 2012-03-12 01:39:45 UTC
All I know is I see a genuine raw ASCII character number 10
reference staring me in the face above.

Please add some level of indirection so it looks more elegant.

I recall the HTTP spec mentions what 'newlines' should be.
Use that if there is no XML, HTML etc. spec for text newlines.
Isn't there some way to detect it automatically for the operating
system, so the user need only change it if detected wrong?

OK, so make CR LF the default, and OK then I will use your above
reference if I just want UNIX LFs. But at least make a default.

Anyway, just make sure that
<X>1 A</X>
<X>2 B</X>
output can become
1 A
2 B
with just the change of a Xquery header declaration, and _no_ other rewriting of the
program.

With my above example, we see Jidanni (that's me) doesn't want _every_
space to become a newline...
Comment 3 Michael Kay 2012-03-12 10:18:01 UTC
I have some sympathy with this suggestion. I would be inclined to handle it by overloading the interpretation of indent="yes" rather than adding a new serialization parameter: simply specify that if indent="yes" is specified then (regardless of which serialization method is used) the single space character used as a separator during sequence normalization MAY be replaced by some other whitespace sequence.
Comment 4 Christian Gruen 2012-03-12 22:17:07 UTC
Good point; if possible, I'd like to stick with the existing parameters as well. Would it be possible to extend the choice of "indent" parameters with additional values (e.g. "space", "tab", "newline") without introducing legacy issues?

If better not, how can a user select a different a different whitespace sequence without being restricted to a specific implementation?
Comment 5 jidanni 2012-03-13 00:57:01 UTC
Wait, I can just use CSV (Comma Separated Values) output mode, setting quotes and commas to nil first!
...Alas, CSV is just and input mode not an output mode in my favorite Xquery implementation :-(
Comment 6 jidanni 2012-03-13 00:58:21 UTC
and->an. Sorry.
Implementation: BaseX.
Comment 7 Henry Zongaro 2012-06-19 16:59:28 UTC
I like Michael Kay's suggestion in comment 3 of overloading the meaning of indent="yes" to satisfy this requirement.  However, I think it might be better to make it stronger than a "MAY" - for the text output method, the formatting seems much more important than for XML.

One concern I have lies with the interaction with mixed content.  I think we would want to restrict the insertion of new line characters (or whatever whitespace was desired) in the same way that the serialization draft does for the XML output method.[1]

e.g., The following

  <p>This is my <b>first</b> paragraph.</p><p>This is my <i>second</i> paragraph.</p>

should probably be serialized as

  This is my first paragraph.
  This is my second paragraph.

and not as

  This is my
  first
  paragraph.
  This is my
  second
  paragraph.

[1] http://www.w3.org/TR/xslt-xquery-serialization-30/#XHTML_INDENT
Comment 8 Henry Zongaro 2012-07-24 09:51:05 UTC
Jidanni, I still feel some confusion about the requirement you've described.  Are you interested only in new lines between items in the sequence that is to be serialized, or new lines in other places as well?  In your examples, you have a sequence of two elements.  What if you were serializing a document node that contained two elements, as in the result of this query?

document {
<X>1</X>
<Y>2</Y>
}

Do you still want to see this result serialized?

1
2

or this?

12
Comment 9 jidanni 2012-07-24 10:01:59 UTC
I am sort of getting hazy, but why would I want my results glued together :-)
Anyway why don't you add a additional controlling parameter, so one can pick
1. Results glued together
2. Results separated by spaces
3. Results separated by newlines
Comment 10 jidanni 2012-07-24 10:02:56 UTC
And be sure to allow the user to switch back and forth anywhere in the program, don't lock him into only one mode for the whole file.
Comment 11 Henry Zongaro 2013-01-14 10:50:28 UTC
Jidanni, sorry for the late update on this issue.  At the joint teleconference of the XSLT and XQuery working groups of 25 September 2012,[1] the working groups decided to resolve your request through the addition of a new "item-separator" serialization parameter.

The item-separator is a string, and if present, each item in the serialized result is separated by its value.  The serialization parameter affects all output methods, not just the text output method.

So, if the sequence to be serialized was <X>1</X>,<Y>2</Y>, and the value of the item-separator was the LINE FEED character, the serialized result for the XML output method would be

<X>1</X>
<Y>2</Y>

and for the TEXT output method would be

1
2 

Note that the value of the parameter is only inserted between items in the original sequence that is to be serialized.  So if the value of the item-separator was LINE FEED, and the sequence to be serialized under the TEXT output method was

<p>My <em>first</em> paragraph.</p><p>My <em>second</em> paragraph.</p>

then the serialized result would be

My first paragraph.
My second paragraph.

You can see the detailed description in the new working draft of Serialization 3.0.[2]

[1] https://lists.w3.org/Archives/Member/w3c-xsl-query/2012Sep/0094.html (Member-only link)
[2] http://www.w3.org/TR/2013/WD-xslt-xquery-serialization-30-20130108/#serdm
Comment 12 jidanni 2013-01-19 03:22:34 UTC
Thanks. I hope that is what I wanted but well time slips so I am foggy now :-)