19003 – Bugs in toNativeLineEndings()

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 19003 - Bugs in toNativeLineEndings()

Summary: Bugs in toNativeLineEndings()

Status:	RESOLVED FIXED

Alias:	None

Product:	WebAppsWG
Classification:	Unclassified
Component:	File API (show other bugs)
Version:	unspecified
Hardware:	PC All

Importance:	P2 normal
Target Milestone:	---
Assignee:	Arun
QA Contact:	public-webapps-bugzilla

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2012-09-25 07:40 UTC by Simon Pieters
Modified:	2012-10-17 00:11 UTC (History)
CC List:	2 users (show)

See Also:

Attachments

Description Simon Pieters 2012-09-25 07:40:50 UTC

https://www.w3.org/Bugs/Public/show_bug.cgi?id=16733
http://dev.w3.org/2006/webapi/FileAPI/#convenienceAPI

The IDL should say:

Window implements LineEndings;
WorkerGlobalScope implements LineEndings;

And remove this:
[[
In environments where the global object is represented by a Window or WorkerGlobalScope object, the toNativeLineEndings method should be available.
]]

(assuming we want this in the global scope)

The algorithm seems bogus. It seems to assume that there will only be one "line ending" and that it is at the end of the string. It doesn't define what a "line ending" is. "Expected by the underlying platform" also seems unclear. It seems unexpected that this methods *adds* a line ending if there isn't one.

I would have expected that this method does this:

function toNativeLineEndings(string) {
  ...snip impl of WebIDL voodoo here...
  return string.replace(/(\r\n|\r|\n)/g, nativeLineEnding);
}

As for spec language, I would probably do this:

1. Let /input/ be /string/.
2. Let /position/ be a pointer into /input/, initially pointing at the start of the string.
3. Let /lines/ be an empty array.
4. While /position/ is not past the end of /input/:
    1. Collect a sequence of characters that are not U+000A or U+000D.
    2. Add the string collected in the previous step to /lines/.
    3. If the character at /position/ is U+000D, and the character at /position/+1 is U+000A, advance /position/ by two characters. Otherwise, advance /position/ to the next character.
5. Let /native line ending/ be a sequence of characters that represent a line ending according to the underlying platform's conventions, or U+000A if there is no particular convention. [issue: should we only allow \n and \r\n here?]
6. Let /output/ be the result of joining /lines/ with /native line ending/.
7. Return /output/.

where "collect a sequence of characters" is defined here: http://www.whatwg.org/specs/web-apps/current-work/multipage/common-microsyntaxes.html#collect-a-sequence-of-characters

or here: http://dom.spec.whatwg.org/#collect-a-sequence-of-characters

Comment 1 Glenn Maynard 2012-09-25 15:09:44 UTC

Yeah, that's what I meant in https://www.w3.org/Bugs/Public/show_bug.cgi?id=16733#c21: convert the line endings in a string.

> 3. Let /lines/ be an empty array.
> 4. While /position/ is not past the end of /input/:
>     1. Collect a sequence of characters that are not U+000A or U+000D.
>     2. Add the string collected in the previous step to /lines/.
>     3. If the character at /position/ is U+000D, and the character at
> /position/+1 is U+000A, advance /position/ by two characters. Otherwise,
> advance /position/ to the next character.
> 6. Let /output/ be the result of joining /lines/ with /native line ending/.

This doesn't seem quite right--an input of "\n" results in "" (instead of eg. "\r\n"), because /lines/ becomes [""].  Also, "joining string with a string" would need to be defined.

Maybe:

1. Let /position/ be a pointer into /string/, initially pointing at the start of
the string.
2. Let /result/ be the empty string.
3. Let /native line ending/ be the character U+000A, or the character U+000D followed by the character U+000A, as determined by the underlying platform's conventions.
4. /Line loop/: Collect a sequence of characters that are not U+000A or U+000D.
5. Add the string collected in the previous step to /result/.
6. If the character at /position/ is past the end of /string/, return /result/.
7. Add /native line ending/ to /result/.
8. If the character at /position/ is U+000D, and the character at
/position/+1 is U+000A, advance /position/ by two characters.  Otherwise,
advance /position/ to the next character.
9. Jump back to the step labeled /line loop/ in these steps.

Comment 2 Arun 2012-09-25 15:33:12 UTC

*Both* SimonP's formulation *and* Glenn's formulation in Comment 1 do not change a string without ANY line endings to a string with line endings (per the underlying platform).  Why is that?  Should one NOT assume that a developer calls this API on strings without line endings to add them in appropriately?

Comment 3 Glenn Maynard 2012-09-25 16:08:58 UTC

Huh, what?  Why would you add line endings to a string that didn't have any?

The whole point is to convert the line endings in the string to the line endings of the platform.

s = open("file with unix line endings.txt").read()
s = s.replace("\n", "\r\n")
open("file with DOS line endings.txt", "w").write(s)

Comment 4 Arun 2012-09-25 16:43:49 UTC

I've been thinking about this API incorrectly, without allowing for multiple pre-existing line endings.  I also assumed it would append a line ending to a single line parameter.  Both are wrong.  Your way is right, and your algorithm is better.

Comment 5 Simon Pieters 2012-09-26 06:56:29 UTC

zewt, oops.

You version still has a bug though; the "collect a sequence of characters" algorithm expects a variable in the calling algorithm named /input/ (hence my step 1).

Comment 6 Glenn Maynard 2012-09-26 14:06:13 UTC

A very weird algorithm.  That's like a function that modifies variables in its calling function, instead of using parameters.

Comment 7 Arun 2012-10-17 00:11:32 UTC

Done.