Bug 16733 - Drop 'endings'?
Drop 'endings'?
Status: RESOLVED FIXED
Product: WebAppsWG
Classification: Unclassified
Component: File API
unspecified
PC All
: P2 normal
: ---
Assigned To: Arun
public-webapps-bugzilla
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2012-04-13 14:50 UTC by Simon Pieters
Modified: 2012-12-08 11:53 UTC (History)
6 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Simon Pieters 2012-04-13 14:50:52 UTC
File API Writer says about BlobBuilder's endings:

[[
Can we do without endings? Any choice other than "native" can be implemented by the app author, and most file formats don't care about line endings. "Native" would be handy for sharing certain types of text files with apps outside the browser [e.g. Makefiles on a system where make is expecting \n will have issues if they're written with \r\n]. Is it worth it? Can this be worked around if we don't supply it?
]]

This bug is about the Blob constructor's 'endings'. What's the motivation for having it? Can we drop it? In BlobBuilder it only applies for DOMString, but Blob constructor doesn't say whether it applies for only DOMString or all of DOMString, ArrayBuffer and Blob.

Authors who want to normalize line endings in their DOMStrings on Windows can do so manually before passing the string to the Blob constructor, by checking navigator.platform.substr(0, 3) == 'Win' and then a simple replace().
Comment 1 Glenn Maynard 2012-04-13 15:16:32 UTC
> by checking navigator.platform.substr(0, 3) == 'Win' and then a simple replace().

That takes some extra hoop jumping if the source is another Blob or an ArrayBuffer, though.  It's also potentially a lot harder if the source is a very large, disk-backed Blob, where you explicitly don't want to pull the whole thing into memory.

FYI, converting only on Windows wouldn't be a good thing to do: not only may Windows users load text files with Unix line endings (which wouldn't be harmed by that), but Linux users can load text files with Windows line endings.  It would result in strange asymmetries, where both types of files would work for Windows users, but not for anyone else.  When reading files that you don't plan to write back out, you're usually best off *always* performing CRLF conversion, so all text files always work regardless of platform.
Comment 2 Glenn Maynard 2012-04-13 15:40:35 UTC
A couple corrections:

'endings' is only useful for converting output; it doesn't convert input.  (I find that unintuitive.)

'endings' in the Blob ctor *only* applies to DOMStrings.  That's necessary, since otherwise input Blobs would have to be read synchronously to determine the length, but it's also unintuitive.

These would both be much more clear if this was moved to its own method, eg. DOMString convertToNativeLineEndings(DOMString).
Comment 3 Anne 2012-04-13 17:49:59 UTC
For which file formats is this still an issue though? Use cases would be nice to know.
Comment 4 Eric Uhrhane 2012-04-13 18:19:36 UTC
This is really only relevant for text files; no binary file is going to want you to mess with its contents.  If you want to output a text file and have it open correctly in Notepad, you'll need CRLF in there.  If you want it to work well on Linux, you'll want just LF [although really a lot of Linux programs can cope with CRLF as well].  Here's an example of a user having a problem on Windows: http://stackoverflow.com/questions/1278501/how-do-i-create-a-new-line-using-javascript-that-shows-correctly-in-notepad.

I don't see any need to apply this to Blob or ArrayBuffer data; it only really makes sense on strings.  Do the conversion as you create the binary data, with the invariant that it's already converted once it's in binary form.

I see that I wrote about the underlying filesystem in the spec; it's probably more appropriate to talk about the host operating system; Linux systems can certainly run on FAT32 filesystems, but they'll still expect LF rather than CRLF.

Glenn:  Endings converts only output, not input, because this is FileWriter, an output API.  If you wanted to convert input, that'd be in FileReader.readAsText.  However, it's probably not necessary to add that.

I like your suggestion that this be a separate method, actually.  You could use Simon's trick, but that's a bit of a pain, and adding a cleaner API would make it more likely that developers would do it right.  If you wanted to handle input as well, offer both convertToNativeLineEndings and convertFromNativeLineEndings.  In the latter case, you'd probably want to convert any CRLFs even if on a non-Windows platform, just in case.
Comment 5 Anne 2012-04-13 18:24:04 UTC
Is there any OS that does not deal with CRLF? Because otherwise we could just have "compatible" line endings (CRLF) and "default" (LF).
Comment 6 Arun 2012-04-13 18:47:57 UTC
I'm not opinionated strongly on whether we should keep endings as an option, or migrate these to a separate API.  Would these methods mentioned on Comment 2 exist on Blob?
Comment 7 Glenn Maynard 2012-04-13 18:52:23 UTC
If you're receiving a Blob or an ArrayBuffer from an API (like FileReader or XHR), then you aren't creating the data yourself, so you can't convert line endings as you create it.

I'm not concerned about supporting that (lacking use cases), so it's probably fine to shift it into a standalone function.

> Glenn:  Endings converts only output, not input, because this is FileWriter, an
> output API.  If you wanted to convert input, that'd be in
> FileReader.readAsText.  However, it's probably not necessary to add that.

We're talking about the Blob constructor, which isn't part of FileWriter.  (BlobBuilder wasn't, either...)

> Is there any OS that does not deal with CRLF? Because otherwise we could just
> have "compatible" line endings (CRLF) and "default" (LF).

Some applications do only support text files with native line endings.  Notepad in Windows, for example, can only load files with CRLF endings.

Also, if we're going to support writing files to the user's native system (FileWriter and non-sandboxed file access), we should really be able to write files that conform to the conventions of the user's system.

There's an inherent interop cost here, but it seems unavoidable...
Comment 8 Simon Pieters 2012-04-14 08:46:13 UTC
(In reply to comment #0)
> Blob constructor doesn't say whether it applies for only DOMString or all of
> DOMString, ArrayBuffer and Blob.

It probably shouldn't apply for Blob if we decide to keep it, since it's a sync operation and you need to have the Blob in memory to replace the line endings to know what to set .size to.
Comment 9 Simon Pieters 2012-04-16 05:18:56 UTC
OK here's a stawman API, implemented in javascript:

function normalizeLineEndings(s, endings) {
  if (arguments.length == 1)
    endings = 'LF';
  endings = String(endings);
  s = String(s);
  if (endings != 'LF' && endings != 'CRLF' && endings != 'native') { // CR intentionally not supported
    throw "SyntaxError";
  }
  if (endings == 'native') {
    endings = navigator.platform.substr(0, 3) == 'Win' ? 'CRLF' : 'LF';
  }
  var newline = '\n';
  if (endings == 'CRLF') {
    newline = '\r\n';
  }
  return s.replace(/\n|\r|\r\n/g, newline);
}

When writing a file with FileWriter wanting the file to open in Notepad on Windows and whatever text editor that only works with LF on other OSes, you'd use normalizeLineEndings(foo, 'native'). When reading a file with unknown endings and want LF, you'd use normalizeLineEndings(bar).

I'm not sure why we should provide this convenience API in the platform, though, rather than letting authors use the above javascript function.
Comment 10 Simon Pieters 2012-04-16 05:25:46 UTC
(In reply to comment #9)
>   return s.replace(/\n|\r|\r\n/g, newline);

  return s.replace(/\r\n|\r|\n/g, newline);
Comment 11 Glenn Maynard 2012-04-16 14:28:47 UTC
Is the "navigator.platform.substr(0, 3) == 'Win'" test always correct?  It gives the UA has no control over what newline style is considered "native".

For example, a phone may want to pretend it uses CRLF newlines as far as user-visible files are concerned (regardless of what it uses internally), to minimize problems when people mount their phone storage on USB and access files from their desktop.

This would turn the "if Windows" check into a list that would always be out of date: "Windows, iPhone, iPad, Android, ...".
Comment 12 Simon Pieters 2012-04-16 14:35:28 UTC
(In reply to comment #11)
> Is the "navigator.platform.substr(0, 3) == 'Win'" test always correct? 

I dunno, I thought it was a legacy limited to Windows only.
 
> For example, a phone may want to pretend it uses CRLF newlines as far as
> user-visible files are concerned (regardless of what it uses internally), to
> minimize problems when people mount their phone storage on USB and access files
> from their desktop.

Is that a reality today or a hypothetical?

> This would turn the "if Windows" check into a list that would always be out of
> date: "Windows, iPhone, iPad, Android, ...".

Right.
Comment 13 Glenn Maynard 2012-04-16 14:48:18 UTC
(In reply to comment #12)
> Is that a reality today or a hypothetical?

For native apps it's typically up to the application developer, rather than the platform, since the APIs don't push developers one way or the other, so I'd say neither: the platforms havn't have to make a decision so far, but they'll need to for web apps.

I'd definitely use CRLF if I was developing an Android app that outputs text files to USB/SD-accessible storage, since most users will access it in Windows, and Windows users are stumped by LF text files.  (Linux users--and I assume Mac users as well--have no problems with CRLF, so CRLF is safer.)
Comment 14 Anne 2012-04-16 14:51:21 UTC
Right, hence my suggestion in comment 5. To have "compatible" for people who want CRLF, and "default" for everyone else.
Comment 15 Simon Pieters 2012-04-16 14:58:06 UTC
(In reply to comment #13)
> I'd definitely use CRLF if I was developing an Android app that outputs text
> files to USB/SD-accessible storage, since most users will access it in Windows,
> and Windows users are stumped by LF text files.  (Linux users--and I assume Mac
> users as well--have no problems with CRLF, so CRLF is safer.)

That argues for having an API that normalizes to CRLF regardless of OS. (And maybe an API that normalizes to LF when going in the other direction.)

Should we provide a convenience API for that?
Comment 16 Glenn Maynard 2012-04-16 15:14:05 UTC
(In reply to comment #15)
> That argues for having an API that normalizes to CRLF regardless of OS.

No--if my application was running natively in Linux, I'd (usually) write LF's.

I don't have a strong feeling for whether that translates to web apps.  How much will people be bothered if web apps always write CRLFs, even when running in Linux or OSX?  I'm not sure...

> (And maybe an API that normalizes to LF when going in the other direction.)
> 
> Should we provide a convenience API for that?

If all you want is to convert CRLF<->LF, you don't really need a helper API, since it's just a one-line regex.


(In reply to comment #14)
> Right, hence my suggestion in comment 5. To have "compatible" for people who
> want CRLF, and "default" for everyone else.

This is backwards--most people will want CRLF.  The only people who might want LF are *nix and OSX users.
Comment 17 Jonas Sicking 2012-04-25 07:13:05 UTC
The use cases in comment 4 seem important enough to try to solve.

We certainly could provide a orthogonal API like the one in comment 9 though. But if the number of APIs where we would need to provide "built in" conversion is small enough then providing built-in support might be nicer for authors.

Right now we're considering built-in support in 2 APIs, Blob-ctor and FileWriter (or the FileHandle API I've proposed, or whatever we end up going with).
Comment 18 Simon Pieters 2012-04-25 08:25:46 UTC
I'm not convinced it's nicer, since if you don't want a Blob but want to use the line-endings API, you have to round-trip to a Blob (and reading is an async operation) instead of using the API directly.
Comment 19 Glenn Maynard 2012-04-25 14:09:53 UTC
(In reply to comment #17)
> The use cases in comment 4 seem important enough to try to solve.
> 
> We certainly could provide a orthogonal API like the one in comment 9 though.
> But if the number of APIs where we would need to provide "built in" conversion
> is small enough then providing built-in support might be nicer for authors.
> 
> Right now we're considering built-in support in 2 APIs, Blob-ctor and
> FileWriter (or the FileHandle API I've proposed, or whatever we end up going
> with).

The problem is that it gives an inconsistent API, since the newline handling in the Blob ctor can only work for some argument types and not others.  That's fairly confusing, and having a separate function that clearly only supports strings is a lot clearer.
Comment 20 Arun 2012-06-26 19:14:56 UTC
I'm ok with providing an orthogonal API, but think that as long as default behavior is well defined in Blob constructors, we might be good to go here.  The legacy BlobBuilder API for line endings is carried forward into Blob constructors, with better defined defaults, including what happens under append.  

Does that pass muster, or should we still consider a line endings API that's orthogonal?  I actually don't mind the dictionary object in the Blob constructor.
Comment 21 Glenn Maynard 2012-06-26 21:07:00 UTC
(In reply to comment #20)
> I'm ok with providing an orthogonal API, but think that as long as default
> behavior is well defined in Blob constructors, we might be good to go here. 
> The legacy BlobBuilder API for line endings is carried forward into Blob
> constructors, with better defined defaults, including what happens under
> append.  
> 
> Does that pass muster, or should we still consider a line endings API that's
> orthogonal?  I actually don't mind the dictionary object in the Blob
> constructor.

(This isn't about removing the dictionary parameter--the "type" option is unaffected.)

My concern isn't changed: it's confusing to have a line-endings option on the Blob ctor that only works for some types of blobParts and silently does nothing for others.

I agree with Simon (comment #18) that if native line endings are going to be exposed at all, people are probably going to want to use it without having to go through Blob to access it.  A "toNativeLineEndings(DOMString)" method will be wanted anyway.

toNativeLineEndings() avoids the confusion, since its argument is explicitly DOMString.  If we have it, there's no reason to put it in Blob as well.  "new Blob([toNativeLineEndings(arg1)])" isn't materially worse for authors than "new Blob([arg1], {endings: "native"})". (I find it a little cleaner, actually.)

Also, toNativeLineEndings also makes it a bit easier to pick and choose which arguments you want converted, instead of having to construct multiple Blobs:

> var b1 = new Blob([x1], {endings: "native"});
> return new Blob([b1, x2]);

becomes

> return new Blob([toNativeLineEndings(x1), x2]);

Another thing: if {endings: "native"} is kept and we don't make it affect ArrayBufferView now, we won't be able to do it later--it would be a breaking change.  For example, in "new Blob([view, string], {endings: "native"})", "view" might be data previously read from disk--already containing native line endings--and "string" data that the user wants to append to the end.  This code depends on "view" *not* being converted because it's an ArrayBufferView.  Changing endings to work on "view" would cause the existing data in "view" to be doubly-converted.

(toNativeLineEndings() doesn't have this problem; we can add a toNativeLineEndings(ArrayBufferView) overload later, when use cases surface, instead of having to decide now.)
Comment 22 Arun 2012-07-12 20:56:55 UTC
OK, I'm swayed by the number of comments alone :)

To be clear, what we want in order to satisfactorily close this bug is:

1. A "helper" function along the lines of Comment 21 (which suggests a toNativeLineEndings(), probably as a [Supplemental] on the Window interface.  I'll have to look into this, but we may as well do it in the File API specification.

2. Removal of the endings dictionary member from Blob constructor syntax.

And as pointed out in Comment 21, we can overload for ArrayBufferView later, if absolutely necessary.