This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 23988 - forms: An API for <input type=file multiple> that allows authors to walk the hierarchy on demand
Summary: forms: An API for <input type=file multiple> that allows authors to walk the ...
Status: RESOLVED MOVED
Alias: None
Product: WHATWG
Classification: Unclassified
Component: HTML (show other bugs)
Version: unspecified
Hardware: Other other
: P3 enhancement
Target Milestone: Unsorted
Assignee: Ian 'Hixie' Hickson
QA Contact: contributor
URL:
Whiteboard: blocked on getting some sort of Direc...
Keywords:
Depends on:
Blocks:
 
Reported: 2013-12-03 18:46 UTC by Ian 'Hixie' Hickson
Modified: 2016-03-16 17:57 UTC (History)
4 users (show)

See Also:


Attachments

Description Ian 'Hixie' Hickson 2013-12-03 18:46:35 UTC
Right now, if you select a deep directory tree in <input type=file multiple>, the only way to represent it is for the UA to synchronously flatten the entire thing and expose it in the .files list. But that's a performance nightmare. So maybe we should allow people to walk the tree file by file so that they can show incremental progress UI while they do it.
Comment 1 Ian 'Hixie' Hickson 2013-12-03 18:52:50 UTC
(The feature would presumably be triggered by some new attribute that turns off the regular .files API, or some such.)
Comment 2 Ian 'Hixie' Hickson 2013-12-04 02:56:54 UTC
Sicking says Mozilla would be interested in doing something like this.
Comment 3 Ian 'Hixie' Hickson 2014-02-04 21:22:22 UTC
Proposal:

   <input type=file multiple iterated>

   input.files returns null when iterated="" is present

   input.getFileIterator() returns a FileIterator when iterated="" is present,
   throws otherwise.

   interface FileIterator {
     File? next(); // returns null when done
   }
Comment 4 Ian 'Hixie' Hickson 2014-02-05 00:20:47 UTC
Oops, sicking points out I forgot to make this async at all.

There are various ways to design an async iterator.

A. Having the browser manage the iterator and just call you when needed:
   input.iterateFiles(processFile, doneProcessingFiles);

B. Having an explicit iterator that you invoke at each step:
   var iterator = input.getFileIterator();
   iterator.getNext(handleFile);
   function handleFile(file) {
     if (file) {
       processFile(file);
       iterator.getNext(handleFile);
     } else {
       doneProcessingFiles();
     }
   }

C. An explicit iterator where the value is on the iterator:
   var iterator = input.getFileIterator();
   iterator.getNext(handleFile);
   function handleFile() {
     if (iterator.currentFile) {
       processFile(iterator.currentFile);
       iterator.getNext(handleFile);
     } else {
       doneProcessingFiles();
     }
   }

D. An explicit iterator where the callback is passed all the relevant values:
   var iterator = input.getFileIterator();
   iterator.getNext(handleFile);
   function handleFile(file, iterator) {
     if (file) {
       processFile(file);
       iterator.getNext(handleFile);
     } else {
       doneProcessingFiles();
     }
   }

E. An explicit iterator where the return value decides if you continue:
   var iterator = input.getFileIterator();
   iterator.getAllFiles(handleFile);
   function handleFile(file) {
     if (file) {
       processFile(file);
       return true;
     } else {
       doneProcessingFiles();
       return false;
     }
   }

F. Events:
   var iterator = input.getFileIterator();
   iterator.onfileready = function (file) {
     processFile(file);
     iterator.next();
   };
   iterator.ondone = doneProcessingFiles;
   iterator.next();

G. Events with no explicit stepping:
   var iterator = input.getFileIterator();
   iterator.onfileready = processFile;
   iterator.ondone = doneProcessingFiles;
   iterator.walkAllFiles();

H. Having an explicit iterator that walks all the values:
   var iterator = input.getFileIterator();
   iterator.getNext(processFile, doneProcessingFiles);

I. Various on G, H, and the like where the iterator objects has .pause() and .resume() methods you can invoke.

J. Various on the above with promises.

K. Streams.

Cons of various methods:
 - duplicating the call to the iterator
 - being forced to handle all the files rather than stopping early
 - having to check for null at each step
 - complicated APIs (promises, streams)
 - boolean return values are unintuitive

Pros of various methods:
 - a one-statement solution is terse
 - integrating with other APIs that do similar things (promises, streams)
 - being able to stop early
 - automatically having the "end" method called appropriately
Comment 5 Jonas Sicking (Not reading bugmail) 2014-02-05 00:21:31 UTC
This needs some asynchronousness.

Though I wonder if it wouldn't be better to expose a list of File and "Directory objects" (for some definition of "Directory objects". Right now we have two proposals). This allows for two additional things:

A) It allows the page to create a UI where the user can choose which subdirectories to explore. I.e. you could build a page where the user picks his/her "pictures" folder. The page then creates UI for choosing which folders and subfolders to actually upload to the server.

B) We can add API for adding and removing Directory/File objects.

In general, it seems good to expose the list of Directories and Files that the user has chosen. That way the page can expose UI that displays that list to the user without also exposing all files in all subdirectories.
Comment 6 Ian 'Hixie' Hickson 2014-02-06 19:35:09 UTC
Sounds reasonable. Do we have a Directory object in an existing API I can reuse, or would that be new? Where would be getting these Directory objects from, to add them? Is there a preferred order in which we'd want to enumerate them (directories first, alphabetical, user selection order), or should we just return them in whatever order the OS returns them in? (That would be most efficient, I guess.)
Comment 7 Jonas Sicking (Not reading bugmail) 2014-02-06 21:44:35 UTC
There's two competing proposals for Directory objects. One from google[1] that has been around for a while, but which hasn't gathered implementation interest with browsers outside of Chrome (other vendors have explicitly said they are not interested).

One from mozilla [2] which is still not a full proposal and which also hasn't gotten much interest. The strongest support statement of support is the one from Apple at [3]. This one isn't so much a draft spec yet, but it's being actively worked on.

We should definitely reuse one of those two. Which one is the tricky question. Depends on which will end up getting used for the sandboxed filesystem feature. Ultimately I think we're blocked here until a decision about that is made.

Note that in either case we should only expose read-only access. But that should be possible with either proposal.

[1] http://www.w3.org/TR/file-system-api/#the-directoryentry-interface
[2] http://w3c.github.io/filesystem-api/Overview.html#the-directory-interface
[3] http://lists.w3.org/Archives/Public/public-webapps/2014JanMar/0167.html

As for order, do you mean order that the Directory object exposes its contained files in? If so, I'd defer to the relevant specification above. Most likely that means whatever order the OS returns the files in.

If you mean the order to return the list of files-and-toplevel-directories-that-the-user-has-picked, I think we should use the same order as we currently use for .files. I.e. UA-defined.
Comment 8 Ian 'Hixie' Hickson 2014-02-07 17:59:46 UTC
Wow, those are _not_ compatible. There goes my hope for using a common subset. :-)