This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 26957 - Allow sending DOM objects to Workers and expose a DOM (or DOM-like) interface to workers
Summary: Allow sending DOM objects to Workers and expose a DOM (or DOM-like) interface...
Status: RESOLVED WONTFIX
Alias: None
Product: WebAppsWG
Classification: Unclassified
Component: Web Workers (editor: Ian Hickson) (show other bugs)
Version: unspecified
Hardware: PC All
: P2 normal
Target Milestone: ---
Assignee: Ian 'Hixie' Hickson
QA Contact: public-webapps-bugzilla
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-10-02 19:25 UTC by brunoais
Modified: 2014-10-09 16:02 UTC (History)
5 users (show)

See Also:


Attachments

Description brunoais 2014-10-02 19:25:18 UTC
I'd like to revive this bug and ask for a minor alternative that might be possible and OK to go for.

The idea is to allow passing document fragments to worker threads (document fragments only).

The main idea is that, sometimes, a large amount of changes or analysis are required to be made to the DOM.

By allowing sending a fragment to a worker thread, the overhead required to translate all the DOM send into an analyzable format.
You may even require
http://www.w3.org/TR/DOM-Level-2-Core/core.html#Core-Document-importNode
Or some work adapting the content to the worker. As long as that is made in the other thread, there's no issue on my end.


The main use case here is:
Extensive work is required to be made on a subtree of a page's DOM.
Here's an example generic use-case.

1. The main thread splits the DOM tree into parts and then it sends those parts to the worker threads.
2. The worker threads do the DOM changes.
3. The worker threads send the changed DOM to the original thread.
4. The original thread integrates the changed DOM nodes from the workers into it's own DOM.


The requirements for my use-case are:

 --- Worker having a nearly complete document DOM --- 
This document is not connected to any interface. In other words, it acts the same way as a documentFragment towards everything (actually, it can even be an actual documentFragment, if possible).
Any changes to it only changes that DOM tree inside the worker thread.

 --- You can pass DOM objects ---
A passed DOM object is read-only on the destination until
http://www.w3.org/TR/DOM-Level-2-Core/core.html#Core-Document-importNode
(or equivalent) is called. This one may be harder to do because there's no synchronization and implementing a copy-on-write may just be too much.
OR
Passing a DOM is made by cloning it to the document that resides in the worker thread. The result of that clone is equivalent of passing it through the importNode() method. This one should be the easiest to implement. It may be slower, but then it should only affect performance for the first send (considering that there are multiple workers).

There can be optimizations here such as for documentFragment. If transfering a documentFragment it could behave just like a Transferable such as ArrayBuffer (once transfered, it cannot be accessed from the source thread). As far as I know, there's no way for a change in a documentFragment to be reflected in the interface's document and vice-versa but there can be internal code requirements that make this optimization impossible.


If this is enabled, when the parsing work on the js side is more complicated, it may be sent to the worker thread to deal with the specifics and still have tree-like operations with Nodes that look like the interface's DOM.


Just to keep it clear:
-> This is still not memory sharing, just memory transfer or memory ownership transfer.
-> All changes inside the worker do not reflect on any other thread. I.E. changes on a specific thread's DOM documents do not reflect on any other thread's DOM documents.
Comment 1 Boris Zbarsky 2014-10-02 19:37:03 UTC
The issue here is whether implementors are willing to do this, no?

At least in Gecko, making it possible to use DOM nodes from multiple threads (yes, distinct DOM nodes) would be a pretty major undertaking.  Starting with the fact that we intern things like localnames and namespaces and the interning is not thread-safe, for example.

Making this work is probably a 20+ person-year sort of project which will in the  process degrade the performance of DOM on the main thread, I expect.  Avoiding the performance degradation would make it a 50+ person-year sort of project and introduce a lot more complexity and security concerns...
Comment 2 brunoais 2014-10-02 20:30:21 UTC
I don't get it. 
Why would DOM methods need to be thread-safe to implement what I'm suggesting? Am I missing something?

> At least in Gecko, making it possible to use DOM nodes from multiple threads (yes, distinct DOM nodes) would be a pretty major undertaking.  Starting with the fact that we intern things like localnames and namespaces and the interning is not thread-safe, for example.

What does that have to do with my suggestion? The only problem you are stating is that DOM methods applied to a DOM node are not thread-safe, if I understood that right. I really don't see why that is an impediment to my request as there's no memory sharing.
Comment 3 Boris Zbarsky 2014-10-02 21:01:43 UTC
I am saying that the implementations of DOM methods use shared state that is main-thread only (like string interning tables, for a start).

> I really don't see why that is an impediment to my request as
> there's no memory sharing.

My point is there is assumed memory sharing in the DOM implementation itself.  And that getting rid of that assumed memory sharing is not a small undertaking.
Comment 4 Ms2ger 2014-10-03 06:44:59 UTC
Closing for all the same reasons as before.
Comment 5 brunoais 2014-10-08 16:30:57 UTC
There are DOM implementations nearly everywhere.

Take this one, for example:
https://github.com/tmpvar/jsdom

It is a DOM implementation in javascript. The problem is that it tampers with C and C++ files so it cannot be used out-of-the-box.

Are you sure that there's no way of having two instances of things that offer the same DOM interface but share no data (from the interface standpoint)?
Comment 6 Domenic Denicola 2014-10-08 16:41:35 UTC
> The problem is that it tampers with C and C++ files so it cannot be used out-of-the-box.

Incorrect. jsdom can be used in browsers.
Comment 7 brunoais 2014-10-08 17:05:58 UTC
Really?
Thanks for the answer.
Then I may have a way to have it. I just wonder how slower it is compared to the browsers' implementations...

I still wonder how inviable this is...
Comment 8 Boris Zbarsky 2014-10-08 19:33:11 UTC
Are you OK with the DOM in workers not exposing all the same APIs?  Or do you expect UAs to implement every DOM API twice?

I mean, this is all technically feasible; it's "just" a matter of resources.
Comment 9 brunoais 2014-10-08 22:06:22 UTC
Yes. It doesn't need to be the whole API. The more the better but if there are technical issues related to shared memory, I cannot know how much it can be done.

I wonder if it is the rendering engine that processes the DOM or if there's a layer that does that work and the rendering engine just takes the internal DOM representation and then translate it to a visual representation of that content.

If it is the latter, then it is definitely not required for what I mention in this bug.

Large subset of the DOM API (outside what is already specified to be implemented inside the workers) is inside what I ask but lots of it is not required.

E.g:
- LocalStorage -> Not required.
- IndexedDB -> Not required (although if implemented it would be quite awesome).
- Mutation observers -> Not required (completely useless in a worker also)
- Events (and all related, including addEventListener) -> Not required (completely useless)
- Deprecated DOM interfaces (such as the document.all and related) -> Not required.
- Most window's methods and properties such as:
-- window.name -> Makes no sense
-- window.postMessage (use the main thread instead)
- Many of Document's methods and properties -> Not really useful. Such as:
-- document.getSelection() (and others related to selections) -> Makes no sense.
-- document.hasFocus (use the main thread instead)
-- document.open + document.close (use the main thread instead)

- Any way to communicate or send information to other threads outside the already defined events system for communication with thread that started the worker's execution.



DOM searching, however, would be a great addition. Such as:
.querySelector() + .querySelectorAll()
.getElement(s)By*()
.matches()
.children
NodeList
document.createElement()
.classList
.style (only changes the HTML. Has no action in the interface)
.setAttribute() + .getAttribute()
etc...


Does it look like more realistic that way? Is there a need to remove more DOM functionality or am I removing all that (probably) uses shared memory?
Comment 10 Boris Zbarsky 2014-10-09 00:21:18 UTC
At least in Gecko, querySelector, nodelists, .style can definitely use shared memory right now.
Comment 11 Boris Zbarsky 2014-10-09 00:21:55 UTC
And at one point so did .nextSibling/.previousSibling, though I think we changed that at some point...
Comment 12 brunoais 2014-10-09 06:43:22 UTC
> At least in Gecko, querySelector, nodelists, .style can definitely use shared memory right now.

Would that be shared memory in:
querySelector -> Called on the same document.
NodeList -> Called on the same parent Node.
.style -> Called on the same Node.

Because, otherwise, I don't really see a reason to share memory for these methods/Objects.
Comment 13 Boris Zbarsky 2014-10-09 14:59:01 UTC
> querySelector -> Called on the same document.

Unclear from a quick glance.

> NodeList -> Called on the same parent Node.

No.

> .style -> Called on the same Node.

No.

> Because, otherwise, I don't really see a reason to share memory for these
> methods/Objects.

You're welcome to read the implementation.  The source is open.

But as a quick hint, getElementsByTagName caches values so we can keep returning the object every time, and it does so in a global hashtable (which has better memory usage characteristics that using multiple per-document hashtables).
Comment 14 brunoais 2014-10-09 15:25:25 UTC
Thanks.

I'm downloading firefox's source code to get a good idea on it.

Is there any index I can use so that I don't take too long to find each set of functions for each of those DOM Interface objects and methods? That would save me a lot of time finding those implementations.
Comment 15 Boris Zbarsky 2014-10-09 15:41:41 UTC
http://dxr.mozilla.org/mozilla-central/source/ or http://mxr.mozilla.org/mozilla-central/ (they differ a bit in terms of what they can find, since the latter is based on text indexing and the former based on what's actually compiled).
Comment 16 brunoais 2014-10-09 16:02:46 UTC
Thank you!