29039 – Clarifications and extensions to worker semantics

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 29039 - Clarifications and extensions to worker semantics

Summary: Clarifications and extensions to worker semantics

Status:	RESOLVED MOVED

Alias:	None

Product:	WHATWG
Classification:	Unclassified
Component:	HTML (show other bugs)
Version:	unspecified
Hardware:	PC All

Importance:	P2 normal
Target Milestone:	Unsorted
Assignee:	Domenic Denicola
QA Contact:	contributor

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2015-08-06 16:11 UTC by Lars T Hansen
Modified:	2016-06-23 13:01 UTC (History)
CC List:	5 users (show)

See Also:

Attachments

Description Lars T Hansen 2015-08-06 16:11:33 UTC

The background for this report is that the worker semantics are a little too hazy to allow workers to be used reliably.  I could split this up but I'm just posting it all here to at least make it visible.

1) Worker startup and forward progress

The spec /probably/ already requires a worker to start immediately (this is the "run a worker" algorithm) once it is created (step 1 seems to require forking off an actually concurrent thread) but it would be helpful if this were stated explicitly, as it is observable.

In Firefox, a worker is not started until the creating agent returns to its event loop, or, if the worker budget for the domain is exhausted, not started at all until some other worker in the same domain dies.  I would like those behaviors to be clear errors, as they seriously impact programmability.  But if they are going to be allowed then the spec needs to note this, and there needs to be ways to discover what's going on.

I've had a report that Edge, too, does not start the worker until the script returns to its event loop but I've not verified that.

The fact that a worker did not start can be observed by the creating agent or the user or a remote server in various ways, depending on what APIs we assume are available to the worker.  If the main thread forks off a worker to do an xhr to a remote server then the server will observe that the request never arrives.  If the worker posts a message in the console then the user will observe that the message never appears (and if workers get access to WebGL this kind of thing will become obvious).  If (in the context of the ongoing shared-memory work) an agent forks off a worker and then waits on a location in shared memory with the expectation that the worker wakes it up, then the creating agent will never wake up, because the worker will not run.  (That can constitute "observation" if the wait has a timeout; otherwise it's a deadlock.)

Specifically, therefore,

- A worker must be a true concurrent agent with a forward progress guarantee (more on that below) and the 'new Worker()' is the only action needed to create the worker, no additional action (returning to the event loop, staying within budget) may be required by the implementation

- If a worker cannot be backed by a concurrent thread for resource exhaustion reasons the result needs to be an exception, not a quiet failure that entails a possible deadlock.  Or, minimally, the possibility of failure must be noted in the spec and there must be some way of detecting it, see below on reading a worker's state.

2) Curtail the browser's license to kill

The "kill a worker" algorithm is also interesting.  It is stated that it may be run at any time, and the rest of the spec seems to take that into account properly.  Thus a correct implementation of workers can pick a random worker in a random domain every second and just gun it down for no good reason at all.  Again, this seriously impacts the utility of workers and is probably wider license than the browser needs.

I imagine the need for the /some/ wording comes from the desire to kill runaway workers without a slow-script dialog and also from the need to close tabs and throw things out of the history; I expect the current wording was simply expedient.

In the shared-memory setting it is more likely than not that workers will communicate synchronously through shared memory, and that it will be very difficult to determine if a worker is 'runaway'.  I don't know what to do about that, exactly, but could we at least enumerate in the spec the situations under which the kill a worker algorithm can be run, so that we can discuss specific cases?

3) Expose the worker state

There does not seem to be any easy way of detecting that a worker is no longer operating (it may, for example, have been killed by the browser without any algorithmic action on part of either the worker or the worker's creator).  It's possible I've missed something, but it seems to me that this should be discoverable.  One mechanism might be a property "state" on the Worker object that can be read to discover the current state of the worker; another mechanism might be an exception delivered to sending agent if the receiver is not capable of receiving; perhaps, if a worker is killed, an error event should be delivered up the creation chain.

Note that this does not expose GC per se, since the Worker object is still in hand to be queried.  It might expose some runtime decisions about reaping service workers; I'm not sure.  Frankly I'm mostly interested in plain workers, since they will be used for computations.

More background:

Some of these observations were made as we were working on the shared memory spec (https://github.com/lars-t-hansen/ecmascript_sharedmem, see issues 2, 3, and 5), where workers are our unit of concurrency, and some have come up in the context of WebAssembly, where they need to figure out what their unit of concurrency is (notably https://github.com/WebAssembly/design/issues/104).

The "guarantee of progress" notion can be formalized in various ways, there's a nice attempt for C++ (C++ Light-Weight Execution Agents WP / Progress guarantees) that is possibly more than the Web needs, but forward progress for truly concurrent agents is formalized there in terms of observable steps.

I suspect that in the context of the web and ECMAScript, forward progress for jobs might be formalized in ES2016 and workers could adopt ES jobs as the means of running code.

Comment 1 Lars T Hansen 2015-08-06 19:15:31 UTC

The link to the referenced C++ working paper dropped out during cutting and pasting.  This is it: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/n4439.pdf.

Comment 2 Lars T Hansen 2016-03-09 18:07:50 UTC

This is a related bug, probably, esp the mail thread referenced in its second comment:  https://www.w3.org/Bugs/Public/show_bug.cgi?id=28813#c2

Comment 3 Anne 2016-06-23 13:01:09 UTC

This is now superseded by these issues:

* https://github.com/whatwg/html/issues/851
* https://github.com/whatwg/html/issues/1004