10834 – Garbage collection is the wrong level of abstraction

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 10834 - Garbage collection is the wrong level of abstraction

Summary: Garbage collection is the wrong level of abstraction

Status:	RESOLVED WONTFIX

Alias:	None

Product:	HTML WG
Classification:	Unclassified
Component:	pre-LC1 HTML5 spec (editor: Ian Hickson) (show other bugs)
Version:	unspecified
Hardware:	All All

Importance:	P2 normal
Target Milestone:	---
Assignee:	Ian 'Hixie' Hickson
QA Contact:	HTML WG Bugzilla archive list

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2010-09-29 16:39 UTC by Philip Taylor
Modified:	2010-10-04 14:32 UTC (History)
CC List:	6 users (show)

See Also:

Attachments

Description Philip Taylor 2010-09-29 16:39:24 UTC

The spec talks about garbage collection with terms like "strong reference" and "element to which no references exist" and "may the element be garbage collected" and "This object must never be garbage collected" etc.

My understanding of the theoretical meaning of 'garbage' is that something is garbage if the future operation of the program will not be observably affected by whether that thing is destroyed or not (ignoring the observable effect of running out of memory). That seems to match what http://en.wikipedia.org/wiki/Garbage_(computer_science) talks about in the introduction (then calls "semantic garbage"), and is equivalent to http://blogs.msdn.com/b/oldnewthing/archive/2010/08/09/10047586.aspx (garbage collection is simulating infinite memory - the mechanism is a separate concern).

The concept of references is only relevant with a specific class of algorithms that approximate garbage collection by counting or tracing references. The spec seems to assume to assume some algorithm like this, without actually saying so (as far as I can see), which makes the mentions of GC confusing and weird.

In particular, the observable behaviour is unclear. If the behaviour of the browser's GC implementation has observable effects, then that's theoretically wrong (it must have cleaned up something that wasn't truly garbage) and likely very confusing for authors (since many GC implementations are non-deterministic) and an interoperability danger. If there's no observable effect, then the discussion doesn't belong in normative sections of the spec (it's just an implementation quality issue, like any other aspect of performance or resource limits).

If the GC behaviour should be observable, I think the spec ought to add notes pointing out how it can be observed, so that it's clear the behaviour is worth specifying normatively and so that authors know what to look out for. In that case the spec should also define what basic model of GC it's using when defining the behaviour, so that it makes sense.

If the GC behaviour should not be observable, I think the spec shouldn't say anything normative about it, but it could have non-normative notes to implementers to remind them that they mustn't collect the object yet because that collection could be observed (and mention how).

In both cases, the important thing is the observable behaviour that users can experience, not the mechanics of garbage collection.

Comment 1 Ian 'Hixie' Hickson 2010-09-29 18:59:49 UTC

This stuff is only specified because it is observable. For example, if you create a document using createDocument, then create an Element using createElement on that document, then drop all references to everything except the element, element.ownerDocument has to remain non-null, and the only reason for that as far as I can tell is because the spec says that there's a strong reference there.

What kinds of garbage collection algorithms are based on anything other than whether anything references an object? I mean there's lots of ways of doing that, sure, but don't they all boil down to inferring state from the current set of references?

Comment 2 Philip Taylor 2010-09-29 20:35:55 UTC

Unless I'm significantly misunderstanding things:

The fact that you can refer to the document through ownerDocument means there's already necessarily a reference - otherwise you couldn't refer to it. The implementation can't magically grab the object out of thin air when you access ownerDocument, it has to already have a chain of references to reach it.

There's no reason for it to ever automatically turn into null, unless it was specifically implemented with a weak reference that intentionally turns into null at some possibly non-deterministic time when there's currently no strong references, and it shouldn't be implemented like that unless it's specified like that, and I don't think it's specified like that anywhere.

What matters to the spec is simply that it is possible to access the document via ownerDocument, which means the document is not garbage (by the theoretical definition of garbage). It's a low-level implementation detail that the this-is-not-garbage status will be determined by the presence of references, and there's no reason for the spec to care about that specific implementation detail and not the billion other implementation details in a browser.

Having the spec mention half a dozen places where there are strong references is also pointless when e.g. the entire ES5 spec seemingly doesn't mention garbage collection even once, so nothing explicitly defines that if you write "var d = e.ownerDocument;" you've got a strong reference at that point. It would only be meaningful if the reference graph (and root set) of the entire web platform was specified, otherwise there's just a few specified references dangling in a giant ocean of undefinedness.

(Hypothetically you could have a null garbage collector that does nothing, or a future-predicting oracle (perhaps based on a log replay) that collects objects precisely after the last time you read them (even if there's still some references). Less hypothetically, you could have a fancy JITted browser that statically analyses the code to realise a script has an Element variable but only ever accesses tagName so it's free to collect all the other properties immediately, even though it looks like the script still has a reference to the Element.)

Comment 3 Jonas Sicking (Not reading bugmail) 2010-09-29 21:39:51 UTC

I largely agree with Philip here.

GC should generally never have observable effect. Consider the example of an element with a specific owner document (X) and then dropping all other references to that owner document. If GC caused the ownerDocument value from changing from X to null, then GC has observable side effects.

It should generally be ok to simply state that GC should have no observable side effects and that the ownerDocument property refers to the owner document.

For HTML5 I can only think of one exception to this rule. If there are cases when we are concerned that implementations will miss the fact GCing certain objects will have side effects, then it makes sense to have a informative note pointing this out.

For example, consider the following code:

var img = document.createElement('img');
img.src = "/imgs/mypic.jpg";
img.onload = function() {
  alert('hello world');
}
img = null;

In this case it might not be obvious that the implementation must not GC the img element while it is still loading. GCing the element would have the side effect that the load handler never executes. Thus simply defining "GC must not have observable side effects" is technically enough to ensure that the img must not be GCed. However since this might be easy for implementations to miss, thus leading to bugs, it could be worth pointing out in the spec.

Comment 4 Ian 'Hixie' Hickson 2010-09-30 07:36:25 UTC

Every single time anything to do with GC is mentioned in the spec, it's because of cases like that. Acid3 for example has a subtest that actually tests the ownerDocument case I mentioned (or maybe it was parentNode, something equivalent though), and there were browsers that failed it, and while I could make the argument in comment 2 as a way to justify the test, having something explicitly there is makes life a heck of a lot less difficult when you're trying to convince an implementor to change their code.

EDITOR'S RESPONSE: This is an Editor's Response to your comment. If you are satisfied with this response, please change the state of this bug to CLOSED. If you have additional information and would like the editor to reconsider, please reopen this bug. If you would like to escalate the issue to the full HTML Working Group, please add the TrackerRequest keyword to this bug, and suggest title and text for the tracker issue; or you may create a tracker issue yourself, if you are able to do so. For more details, see this document:
   http://dev.w3.org/html5/decision-policy/decision-policy.html

Status: Rejected
Change Description: no spec change
Rationale: If you have a concrete example of something that the text in the spec prevents you from implementing, which should be implementable, then let me know, and I'll rephrase the text -- but I disagree on the overall principle that we shouldn't be talking about GC because it theoretically can't have any effect. It can, and does, and is a source of bugs, and we should be as explicit about such things as we can when striving for interoperability.

Comment 5 Jonas Sicking (Not reading bugmail) 2010-09-30 08:09:02 UTC

Note that while gecko is failing the acid3 test in question, it has nothing to do with the problem mentioned in comment 3. It is purely due to technical limitations in how we used to do memory management, and the fact that it takes a while to migrate memory management model (we used to use pure refcounting, now we use refcounting plus cycle collection).

I'll have to check if it's correct that all mentions of GC is really informative and that the only normative definition is defining that "GC must never have side effects". I was under the impression that that wasn't the case, but I'll double-check.