Re: Re-entrant invocation of the tree builder

On Jun 17, 2008, at 23:58, Ian Hickson wrote:

>> Aside: I find the concept of "insertion point" in a stream to be  
>> harder
>> to track than a concept of a stack of pending streams where each
>> document.write() pushes a new stream onto the pending stack.
>
> I don't know if a stack can be equivalent to the insertion point  
> concept.

Right. A stack is insufficient.

> It depends whether you keep track of how much you have tokenised for  
> each
> item in your stack, and whether you can append to an item on the  
> stack.
>
> Consider:
>
>   <script>
>     document.write("a<script src=b><\/script>c");
>     document.write("d");
>   </script>...
>
> When the inline script is about to be done executing, the input stream
> looks like:
>
>                                   v  v
>   ...ript>a<script src=b></script> cd ...
>                                   ^  ^
>                                   T  I
>
> ...where T is the tokeniser's position ("c" is the "next input  
> character")
> and I is the insertion point. However as soon as it is done  
> executing the
> UA will pause for 'b', and if b does a document.write() it'll go  
> where "T"
> is, not where "I" is.


OK. Would the following work?

There's a queue of UTF-16 buffers and keyed placeholders. That is,  
there's one queue that contains an interleaving of objects that are  
UTF-16 buffers or objects holding a magic key value.

The buffers have a start position that the tokenizer advances. A  
buffer can be partially consumed, have its start position advanced  
accordingly and be left in the queue for further consumption later.

The normal tokenization process consumes data from the front of the  
queue. When a buffer is empty, it is dequeued and the next buffer is  
consumed. Objects holding magic key values count as empty buffers for  
the purpose of dequeuing.

Exception: There's always at least one buffer object in the queue and  
the last buffer is never dequeued. Instead, it is left in the queue  
when it is empty.

The network stream always adds data to the last buffer or appends a  
new buffer to the queue.

Each document.write call to the parser comes with a magic key value.  
The magic key is guaranteed to be the same for all document.write  
calls from a given script and different from different scripts within  
a document.

On document.write, if there is a pending external script, the queue is  
searched for a magic key holder with the same key value as the  
document.write call. If there is such an object in the queue, the text  
of the document.write call is inserted as an UTF-16 buffer into the  
queue immediately before the key holder object. If there's no such  
object in the queue, a key holder with the key for this document.write  
call is inserted in the front of the queue and then the text is  
inserted as an UTF-16 buffer in front of that of the key holder.

If there's no pending external script, the tokenization of the text  
argument is attempted immediately with parser suspension for event  
loops spins disabled. If tree builder causes the parser to block and  
there are untokenized characters in the text argument, the untokenized  
tail of the argument is treated as in the previous paragraph.

Invariant: The last buffer of the queue is always a buffer that was  
put in the queue by the parser initializer or by the method that  
appends data from the network. The last object in the queue never  
holds a magic key value.

(The motivation for not using the same concepts as the spec is that  
the magic keys is the mechanism Gecko already provides for managing  
the context of document.writes, and this queuing mechanism never  
requires a moving UTF-16 data once it has been written into a buffer.)

-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/

Received on Monday, 17 November 2008 15:57:00 UTC