This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 24658 - [imports]: The fetch readiness shouldn't block fetching.
Summary: [imports]: The fetch readiness shouldn't block fetching.
Status: RESOLVED FIXED
Alias: None
Product: WebAppsWG
Classification: Unclassified
Component: HISTORICAL - Component Model (show other bugs)
Version: unspecified
Hardware: PC Linux
: P2 normal
Target Milestone: ---
Assignee: Dimitri Glazkov
QA Contact: public-webapps-bugzilla
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: 20683
  Show dependency treegraph
 
Reported: 2014-02-13 21:46 UTC by Morrita Hajime
Modified: 2014-02-19 00:29 UTC (History)
4 users (show)

See Also:


Attachments

Description Morrita Hajime 2014-02-13 21:46:51 UTC
Spawned from Bug 24564.

Blocking fetch kills concurrency.
Instead, the readiness criteria should block only script execution.
Comment 1 Gabor Krizsanits 2014-02-14 00:47:26 UTC
I think we still need to chew on this for a while, maybe in the end the more conservative blocking fetch will be the way to go. For example if we parse an import that looks something like:

script1
subimport1
script3

in script blocking mode. And then we want to unblock the scripts in it, we should unblock it until it gets to subimport1, and then re-block it. Unblock subimport1 and wait until it's all done. Then run script 3... It's quite hard to think over what is going on... The more or I think the better I like the blocking on the fetching level or at least on the parsing level.

> > The other part is not entirely clear how all these algorithms in the spec
> > work together exactly. More specifically when should one check for fetch
> > readiness? Every time an import finishes we find the next candidate that can
> > be fetched?
> 
> Yes, this is kinda reactive thing. In Blink, the readiness of each import is 
> re-evaluated each time when any one import finished loading.
> 
> I hope there are better way to explain this nature but I have no idea so
> far. Suggestions are welcome!

I think when there is at least one link in the list that is not ready to be fetched, we have to run the 'find next to be fetched algorithm' after each import that is done loading. The algorithm will find the first link on the list whose flag can be changed from not ready to ready to be fetched and start fetching it (if there is any). Is that correct? 
And of course when a new link is added to the list that is immediately ready to be fetched we just fetch it.

So the special case is when we start fetching an import like:

...
   I
    I2
    I3

And as we add I2 to the list we immediately start fetching it and it gets done before the parser adds I3 to the list, then the 'find next to be fetched' algorithm will not find any new candidate, but that's not a problem since soon we add I3 to the list and start fetching it immediately.
Comment 2 Morrita Hajime 2014-02-14 01:38:25 UTC
Thanks for sharing your thought Gabor!

Before stepping into the detail, let me clarify the motivation behind
this "block only on <script>" idea.

Think all imports are composed in this way:

---
<html>
<head>
  <link rel=import ....>

  ... bunch of more imports ...

  <script>
  ... // main logic of the imports
  </script>
</head>
</html>
---

Assuming all imports are written in this style and we don't block on imports (only block on <script>), we can fetch subimports as soon as we receive imports bytestream and started parsing them. We could possibly request all requested imports and stylesheets without blocking any. This maximizes the concurrency and minimizes the delay.

If we block on fetching, we'll introduce delay. Even if we pre-fetch them,
waiting before parsing means we cannot see <link>s in imports to be parsed.

Although it can go as fast as non-blocking scenario with prescanning+prefetch,
it would be nice if we can make parallelism built into the spec,
instead of expecting many speculative work.

I agree that current fetch readiness criteria doesn't work well with this.
The first version of the spec was written this in mind, but then dedup made
things complicated and I once gave up the parallelism to introduce fetching readiness.

But there are strong demand to make this fast. After all, speed is one of the justification why we want HTML Imports.

So let's make it work somehow.
Comment 3 Morrita Hajime 2014-02-14 02:19:47 UTC
(In reply to Gabor Krizsanits from comment #1)
> I think we still need to chew on this for a while, maybe in the end the more
> conservative blocking fetch will be the way to go. For example if we parse
> an import that looks something like:
> 
> script1
> subimport1
> script3
> 
> in script blocking mode. And then we want to unblock the scripts in it, we
> should unblock it until it gets to subimport1, and then re-block it. Unblock
> subimport1 and wait until it's all done. Then run script 3... It's quite
> hard to think over what is going on... 

I think <link rel=stylesheet> behaves like this.
It blocks <script> and it doesn't block anything else.

> The more or I think the better I like
> the blocking on the fetching level or at least on the parsing level.

In my understanding, <script> behaves like that way because the script execution should be completely serialized and the order should be deterministic. <link rel=stylesheet>, on the other hand, behaves differently. There is no loader ordering constraint for stylesheet, so it is loaded as it is available.

HTML import is a mix of these two. As long as parser doesn't see <script>, there is no order constraint. Once it sees any script, it blocks until the newly introduced order constraint is resolved.

> I think when there is at least one link in the list that is not ready to be
> fetched, we have to run the 'find next to be fetched algorithm' after each
> import that is done loading. The algorithm will find the first link on the
> list whose flag can be changed from not ready to ready to be fetched and
> start fetching it (if there is any). Is that correct? 
> And of course when a new link is added to the list that is immediately ready
> to be fetched we just fetch it.
> 

A possible problem of "find next" approach is that it embodies order serialization. If we spec in this way, it will be hard to explain concurrency of the process.

At the same time, it might be clearer to make it explicit that when we recompute the readiness and unblock parsers. As you are suggesting, it would be when each import finished loading.
Comment 4 Gabor Krizsanits 2014-02-14 19:03:56 UTC
(In reply to Morrita Hajime from comment #2)
> Assuming all imports are written in this style and we don't block on imports
> (only block on <script>), we can fetch subimports as soon as we receive
> imports bytestream and started parsing them. We could possibly request all
> requested imports and stylesheets without blocking any. This maximizes the
> concurrency and minimizes the delay.

Just one thing... If imports look like this those scripts will be executed even if the sub imports are failed to be loaded... If you want to hook up an onload function on the import link element, the definition of that onload function should be in a script before the subimports in the DOM.

<head>
<script>function runMe(){..}</script>
<link rel=import ... onload="runme()">

But I might be overlooking something trivial...

> But there are strong demand to make this fast. After all, speed is one of
> the justification why we want HTML Imports.
> 
> So let's make it work somehow.

I very much agree with all this, that's why I brought up this idea, it just made my head exploded a bit yesterday to think over all the edge cases :) But, yes, let's try to make this work.

> In my understanding, <script> behaves like that way because the script
> execution should be completely serialized and the order should be
> deterministic. <link rel=stylesheet>, on the other hand, behaves
> differently. There is no loader ordering constraint for stylesheet, so it is
> loaded as it is available.
> 
> HTML import is a mix of these two. As long as parser doesn't see <script>,
> there is no order constraint. Once it sees any script, it blocks until the
> newly introduced order constraint is resolved.

Thanks for this explanation, I think I need to read up on this a bit more.

> At the same time, it might be clearer to make it explicit that when we
> recompute the readiness and unblock parsers. As you are suggesting, it would
> be when each import finished loading.

Yes, exactly this is my point. So if I understand correctly we want to guarantee a sane script execution order, and block only things that really needs to be blocked for it. So how about this:

Script execution can be blocked in an import two ways, either internally during parsing it when a script or an import tag is found, or externally, before even we begin parsing it, because there is another import in the tree that has to be finished first. This later what we really have to define and deal with more now, since the parser will take care of the other first kind of blocking.

Instead of that list we could just build up this tree of links whose imports might be blocking an external import (a link that is not in their sub import tree). So when an import is finished loading we remove it from this tree, and check if the left most leaf (we call first child in this tree until we find a leaf I mean) in this tree can/has to be unblocked. Kind of equivalent with the list version but I find it easier to wrap my head around, so I though it worth to be shared.
Comment 5 Morrita Hajime 2014-02-14 23:32:31 UTC
(In reply to Gabor Krizsanits from comment #4)
> Just one thing... If imports look like this those scripts will be executed
> even if the sub imports are failed to be loaded... If you want to hook up an
> onload function on the import link element, the definition of that onload
> function should be in a script before the subimports in the DOM.
> 
> <head>
> <script>function runMe(){..}</script>
> <link rel=import ... onload="runme()">
> 
> But I might be overlooking something trivial...

Yeah, onload won't be that useful in practice when you use it in the markup form. I see that onload is used to track asynchronous loading done by
adding <link> dynamically:

link = document.createElement("link");
link.setAttribute("rel", "import");
link.setAttribute("href", url);
link.onload = function() { ... }
document.head.appendChild(head);

> > At the same time, it might be clearer to make it explicit that when we
> > recompute the readiness and unblock parsers. As you are suggesting, it would
> > be when each import finished loading.
> 
> Yes, exactly this is my point. So if I understand correctly we want to
> guarantee a sane script execution order, and block only things that really
> needs to be blocked for it. So how about this:
> 
> Script execution can be blocked in an import two ways, either internally
> during parsing it when a script or an import tag is found, or externally,
> before even we begin parsing it, because there is another import in the tree
> that has to be finished first. This later what we really have to define and
> deal with more now, since the parser will take care of the other first kind
> of blocking.
> 

Ah, having the internal blocking and external blocking separately seems
good idea to explain it. Thanks!

> Instead of that list we could just build up this tree of links whose imports
> might be blocking an external import (a link that is not in their sub import
> tree). So when an import is finished loading we remove it from this tree,
> and check if the left most leaf (we call first child in this tree until we
> find a leaf I mean) in this tree can/has to be unblocked. Kind of equivalent
> with the list version but I find it easier to wrap my head around, so I
> though it worth to be shared.

It's almost tree, except we have import sharing (dedup). de-dup makes everything complicated. Once we allow import sharing (by having <link>s with same URL), it is no longer a tree. It becoems DAG, or even could have cycles. That's why I gave up the tree and use list intead.

But yes, having a list globally makes explanation cryptic
and it kills parallelism easily if misstated.

My current thinking here is that we could prevent cycles explicitly and accept the notion of DAG somehow.
Comment 6 Gabor Krizsanits 2014-02-17 01:12:11 UTC
(In reply to Morrita Hajime from comment #5)
> It's almost tree, except we have import sharing (dedup). de-dup makes
> everything complicated. Once we allow import sharing (by having <link>s with
> same URL), it is no longer a tree. It becoems DAG, or even could have
> cycles. That's why I gave up the tree and use list intead.
> 
> But yes, having a list globally makes explanation cryptic
> and it kills parallelism easily if misstated.
> 
> My current thinking here is that we could prevent cycles explicitly and
> accept the notion of DAG somehow.

Ah yeah, I keep saying tree when it's not... But it does not matter much really, the tree we are looking for is well defined and part of the directed graph of imports. Only extension to my algorithm that while we are looking for the left most child we build up a list form the parent chain (that we have to anyway because of the multiple parents) and check if the next first child is in this list already. If so, we found a circle, we just ignore that link and check for the next child or if there is no more child we're done, we found the leaf node we were looking for in the sub-tree. This way the alg. will traverse the graph as if it were a tree, ignoring exactly the edges that make circles. For de-dup we have to add one more thing to it: step to the next one when the referred import is already started/unblocked.
Comment 7 Morrita Hajime 2014-02-18 23:44:39 UTC
See https://github.com/w3c/webcomponents/commit/220b817d705f483b978b765f24244287d8ca28f9 and following changes.
I should've done this in branch to make the diff easy to read :-(

This is an attempt to reduce blocking.

The main changes are:

- Now the import dependencies are explained as a DAG.
  The dependency for each import is explicitly spec-ed as "import dependent" 
  using the DAG topology.

- The "import fetching" algorithm no longer spins until unblock.
  Only place to spin is before <script>, as it is originally intended.
  The notion of "fetch readiness" is gone.

Some superficial changes:

- "import parent" is renamed to "import owner". 
  Now the parent-child relationship over DOM is called "import owner" and
  one over the DAT is called "import parent".

- Many existing paragraphs are rephrased based on these new concepts.

I hope new revision make the whole concept clearer. 
As usual, feel free to reopen/file new bugs.
Any feedback will be appreciated!
Comment 8 Morrita Hajime 2014-02-19 00:29:43 UTC
(In reply to Gabor Krizsanits from comment #6)
> Ah yeah, I keep saying tree when it's not... But it does not matter much
> really, the tree we are looking for is well defined and part of the directed
> graph of imports. Only extension to my algorithm that while we are looking
> for the left most child we build up a list form the parent chain (that we
> have to anyway because of the multiple parents) and check if the next first
> child is in this list already. If so, we found a circle, we just ignore that
> link and check for the next child or if there is no more child we're done,
> we found the leaf node we were looking for in the sub-tree. This way the
> alg. will traverse the graph as if it were a tree, ignoring exactly the
> edges that make circles. For de-dup we have to add one more thing to it:
> step to the next one when the referred import is already started/unblocked.

So I added explicit cycle check in the new revision for the clarity. By excluding cycles, the final DAG is almost deterministic, although its interim shape could be different depending on the network state. De-dup will work as well because we split requesting and fetching. Basically, fetching is for detecting the dup.

I think we share the understanding in this part. What we might need to discuss more is how to define the "order" of fetching and/or
script execution.

The new revision only defines the criteria of blocking script. It means that the loading of imports can happen in parallel: It is OK to start loading as soon as the parser sees <link rel=import>. What we need to do for determinisity is:

- Make sure that de-dup works by checking import-map before fetching (as specified in the algorithm).
- Make sure that the dependent criteria blocks script execution and never over-unblock them.

Ensuring these, the script execution order is stay deterministic and locking delay can be minimized at the same time. We don't need to define "the next" or order of import in general: Everything can happen concurrently, blocking script takes are of ordering.

Well, I admit that stylesheet ordering across imports (cascading order) will need some more. My plan is to define it on top of the DAG instead of modifying it. See bug 24616 for this.