15698 – [Templates]: Enable on-demand parsing

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 15698 - [Templates]: Enable on-demand parsing

Summary: [Templates]: Enable on-demand parsing

Status:	RESOLVED WONTFIX

Alias:	None

Product:	WebAppsWG
Classification:	Unclassified
Component:	HISTORICAL - Component Model (show other bugs)
Version:	unspecified
Hardware:	PC All

Importance:	P2 normal
Target Milestone:	---
Assignee:	Dimitri Glazkov
QA Contact:	public-webapps-bugzilla

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:	15476
	Show dependency tree / graph

Reported:	2012-01-24 20:58 UTC by Dimitri Glazkov
Modified:	2012-02-08 03:00 UTC (History)
CC List:	2 users (show)

See Also:

Attachments

Description Dimitri Glazkov 2012-01-24 20:58:00 UTC

Templates should be specified a way that enables parsing template contents on demand, rather than at the same time as parsing of the template element.

This opens opportunities for performance optimizations.

Comment 1 Dimitri Glazkov 2012-02-07 00:33:43 UTC

I've done a bit of investigation here.

This is possible for non-nestable <template> elements.

For nestable <template> elements, I made an attempt to write a HTML tokenizer, and determined that the context-sensitive nature of markup forces us to do almost equivalent work to to just completely parsing template contents. Special thanks to Adam Barth for expert guidance.

One possible solution would be to add extra demarcation to the outermost <template> element. For example:

<template><![CDATA[ sdfsfds </template>]]></template>

Of course, this solution is limited, since it only allows one level of nesting.

As it stands now, I don't see a way for implementing on-demand parsing of the nestable <template> elements nicely.

Comment 2 Dimitri Glazkov 2012-02-07 19:03:56 UTC

For posterity, here's my attempt: http://dvcs.w3.org/hg/webcomponents/raw-file/a28e16cc4167/spec/templates/index.html#parsing

It fails for cases where the string "<template>" appears in a string within a <script> element. And probably many more.

Comment 3 Dimitri Glazkov 2012-02-07 21:04:34 UTC

I admit defeat.

Comment 4 Dominic Cooney 2012-02-08 01:58:15 UTC

(In reply to comment #3)
> I admit defeat.

Boo.

Perfect is the enemy of good. You can’t have a </script> in a string literal in a script either; web developers have venerated '</scr' + 'ipt>', so maybe '</tem' + 'plate>' is fine.

This shouldn't be hard; just tedious. FWIW many auto-escapers have a useful set of contexts that include in-script-tag and in-string-literal, it is not so many, and probably more than you need. You could crib from one of those as a first cut.

Comment 5 Rafael Weinstein 2012-02-08 02:06:27 UTC

Holy crow, Dominic is right. I had no idea that this:

<script>
var foo = "</script>";
alert('foo');
</script>

Turns into this:

<script>
var foo = "</script>
</head>
<body>";
alert('foo');
</body>

I'd like to understand how much more work it will be to get the *right* thing, but given this, it doesn't seem wise let perfect be the enemy of good.

Comment 6 Dimitri Glazkov 2012-02-08 02:52:22 UTC

(In reply to comment #4)
> (In reply to comment #3)
> > I admit defeat.
> 
> Boo.
> 
> Perfect is the enemy of good. You can’t have a </script> in a string literal in
> a script either; web developers have venerated '</scr' + 'ipt>', so maybe
> '</tem' + 'plate>' is fine.
> 
> This shouldn't be hard; just tedious. FWIW many auto-escapers have a useful set
> of contexts that include in-script-tag and in-string-literal, it is not so
> many, and probably more than you need. You could crib from one of those as a
> first cut.

This is not just the problem of the closing tag. This:

<template><script> alert('I love the <template> tag');</script></template>

will break stuff in a somewhat more subtle way. I am pretty sure we'll be writing a parser inside of a tokenizer very quickly. It just doesn't fit.

Comment 7 Dimitri Glazkov 2012-02-08 03:00:22 UTC

Also, I rewrote the parsing to just use insertion modes, and it's much simpler and straightforward. We lose the ability to treat templates as strings, but get a more natural (from developer's perspective at least) behavior in return.