25521 – Even with the "optionally intermixed with one or more script-supporting elements." clause, this is s [...]

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 25521 - Even with the "optionally intermixed with one or more script-supporting elements." clause, this is s [...]

Summary: Even with the "optionally intermixed with one or more script-supporting eleme...

Status:	RESOLVED WONTFIX

Alias:	None

Product:	WHATWG
Classification:	Unclassified
Component:	HTML (show other bugs)
Version:	unspecified
Hardware:	Other other

Importance:	P3 normal
Target Milestone:	Needs Impl Interest
Assignee:	Ian 'Hixie' Hickson
QA Contact:	contributor

URL:	http://www.whatwg.org/specs/web-apps/...
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2014-04-30 20:49 UTC by contributor
Modified:	2014-09-29 19:36 UTC (History)
CC List:	5 users (show)

See Also:

Attachments

Description contributor 2014-04-30 20:49:47 UTC

Specification: http://www.whatwg.org/specs/web-apps/current-work/multipage/tabular-data.html
Multipage: http://www.whatwg.org/C#the-table-element
Complete: http://www.whatwg.org/c#the-table-element
Referrer: https://www.google.ca/

Comment:
Even with the "optionally intermixed with one or more script-supporting
elements." clause, this is still problematic for both web components and
frameworks which preceded web components and replace DOM nodes with custom
content (such as AngularJS). This applies to every single "allowed content"
rule in the spec. With elements whose content changes at runtime, content
models are entirely meaningless, as a none-table element may represent a table
element eventually. This is broken when the web is used as an application
platform rather than a document viewer, which seems to be the way things are
going whether anyone likes it or not. For these reasons, all of these content
model rules should probably be discarded entirely, or fixed to address that
problem.

Posted from: 99.237.75.191
User agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/34.0.1847.131 Safari/537.36

Comment 1 caitp 2014-04-30 20:53:32 UTC

For an example of this being broken, see https://github.com/angular/angular.js/issues/7295

A user wishes to have a custom element which replaces a node which is not in the table content model, with some table nodes.

The browser, in following what the spec suggests, decides to move these nodes. Therefore, the custom component is not capable of providing a shorthand for table content, except using the workaround mentioned by the user.

This workaround is not actually possible with Web Components / Custom Elements, currently, and it is still problematic when using document.registerElement in place of Angular components.

This is quite broken, and was ill-conceived to begin with.

Comment 2 Ian 'Hixie' Hickson 2014-05-01 18:07:02 UTC

Is the problem here that the parser won't recognise custom elements in tables, or that validators will say that custom elements are invalid? or both?

Comment 3 caitp 2014-05-01 18:27:06 UTC

The problem is really both. Parsers will orphan the non-table content from the table, and validators (in particular, the w3.org validator in experimental html5 mode -- but this isn't terribly surprising) will report an error for the same reason: "Start tag some-tr-component seen in table." (for example).

However, for most people, the browser re-arranging their markup is a bigger problem than the validator issue.

Comment 4 Ian 'Hixie' Hickson 2014-05-01 19:18:30 UTC

Isn't the is="" attribute the solution for the parser? I mean, we can't fix the parser, that's a non-starter. But is="", I thought, was meant to be the solution there.

For content outside of parser-impacted areas, like flow content, Web components presumably need a mechanism to define what categories they fall into.

Comment 5 caitp 2014-05-01 19:27:33 UTC

> I mean, we can't fix the parser, that's a non-starter.

Sure we can. This is fixable in a backwards-compatible way by simply not orphaning elements which contain a hyphen (which is a suggested style for custom elements, and is not used by any "native" elements). But it's not likely that anyones application actually depends on that weird, undesirable behaviour in the first place.

> But is="", I thought, was meant to be the solution there.

This I don't know about, as I have not seen this mentioned in any of the specs I keep up with. However, it's not really ideal for authors to have to make use of some attribute just to get around undesirable behaviour of the parser.

Do you have a link to somewhere that this is documented? I've been looking for the past few minutes and haven't seen this mentioned anywhere yet.

Comment 6 Ian 'Hixie' Hickson 2014-05-02 23:16:38 UTC

> But it's not likely that anyones application actually depends on that weird, 
> undesirable behaviour in the first place.

We have AMPLY proved that this line of reasoning doesn't work on the Web. People rely on all KINDS of crazy stuff.


> > But is="", I thought, was meant to be the solution there.
> 
> This I don't know about, as I have not seen this mentioned in any of the
> specs I keep up with. However, it's not really ideal for authors to have to
> make use of some attribute just to get around undesirable behaviour of the
> parser.
> 
> Do you have a link to somewhere that this is documented? I've been looking
> for the past few minutes and haven't seen this mentioned anywhere yet.

No idea. Dimitry?

Comment 7 Dimitri Glazkov 2014-05-02 23:18:39 UTC

(In reply to Ian 'Hixie' Hickson from comment #6)
> > But it's not likely that anyones application actually depends on that weird, 
> > undesirable behaviour in the first place.
> 
> We have AMPLY proved that this line of reasoning doesn't work on the Web.
> People rely on all KINDS of crazy stuff.
> 
> 
> > > But is="", I thought, was meant to be the solution there.
> > 
> > This I don't know about, as I have not seen this mentioned in any of the
> > specs I keep up with. However, it's not really ideal for authors to have to
> > make use of some attribute just to get around undesirable behaviour of the
> > parser.
> > 
> > Do you have a link to somewhere that this is documented? I've been looking
> > for the past few minutes and haven't seen this mentioned anywhere yet.
> 
> No idea. Dimitry?

http://w3c.github.io/webcomponents/spec/custom/#dfn-type-extension

Comment 8 Dimitri Glazkov 2014-05-02 23:25:46 UTC

Now with an example (requires a browser that supports custom elements): http://jsbin.com/raxev/2/edit

Comment 9 caitp 2014-05-03 01:11:14 UTC

(In reply to Ian 'Hixie' Hickson from comment #6)
> We have AMPLY proved that this line of reasoning doesn't work on the Web.
> People rely on all KINDS of crazy stuff.

Here's the thing.

Re-arranging markup during parsing is fundamentally broken. This could be fixed in a backwards compatible way, by refusing to re-arrange elements with hyphens in the tag name. However, this sucks, because it creates inconsistencies in the behaviour of the parser, and adds complexity. So it would take longer to implement, and would confuse authors. This isn't good.

But, because re-arranging markup is fundamentally broken, there is no compelling reason to continue to do it. Would changing it possibly break apps? It might, but if anyone actually cares about those apps/pages, they'll probably take the time to fix their markup. And for historical sites which aren't being maintained, they can always be viewed with a legacy browser. Old parsing could be enabled via a pref in chromium or firefox, as kind of a quirks mode.

There is no end to the number of ways this could be fixed more elegantly than requiring an unrelated tag name and an extra attribute to be used. Throwing out portions of parsers which re-arrange markup would actually mean simplifying a code base. Not behaving in this weird, incorrect way would also make markup simpler, and these are all positive things.

So, while it's good that we have a workaround for web-components, it's not really an appropriate solution. It is no help to people writing applications using frameworks like legacy Ember, Angular, Knockout, or any of the other pre-webcomponents toolsets which don't use document.registerElement(). An effort should be made to simplify the web, simplify implementations, and remove weird undesirable behaviours. Adding an extra mandatory attribute just to work around brokenness in the parser is not the right way to go.

Needless to say, these are all personal opinions, but really, authors don't want to be concerned with making sure to obey the content model specified in the spec, and they shouldn't need to. The content model for visible elements should have absolutely no significance. In practice, they don't have any significance. If I manually call table.appendChild(someDiv), a browser will happily append that div to the table. Here's an example which will work in every major browser: http://jsfiddle.net/ULwSB --- Layout just does not care about the content model, the DOM api does not care about the content model. The only thing that does care about the content model is the parser, and it is incorrect to care. It surprises people that it cares.

So, since browser vendors and the W3C aren't actually responsible for being curators of a huge museum of vintage broken applications, there is no compelling reason to continue to promote a fundamentally broken behaviour, which is completely unnecessary as far as layout engines are concerned.

I hope my manner of speaking doesn't upset anyone, I've been told that I sound judgemental or even condescending at times when I write on these things, and that's not the intention. However, I would prefer to break old broken behaviour, rather than introduce new behaviour which benefits only specific use cases, and which clutters up markup.

Comment 10 Simon Pieters 2014-05-05 13:47:19 UTC

(In reply to caitp from comment #9)
> Re-arranging markup during parsing is fundamentally broken. This could be
> fixed in a backwards compatible way, by refusing to re-arrange elements with
> hyphens in the tag name.

I think you need to provide data that the number of Web pages that do rely on that is low enough to be able to make that change. (For instance, you could instrument the HTML parser in Blink to trigger a use counter when it happens, plus a use counter for all foster parentings, for comparison.)

> However, this sucks, because it creates
> inconsistencies in the behaviour of the parser, and adds complexity. So it
> would take longer to implement, and would confuse authors. This isn't good.

Right.

> But, because re-arranging markup is fundamentally broken, there is no
> compelling reason to continue to do it.

The reason is that the Web relies on it behaving that way.

> Would changing it possibly break
> apps? It might, but if anyone actually cares about those apps/pages, they'll
> probably take the time to fix their markup.

No, it doesn't work that way. What happens is that users switch to a competing browser where the app still works.

> And for historical sites which
> aren't being maintained, they can always be viewed with a legacy browser.

That doesn't work either. There are *lots* of sites that aren't being maintained (and new ones are created every day), but users don't have legacy browsers to view them. Nor should they since legacy browsers have security vulnerabilities that have been fixed in new browsers.

> Old parsing could be enabled via a pref in chromium or firefox, as kind of a
> quirks mode.

Please, no. :-(
 
> There is no end to the number of ways this could be fixed more elegantly
> than requiring an unrelated tag name and an extra attribute to be used.

Suggesting that users use legacy browsers or flip an "Unbreak the Web" pref are not elegant solutions.

> [...]

Comment 11 Simon Pieters 2014-05-05 13:55:35 UTC

What do you want to happen with:

<!doctype html><head><foo-bar>

<!doctype html><select><foo-bar>

<!doctype html><foo-bar><tr><td>

Comment 12 Simon Pieters 2014-05-05 14:02:41 UTC

Moreover, it is bad that the parsing result would be different in new browsers compared to old browsers.

Comment 13 caitp 2014-05-05 16:04:15 UTC

(In reply to Simon Pieters from comment #12)
> Moreover, it is bad that the parsing result would be different in new
> browsers compared to old browsers.

It's only bad if the old parsing behaviour makes sense at all, which is clearly not the case

Comment 14 caitp 2014-05-05 16:06:40 UTC

(In reply to Simon Pieters from comment #11)
> What do you want to happen with:
> 
> <!doctype html><head><foo-bar>
> 
> <!doctype html><select><foo-bar>
> 
> <!doctype html><foo-bar><tr><td>

The parser should never re-arrange this markup, as has been stated,

If the parser finds this markup, it should be left alone. The layout engine should decide how to render it, the parser should never be concerned with this.

In short, the result of parsing

> <!doctype html><head><foo-bar>
> 
> <!doctype html><select><foo-bar>
> 
> <!doctype html><foo-bar><tr><td>

should be

> <!doctype html><head><foo-bar>
> 
> <!doctype html><select><foo-bar>
> 
> <!doctype html><foo-bar><tr><td>

Comment 15 caitp 2014-05-05 16:35:44 UTC

(In reply to Simon Pieters from comment #10)
> (In reply to caitp from comment #9)
> > Re-arranging markup during parsing is fundamentally broken. This could be
> > fixed in a backwards compatible way, by refusing to re-arrange elements with
> > hyphens in the tag name.
> 
> I think you need to provide data that the number of Web pages that do rely
> on that is low enough to be able to make that change. (For instance, you
> could instrument the HTML parser in Blink to trigger a use counter when it
> happens, plus a use counter for all foster parentings, for comparison.)
> 

There is no formal data regarding this, and none will ever be provided by any party.

However, even without formal data, we know that nobody actually needs the parser to re-arrange their crap markup. It's not a helpful thing for the parser to do, so you'll never write an app which depends on this crazy behaviour. Instead, you'll simply find ways to avoid this crazy behaviour.

There will never be formal data on this, however it is quite obvious that nobody depends on it, and it should be fine to change.

> > However, this sucks, because it creates
> > inconsistencies in the behaviour of the parser, and adds complexity. So it
> > would take longer to implement, and would confuse authors. This isn't good.
> 
> Right.
> 
> > But, because re-arranging markup is fundamentally broken, there is no
> > compelling reason to continue to do it.
> 
> The reason is that the Web relies on it behaving that way.

No it does not. Nobody actually needs it to behave this way. Nobody anywhere is writing a webpage which will break if it stops behaving this way. It is incredibly unlikely for anyone to be negatively impacted by changing this. Nobody relies on it. You are unable to show examples of anyone relying on it, I'm unable to, and the magic 8-ball is unable to. No app relies on this, and if someone somewhere does, they can stop relying on badness. This would be a breaking change, sure, but anyone broken by it is "doin it wrong", so the fact that it's a breaking change is of no consequence whatsoever.

> > Would changing it possibly break
> > apps? It might, but if anyone actually cares about those apps/pages, they'll
> > probably take the time to fix their markup.
> 
> No, it doesn't work that way. What happens is that users switch to a
> competing browser where the app still works.

This is completely irrelevant. If a user decides to do this, then that is their prerogative. It is not something which will be of any importance, at all.


> > And for historical sites which
> > aren't being maintained, they can always be viewed with a legacy browser.
> 
> That doesn't work either. There are *lots* of sites that aren't being
> maintained (and new ones are created every day), but users don't have legacy
> browsers to view them. Nor should they since legacy browsers have security
> vulnerabilities that have been fixed in new browsers.

If you author a website which depends on broken, legacy behaviour, I am within my rites to view it with a broken, legacy browser in order to get the full, broken experience.

If no legacy browser is available, I am free to invent one, or free to invent a service which renders an old site exactly the way an old browser would have.

This is frankly not the concern of the spec, and it's not the concern of the future of the web. It simply does not matter.

> > Old parsing could be enabled via a pref in chromium or firefox, as kind of a
> > quirks mode.
> 
> Please, no. :-(

Sure, we're in agreement here ;)

> > There is no end to the number of ways this could be fixed more elegantly
> > than requiring an unrelated tag name and an extra attribute to be used.
> 
> Suggesting that users use legacy browsers or flip an "Unbreak the Web" pref
> are not elegant solutions.
> 
> > [...]


It's not an elegant solution to introduce a new "quirks mode", or to require a preference be set to enable sane behaviour (such as digging in and finding a DNT preference, for instance).

You're not going to successfully argue that it is desirable to do these, however we can agree that for people who really do need or want these (for whatever reason), they will probably find a way to get their needs met.

---

However, all of these points are pretty much irrelevant, because of the first two paragraphs.

It is simply NOT SANE for a browser to re-arrange markup for me, and I would simply be NOT SANE to depend on this behaviour.

Therefore, the sane thing to do, would be to simplify the spec and parser, to not ask browsers to re-arrange markup. Period.

If this negatively impacts anyone (and I doubt it will), frankly, too bad.

It is not an appropriate thing to do, and the fact that it was broken before does not make it appropriate to be broken forever.

We don't need the new web platform to be crippled by old brokenness. It is not something anyone needs or wants. People who do need or want this are free to get their beaks wet some other way.

Comment 16 Ian 'Hixie' Hickson 2014-05-05 21:29:39 UTC

I think you should use XML.

In text/html, we long ago lost this battle (like, in the earlier '90s).

Comment 17 caitp 2014-05-05 21:38:55 UTC

I'm sure you've heard, "the web is the platform". HTML5 is a big part of the "web platform". It's a platform which is able to reach vast numbers of people, and will be reaching a few billion more in the next decade.

What this means is, this is the platform we have. Instead of making it worse than it already is, we should introduce breaking changes to drop brokenness from the platform.

There are already other places in the WHATWG where this is likely to happen, such as dropping synchronous mode from XHR, among other things.

The goal here, is to simplify the platform, to make things behave consistently and sanely, and to not be held back by garbage invented by dinosaurs in the early 90s.

The fact is, you will be hard-pressed to find an example which anyone cares about which breaks in any meaningful way due to making a parser behave consistently with the rest of the browser.

The correct thing to do, is to fix this. The parser has no business whatsoever re-arranging content. None whatsoever. And fixing it is not going to hurt anybody's feelings.

Now, if it can be changed in the spec, it's not going to change in browsers tomorrow. And it may not be changed in a year. But fixing it in the spec means that browsers will no longer have to behave so stupidly just to meet the requirements imposed by an ancient, poorly thought out spec.

I double dare you to find somebody who will be upset about the browser no longer re-arranging content when parsing. I am not convinced you will find anyone. And if you do find someone, I am not convinced they will be unable to cope with it.

If you care about making the web platform suck less than it already does, then you know fully well that removing brokenness is necessary.

Comment 18 Ian 'Hixie' Hickson 2014-05-06 18:32:18 UTC

The reason the Web is so great is because we basically never break anything.

If you think this wouldn't break the Web, then prove it: see Simon's first paragraph in comment 10.

Comment 19 caitp 2014-05-06 19:46:09 UTC

(In reply to Ian 'Hixie' Hickson from comment #18)
> The reason the Web is so great is because we basically never break anything.

The result of "never breaking anything", is that the whole thing is perpetually broken.

I've proposed a number of ways to do this in a backwards compatible fashion in browsers.

I don't think catering to a handful of (broken) websites that nobody cares about is really a concern, but clearly you guys are in disagreement about that.

The end result of this change is better for everybody, and means that people can get away with caring less about mistakes made in the 80s and 90s, and can get on with their lives creating content.

So how about this, what if we suggest changing the parser in a backwards-compatible way (such as, using the new behaviour only if <!DOCTYPE html> is used, or only rearranging content which does not contain a hyphen in the element name, as no native HTML element does)

Both of these should satisfactorily "not break the web more", and both of them will pave the way for fixing some of the brokenness of the web in a few years.

So what's the harm in that? Do you want the platform to suck forever, or do you want to make an effort to improve it for both consumers AND authors/content providers?

Comment 20 Simon Pieters 2014-05-06 23:24:07 UTC

(In reply to caitp from comment #15)
> There is no formal data regarding this, and none will ever be provided by
> any party.

https://gist.github.com/zcorpan/c330049466a705f714b7

Comment 21 Simon Pieters 2014-05-06 23:27:51 UTC

(In reply to caitp from comment #13)
> (In reply to Simon Pieters from comment #12)
> > Moreover, it is bad that the parsing result would be different in new
> > browsers compared to old browsers.
> 
> It's only bad if the old parsing behaviour makes sense at all, which is
> clearly not the case

No, it's still bad even if the old behavior is insane.

If you want to be able to do, say, <table><foo-bar></foo-bar></table>, in new browsers foo-bar would be a child of table, but in old browsers it would be a previous sibling of table. That's bad.

Comment 22 Simon Pieters 2014-05-07 06:53:46 UTC

(In reply to caitp from comment #14)
> (In reply to Simon Pieters from comment #11)
> > What do you want to happen with:
> > 
> > <!doctype html><head><foo-bar>
> > 
> > <!doctype html><select><foo-bar>
> > 
> > <!doctype html><foo-bar><tr><td>
> 
> The parser should never re-arrange this markup, as has been stated,
> 
> If the parser finds this markup, it should be left alone. The layout engine
> should decide how to render it, the parser should never be concerned with
> this.

OK, so you're not satisfied with only "don't foster parent custom elements". You want much more, but I don't know where your slope ends and I don't know what you want exactly.

Would you be happy if browsers were to parse the whole document as XML when they find a custom element?

Comment 23 Simon Pieters 2014-05-07 06:59:27 UTC

(In reply to caitp from comment #15)
> > No, it doesn't work that way. What happens is that users switch to a
> > competing browser where the app still works.
> 
> This is completely irrelevant. If a user decides to do this, then that is
> their prerogative. It is not something which will be of any importance, at
> all.

It is relevant because it gives browser vendors huge incentive to not break sites/apps.

Comment 24 caitp 2014-05-07 11:56:18 UTC

(In reply to Simon Pieters from comment #23)
> (In reply to caitp from comment #15)
> > > No, it doesn't work that way. What happens is that users switch to a
> > > competing browser where the app still works.
> > 
> > This is completely irrelevant. If a user decides to do this, then that is
> > their prerogative. It is not something which will be of any importance, at
> > all.
> 
> It is relevant because it gives browser vendors huge incentive to not break
> sites/apps.

(In reply to Simon Pieters from comment #22)
> (In reply to caitp from comment #14)
> > (In reply to Simon Pieters from comment #11)
> > > What do you want to happen with:
> > > 
> > > <!doctype html><head><foo-bar>
> > > 
> > > <!doctype html><select><foo-bar>
> > > 
> > > <!doctype html><foo-bar><tr><td>
> > 
> > The parser should never re-arrange this markup, as has been stated,
> > 
> > If the parser finds this markup, it should be left alone. The layout engine
> > should decide how to render it, the parser should never be concerned with
> > this.
> 
> OK, so you're not satisfied with only "don't foster parent custom elements".
> You want much more, but I don't know where your slope ends and I don't know
> what you want exactly.

In an ideal world, the parser would not be re-arranging content ever, because that is a stupid thing for a parser to do.

However, this bug is not really about that, it is specifically about re-arranging the content of tables.

> Would you be happy if browsers were to parse the whole document as XML when
> they find a custom element?

No, I would be happy if we could agree that A) nobody actually wants the browser to re-arrange their content, why would you ever want this, ever? and B) sites which would be unusable because of not re-arranging content is probably a myth, or very insignificant.

(In reply to Simon Pieters from comment #23)
> (In reply to caitp from comment #15)
> > > No, it doesn't work that way. What happens is that users switch to a
> > > competing browser where the app still works.
> > 
> > This is completely irrelevant. If a user decides to do this, then that is
> > their prerogative. It is not something which will be of any importance, at
> > all.
> 
> It is relevant because it gives browser vendors huge incentive to not break
> sites/apps.

Sure, I can understand not wanting to break sites.

But how broken would these sites be? Would they be broken at all if browsers implemented this in a backwards compatible fashion?

This isn't a hopeless scenario, it is possible to have a future web which isn't crippled by (all of the numerous) mistakes of old.

Comment 25 Simon Pieters 2014-05-07 12:20:49 UTC

(In reply to caitp from comment #24)
> In an ideal world, the parser would not be re-arranging content ever,
> because that is a stupid thing for a parser to do.
> 
> However, this bug is not really about that, it is specifically about
> re-arranging the content of tables.

It would be pointless to change a one thing if it doesn't address the things you want to do. For instance, <foo-bar><tr> is a different case even when you don't foster parent the foo-bar.

> > Would you be happy if browsers were to parse the whole document as XML when
> > they find a custom element?
> 
> No,

Why not?

> I would be happy if we could agree that A) nobody actually wants the
> browser to re-arrange their content, why would you ever want this, ever? and

Sure.

> B) sites which would be unusable because of not re-arranging content is
> probably a myth, or very insignificant.

No. I've shown that this is false in comment 20.

> Sure, I can understand not wanting to break sites.

Good. Now we're getting somewhere.
 
> But how broken would these sites be?

Enough that browsers would refuse to make the change, I believe.

> Would they be broken at all if browsers
> implemented this in a backwards compatible fashion?

The number of sites using custom elements today is pretty low, so there is still some leeway. However, if we want people to start using custom elements now, it seems like a bad idea to shake the foundation and do radical changes to the HTML parser. People will be authoring pages with custom elements and only test in a legacy browser that has the old HTML parsing behavior for some period of time, which will then be "broken" in new browsers with the different parsing.

> This isn't a hopeless scenario, it is possible to have a future web which
> isn't crippled by (all of the numerous) mistakes of old.

You can't get rid of the old mistakes completely so long as the Web relies on them. If you start special-casing, you make the whole thing more complex, which is a source of bugs.

Comment 26 caitp 2014-05-07 12:23:11 UTC

(In reply to Simon Pieters from comment #25)
> (In reply to caitp from comment #24)
> > B) sites which would be unusable because of not re-arranging content is
> > probably a myth, or very insignificant.
> 
> No. I've shown that this is false in comment 20.

What you've demonstrated is that there are sites which are having their content re-arranged (whether or not these are significant sites is a different question).

This does NOT demonstrate that the sites would be unusable if their content were not re-arranged, and it certainly does NOT demonstrate that they would be unusable if initial implementations of a change to the parsing spec were backwards compatible (such as not re-arranging elements whose tag names do not contain hyphens).

Comment 27 caitp 2014-05-07 12:24:09 UTC

(In reply to caitp from comment #26)
> (such as not re-arranging elements whose tag names do
> not contain hyphens).

Sorry, I mean "not rearranging only elements whose tags names DO contain hyphens"

Comment 28 caitp 2014-05-07 12:30:54 UTC

(In reply to Simon Pieters from comment #25)
> (In reply to caitp from comment #24)
> > This isn't a hopeless scenario, it is possible to have a future web which
> > isn't crippled by (all of the numerous) mistakes of old.
> 
> You can't get rid of the old mistakes completely so long as the Web relies
> on them. If you start special-casing, you make the whole thing more complex,
> which is a source of bugs.

Yes, I agree that special casing sucks, and I am not saying Gecko and Blink and Webkit need to ship these tomorrow.

But somewhere along the way, the foot really needs to be put down. Eggs may be broken, but it doesn't really matter, and you get a tasty omlette as a result.

Just imagine, commercial websites might even have to hire some designers on contract to fix their crappy markup in the future --- committing to fixing brokenness is a way of being a job creator!

Maybe HTML should just be thrown out entirely and replaced with something else, maybe that something else is XHTML, but probably not. But somewhere down the road, the nonsense heuristics of the HTML parser need to be taken out of the equation, because it is a complete joke.

Comment 29 Simon Pieters 2014-05-07 12:47:51 UTC

(In reply to caitp from comment #26)
> What you've demonstrated is that there are sites which are having their
> content re-arranged

Yes.

> (whether or not these are significant sites is a
> different question).

The sites are from the top of the Alexa 1m sites list. It's only front pages, though.

> This does NOT demonstrate that the sites would be unusable if their content
> were not re-arranged,

That is true. My educated guess is that at most half would be unaffected, and at least 10% would become completely unusable, and the rest would have noticeable brokenness but would still be usable. But the burden of proof does not lie on me; I'm not the one asking for a change.

> and it certainly does NOT demonstrate that they would
> be unusable if initial implementations of a change to the parsing spec were
> backwards compatible (such as not re-arranging elements whose tag names do
> not contain hyphens).

Right.

Comment 30 caitp 2014-05-07 13:48:35 UTC

I don't have a burden of proof, because my position is that breaking these websites in some way is really not that important.

For a period where they are broken in browser A, people will view them in browser B, or will stop viewing them at all (depending on the site). I certainly am not going to cry over some shady porn website losing customers to a competing site with better markup, and the consumers aren't going to cry about it. And it's not even going to seriously affect browser marketshare, (as if browser marketshare actually meant anything in the first place).

I am not convinced that it's worth worrying about these sloppy sites at all, and the sooner they get broken via breaking change, the better.

Do you understand the concern here? If we keep adding hack after hack after hack onto the existing brokenness, we have essentially a Jenga tower on a fault line. Geographically, this is shaky, and piling on more hacks to avoid breaking the already shaky foundation is just going to make things worse.

At some point, you have to either let the Jenga tower fall apart, or build a solid foundation and start fixing each floor all the way up, so that you have something sturdy and predictable which people can build on without going insane.

Yes, these points mostly benefit authors and developers. But if authors and developers can start to have a better experience, then they'll pass it on to the consumers of their work. (This is not Reaganomics, it's simply giving engineers better tools so that they can build bridges which don't collapse or become too costly to maintain).

There are objectively good reasons to drop the broken parser heuristics, and the importance of "saving" a few broken websites (which can easily fix themselves) is just not a concern worth having. Like, it just isn't. At all. This isn't making the web "better", it's having a negative impact on specs being drafted today, and will have a negative impact on specs being drafted in the future.

It may not happen today, it may not happen in 5 years, but it's going to have to happen sooner or later, and the sooner, the better.

Comment 31 Ian 'Hixie' Hickson 2014-05-07 18:07:51 UTC

Browser vendors have made it clear that they will _ignore the spec_ if it causes them to make pages not work well (where "not work well" includes minor changes to white space, apparently, let alone moving content around).

I do not want to write a spec that will be ignored.

I believe, based on the data in comment 20 and past experience, that the proposed change would cause a significant number of pages to get different renderings.

I believe this would result in the spec being ignored by browser vendors.

Therefore, I am not willing to make that change to the spec.

To convince me to make the change to the spec, you have to convince me that this chain of argument is wrong. For example, convince me that browser vendors are now willing to break pages, or that the data is wrong, or that it's ok if the spec gets ignored.


(In reply to caitp from comment #30)
> I don't have a burden of proof, because my position is that breaking these
> websites in some way is really not that important.

If you want to convince me to change the spec, then what matters is my position, not your position. The burden of proof is on you if you want to convince me.


> For a period where they are broken in browser A, people will view them in
> browser B, or will stop viewing them at all (depending on the site).

In that period of time, browser B will announce that they'll never make the change, and browser A will revert the change so that they stop losing users.


> (as if browser marketshare actually meant anything in the first place).

It means a lot to the people writing the browsers. Those are the people, ultimately, that I have to have on board if the spec is to be meaningful at all. 


> Do you understand the concern here? If we keep adding hack after hack after
> hack onto the existing brokenness, we have essentially a Jenga tower on a
> fault line. Geographically, this is shaky, and piling on more hacks to avoid
> breaking the already shaky foundation is just going to make things worse.

We already have a "Jenga tower on a fault line". But we have a _lot_ of duct tape, and we are very good at deploying it. The tower is ugly, but it's not falling over.


> At some point, you have to either let the Jenga tower fall apart, or build a
> solid foundation and start fixing each floor all the way up, so that you
> have something sturdy and predictable which people can build on without
> going insane.

Yes. Ultimately we'll have to replace the Web.

   https://plus.google.com/+IanHickson/posts/SiLdNL9MsFw


> There are objectively good reasons to drop the broken parser heuristics

Nobody is arguing that there aren't. If we could, we'd drop them immediately. All we're saying is that there are even stronger reasons not to, to wit, that if we did, the spec would just be ignored.

Comment 32 caitp 2014-05-07 18:45:46 UTC

Yes, I understand what you guys have been saying about browser vendors, I fully understand that.

I think what the data actually shows is that making the change would not hurt a lot of people, and the people it would hurt, wouldn't be hurt very badly. I think vendors could be made to agree that the status quo for the platform is not very good, despite the duct tape.

I think that organizations like the WHATWG are useful for allowing voices from browser vendors to be heard, and to chime in on it. So here's something browser vendors can chime in on:

Even if breaking changes were introduced to the web tomorrow, even if it was only one browser, here's what would happen to their userbase:

Nothing.

Their userbase isn't going to disappear. People don't exclusively use a single browser. People don't exclusively view documents which would be rendered unusable by such a breaking change (and largely, such documents would be a very tiny minority). Jane Doe might decide to use a different browser for a few minutes. That's fine. Ad revenue still gets collected, license fees are still being paid. There is just a little bit more pressure for holders of poorly designed apps to put in a little time or coin to fix them.

There are zillions of features on the web, and little breaking changes would not significantly affect the number of things which still work unaffected, and would benefit future authors greatly (who will then, in turn, produce better, less buggy content). It's not going to hurt bottom line in any measurable way, and the web will still be there the following day.

Now, this is my perspective. I see authors complain about this nonsense. I've been one of those authors confused by the crazy behaviour of the HTML parser's insertion modes.

So, to any executives and CTOs reading this posting, obviously you can assess the cost of such a breaking change to your bottom lines on your own, and you may even disagree with everything I'm saying, that's fine.

But if you disagree, then please propose something that we can do to fix this. I think short-term backwards-compatible changes for a few years would be acceptable until the old deprecated behaviour can be dropped completely. I think a strategy like having a list of sites known to depend on these broken features could also be okay (but not ideal, for obvious reasons).

I think we can come up with something which would both fix (one of) the problem(s), and satisfy vendors.

I accept that it is not even the most broken feature on the web right now, but it's definitely one of the easier ones to fix. I also accept that there are good reasons why vendors would not wish to fix it, but I think it can be done in a way that is virtually harmless to everyone.

There are solutions to this which make everyone happy.

Comment 33 Ian 'Hixie' Hickson 2014-05-09 17:40:30 UTC

If you want to convince the browser vendors, you'll have to approach them directly, I don't think they'll be reading this bug.

Comment 34 Ian 'Hixie' Hickson 2014-09-29 19:36:41 UTC

I'm closing this due to lack of browser vendor interest. Please don't hesitate to reopen it if you do get interest drummed up.