Why HTML 5 Specification Matters?

Author(s) and publish date

By:

Karl Dubost

Published:

6 July 2007

This is a simple story. The story of an HTML bug. Like every stories, it could start with… Once upon a time, there was a bug.

The bug and its consequences

A known HTML page contains a similar piece of code:

<div style="display:none">
  <table>
    <div>
      <table>
      </table>
    </div>
</div>

The HTML code is invalid. It means that a browser which "reads" this page has to recover the errors and recreate something logical to simply display it to the user in the best case, to apply javascript and CSS code in the worst case. The browser implementer is then facing a question. How do I recreate the structure of the content? How do I catch the error and make it something usable?

Some browsers will find a recovery strategy, but not necessary the same. Some browsers will fail on the page. It means in the end two things.

Users with unpredictable results, then usability problems and erosion of trust.
Interoperability issues for browser implementers, and then a risk of losing market share (It is working with browser A and not browser B.)

How to repair?

HTML 4.01 specification is not that much help in this case. It doesn't define a precise error recovery mechanism for invalid documents. So the browser implementer has to create its own strategy with the consequences we have just talked about.

HTML 5.0 Editor's draft defines a very precise mechanism for recovering invalid markup. As we can see in the comment about the bug, Dave Hyatt says: Easy, the html5 spec covers this.

The browser implementer had clear instructions for this type, was able to implement it, and then to create an interoperable recovery system for this type of mistake. The Web users finally were able to access the Web site without troubles and in the same way than with other browsers. HTML 5 Specification matters because it creates more interoperability when recovering from errors.

Related RSS feed

Subscribe to our blog feed

Comments (44)

Vlad Alexander - 6 July 2007 at 13:00:31 UTC
Hi Karl,
Unfortunately, this is a very one-sided perspective. The consequences of silent error handling in this case are:
Invalid HTML markup is still online and not fixed.
This does nothing to make the Web page render correctly in current / legacy browsers.
The Web page author learned nothing. He/she remains ignorant of the mistake and will continue to make similar mistakes on other pages.
thacker - 6 July 2007 at 15:05:05 UTC

Dubost--
Thank you very much for the hard example that supports your position on the need for HTML 5.0 to efficiently render "hobby content" within browsers. That one example, for what it is worth, sold me on the need to further the development of the 5.0 spec.
I still, adamantly, support the basis for Holzschlag's call to let things catch up to full implementation of existing specs before attempts are made to implement any parts of the proposed 5.0 spec.
Whether, I adopt HTML 5.0 -- that is too far down the road. XHTML 1.1 works for me, my clients and the clients' markets and customers. [Does it very well and without any "angry" e-Mails.]
Again, thanks for that hardcore example.
Karl Dubost - 9 July 2007 at 04:53:31 UTC
Hi Vlad,
I do not disagree with you. But let's be practical in a business sense.
Invalid markup is still online, and most of the time, it will stay online for a long time. Bear with me, I'm all for fixing the markup, but unfortunately I do not see any practical solutions to do that.
What would be your practical proposal for "no silent recovery"?
Legacy browsers. Indeed that is a very good point to keep in mind. Agreed with you.
The Web page author will not learn anything with a browser. Or at least in a common web browser. The issue here is that Web authors should use appropriate tools. Either
appropriate authoring tools
appropriate quality checking in the development process
appropriate checking tools of the code.
Browsers are not tools to check your work. There are the very final part step to see the rendering. Do not trust browsers. They are meant to be used by everyone.
Vlad Alexander - 9 July 2007 at 13:03:13 UTC

Hi Karl,

What would be your practical proposal for "no silent recovery"?
Let's back up and look at the big picture. The real discussion is about the future of the Web. Let's take W3C's vision of the future - the Semantic Web. If you can honestly tell me that a Semantic Web can be successfully built on top of invalid HTML, then I will take back my objections to your original post.

However, I suspect most future W3C technologies will not work well in an invalid HTML world. If this is the case, then HTML 5 is a diversion from building the future Web.
So how do you make valid markup? From our experience as an authoring tool vendor, "active error feedback" is the only way to ensure content is authored according to specification.
What is the practical way forward towards a Web with valid markup? You need a new spec that is not backwards compatible. Specs don't need to be backwards compatible. It is user-agents that need to be backwards compatible by supporting multiple specs.
thacker - 9 July 2007 at 22:29:29 UTC

Of course semantic Internet content can be built. What logical reasons are there that the entire Web needs to be homogenous -- it doesn't need nor should it be.
The BMW example, if Bavarian Motor Works wishes to take advantage of the benefits of semantic content, avail itself to future technologies and communicate with the broadest spectrum of its customer base, BMW will have to place the same level of quality of engineering into their Internet communication as they do into their product line. The market place will decide that .. not you, I nor the W3C.
The main thrust of HTML 5 is to manage non-standard compliant markup, as I understand it, albeit with some extra bells and whistles tossed in.
To believe that the Web is broken and by some magical formula and divine right it will get 'fixed' is a pretty big stretch and egotistical view for anyone who believes they or a collective group has that power or capability.
If HTML 5, ultimately, eases the way for interoperability of non-standard content so that user agents, technologies, whatever, can re-focus efforts on standards compliance, communication tools and technologies built around and upon that compliance and thus move the Web field forward, I am all for it.
thacker - 11 July 2007 at 19:29:35 UTC

No intention to hog up this particular post--
One significant concern was the possibility that a new HTML 5.0 spec would detract or become a defacto standard to XHTML and thus stall development.
The recent interview between Berners-Lee and IDG Now put that concern to rest:
http://www.itworld.com/Tech/4535/070709future/pfindex.html
Dark Phoenix - 12 July 2007 at 20:41:12 UTC

Actually, when it comes to error checking, I am of the opinion that the browser should record errors SOMEWHERE. Maybe browsers ought to have a mode where every HTML error gets displayed to the user (preferably which can be turned off, so people don't start complaining about being hassled)?
Martin Hassman - 13 July 2007 at 05:04:39 UTC

html5lib logs parsing errors, see at the bottom of http://james.html5.org/cgi-bin/parsetree/parsetree.py?uri=http%3A%2F%2Fbugs.webkit.org%2Fattachment.cgi%3Fid%3D14511
If browsers start to log these errors like they do with JavaScript and CSS errors, it will be really great.
Michael Daines - 15 July 2007 at 22:36:06 UTC

Maybe this is naive, but isn't it possible that we can have the semantic web without worrying so much about whether documents are valid? For example, given a user agent that uses or is informed in some way by (let's say) the HTML5 spec, how badly would you have to mess up your hCalendar to make it too broken or ambiguous to interpret or use?
Stuart Jones - 23 July 2007 at 01:30:15 UTC

With regards to feedback of errors or recovery of errors...
Developers should have feedback of errors in their code - but that does not necessarily mean that you can't have browsers that are able to produce consistent content event if there are some errors in the underlying code.
All it means is that there needs to be a differentiation between your average user's browser and a developer's browser.
This differentiation is already starting to come about with some of the developer plugins that are available for some browsers.
Vijayakumar Subburaj - 24 July 2007 at 05:55:49 UTC

Hi Karl,
Recovering from errors?! First, why allowing errors?!
Why not just say the document is invalid, like xml?!
Karl Dubost - 24 July 2007 at 06:30:31 UTC
Hi,
you said why recovering and not saying errors right away. There are two reasons: browser market and companies market.
companies market:
An advertisement in a magazine gives an URI to your commercial web site. Your web site is dependent on many third parties softwares and employees creating content. In one part of this software, someone has to create small chunks of HTML for editing the Web site. The person is tired, publishes the content quickly before the week-end but unfortunately it is invalid. The consumer tries to access the page without success, it shows a big error message instead. The consumer is going to the site of your competitor.
Browsers market:
The company Opeziri makes a very strict browsers which doesn't accept any errors in the markup. Each page which is bogus is not displayed and has a big red message saying error. Your grand father is using this Opeziri browser but he's getting tired of using it. Something around 95% of the Web is not viewable because everything is invalid, though he doesn't know that. He just sees the error message. Then there is a competitor Moslacker which accepts all pages. Moslacker starts to get more marketshare than Opeziri, which sees its business model going down.
Gustaf Liljegren - 24 July 2007 at 09:50:40 UTC

First, I think it's a great idea to standardize how browsers should behave when coming upon broken documents. Second, I think we all ought to strive for a well-formed web, because well-formed code is easier to read, for humans and machines alike.
It's obvious that browsers can't parse HTML as strict as we do other XML documents. However, I don't agree that browsers are therefore not good for teaching users how to write well-formed and valid documents. Browsers could assist users to a great extent on this task. Here's how:
Whenever the browser encounters an error in the syntax or grammar, make a small icon appear on the status bar. Hover your mouse pointer over it, and it says "This page is not valid. Click to validate this page". Click and you find yourself at W3C's then improved validator, which tells you not only what is wrong, but why it's wrong and what ought to be done about it.
Whenever this icon appears on a public site, it puts the expert author to shame with his fellows. It lets the curious hobby user learn from other's mistakes. And most importantly: it lets your grand father ignore parsing errors. It doesn't prevent the page from rendering, along the lines of HTML 5 error handling. If you are the author, you get valuable feedback from the end user environment.
It is my opinion that this kind of feedback ought to be a SHOULD requirement in the HTML 5 spec.
nomad - 25 July 2007 at 07:29:01 UTC

Hi Karl,
Re your first case (publish invalid document, customer goes to competitor), that's exactly what I'd like to see. This reason puts market pressure on businesses to publish only valid documents and to check their validity, and that is a good thing.
And if you publish invalid document you risk having your document inaccessible anyway, because not every error is recoverable.
By specifying unified error recovery mechanism you give authors more leeway to ignore their error, practically provoking them to use invalid HTML.
Sean Farrell - 25 July 2007 at 08:30:40 UTC

I am reading over and over and over the prase "that your grandmother / grandfather should be able to publish in the web" as a excuse to have invalid HTML hanging around. This is total nonsese!
You can differenciate two type of people, coder and non-coders. Coder wither write the HTML or tools (scripts) that output HTML and non-coders only use tools that generate HTML. A non-coder will never write a line (tag) of HTML, nor understand it. He/She simple does not have to, since there are tools that can be used. (If the tool does not outbut 100% valid HTML it is broken...)
The reason why 95% of the web is invalid (not broken) ist because browsers allowed showing broken pages. The creators of tools simple did not care to check if their tool wrote 100% valid HTML.
The other problem was that in the erly spect the emphasis was never put on the actual way to display HTML and so browsers differed slighlt. Additionally browser extentions did not make life easyer.
My proposal to get people to comply in the furure is to display a red bar at the top of the page if it is illformed, just like pop-up blockers do. The content is still viewable, but the authors take a little blame. There is a good chance that the manager that browses the corporate site gos to the web develeoper and aks "Why is our site non conforment?".
Vlad Alexander - 25 July 2007 at 13:57:12 UTC

Can we give names to different error handling options so that it's easier to discuss?
Gustaf, let's call your suggestion "passive error feedback". Let's call the XML approach "active error feedback". And the HTML approach "no error feedback".
Karl, nobody is suggesting that browsers should use "active error feedback" for existing specs like HTML 4.x, but only for new specs. Web site developers can to use the new spec or the old spec. If all browser vendors agree to respect the rules of the new spec, then market share is not affected.
David - 25 July 2007 at 16:43:29 UTC

Gustaf is correct that a browser should notify the user in some manner that it is "fixing" the document in order to have it display "properly". I use quotes because it is impossible to correctly guess the intended results 100% of the time. I also know many web developers that use a browser as their primary means of testing their code. Their main concern is how the page looks in browsers people are actually using, not whether it validates. Also, checking that it validates is an additional step, one which might break the layout when the HTML is modified to comply.
This brings me to my ultimate point, which is a bit off-topic: Why is the W3C developing yet another specification when the available browsers fail to support the current ones? (source: http://www.webdevout.net/browser-support-summary ) Apparently I live in a fantasy world, because I would like the ability to create a web page that both validates and looks how I intended in browser X and know that everyone else will see the same thing I see as long as they are using a browser that fully supports the correct specifications. This may be an unreasonable request, but why doesn't the W3C spend their time and resources to create an opensource reference browser that web developers can use to test page layout and browser developers can use as a guide for fixing their current browsers?
Tony E - 26 July 2007 at 21:17:55 UTC

I'm personally a bit stuck on the fence on whether a browser should, or should not fix or auto-repair website code. I am leaning on the thought that browsers should not, however, simply for the fact that if browsers allow for broken code, why fix it? I recall old Netscape 2 vs. Internet Explorer debates when IE auto-repaired tables and Netscape Communicator did not, for example. People designing for Netscape & similar browsers knew it worked properly, "for the most part".
My train of thought on this issue is like that of purchasing a vehicle. If you buy a brand new car, and the door doesn't open. Do you send it back? or just roll down the window with your neat remote window open/close mechanism, and climb in? Allowing broken html to exist is the same to me, as climbing in the window. The manufacturer has no clue they just sold a defective product, because you're driving away with a smile on your face.
I fail to see how that is acceptable. Let Microsoft do it's own thing, they will anyway (in my humble opinion)... but browsers like Firefox/Mozilla, Netscape, Opera, Safari, they seem to be willing to work with standards, why not have them choke on errors? Then people will build better code. Perhaps browser developers can opt to add an option "Enable Code/DOM debugging?" so it will popup a window describing any code that choked or is incorrect?
Just an idea and train of thought of one small time web developer. And I also do believe a good developer runs their code through Tidy or some other code checker, but obviously Joe Shmoe 14yo just got hired by Jim Bob's Used Car Sales, and isn't a pro.
thacker - 27 July 2007 at 16:10:53 UTC

I am not so sure that, while handy for developers for example, that any additional burdens should be placed upon browsers and their development to notify a Web user of invalid code.
A browser's primary function is to serve the user of the Internet. There are enough tools designed and in-place for the developer to use in generating standards compliant code.
What will drive and continue to drive increased production of standard compliance is education at the developer level and the market and economic pressures that are exerted upon the business and how it impacts their performance and relationships with their customers.
Browser developers need to focus on implementation of standards, evolution of standards and browser based technologies and upon security for the purpose of delivering Web based communication to the end user. That in and of itself is more than enough to keep their plate full.
However, for the hobbyist and for businesses, such as BMW which was referenced by Dubost, it is the responsibility and function, I believe, of the CMS and development applications to serve the function of notifying the developer of invalid code.
David--
You came close to answering your own question of why another spec:

checking that it validates is an additional step, one which might break the layout when the HTML is modified to comply.

The primary function of the HTML 5 spec is to adapt to the ways that the vast majority of content is being coded and allow it to comply to a standard.
In theory, and hopefully practicality, this will ease the way for browsers to render, for example, tag soup, consistently.
There are a lot of hurdles, as history as pointed out, to achieve and implement any standard.
You make a very valid point, along with Molly Holzschlag, that full implementation of all existing standards should and needs to happen, quickly.
eekee - 28 July 2007 at 14:22:37 UTC

Lot of wrangling over details in these comments, with the bulk of opinion seeming to go to something that I believe would be actively harmful to individuals trying to earn a living. Do you guys who want the web browser to flag bad pages think it's alright to publicly show a working person mistakes to the world? Granted, sloppy work is no good thing, but everyone makes mistakes, and in correcting those mistakes one reaches a point of diminishing returns:
To the individual site creator, the effort put in matters, and I don't want to have my judgment of an individual colored by his or her html skill!
Likewise with the company, good html coders I'm sure cost money, and I do not want to have my judgment of a company influenced by how good a coder –how good a shop window fitter – they could afford!
I also, as my last and least point, don't want my browsing being bothered by a little icon in the corner saying "Oh dear, this web page author ade a mistake!" I do, however, want to be able to enable just such an icon for checking my own pages. It would be a handy browser feature but it would be a browser feature, not something that belongs in the standard. (Of course, if one wishes to see the validity of every page, one could simply leave that feature enabled.)
I read the original post as something with a much, much nobler goal than pointing fingers. HTML 5 will require that all browsers respond to any given error in the same way. This, if implemented properly, removes an artificial and meaningless source of differences between IE, Gecko, Opera, khtml, and all other html renderers out there.
Gustaf Liljegren - 31 July 2007 at 17:59:39 UTC

Even though most of us want a more well-formed web, it appears we are divided on how to get there. One extreme is to let browsers silently fix all markup errors, like today. The other extreme is to make it a requirement for conforming browsers to refuse to show pages with syntax/grammar errors. The first doesn't lead to a more well-formed web, which is our ultimate goal. The second is not realistic, because it breaks the web.
Anything that breaks the web (i.e. makes much of today's web unreadable) will surely not be implemented, and for a good reason. We can't have a transitional period where most websites won't work in the latest browser. The first priority of HTML 5 must be to accomodate the masses, and that means making old tagsoup show up pretty in next-gen browsers.
However, browsers could still promote well-formedness in discrete ways. The icon on the status bar I suggested earlier would have this effect. People wouldn't see it unless they looked for it, but it would have a impact on webmasters. The specification doesn't need to be too specific on how to notify the user of validation errors, but it could state that a browser by default SHOULD notify the user, and give an option to validate the page.
Vlad Alexander - 1 August 2007 at 03:05:12 UTC

Gustaf, please do not misrepresent the side that supports "active error feedback". NOBODY is suggesting that Web browsers should stop rendering HTML 4.x/XHTML 1.x Web pages because they are invalid. Supporters of "active error feedback" suggest this approach only for NEW specs.
JP Fiset - 2 August 2007 at 13:41:53 UTC

I find myself leaning towards the comments that allow rendering of bad markup. Users want to see content that companies are offering. Creating tools that hinder this relation is not good.
Besides, if you have worked long enough in private companies, you can envisage the examples given above turning into situations where an expert author is coming back to the office on a long week end to fix something broken by a novice author. Please, display the page.
I also want to see a valid web. The discussion so far is focused on letting the user know that a page is broken. The user can not do anything about it and I suspect he/she will not take the time to send an e-mail.
How about letting the originating site know that the page is broken? Could there be any mechanism put in place to help with that?
Cecil Ward - 2 August 2007 at 16:02:06 UTC

As we all know, the reason that this situation ever came about is that early web browsers were unreasonably forgiving and these browsers were the only testing tools that non-professional web authors were exposed to. These browsers failed in their duty to web authors.
We should not keep becoming distracted by the issue of what happens when end users view invalid pages in their browsers. What is important to focus on when considering how things must change is sorting out the experience that web authors at home (say) have when they are writing HTML and using their browsers as test tools.
It is time to draw a line and bring the era of pervasive broken markup to an end and this is to be achieved by a combination of measures.
One strategy towards this is to pressure browser manufacturers and vendors of other tools to include accurate and updatable validation tools in their products. This is an idea worth pursuing, but this strategy on its own will not succeed because only the more knowledgeable web authors will know to obtain such validators or know to turn them on.
Rather, I believe that it is vital to proceed as follows
(i) W3C to version-mark HTML5 now, so that forthcoming browsers can recognise it as such
(ii) W3C to require browser manufacturers to fail with an error (just like XML) if they see an invaliud pages bearing that version marker.
(iii) W3C to act quickly to get browser manufacturers to sign up for this, and fast-track the release of a small spec covering this issue well ahead of the HTML5 spec timescale. Such a spec would be small, and a technique similar to DOCTYPE-switching would serve. Aim to get it into Firefox 3 and IE8 without fail.
There is no need to worry about any bad impact on end-users' experience in this respect, because in the new scenario the web author would never have been able to unknowingly release an HTML5-version-marked page that was invalud. Failing to signal errors to home web authors was never a way of helping them, it was actually a behaviour that was letting them down, and it’s time to acknowledge that, and put an end to the old era.
Cecil Ward.
Olivier Wehner - 5 August 2007 at 02:22:16 UTC

There is something spooky about these "should browsers render tag soup" discussions: Yes, this would make the live of everyone easyer. No, browser vendors will never stick to that rule. Why should they?
The people who write the spec do not write the browser code, the spec is not law and the browser market is too competitive to give purity, purism or beauty a chance.
mina86 - 5 August 2007 at 18:39:07 UTC

So what, now invalid markup will have to be rendered in given way? Why is it called "invalid" then? If specification says how to render it why should anyone care to produce valid markup?
This only impose new restrictions on the user agents which may only do harm, ie. bloat them, make them bigger, slower and introduce new bugs.
After pointing an invalid markup to a web developer, s/he could say: "According to HTML5 this has to be rendered the way X Web Browser renders it so yout Y Web Browser is wrong and I'm right."
I don't mean to offend anyone but putting error recovery into the spec is an absurd and the second most stupid thing that happened in HTML after the FONT tag.
Francis - 7 August 2007 at 23:27:54 UTC

I recently joined a company, working with all MS .NET developers. I was shocked to discover that none of them known that HTML/CSS Specifications existed, nor of XML and XML Schema. OMG they don't even know what W3C is.
Effort should be given to promote W3C Standards.
SuperKoko - 15 August 2007 at 10:02:22 UTC
Karl, nobody is suggesting that browsers should use "active error feedback" for existing specs like HTML 4.x, but only for new specs. Web site developers can to use the new spec or the old spec. If all browser vendors agree to respect the rules of the new spec, then market share is not affected.

Unfortunately, you forget one thing: New specs will be recognized by new browsers, through the DOCTYPE declaration, and new browsers will be able to use "active error feedback", but existing browsers (e.g. IE6) ignores the unrecognized DOCTYPE, and will read the new HTML as if it were HTML 4.x, and will actually render something viewable, because new specs usually are quite "backward-compatible".
Bad web designers who use IE6 to test their pages may use the new DOCTYPE, and think that "it works", while new browsers won't display it.
Users will be frustated by behavior of new browsers and claim they're buggy, and argue that IE6 does a better job, because: "It displays more pages".
Users will get back to IE6.
Compiler vendors learn the lesson, and their next browser provides no error checking or passive error checking.
That happened with XHTML (though, XHTML served as application/xhtml+xml gets active error checking with some browsers). That won't happen with next specs of HTML, because browser vendors won't even try to provide active error checking: They've already learned the lesson.

Active error checking can only be put in a spec that isn't backward compatible (i.e. that current browsers don't display). XHTML served as application/xhtml+xml is one example: This MIME type is not recognized by IE6 which doesn't render it, but just asks "Do you want to save it to disk?".
Passive error checking is a great idea, in my opinion. I wouldn't go as far as putting it as a SHOULD in the HTML spec. I find it more sensible to humbly request this feature to be added in specific browsers.

This may be an unreasonable request, but why doesn't the W3C spend their time and resources to create an opensource reference browser that web developers can use to test page layout and browser developers can use as a guide for fixing their current browsers?

Because their parsing & rendering bugs would become de facto standard.
I wish browsers would use a true SGML parser. Such parsers do exist. Unfortunately, they don't have enough error recovery mechanism. An OSS SGML parser could be adapted and that would make the browser conforming to the HTML spec.

So what, now invalid markup will have to be rendered in given way? Why is it called "invalid" then? If specification says how to render it why should anyone care to produce valid markup?

I agree. HTML is a contract between HTML document writers and user agents.
HTML5 contract is:
If you do your job (web developer) you'll get your money (proper layout in user agent), but if you do half of the job (but, you MUST not), you'll get the same amount of money.
Is not it equivalent to?
If you do half or your job or more, you'll get your money?

This only impose new restrictions on the user agents which may only do harm, ie. bloat them, make them bigger, slower and introduce new bugs.

Exactly. The de facto standard "extended HTML tag soup" would be so complex, have so many quirks, that it would not be accessible to small tools written in three days by a normal developer. Only big company could produce parsers able to parse one of the most complex computer language ever.
Ether - 15 August 2007 at 19:22:06 UTC

Well, I fail to see what are you people arguing about. The main ideas were mostly said by other, :
x] To create a specification of how should browsers render invalid HTML is a good idea, which should bring some uniformity to the browsers. But hey, admit it, browser don't render the same even valid documents, so where's the certainty that this time they will listen to the specifications?
x] I don't know why it should apply for the NEW "version" of HTML. An invalid HTML5 document shouldn't be rendered at all. That way, old invalid HTML4 pages will stay mostly the same and new HTML5 pages will be valid and no error recovery will be needed for it.
x] When I code an application in PHP, it doesn't even try to recover from syntax errors (which is the essence of tag soup), so why the coders of HTML should be given more? Yes, they are coders (and with all browser displaying i.e. CSS the same way, it would be easier for them to obey the specifications).
x] When someone cannot or doesn't want to become a coder, there are still applications that will have to create valid HTML, because when page exported from editor A won't be displayed and the same page exported from editor B will, no-one will blame HTML5 strict specifications, they'll blame editor A.
Karl Dubost - 15 August 2007 at 20:21:34 UTC

In the comment of SuperKoko, he/she has used cite="firstname lastname" which has made the comment non HTML conformant. (I have fixed it.) The value of a cite attribute must be a URI.
What a modern browser should have done with a non conformant markup like this ? Should it carry a message saying the whole page was non conformant ?
btw, the doctype will not carry a version in HTML 5. It will will simply be <DOCTYPE html>, Among browser vendors, Chris Wilson (Microsoft) and Dave Hyatt (Apple) have advocated for versioning in HTML 5. Ian Hickson (Google), Anne (Opera), Maciej (Apple) are against.
The issue is not on the browser level, but authoring level. The issue is that authoring tools (and authors) should be strict in what they produce, that was part of the sense of my article in the craft of HTML.
Though I will disagree on your last statement, developers going on the HTML market have to recover non conformant HTML markup. It means they have to find techniques to recover the content. What HTML 5 offer for the first time is a precise description on how to recover.
Now I would like on the WG is more developers of CMS, and authoring tools. All people implementing the production of HTML code. The sanitization of the HTML code is in the production tools.
Karl Dubost - 15 August 2007 at 20:24:41 UTC

@ether
if we really want to be strict :)
Your comment is non conformant you are using paragraphs instead of a real list. ul/li
Luckily enough is something which is almost impossible to check by machine. So if we push a bit further, this article should not have been displayed because of your comment.
thacker - 16 August 2007 at 02:47:53 UTC

Dubost--
I have a bunch of questions, hope you don't mind.
Why the differences of opinion between version reference and generic HTML 5 DOCTYPE?
You sold me on the need for the HTML 5 spec based solely on error and recovering handling. A key component on this are the CMS and authoring tool developers. What are the reasons for their non-participation within the HTML 5 working groups and how can this critical element be resolved?
Secondly, why are there two working groups on this spec?
The article by Anne van Kesteren [ http://www.w3.org/html/wg/html5/diff/ ]is very succinct, clear and ideal for non-technical people such as myself versus the WHAT spec draft. Is it possible that van Kesteren will keep this article current on a monthly basis or as needed?
The W3C projects a final recommendation date of the 4th quarter of 2010. Why can't this be compressed to the 4th quarter of 2008?
Wouldn't incremental releases of the spec be more practical with focus initially upon error/recovery handling and security with the initial release?
Why is there even consideration of depreciation of elements and/or attributes based upon frequency of use or confusion/misunderstanding of use? Isn't this more of a matter of education rather than assumption that use reflects practicality or functionality of the element/attribute?
What impact will the HTML 5 spec have upon the existing XHTML specifications?
Finally, what are your thoughts about Holzschlag's suggestions that she presented within her most recent post on her blog?
Don't get discouraged, I have a gazillion more questions.
Thank you very much.
Bennett McElwee - 16 August 2007 at 22:43:28 UTC

Consistent recovery from invalid markup is indeed a useful and important thing to specify. But it should not be part of the HTML 5 specification.
The original article says, "HTML 5 Specification matters because it creates more interoperability when recovering from errors." Actually, it's only the spec's definition of "a very precise mechanism for recovering invalid markup" that does this. This is indeed a useful thing to specify; but it's not logically part of the specification of the HTML 5 language.
W3C could instead just create an "HTML 4 Invalid Markup Recovery" spec. This spec could act as a helpful guide to browser implementors, who currently all use their own algorithms for rendering invalid markup. It would apply to both HTML 4.x and XHTML 1.x served as text/html.
If there is still really a need for HTML 5, then it could be accompanied by a "HTML 5 Invalid Markup Recovery" spec. But perhaps an even better approach would be to forget HTML 5 and simply allow XHTML 2 to be served as text/xhtml (or for backward compatibility, text/html). Then publish an "XHTML 2 Invalid Markup Recovery" spec.
Ether - 22 August 2007 at 20:53:01 UTC

@karl dubost] I was talking about syntax errors, which make the document unparsable, not about the semantic ones. Anyway, I haven't wrote the 'p' tags, it was done automatically.
x] Shouldn't the browser display this page because of errors? Sure, but the script shouldn't have put it there in the first place. There are ways to detect that the posted code is invalid before putting it on the page. And when the recovery algorithm will be available, the receiving script should be able to apply it easily and make the posted code valid.
x]Well, I don't know that much about DOCTYPEs, SGML vs. XML and such, but a HTML5 document could be always recognized using the DTD clausule. I know that this won't solve the thing SuperKoko wrote about, but I'm no pro, right? What about creating strict (with active error feedback) and transitional or such (with passive error feedback) standards?
Karl Dubost - 26 August 2007 at 23:27:48 UTC

Hi,
Just a quick note, that I see sometimes some comments made anonymously. These comments will not be moderated positively.
Stuart Metcalfe - 28 August 2007 at 14:31:28 UTC

I second Bennett McElwee's comment.
That browser vendors want to be able to recover from minor errors is entirely understandable and I don't think many people are going to lose any sleep if they write a recovery mechanism 'on top of' the HTML specification to improve the user experience. That an agreed 'standard' for this is established is good in this case. I strongly believe, however, that this mechanism has no place in the main HTML specification which should be clean, clear and above all require correct implementation. Poor quality code should be optionally recoverable but never explicitly accommodated.
Felix Schins - 2 October 2007 at 11:09:58 UTC

Hi!
I think, it would be very good, if HTML5 uses some of the good ideas of XHTML2. Like e.g. the <h>-<section>-model that is better than <h1> - <h6>, <separator> instead of <hr>, removing the <font>-element and the <iframe>-tag, and so on...
A list of more great things, that HTML5 could use from XHTML2 is found here:
http://www.xhtml.com/en/future/x-html-5-versus-xhtml-2/
Best wishes for the standards-development...
Felix
Erik Reppen - 2 March 2008 at 02:10:09 UTC

Why advocate any recovery from invalid markup for new technologies? It only leeches time and energy away from browser development and makes everybody's lives more difficult. How is a user who can't even master simple SGML-based syntax supposed to correct behavior if a given recovery-process has failed to properly assess their intentions?
People who don't want to learn anything new can stick to older technologies which will no doubt continue to be supported indefinitely as the resources expended on doing so gradually become even more negligible.
The carrot offered for taking the very minor step of learning to stick to lower-case, nest, and close properly will be the advantages that new technologies offer. Asking people to validate their markup for a to-the-line error check hardly seems like a major barrier to entry.
Sloppy syntax allowances in something as basic as an SGML-based language is a waste of resources that can only hurt accessibility, aggravate proper indexing in search engines and slow the implementation of all new technologies. Why bring the evolution of the web to a snail's pace for people who can't be bothered to do the amateur web designer's equivalent of a spell check?
- Karl Dubost - 2 March 2008 at 09:49:32 UTC
  
  Hi Erik,
  Nothing forbids an author to stick to lowercase, strict guidelines, quoted attributes, etc for writing HTML. I would even personally encourage this. It is good design and practices when sharing work in a Team.
  That said, browser developers also need to recover broken markup in their implementations. if you stop to recover for broken markup, we will not be able to access 95% of the Web.
  There are really two things to separate:
  Authoring HTML which can be strict with a well defined content model
  Parsing HTML which has to cope with errors.
  For the 1., I invited people to commit their time to write the HTML 5 Authoring guidelines. It means people actually writing prose and not only discussing about the why and when and how. The only way to move forward on this is really to create the document for it.
Erik Reppen - 3 March 2008 at 02:16:42 UTC

"That said, browser developers also need to recover broken markup in their implementations."
This is the part I'm confused on. Why?
Ever since I first caught wind of the new spec, I've been trying to understand whether I've misunderstood the goal of standards all along or if there's been a change of plan. I thought the idea was to ultimately transition to strict syntax. Period. Not for the purpose of "enforcing" proper coding practice out of sheer priggishness but to improve the quality of the development and usage environment for everybody involved by making sure that if something is live, it's code can be easily read by machine, code, and developer. We can continue to allow for old mistakes through the use of proper doctype recognition and proprietary opt-in browser targeting (in the case of IE exclusivists).
But if sloppy markup continues to be allowed to go live, accessibility, indexing, barrier to entry and standards as a whole are all impacted in a negative manner in my eyes. I just don't see who it benefits. Certainly not the new markup coder who is trying to figure this stuff out for the first time but can't because a browser is incorrectly guessing at what his sloppy syntax is supposed to mean rather than simply pointing out where it needs to be corrected before rendering anything at all.
If the browser devs don't think that's good enough for less expeirenced aspiring web developers, all that's really needed is the equivalent of an SGML spellchecker that suggests rather than automatically assumes it knows the proper code. Although I'd expect most new devs could make do with something similar to the validation process.
So help me out here. Am I under some sort of mistaken impression about how things work or what the W3Cs goals are? I'd love to have a better understanding of everybody's priorities in these matters, especially the browser devs (MS mostly) are.
Thanks for your response thus far and feel free to direct me to a more appropriate place for this discussion if there is one. It just seems to me like strict syntax is win-win for everybody and I don't see the cons of it.
- Karl Dubost - 5 March 2008 at 02:43:18 UTC
  
  I've been trying to understand whether I've misunderstood the goal of standards all along or if there's been a change of plan.
  
  The goal of a standard is to be implemented by a good share of the market so that people can benefit of smooth interoperability when they are working with documents. It is a practical exercise with social, economical, technical constraints.
  
  But if sloppy markup continues to be allowed to go live, accessibility, indexing, barrier to entry and standards as a whole are all impacted in a negative manner in my eyes. I just don't see who it benefits.
  
  I will try to use another metaphor, because there is a misunderstanding.
  In my native language, French, I do mistakes (typo, grammar, etc.). The rules for French are strict and defined. Someone who is listening to me or who is reading me is still able to understand me even when I do typos and grammar errors (except if my content becomes really garbage). The person has applied an automatic recovery process to make the discussion possible. In a teaching context, if the person is a professor, she/he will fix my mistakes (note that he/she has been able to understand my broken content in the first place). My author responsibility is to create a correct content following the rules.
  There are billions of documents (95%) on the Web with incorrect syntax. Two solutions:
  Browsers stop to process any documents which is written with an incorrect syntax. It means that most of the Web sites on the Web will not be displayed anymore, your favorite travel agency, your favorite search engine, etc. With the previous metaphor, nobody understands you as soon as you make a mistake.
  We create a specification which explains to browsers, fixing libraries to recover the content available on the Web in an interoperable way. With the previous metaphor, everyone has a formal process to recover what you said incorrectly. Useful for teachers (validators, checkers), useful for your buddies (browsers).
  That said nobody forbids you to apply your author responsibility and creates strict markup. The content model of HTML 5 (rules for writing in html and xhtml) is not yet finished. A specification which makes it obvious for authors is needed. A volunteer editor, who commits time, is what we need for now.
Tom Aman - 5 March 2008 at 18:13:36 UTC

First of all, most of the comments here refer to browsers. Instead of browsers, think user-agents. While it is reasonable to continue to have user-agents attempt to fix bad html, I think it would be great to at last insist that any new version of HTML MUST be valid. One reason that there is so much bad code out there is that user-agents (mainly browsers) have been so forgiving and have done their best to cope with errors by guessing at the repair and many page creators never validate their code (often are not aware that W3C offers free validation). The problem with allowing the errors is that it makes it difficult to write any user-agent to cope, greatly increasing the code needed to parse a document and, at the same time, slowing the rendering. In addition, allowing the errors will just perpetuate the present situation.
Essentially, we can't do much with existing documents except carry on as we have and display the pages as best we can but we can insist that any document that purports to be HTML 5 or higher will NOT display unless the markup is correct (and good browsers will, as a minimum, identify the line containing the error, preferably will also tell what is wrong).
- Karl Dubost - 9 March 2008 at 15:17:40 UTC
  
  The spec already mandates that the content which is produced must be valid.
  For the second part of your comment, do no display content written for HTML 5 which is invalid? How do you know if an invalid document has been written with HTML 4.01, HTML 5 or nothing specific in mind? It's almost impossible to know that except if you are the author yourself (or the tool which is producing the content.)
Strick - 3 September 2008 at 13:22:45 UTC

How do you know if an invalid document has been written with HTML 4.01, HTML 5 or nothing specific in mind? It's almost impossible to know that except if you are the author yourself (or the tool which is producing the content.)
All they would have to do is add some sort of attribute to an existing tag to put what spec you are using. (Kinda like what MS is doing with IE 8 http://support.microsoft.com/kb/956197
I'm all for forcing the new standard. I'm tired of trying to maintain code that uses tags to design.