W3C

Fixing the Web… Together!

Molly Holzschlag recently posted an article about stopping the development on HTML 5 and XHTML 2.0 until implementations are consistent for HTML 4.01 and others. It is surprising because one of the main goals of HTML 5 is exactly this, a “CALL for consistent implementation of these most basic specifications in all current browsers and devices to this point”.

Let’s fix bugs

Then many developers go on in comments, saying “yeah, we should do that, stop everything.” Me too, I just want to fix bugs. I’m no implementers of a Web browsers, but I have reported bugs to implementers. I have also in my professional activity developed quite a lot of commercial and academic Web sites, before working at W3C.

But how do we fix bugs?

In fact, if we look carefully at browsers the numbers of bugs related to HTML is not that big. There are more troubles on DOM HTML, CSS fronts. If you take an HTML 4.01 valid page with CSS 2.0, which is simple enough, it is working most of the times. When you start to play with the DOM (Ajax, javascript), it becomes a little bit more rocky.

To fix bugs, an implementer looks at the specification, either HTML 4.01, DOM HTML Level 2, CSS 2.0, etc. and has to interpret what’s written into it. Sometimes specifications have ambiguities, sometimes an implementer goofed and made a mistake.

When the specification is not clear, what do we do? We create an errata, or if it’s a too big issue, we write a clearer specification for the implementerHTML 5.

Trickier is the case of a bug in the implementation which becomes widely used. The bug somehow became a feature (Remember CSS rules for selecting browsers?). It means the vendor has to take the difficult decision by fixing the bug to break Web sites.

Here I have talked only of valid content.

The Valid Web?

What do we do with 95% of the Web? I’m talking about the invalid content. First the HTML 4.01 doesn’t define normatively what to do with invalid content and how to parse it. Common browsers have never been validating SGML parser, they all implemented a parsing engine which has never been specified in a document, until… HTML 5. So to have interoperability, parsing HTML has to be defined, the purpose of this specification.

Then what do we do when we have parsed an invalid document.

  • Do we display the document at all?
  • Do we send a big red warning sign for the user saying the Web site is crap?
  • Do we recover silently? but then how do we have interoperability if the content is unpredictable?

How many developers serve their XHTML with application/xhtml+xml? Not many Web developers do. What do we do about invalid CSS? If you read me and you are the Web developer of a big company Web site. Please do follow these steps:

  • Make your content XHTML 1.0 Strict or XHTML 1.1
  • Serve your content as application/xhtml+xml

Then tell me how long it took before you got a phone call or an email.

Fixing bugs in browsers becomes a lot easier when the content is valid and correctly written. How do we fix all these authoring tools? these HTML scripting libraries in python, perl, .Net, etc? How do we fix authors who are writing bad HTML codes? How do we fix this 95% of the Web?

HTML 5 is the start of an answer. It is not the ultimate answer, but it helps a lot to achieve what you are exactly calling for. Getting interoperability.

Maybe you wanted to say HTML 5 without new features? That is the topic of another post :)

25 thoughts on “Fixing the Web… Together!

  1. If we want a valid web, there needs to be an incentive for developers to make valid code. Here’s a solution: Browsers should only allow new features to work on valid pages. For example, not valid, no CSS3. Developers will want to use CSS3 and will produce valid code in order to do so. If they make a mistake, the page will still work, just without the new styling. It’s a form of graceful degradation. This method has already worked to make the web more valid than it was a decade ago.

  2. Liam, this technique should only work on aware developers. HTML is also for people who know nothing about technique and of course nothing about CSS 2.1, HTML4, XHTML 1.1 or any version…

    I think the first step is to tell Web Developers and everyone called a “professional” that they made crappy pages (and they still do).

    Unfortunately I have no magic response…

  3. If you read me and you are the Web developer of a big company Web site. Please do follow these steps:

    • Make your content XHTML 1.0 Strict or XHTML 1.1
    • Serve your content as application/xhtml+xml
      Then tell me how long it took before you got a phone call or an email.

    What are these phone calls/e-Mails going to point out??

    Reference is constantly being made to a “broken Web” and how HTML 5
    is going to help fix that.

    What is actually broken?? Content that is not standards compliant
    breaks the Web??

    A significant argument can be presented that the only things that are
    “broken” are the tools that have been used to produce Web content and
    the level/amount of education for those individuals publishing
    content.

    Developing a new standard will not solve these issues. What it will
    do is increase the complexity of the education process.

    The two greatest offenders of generating “broken” content, Microsoft
    Word and FrontPage, are history.

    The market success of FrontPage produced a plethora of other sloppy
    development applications.

    Maybe the objective should be to get these development tools so the
    tools will easily generate semantic content, do it simple and stupid
    so that your grandmother can publish content if she so desires.

    The Web is and should be open … open to anyone to publish content.
    To think that everyone should be a top-tier developer is ludicrous.

    Holzschlag hit the nail on the head … let things catch up and stop
    thinking the somehow a new standard will magically repair what has
    been done.

  4. As developers, what would be the best course of action for us to evangelize HTML 5 and interoperability to browser vendors?

  5. Copying the comment left on Molly’s site

    About the server, yes there was a power outage on some of the machines, I’ll take care of it this morning, as soon as I arrive at the office. done!

    About the phone calls ;)

    • Users of the Web site will get a dialog from Win IE saying “save this page on your computer. It means that 75% of users will not be able to see the content.
    • Then let’s say that this big site is used by 25% of users with browsers which can handle application/xhtml+xml, it means the site has to be perfect before displaying the page. Any kind of well-formed errors will display an error message in the browser. (XML spec rule)

    Bear with me, I’m not saying XHTML is bad, I use it on a personal web site served with the right mime type: application/xhtml+xml. It requires from the maintainer of the Web site, to introduce quality control early in the publishing process. If the pages are not well-formed, the publication can NOT happen. So it will minimize users troubles.

    Let’s say that if we had draconian rules for HTML 4.01 (not defined in the spec) and that the browser should not display anything at all or a big red flag for all invalid content. Every webmasters, content developers will be harassed with phone calls and emails.

    It is the social part of the Web. There are user interactions.

    About history in terms of Web development. Saying that Microsoft FrontPage is history is not enough. What is the plan to fix 95% of the Web content (as browsers will have to read it in an interoperable way.) The content which is used now by Web sites. If there is a solution, I’ll be happy.

    For “let things catch up, read top to bottom HTML 4.01, and you will see that is difficult to implement, ambiguities, behaviors not specified, etc. HTML 5.0 contains a lot of what Molly is calling for, fixing HTML 4.01 to make it implementable.

  6. Dubost– I responded to your appreciated comments on Holzchlag’s Blog without checking to see if your servers were back in order. My apologies.

    The copy from Holzschlag’s Blog:

    dubost–

    Thank you for your reply.

    The XHTML DTD served with a correct MIME type is not an issue in any modern user agent if, as you have stated, content is coded correctly.

    Maybe I just simply look at things with the mind of an eleven year old kid. To think a change in standards is going to fix broken Web content is similar to trying to fix broken automobiles by rebuilding roadways and changing the engineering on how those roads are built.

    Designers/developers have gotten so wrapped up in the “we concept and what is good for “us that focus has been lost on what the Internet is about and who it serves.

    To the average Web visitor, what is being referred to as “broken content is not broken to them. “Elk don’t know how many feet a horse has.

    HTML 5.0 appears to be stepping backwards, eg. co-mingling design and content. Standards are not broken. What is broken is how people code content and the tools used to create content.

    I personally have a hard time grasping the concept that broken content is all that important when factoring in that anyone should be able to publish content. Browsers seem to handle “broken content with greater ease than interpreting standards compliant content.

    What to hell do I know .. not a damn thing. Dubost, Holzschlag, et al .. you people are the experts. Just please don’t screw it up .. keep it simple and stupid. Above all else, “if it ain’t broke, don’t be trying to fix it.

    Addendum: If interoperability is the issue for HTML 5, well, that has some understandable logic. If the intent is to introduce a standard to ease rendering of content created by all the FUBAR content development packages, that may have some merit even though I think it may be backwards and with more being read into the problem of content that has been poorly coded than what actually exists.

    Approach judiciously and with caution, please. It has taken over a decade begin to reap the benefits of standards compliance as they currently stand.

  7. Excelente idea about you wrote:

    “Then what do we do when we have parsed an invalid document.

    * Do we display the document at all?
    * Do we send a big red warning sign for the user saying the Web site is crap?
    * Do we recover silently? but then how do we have interoperability if the content is unpredictable?"
    

    Well, we can arrive the standard forcing developers writing a correct code. :) second option is my choice!

    Sorry my english!

  8. I think it’s not only individual developers we need to evangelize to. While more compliant than in the past, browsers are still not fully compliant. Individual developers still have to resource to workarounds and hacks to make their sites work for all browsers and OS.

    Also, with the introduction of online CMS like WordPress, PHPNuke, Joomla, Mambo, DNN, etc. along came new players to the game, in the form of HTML editor plugins like Ektron, Telerik, FreeTextBox, FCKEdit, etc.

    Final users will most likely be using those instead of FrontPage/DreamWeaver/UltraDev, and these need to be fully HTML compliant.

    There’s a lot of people that needs to be convinced to make all the tools fully compliant. Until then, developers are “between a rock and a hard place”, trying to fix things they haven’t broken to accomodate to the standards (and to browser’s quirks).

  9. With PHP script I make my website xhtml1.1 application xhtml+xml if the user have standards compliant browser (ex; mozilla firefox), if they do not (IE6 and other) it simply is output as text/html but still xhtml1.1. That way IE users can view the page just perfectly too.

    so far it works fine, and it feels more like mature coding (like PHP) with the error reporting and need to write strict for it all to work. I like if the direction goes this way.

    I heard you need to write Javascript in different ways tho, so it may have some compability issues there until the web go all the way to application xhtml+xml.

  10. Responding to Neovov

    Let’s say that when CSS came out, all browsers would ignore all CSS declarations on invalid pages. An unaware developer applies color: blue; to a heading and nothing happens. He posts to a forum and gets the response, “Dude, you need to validate your page”. He fixes his page, and it works. From there on, he is careful to create valid pages. If he makes a mistake on some page, the page still appears, but without styling. Users are not locked out of the content because of a small oversight but there is still an incentive to create valid pages. This worked to some extent. I recall early CSS tutorials which rightly said that invalid pages do not play nice with CSS. Many developers starting paying attention to validity to make their CSS work better.

    Today, if the browser makers took this above approach with any new CSS features, developers who want to use them would create valid code while the rest of the web would continue to function. HTML authoring tools that want to allow their users to use these new features would start to generate only valid code. This method is all carrot, no stick.

  11. Fortunately, it looks like almost all browsers supports XHTML 1.1, except IE, and that is another way IE is holding up the Web.

  12. Disclaimer: I have no idea what I’m talking about…
    If the search engines, err, I mean, if Google had a say in the new spec’s semantics, there would be a huge incentive for web developers to be aware of their markup’s validity. Imagine if their webmaster guidelines stated that valid XHTML “helps” Google index and categorize your content.

  13. Where we need to begin is in technical schools. At mine, I did not learn HTML. I learned Dreamweaver, which does the HTML for me. This is why a lot of sites are crap: We are taught to be lazy. I can write in real HTML, but I had to learn that on my own.

  14. I can’t agree with Lynx Kraaikamp more.

    Let me describe today at work:
    Arrived at work
    Opened open source, text/source code editor.
    Wrote lots of code
    Got the results I wanted
    Wrote beautifull code
    Finished the task
    Went Home

    My friend:
    Arrived at work
    Opened Dreamwaver
    Clicked a lot of things
    Taunted everything
    Clicked a lot of other things
    Asked for help
    Clicked a lot of stuff
    Told him to visit w3c.org
    Read key paragraphs from the css spec
    Still working in dreamwaver
    Finished before me
    The tool wrote horrible code
    Went Home

    Boss is happy with both of us.

    Until now, the point of HTML (as I get it) is that you can write documents wether you’re a coder or non-coder.

    I would not like an error message displayed in the browser content at all. A browser HTML console/built-in validator would be very great.

    Imagine that one strong side of HTML that atracted so many people was the browser’s high tollerance to errors. As a novice coder it is very dissapointig to see “Error in line 6″ which specifically means that “this website’s creator does not know how to write html”. This is mood killing. A console would be great.

    I think it’s a dumb idea. How is this comment constructive? Well… people are not developers. As an experiment, I printed the HTML4.01 and CSS2.1 latest specifications and showed them to anybody who was interested to know what were those huge books.
    When I told them “those are the HTML and CSS specs” people said – you’re crazy, right?
    Imagine that I don’t know a single person that has thoroughly read the specifications like me.
    All of them say… Hell… tools make your job easier and they’re telling me that I’m mistaking if I say that manual coding is more efficient. How do you convince someone like that to stop using WYSIWYG tools?
    I didn’t.
    Also imagine that the spec should be made in such a way that your non-developper mom should understand it, also, implementers should understand it. Both the “mom” and “implementor” parties must understand the spec in the same way.

    I’m still wondering if my mom would make the most mistakes or implementors would goof more often :)

  15. This is my first comment here so hopefully it renders with proper paragraph breaks.

    I really like the suggestion that Liam Morland posted on 2007-06-15 at 12:06 in regards to only processing new features (e.g. CSS3 instructions) if a page actually validates correctly to the relevant specifications. I also liked Lynx Kraaikamp’s observation about schools not teaching HTML, rather teach users tools like Dreamweaver. This has irritated me for years. If someone is going to be a “professional” web developer, they need to understand the fundamental theories of the underlaying specifications and guidelines.

    I was really excited when I heard HTML5 was under development and that it was going to fix some of the bugs and make the rendering rules for browsers much more explicit. I was not pleased, however, to see proposals to bring back elements like “FONT” just to support broken WYSIWYG webpage editors. The best thing that ever happened in the evolution of web development was the separation of webpage layout and structure. As others have said, it has taken ten years to make as much progress as has been made. The last thing that should be done is to turn back the clock just to support broken web development tools.

    Even if HTML5 were finalized tomorrow, it will take another ten years to be the generally used development “standard”. The idea of delaying HTML5 to fix the bugs does not make sense to me. Fix the bugs as part of HTML5 so that it can become the adopted “standard” as quickly as possible. Oh in regards to the questions about fixing browser bugs at the expense of breaking websites. I fully believe the bugs should be fixed regardless of what it does to websites, especially if the websites get broken because their code is not valid for the declared DTD. While I may feel for armatures who simply want a personal website, I have absolutely no pity for “professionals” and companies whose websites get broken because the code is not valid. Someone who is truly a professional web developer should be creating valid code as a normal routine and companies should be hiring people who will build their sites correctly from the get go.

  16. The problem with browsers displaying an error message when a page is invalid is that it doesn’t make business sense from their point of view. Suppose “canonical example grandma” visits her favorite cooking website with IE6 just fine, but when her super-geek grandson installs the latest version of Firefox for her she get a big red warning every time she logs on? She’ll go back to IE6 because it “works”, even though the site is technically broken.

  17. I done same way of learn as Lynx Kraaikamp.
    First Dreamweaver and now real HTML code.

    Now it’s really horrible for me to see the colleagues to make the web pages with Dreamweaver. They stopped 3 ago.

    I’m ready to learn new things which improve the web. It is the evolution.

  18. I hear a lot about the 95% of the Web that is “broken”, i.e., coded using invalid HTML. It usually goes along with the idea that authors of that 95% portion will be the problem with adopting new standards. It’s as if these “educated” developers are to blame for the error-prone Internet we’ve got today, with educating them stated as the biggest challenge we have to overcome.

    Let’s be clear, the reason 95% of the Web contains markup errors and other bugs is because it was coded using an error-prone and buggy browser as the testing device, or created using bad WYSIWYG authoring tools. Sure, HTML is designed to be lenient, but full fledged bugs are a different story.

    Fix the tools, and the Web will clean up on its own.

  19. As long as I remember, in colleger I never was instructed about any specific programming tool. Regarding programming languages yes. I considering the teaching of specific IDEs or programming tools (Wizards included) a waste of time. I lernt how to programm in C, C++, Assembler, Pascal, etc. Above all very good programming practices: OOP, E-R, data structures, etc.
    The main point here is about having browsers and tools to help coders, life is not just HTML, we have a lot of stuff out there: XML, XSLT, javascript, SOA, C++, java, C#, etc. Having a better help files for checking specifications could be a great step forward good coding. As far as I know, microsoft help files are the best, something I miss a lot with other tools and standards.

    Sorry to hear about lazy programmers and related problems.
    Strictly speaking about HTML. I use tools that help me to have a first preview and then I go to open the page in different browsers, just to be sure. Do not use open source only and don’t see the point of blamming the tool but the programmer itself.

  20. > I have reported bugs to implementers

    The current nr 1 problem with bug reporting is Microsoft, period. Over the last 6 years, all of the browser manufacturers have dealt with or fixed a very large amount of their own [HTML or CSS or DOM] bugs. All of the browser manufacturers, except Microsoft. No one can seriously argue with this.

    > Sometimes specifications have ambiguities,

    Then, W3C people should edit clarifications, errata, the sooner the better. Simple as that.

    > sometimes an implementer goofed and made a mistake.

    Then W3C people should notify, should warn implementer about such when they know about this. Web standards advocacy groups and web standards gurus may too. No one wins anything on the web if implementations differ a lot or when it involves an important aspect of the spec. E.g. overflow: visible; CSS1 box model; margin and width auto; unitless value becoming pixels; etc..: for most parts, they now all depend on doctype switching (that’s true for IE6+), so faulty implementations can linger on and on and the invalid web never get fixed and never needs to be fixed according to the web author. A very miserable situation.

    > (…) bug in the implementation which becomes widely used. (…) It means the vendor has to take the difficult decision by fixing the bug to break Web sites.

    I am for this. You make an error: you fix your own error. There has to be a responsibility, accountability somewhere. It was up to the implementor to verify, test, make sure, to confirm, etc.. to begin with.

    > What do we do with 95% of the Web?

    What I don’t understand here is how come the W3C has not asked its own members and structures that question 5 years ago? Why did it take soo long for W3C to realize that this was already a huge problem several years ago. In 1 W3C webpage edited in April 2002, it is said that 99% of all the web is invalid: “Most of the Web sites on the Web are not valid. We may assume that this is the case for 99% of the Web pages” taken from
    http://www.w3.org/QA/2002/04/Web-Quality

    > what do we do when we have parsed an invalid document.

    Recovering silently is what created, caused the web to be 95% invalid to begin with. Bad, invalid WYSIWYG HTML editors is what caused the problem to begin with.

    My answer is: do not display the faulty invalid content (even if it has a benin error) and display a Webpage Quality indicator icon (like smiley or green check for valid page, frown or red ‘X’ when invalid) on the statusbar of the browser (or somewhere else) which when clicked would report more info to the user about the markup errors and give him more options among which one would be to validate the whole page with the World Wide Web Consortium. In that same error reports, you could have a list of tools, quality web authoring tools aimed at, targeting fixing validation errors.

    What W3C has not done and should have done a long time ago was to create a free and offline validation markup software which could be used to learn HTML 4 and to fix HTML markup errors all at the same time. A tool similar, like “A real validator” from WDG. A tool which would have more options, capabilities for advanced users (like built-in HTML Tidy). A tool which would explain with examples and with simple verbose what the W3C validator does not explain and is not explaining in simple terms, with simple examples.

    The W3C should have been evaluating and assessing the quality of web authoring tools as far as quality of markup code generated, out-of-the-box generated markup code and then publish its results on the W3C site.

    ATAG 1.0 is nice but are just guidelines: evaluate and publish scores, evaluating results on those authoring tools, for their intrinsec web-quality. If the W3C had done that, the history of FrontPage (at least 7 versions) would not have lasted 7 years. FrontPage is today still being used by lots of “Grand-mothers”. Lots of unmaintained and invalid today websites have been developed by FrontPage.

    With 95% of invalid, bad HTML code all over in billions of webpages, the solution can not please everyone and can not be a nice, soft, gentle tech-evangelization, educating, tutorial-oriented approach. None of this achieved huge or significant results in the last 5 years. You have to resort to drastic measures, some sort of a revolution here.

    What is invalid HTML code should not and must not display/work anyway and regardless anymore.

    Gérard Talbot

  21. I really like the suggestion that Liam Morland posted on 2007-06-15 at 12:06 in regards to only processing new features (e.g. CSS3 instructions) if a page actually validates correctly to the relevant specifications. I also liked Lynx Kraaikamp’s observation about schools not teaching HTML, rather teach users tools like Dreamweaver. This has irritated me for years. If someone is going to be a “professional” web developer, they need to understand the fundamental theories of the underlaying specifications and guidelines.

    I was really excited when I heard HTML5 was under development and that it was going to fix some of the bugs and make the rendering rules for browsers much more explicit. I was not pleased, however, to see proposals to bring back elements like “FONT” just to support broken WYSIWYG webpage editors. The best thing that ever happened in the evolution of web development was the separation of webpage layout and structure. As others have said, it has taken ten years to make as much progress as has been made. The last thing that should be done is to turn back the clock just to support broken web development tools.

  22. Fixing the Web, means more or less fixing “(internet-explorer)-browser-bugs” ;o)

Comments are closed.