W3C Team bloglife without MIME type sniffing?

In a recent item on IE8 Security, Eric Lawrence, Security Program Manager for Internet Explorer, introduced a work-around to the security risks associated with content-type sniffing: an authoritative=true parameter on the Content-Type header in HTTP. This re-started discussion of the content-type sniffing rules and the Support Existing Content design principle of HTML 5. In response to a challenge asking for evidence that supporting existing content requires sniffing, Adam made a suggestion that I'd like to pass along:

I encourage you to build a copy of Firefox without content sniffing and try surfing the web. I tried this for a while, and I remember there being a lot of broken sites ...

That reminded me of an idea I heard in TAG discussions of MIME types and error recovery: a browser mode for "This is my content, show me problems rather than fixing them for me silently."

Though Adam offered a patch, building firefox is not something I have mastered yet, so I'm interested to learn about run-time configuration options in IE (notes Julian) and Opera (notes Michael). Eric Lawrence's reply points out:

Please do keep in mind, however, that most folks (even the ultra-web engaged on these lists) see but a small fraction of the web, especially considering private address space/intranets, etc.

A report from one developer suggests there's light at the end of the tunnel, at least for sniffing associated with feeds:

I did, partly as an experiment, stop sniffing text/plain in the latest release of SimplePie (which, inevitably, isn't the nicest of things to do, seeming there are tens of thousands of users). Next to nothing broke. I know for a fact this couldn't have been done a year or two ago: things have certainly moved on in terms of the MIME types feeds are served with ...

If you get a chance to try life without MIME type sniffing, please let us know how it goes.

Steve Faulkner et alCircumventing Hegemony in the HTML WG

Raising Issues

In order to raise an issue or proposal in regards to the HTML5 specification you do not have to be a member of the W3C HTML Working Group (HTML WG). Anyone can simply enter a bug into the HTML Bugzilla. If the proposal or issue is rejected by those that control the specification and the issue is related to accessibility, you can refer the issue to the W3C Protocols and Formats Working Group (PF WG). Probably the best method is to subscribe to and send an email to the WAI-XTECH mailing list.

I would suggest that you take this course of action if you consider that the issue raised to be substantive and it has not been given appropriate consideration due to lack of accessibility expertise or understanding of the obligations of the HTML WG to ensure deliverables will satisfy accessibility requirements.

The PF WG is responsible for ensuring accessibility considerations are taken into account in all specifications produced within the W3C.  If the PF WG considers the matter substantive they may formally request that the HTML WG, not just the editor, reconsider the matter. This may lead to the HTML WG as a whole having a vote on the matter, that is up to the HTML WG Chairs to decide.

By following this course of action you at least guaranteed that the issue will be discussed by a group within the W3C that has expertise in relation to accessibility and the web. It will also be considered employing the W3C consensus process, which is not a process currently used in practice within the HTML WG.

W3C Member Organisations

You can also bring matters to the attention of W3C member organisations, so when it comes time to review and vote on the HTML5 specification before publication, those organisations can make an informed decision about whether the specification takes the accessibility requirements of their constituents into account.

W3C members include organisations such as the Royal National Institute for the Blind and vision australia who represent the interest of people with disabilities, so you can voice your concerns with them directly.

Related Reading:

W3C Team blogThe How-To for html 5 parsing

You have read a lot about the html 5 specification. You heard that there were hidden dragons and acid rains. But what about looking by yourself practically how html 5 parsing is working? There are already some tools to play with html 5.

DOM in actual browsers

DOM (Document Object Model) is the representation that browsers are using in memory to manipulate Web content. Browsers have bugs and the content on the Web is largely not conforming. It results in very different DOM representations in browsers. If you are interested by seeing what a document looks like in different browsers, you can use the Live DOM Viewer. Open this link with each browser you know and paste code into the window.

This helps you to see how the Web content is understood today by different tools.

DOM after html 5 parsing

Now you might be interested to see how a document will be represented by a tool implementing html 5 parsing rules. An important note, html 5 is a specification in development. Things might change. The following tools might be incomplete and contain bugs as well. But it will give you an idea of the DOM. It is very practical when you are developing another language which is not html 5 but might be sent as text/html (by mistake or practical choice).

There are at least two online services:

Henri Sivonen developed a standalone application that you can use on your desktop. Here are the instructions to get it running. It worked fine on my macintosh.

  1. Check out the source: svn co http://svn.versiondude.net/whattf/htmlparser/trunk/ htmlparser
  2. Download and untar GWT 1.5 RC1: http://code.google.com/webtoolkit/versions.html
  3. On Linux, install libstdc++5 and a JDK (Ubuntu's OpenJDK-based package worked for me).
  4. Edit the paths in HtmlParser-shell (Mac) or HtmlParser-linux (Linux) to point to the location of GWT.
  5. Run HtmlParser-shell (Mac) or HtmlParser-linux (Linux)

Henri gave a list of limitations and bugs

Using html 5 parsing in your own code

There are for now three implementations of the html 5 parsing algorithm.

There is an attempt at implementing in C# for .Net 2.0, but no code has been released yet.

If you know other tools implementing it, leave a comment.

W3C Team blogImproving Interoperability by Short Release Cycle

When a software is shipped, it has bugs. There are many reasons for these bugs. It can be poor in-house development, it can be careless testing, it can be unclear specifications, and many other things. We have to live with these bugs in software.

A bug deployed in a software for a long term becomes a feature.

It's specifically true in a distributed environment where pieces are loosely joined: the Web. Softwares are released with their inherent bugs. Content and framework developers are hit by the bug. They modify their own software to accommodate the bug or take advantage of it. No new version of the buggy software is released for a long time. When it is finally time to release a new version, the buggy software has to keep the bug as a feature to not break anything on the Web. Eventually, one day the bug makes its way to a specification like html 5.

It is difficult to change things because they are all intertwined but in a very loose way, which makes its strength. You can try to fix the software knowing that it will break things at many places. You have then to be ready to loose customers if someone else as implemented the bug. Users are not aware of the bug, and they don't really care about it. Fixing means also, in this case, educating people about the issue, and content developers on how to fix their content. Content developers will be the hardest ones. If they fix, knowing that it will break things in other softwares, they will loose customers. So they are not likely to do it.

To avoid that bugs become features, softwares have to be released with a short cycle. So that people can't take advantage of bugs. It means also that bugs don't survive many releases.

Can we improve the situation for bugs already deployed?

The solution could be a simultaneous release of softwares and a campaign educating people. This is challenging. Very challenging. It means agreement between companies at the release moment and a front with regards to unsatisfied customers. I just wonder if it would be possible as an experiment for one or two bugs. For example, in HTML 5 specification, browsers and Web sites, would it be possible to fix the content-type sniffing on text/plain.

Sam Rubyauthoritative=true

Eric Lawrence: we’ve provided web-applications with the ability to opt-out of MIME-sniffing. Sending the new authoritative=true attribute on the Content-Type HTTP response header prevents Internet Explorer from MIME-sniffing a response away from the declared content-type

While I’m not a fan of content-sniffing, one of my few pet peeves with HTML5 is that it endeavors to institutionalize the practice with no provisions for content providers to opt out.  As the lesser of the available evils, I hope Microsoft’s proposal is quickly adopted by other browsers.

WHATWG blogInterview about HTML5 on Boagworld

Boagworld is a web design and development podcast based in the UK. In today's episode, they interview me about HTML5. In it, we discuss the current state of HTML5, some of the new features that are currently, or are being implemented, and what we can expect in the future.

Lachlan HuntInterview about HTML5 on Boagworld

Boagworld is a web design and development podcast based in the UK. In today’s episode, they interview me about HTML5. In it, we discuss the current state of HTML5, some of the new features that are currently, or are being implemented, and what we can expect in the future.

IEBlogIE8 Security Part V: Comprehensive Protection

Hi! I’m Eric Lawrence, Security Program Manager for Internet Explorer. Last Tuesday, Dean wrote about our principles for delivering a trustworthy browser; today, I’m excited to share with you details on the significant investments we’ve made in Security for Internet Explorer 8. As you might guess from the length of this post, we’ve done a lot of security work for this release. As an end-user, simply upgrade to IE8 to benefit from these security improvements. As a domain administrator, you can use Group Policy and the IEAK to set secure defaults for your network. As web-developer, you can build upon some of these new features to help protect your users and web applications.

As we were planning Internet Explorer 8, our security teams looked closely at the common attacks in the wild and the trends that suggest where attackers will be focusing their attention next. While we were building new Security features, we also worked hard to ensure that powerful new features (like Activities and Web Slices) minimize attack surface and don’t provide attackers with new targets. Out of our planning work, we classified threats into three major categories: Web Application Vulnerabilities, Browser & Add-on Vulnerabilities, and Social Engineering Threats. For each class of threat, we developed a set of layered mitigations to provide defense-in-depth protection against exploits.

Web Application Defense

Cross-Site-Scripting Defenses

Over the past few years, cross-site scripting (XSS) attacks have surpassed buffer overflows to become the most common class of software vulnerability. XSS attacks exploit vulnerabilities in web applications in order to steal cookies or other data, deface pages, steal credentials, or launch more exotic attacks.

IE8 helps to mitigate the threat of XSS attacks by blocking the most common form of XSS attack (called “reflection” attacks). The IE8 XSS Filter is a heuristic-based mitigation that sanitizes injected scripts, preventing execution. Learn more about this defense in David’s blog post: IE8 Security Part IV - The XSS Filter.

XSS Filter provides good protection against exploits, but because this feature is only available in IE8, it’s important that web developers provide additional defense-in-depth and work to eliminate XSS vulnerabilities in their sites. Preventing XSS on the server-side is much easier that catching it at the browser; simply never trust user input! Most web platform technologies offer one or more sanitization technologies-- developers using ASP.NET should consider using the Microsoft Anti-Cross Site Scripting Library. To further mitigate the threat of XSS cookie theft, sensitive cookies (especially those used for authentication) should be protected with the HttpOnly attribute.

Safer Mashups

While the XSS Filter helps mitigate reflected scripting attacks when navigating between two servers, in the Web 2.0 world, web applications are increasingly built using clientside mashup techniques. Many mashups are built unsafely, relying SCRIPT SRC techniques that simply merge scripting from a third-party directly into the mashup page, providing the third-party full access to the DOM and non-HttpOnly cookies.

To help developers build more secure mashups, for Internet Explorer 8, we’ve introduced support for the HTML5 cross-document messaging feature that enables IFRAMEs to communicate more securely while maintaining DOM isolation. We’ve also introduced the XDomainRequest object to permit secure network retrieval of “public” data across domains.

While Cross-Document-Messaging and XDomainRequest both help to secure mashups, a critical threat remains. Using either object, the string data retrieved from the third-party frame or server could contain script; if the caller blindly injects the string into its own DOM, a script injection attack will occur. For that reason, we’re happy to announce two new technologies that can be used in concert with these cross-domain communication mechanisms to mitigate script-injection attacks.

Safer Mashups: HTML Sanitization

IE8 exposes a new method on the window object named toStaticHTML. When a string of HTML is passed to this function, any potentially executable script constructs are removed before the string is returned. Internally, this function is based on the same technologies as the server-side Microsoft Anti-Cross Site Scripting Library mentioned previously.

So, for example, you can use toStaticHTML to help ensure that HTML received from a postMessage call cannot execute script, but can take advantage of basic formatting:

document.attachEvent('onmessage',function(e) { 
  if (e.domain == 'weather.example.com') {
      spnWeather.innerHTML = window.toStaticHTML(e.data);
  }
}

Calling:

window.toStaticHTML("This is some <b>HTML</b> with embedded script following... <script>alert('bang!');</script>!");

will return:

This is some <b>HTML</b> with embedded script following... !

Safer Mashups: JSON Sanitization

JavaScript Object Notation (JSON) is a lightweight string-serialization of a JavaScript object that is often used to pass data between components of a mashup. Unfortunately, many mashups use JSON insecurely, relying on the JavaScript eval method to “revive” JSON strings back into JavaScript objects, potentially executing script functions in the process. Security-conscious developers instead use a JSON-parser to ensure that the JSON object does not contain executable script, but there’s a performance penalty for this.

Internet Explorer 8 implements the ECMAScript 3.1 proposal for native JSON-handling functions (which uses Douglas Crockford’s json2.js API). The JSON.stringify method accepts a script object and returns a JSON string, while the JSON.parse method accepts a string and safely revives it into a JavaScript object. The new native JSON methods are based on the same code used by the script engine itself, and thus have significantly improved performance over non-native implementations. If the resulting object contains strings bound for injection into the DOM, the previously described toStaticHTML function can be used to prevent script injection.

The following example uses both JSON and HTML sanitization to prevent script injection:

<html>
<head><title>XDR+JSON Test Page</title>
<script>
if (window.XDomainRequest){
      var xdr1 = new XDomainRequest();
      xdr1.onload = function(){
           var objWeather = JSON.parse(xdr1.responseText);
           var oSpan = window.document.getElementById("spnWeather");
           oSpan.innerHTML = window.toStaticHTML("Tonight it will be <b>"
                             + objWeather.Weather.Forecast.Tonight + "</b> in <u>" 
                             + objWeather.Weather.City+ "</u>.");
      };
      xdr1.open("POST", "http://evil.weather.example.com/getweather.aspx");
      xdr1.send("98052");
}
</script></head>
<body><span id="spnWeather"></span></body>
</html>

…even if the weather service returns a malicious response:

HTTP/1.1 200 OK
Content-Type: application/json
XDomainRequestAllowed: 1

{"Weather": {
 
"City": "Seattle",
 
"Zip": 98052,
 
"Forecast": {
   
"Today": "Sunny", 
    "Tonight": "<script defer>alert('bang!')</script>Dark",
   
"Tomorrow": "Sunny"
 
}
}}

MIME-Handling Changes

Each type of file delivered from a web server has an associated MIME type (also called a “content-type”) that describes the nature of the content (e.g. image, text, application, etc). For compatibility reasons, Internet Explorer has a MIME-sniffing feature that will attempt to determine the content-type for each downloaded resource. In some cases, Internet Explorer reports a MIME type different than the type specified by the web server. For instance, if Internet Explorer finds HTML content in a file delivered with the HTTP response header Content-Type: text/plain, IE determines that the content should be rendered as HTML. Because of the number of legacy servers on the web (e.g. those that serve all files as text/plain) MIME-sniffing is an important compatibility feature.

Unfortunately, MIME-sniffing also can lead to security problems for servers hosting untrusted content. Consider, for instance, the case of a picture-sharing web service which hosts pictures uploaded by anonymous users. An attacker could upload a specially crafted JPEG file that contained script content, and then send a link to the file to unsuspecting victims. When the victims visited the server, the malicious file would be downloaded, the script would be detected, and it would run in the context of the picture-sharing site. This script could then steal the victim’s cookies, generate a phony page, etc.

To combat this problem, we’ve made a number of changes to Internet Explorer 8’s MIME-type determination code.

MIME-Handling: Restrict Upsniff

First, IE8 prevents “upsniff” of files served with image/* content types into HTML/Script. Even if a file contains script, if the server declares that it is an image, IE will not run the embedded script. This change mitigates the picture-sharing attack vector-- with no code changes on the part of the server. We were able to make this change by default with minimal compatibility impact because servers rarely knowingly send HTML or script with an image/* content type.

MIME-Handling: Sniffing Opt-Out

Next, we’ve provided web-applications with the ability to opt-out of MIME-sniffing. Sending the new authoritative=true attribute on the Content-Type HTTP response header prevents Internet Explorer from MIME-sniffing a response away from the declared content-type.

For example, consider the following HTTP-response:

HTTP/1.1 200 OK
Content-Length: 108
Date: Thu, 26 Jun 2008 22:06:28 GMT
Content-Type: text/plain; authoritative=true;

<html>
<body bgcolor="#AA0000">
This page renders as HTML source code (text) in IE8.
</body>
</html>

In IE7, the text is interpreted as HTML:

IE7 text interpreted as HTML

In IE8, the page is rendered in plaintext:

IE8 text rendered as plain text

Sites hosting untrusted content can use the authoritative attribute to ensure that text/plain files are not sniffed to anything else.

MIME-Handling: Force Save

Lastly, for web applications that need to serve untrusted HTML files, we have introduced a mechanism to help prevent the untrusted content from compromising your site’s security. When the new X-Download-Options header is present with the value noopen, the user is prevented from opening a file download directly; instead, they must first save the file locally. When the locally saved file is later opened, it no longer executes in the security context of your site, helping to prevent script injection.

HTTP/1.1 200 OK
Content-Length: 238
Content-Type: text/html
X-Download-Options: noopen

Content-Disposition: attachment; filename=untrustedfile.html

Save File Dialog

Taken together, these new Web Application Defenses enable the construction of much more secure web applications.

Local Browser Defenses

While Web Application attacks are becoming more common, attackers are always interested in compromising ordinary users’ local computers. In order to allow the browser to effectively enforce security policy to protect web applications, personal information, and local resources, attacks against the browser must be prevented. Internet Explorer 7 made major investments in this space, including Protected Mode, ActiveX Opt-in, and Zone Lockdowns. In response to the hardening of the browser itself, attackers are increasingly focusing on compromising vulnerable browser add-ons.

For Internet Explorer 8, we’ve made a number of investments to improve add-on security, reduce attack surface, and improve developer and user experience.

Add-on Security

We kicked off this security blog series with discussion of DEP/NX Memory Protection, enabled by default for IE8 when running on Windows Server 2008, Windows Vista SP1 and Windows XP SP3. DEP/NX helps to foil attacks by preventing code from running in memory that is marked non-executable. DEP/NX, combined with other technologies like Address Space Layout Randomization (ASLR), make it harder for attackers to exploit certain types of memory-related vulnerabilities like buffer overruns. Best of all, the protection applies to both Internet Explorer and the add-ons it loads. You can read more about this defense in the original blog post: IE8 Security Part I: DEP/NX Memory Protection.

In a follow-up post, Matt Crowley described the ActiveX improvements in IE8 and summarized the existing ActiveX-related security features carried over from earlier browser versions. The key improvement we made for IE8 is “Per-Site ActiveX,” a defense mechanism to help prevent malicious repurposing of controls. IE8 also supports non-Administrator installation of ActiveX controls, enabling domain administrators to configure most users without administrative permissions. You can get the full details about these improvements by reading: IE8 Security Part II: ActiveX Improvements. If you develop ActiveX controls, you can help protect users by following the Best Practices for ActiveX controls .

Protected Mode

Introduced in IE7 on Windows Vista, Protected Mode helps reduce the severity of threats to both Internet Explorer and extensions running in Internet Explorer by helping to prevent silent installation of malicious code even in the face of software vulnerabilities. For Internet Explorer 8, we’ve made a number of API improvements to Protected Mode to make it easier for add-on developers to control and interact with Protected Mode browser instances. You can read about these improvements in the Improved Protected Mode API Whitepaper.

For improved performance and application compatibility, by default IE8 disables Protected Mode in the Intranet Zone. Protected Mode was originally enabled in the Intranet Zone for user-experience reasons: when entering or leaving Protected Mode, Internet Explorer 7 was forced to create a new process and hence a new window.

IE7 new window prompt

Internet Explorer 8’s Loosely Coupled architecture enables us to host both Protected Mode and non-Protected Mode tabs within the same browser window, eliminating this user-experience annoyance. Of course, IE8 users and domain administrators have the option to enable Protected Mode for Intranet Zone if desired.

Application Protocol Prompt

Application Protocol handlers enable third-party applications (such as streaming media players and internet telephony applications) to directly launch from within the browser or other programs in Windows. Unfortunately, while this functionality is quite powerful, it presents a significant amount of attack surface, because some applications registered as protocol handlers may contain vulnerabilities that could be triggered from untrusted content from the Internet.

To help ensure that the user remains in control of their browsing experience, Internet Explorer 8 will now prompt before launching application protocols.

IE8 prompt prior to launching application protocols

To provide defense-in-depth, Application Protocol developers should ensure that they follow the Best Practices described on MSDN.

File Upload Control

Historically, the HTML File Upload Control (<input type=file>) has been the source of a significant number of information disclosure vulnerabilities. To resolve these issues, two changes were made to the behavior of the control.

To block attacks that rely on “stealing” keystrokes to surreptitiously trick the user into typing a local file path into the control, the File Path edit box is now read-only. The user must explicitly select a file for upload using the File Browse dialog.

IE8 read-only File Path box

Additionally, the “Include local directory path when uploading files” URLAction has been set to "Disable" for the Internet Zone. This change prevents leakage of potentially sensitive local file-system information to the Internet. For instance, rather than submitting the full path C:\users\ericlaw\documents\secret\image.png, Internet Explorer 8 will now submit only the filename image.png.

Social Engineering Defenses

As browser defenses have been improved over the last few years, web criminals are increasingly relying on social engineering attacks to victimize users. Rather than attacking the ever-stronger castle walls, attackers increasingly visit the front gate and simply request that the user trust them.

For Internet Explorer 8, we’ve invested in features that help the user make safe trust decisions based on clearly-presented information gathered from the site and trustworthy authorities.

Address Bar Improvements

Domain Highlighting is a new feature introduced in IE8 Beta 1 to help users more easily interpret web addresses (URLs). Because the domain name is the most security-relevant identifier in a URL, it is shown in black text, while site-controlled URL text like the query string and path are shown in grey text.

When coupled with other technologies like Extended Validation SSL certificates, Internet Explorer 8’s improved address bar helps users more easily ensure that they provide personal information only to sites they trust.

IE8 SSL Address Bar with Domain Highlighting

IE8 SmartScreen Filter Address Bar

SmartScreen® Filter

Internet Explorer 7 introduced the Phishing Filter, a dynamic security feature designed to warn users when they attempt to visit known-phishing sites. For Internet Explorer 8, we’ve built upon the success of the Phishing Filter feature (which blocks millions of phishing attacks per week) and developed the SmartScreen® Filter. The SmartScreen Filter goes beyond anti-phishing to help block sites that are known to distribute malware, malicious software which attempts to attack your computer or steal your personal information. SmartScreen works in concert with other technologies like Windows Defender and Windows Live OneCare to provide comprehensive protection against malicious software.

You can read more about the new SmartScreen Filter in my earlier post: IE8 Security Part III - The SmartScreen Filter.

Summary

Security is a core characteristic of trustworthy browsing, and Internet Explorer 8 includes major improvements to address the evolving web security landscape. While the bad guys are unlikely to ever just “throw in the towel,” the IE team is working tirelessly to help protect users and provide new ways to enhance web application security.

Please stay tuned to the IEBlog for more information on the work we’re doing in Privacy, Reliability, and Business Practices to build a trustworthy browser.

Onward to Beta-2 in August!

Eric Lawrence
Program Manager
Internet Explorer Security

Anne van Kesterenreboot10: Presentations

reboot10 has once again been a very interesting conference. I gave 15x20 micropresentation on HTML5 which was just a variant of my XTech 2008 lightning talk (I converted it to Apple Keynote and removed some slides, basically). The other presentation was Keeping the Web Free (slides) which thanks to tough competition was not quite as well attended as I’d hoped, but it went quite well.

WHATWG blogExperience the HTML5 parsing algorithm in the Live DOM Viewer

If you’ve investigated how browsers parse HTML, you’ve probably used Hixie’s Live DOM Viewer to see what happens. Wouldn’t it be cool, though, if you could experiment with the HTML5 parsing algorithm in the same UI? Well, now you can.

I was looking for a way to experiment with document.write() in the code base of the Validator.nu HTML Parser and I was looking for a way to let people see the parse tree output of the HTML5 parsing algorithm more easily. Instead of writing a test harness fully in Java, I thought it would be better to use the Live DOM Viewer and a browser engine as the test harness. The good news is that Google Web Toolkit makes it possible to put these pieces together, and the trunk of the Validator.nu HTML parser now comes with a document.write()-aware tokenizer driver and a tree builder subclass for GWT.

The bad news is that the Java-to-JavaScript compiler of GWT has a bug that blocks me from putting the result online as JavaScript. The Hosted Mode of GWT, works, though.

Here’s how you can run the Validator.nu HTML Parser in the Live DOM Viewer locally in the Hosted Mode of GWT (on Mac or Linux):

  1. Check out the source: svn co http://svn.versiondude.net/whattf/htmlparser/trunk/ htmlparser
  2. Download and untar GWT 1.5 RC1
  3. On Linux, install libstdc++5 and a JDK (Ubuntu's OpenJDK-based package worked for me).
  4. Edit the paths in HtmlParser-shell (Mac) or HtmlParser-linux (Linux) to point to the location of GWT.
  5. Run HtmlParser-shell (Mac) or HtmlParser-linux (Linux)

Known problems:

  • The Linux version of GWT runs an outdated version of Gecko, and the rendered view doesn't work. The DOM view does.
  • The Mac version of GWT runs a Web Inspector-enabled version of WebKit, but SVG does not draw.
  • document.write() semantics are right only for inline scripts.
  • Copying and pasting using keyboard shortcuts doesn’t work. (Use the context menu.)
  • On Linux, GTW prints a lot of harmless warnings about not finding annotations. (I don’t know why that happens. The annotations should be among translatables.)
  • Gecko (used by GTW on Linux) doesn't allow the creation of xmlns attributes in no namespace, so things stop working if you try to put an attribute called xmlns on HTML elements.
  • The DOM view on Linux doesn't report names with colons in them per the HTML5 spec.

(Aside: This code could have applicability beyond testing the parser. If the compiler bug were fixed or worked around, a script could document.write() a math element and an svg element to sniff if they are parsed according to HTML5 and if they aren't, move aside load event handlers, document.write() <plaintext style='display:none'>, wait until DOMContentLoaded, load the the already created html, head and body elements onto the tree builder stack and head pointer of the HTML5 parser to and reparse the content of the plaintext element as HTML5 and call the load event handlers. See Philip Taylor’s proof of concept with S-expressions.)

W3C Team blogThe War of the Worlds

Almost 70 years ago, on a Sunday, October 30, 1938, we could hear on a radio:

Ladies and gentlemen, we interrupt our program of dance music to bring you a special bulletin from the Intercontinental Radio News. At twenty minutes before eight, central time, Professor Farrell of the Mount Jennings Observatory, Chicago, Illinois, reports observing several explosions of incandescent gas, occurring at regular intervals on the planet Mars.

Recently on Monday, June 23, 2008, we could read on a radio site

hCalendar will be gone from /programmes by the next deploy (probably this Thursday).

In the meantime we'll be looking at the possible use of RDFa (a slightly bigger S semantic web technology similar to microformats but without some of the more unexpected side-effects).

What's common between the two? They created a big wave of reactions, comments and arguments: A war of the worlds.

microformats, RDFa and HTML 5

I would like to focus on two blog posts which I like in this flood of comments. There are many more interesting.

Ed Dumbill says in The BBC, microformats, RDFa and Resig:

One of the wonderful things Resig has done with JavaScript is take time to love it and figure out its corners. Take some of the "confusing" and "advanced" things away and you're not able to achieve the same things. What he's done in jQuery is add a layer of elegance, predictability and accessibility.

I for one would love to see what Resig would do with semantic markup. jQuery really encourages and enables good markup practices, so there's a lot of synergy with his current style.

Not only jQuery, I met once, John Resig in Tokyo. He was giving a talk about new features of the future Ecmascript. It was complex, not necessary easy to understand, but he made it in a way that was enlightning. We could see he had pleasure talking about it. That was refreshing. I decided to put it on the side of good speakers who are worth to go see again.

Then not so far ago, John ported Processing vizualization language to Javascript. I love graphics and information processing. It was yet again another moment of pleasure thinking "Some people have talents and creativity in their hands, they do beautiful things with complex objects."

The other blog post is in French and comment also about the affair. Damien Bonvillain is giving his take on RDFa and its simplicity:

In fact, RDFa defines only 5 new attributes (about, property, resource, datatype, typeof)

RDFa became a candidate recommendation last week. You can read the Primer or go to the RDFa wiki to learn a bit more about the technology. Yes, indeed, for some people it will need a bit of work to understand the concepts. But it took me time to learn HTML, and I don't really master Javascript, but people like John gave me the opportunity to simplify things by developping tools, libraries or authoring tools.

And HTML 5 in all that? Here again there is the story behind the story. The first version of RDFa was using a lot elements like meta and link in the body of a page. But browsers because of invalid markup found on the Web have to recover pages and put back the link and the meta in the head of the document. RDFa community listened and learned. They modified their model to make a step toward HTML 5, to create an environment that will create less interoperability issues. They made a step in the right direction to be able to work together.

Next week, I will show why it is important and how that can work even if not perfectly. But remember, it is because there are people like John Resig, who creates, that complex things become easy. The war of the worlds was a fiction.

Michael(tm) SmithURI error-handling in HTML5, and documenting the (real) Web vs. reinventing it

Ian Hickson, the editor of the current HTML5 draft, posted an Error handling in URIs message to the uri@w3.org mailing list outlining some issues related to browser error handling behaviour for URIs, and to IRIs and character encodings other than UTF-8 — and asking, “Is there any chance that the URI and IRI specifications might get updated to handle these issues?”.

That posting and question spawned some spirited discussion, with messages from Julian Reschke, Anne van Kesteren, Tim Bray, John Cowan, Frank Ellermann, and Martin Duerst, and provoking some comments like the following one:

That’s kind of what I said already, and why I guess that HTML5 will never fly: It tries to reinvent the Web, if not the Internet.

…and from Ian to the above, the following response:

Actually we’re trying to not reinvent the Web, but to document it, so that browser vendors can write browsers that handle existing Web content in a fashion compatible with legacy UAs without reverse-engineering each other.

(It’s true that this is requiring defining things that are at odds with existing specifications, but that’s mostly because those specifications aren’t in fact in line with real usage…)

W3C Team blogDocumenting the Web vs. reinventing it

Ian Hickson, the editor of the current HTML5 draft, posted an Error handling in URIs message to the uri@w3.org mailing list outlining some issues related to browser error handling behaviour for URIs, and to IRIs and character encodings other than UTF-8 — and asking, “Is there any chance that the URI and IRI specifications might get updated to handle these issues?”.

That posting and question spawned some spirited discussion, with messages from Julian Reschke, Anne van Kesteren, Tim Bray, John Cowan, Frank Ellermann, and Martin Duerst, and provoking some comments like the following one:

That’s kind of what I said already, and why I guess that HTML5 will never fly: It tries to reinvent the Web, if not the Internet.

…and from Ian to the above, the following response:

Actually we’re trying to not reinvent the Web, but to document it, so that browser vendors can write browsers that handle existing Web content in a fashion compatible with legacy UAs without reverse-engineering each other.

(It’s true that this is requiring defining things that are at odds with existing specifications, but that’s mostly because those specifications aren’t in fact in line with real usage…)

Sam RubyMinimalist Markup

While Ryan, James, and Mark have been pursing a minimalist design from a presentation perspective, I’ve been quietly pursuing a minimalist design from a markup perspective.  I’m not sure when it changed, but Firefox 3.0, Safari 3.1.1, and Opera 9.5 now all support units of em in SVG dimensions.

This means that my front page (under development) can be valid HTML5 and yet have absolutely no div or span elements, no inline style or class attributes, and no table or img elements used purely for layout purposes.

I have more work to do on individual post pages and on the archives.  The archives will continue to employ a table for the calendar.

Shawn MederoHTML 5 W3C Bugzilla summary for 6/15 - 6/21

Starting in June, the W3C HTML WG began using Bugzilla for tracking of detailed specification issues. The following is a summary of changes for the week of June 15th, 2008.

The following bugs were CLOSED:

If you are looking to help out, the following bugs are marked as NEEDINFO:

Please email any corrections to this summary to Shawn Medero: soypunk@gmail.com.

W3C Team blogUpdate of the RDFa distiller

Now that RDFa has been published as a Candidate Recommendation, it was time to make a new version of the RDFa distiller (ie, pyRdfa). The last update was done when RDFa went into Last Call; there has been some improvements since. Besides the (obvious) fact that the distiller follows the latest RDFa syntax, both the RDF/XML and the Turtle serializers went through serious changes: some of the earlier problems with the original serializers of RDFLib have been taken care of.

The most interesting new feature is, however, the distiller’s parser. By default, pyRdfa uses a standard Python XML parser. However, when invoked with the right option, it can also use a HTML5 parser which, after parsing HTML5 (or a non-XML HTML in general), returns simply a DOM Tree. The RDFa syntax is defined in terms of a simple DOM, i.e., the adaptation to the HTML5 parser worked essentially without problems (I want to thank Elias Torres who drew my attention to this browser and made the first steps towards its integration to pyRdfa). This also means that, using pyRdfa, RDFa attributes added to, e.g., non-XML HTML4.01 files would also yield the appropriate RDF graph.

pyRdfa is only one of many implementations of RDFa: the implementation report (which is not yet up to date!) lists already 9 independent implementations. Manu Sporny’s library is in C (and may become part of Dave Beckett’s Redland one day), Benjamin Nowack did one in PhP, Shane McCarron just finished one in Perl, Fabien Gandon did it in XSLT, Ben Adida in client side Javascript, and he also have one, I believe, in Ruby… We can be confident that all these implementations will pass, eventually, all the official tests; this is certainly the goal of all implementers. Actually, some of those implementations also implement the HTML5 parsing feature just like pyRdfa does. Not bad for a technology that has just entered Candidate Recommendation phase…

A number of pages under the W3C Semantic Web Activity are now in XHTML+RDFa, using the setup described elsewhere already. This is the case of the SW Activity Home page, various entries of the SW Use Cases and Case Studies’ collection (see, for example, one of the latets on Semantic Web and Social spaces), or my own talks pages like the talk I gave in Nancy last week. More will follow…

Anne van Kesterenreboot10: interview

My talk for reboot10 got accepted! Why is so great that the Web is free? In what ways is it free? What are we doing to keep it that way? I gave a slightly confused interview on this: Anne van Kesteren - Reboot 10 interview. The interview focused quite a bit on what is happening with HTML5 and how that is vastly different from traditional HTML. I’m planning for my actual talk to be less focused on that and more on why the Web is free, where we’re moving towards with Web applications, the Web in general, and how not to lock ourselves in proprietary solutions.

W3C Team blogUpdate of the RDFa distiller

Now that RDFa has been published as a Candidate Recommendation, it was time to make a new official version of the RDFa distiller (ie, pyRdfa). The last update was done when RDFa went into Last Call; there has been some improvements since. Besides the (obvious) fact that the distiller follows the latest RDFa syntax, both the RDF/XML and the Turtle serializers went through serious changes; some of the earlier problems with the original serializers of RDFLib have been taken care of. It is actually worth noting that this is only one of many implementations of RDFa: the implementation report (which is not yet up to date!) already lists 9 independent implementations, which is really great when just entering Candidate Recommendation phase…

The most interesting new feature is, however, the distiller’s parser. By default, pyRdfa uses a standard Python XML parser. However, when invoked with the right option, it can also use a HTML5 parser which, after parsing HTML5 (or a non-XML HTML in general), returns simply a DOM Tree. The RDFa syntax is defined in terms of a simple DOM, i.e., the adaptation to the HTML5 parser worked essentially without problems (I want to thank Elias Torres who drew my attention to this browser and made the first steps towards its integration to pyRdfa). This also means that, using pyRdfa, RDFa attributes added to, e.g., non-XML HTML4.01 files would also yield the appropriate RDF graph.

A number of pages under the W3C Semantic Web Activity are now in XHTML+RDFa, using the setup described elsewhere already. This is the case of the SW Activity Home page, various entries of the SW Use Cases and Case Studies’ collection (see, for example, one of the latests on Semantic Web and Social spaces), or my own talks pages like the talk I gave in Nancy last week.

WHATWG blogHTML5 Presentation at @media 2008

Lachlan Hunt and I recently gave a presentation entitled Getting Your Hands Dirty with HTML5 at the @media 2008 conference in London. The audience was mainly front-end developers; the kind of people who are using HTML to make a living, so it was a great chance to get the message out about some of the new features that have been under development.

The talk covered the Design Principles under which HTML5 is being developed, how some of the features of HTML5 can be used to enhance common web sites, and how people can get involved with the development of HTML5.

The presentation seemed to go reasonably well, especially given that we had not met till the morning of the talk although we did have fewer demos than I would have liked, both due to technical problems in the talk and a lack of time to prepare. So, for those who were at the talk (as well as those who were not), here are a somewhat random collection of demos of the HTML5 features we mentioned:

If anyone who saw the presentation is reading this and would like to provide constructive criticism on the talk, I would really appreciate it; giving talks is fun so it would be nice to get better at it :)

Sam RubyIntertwingly on Rails

Views: index, post, comments, archives

This clearly is just modest beginnings.  A snapshot of existing data.  Read-only views at this point.  No caching.

Technology is Rails 2.0.2 on SQLite3 using Phusion Passenger on Dreamhost.

Installation would have been a simple scp except for two issues: despite what it says in this list, the sqlite3-ruby gem does not appear to be installed.  And the current date on the machine appears to be Feb 15, 3155.

For the model part, I can’t quite bear to break with the idea of flat files yet, so the model consists of two tables: posts and comments, and each contain dates and file name parts only.  The remainder of the model is populated using an after_find hook from the flat files.

With my current Intertwingly, I had three views that had diverged over time, as well as a “partial” which contained the navigation bar.  The front page (and comments page) are clean XHTML5, individual posts are XHTML1, and the archives are based a layout that I used back when I was on Radio Userland.  In the Rails implementation, I have four views and a layout (index and comments becoming separate views).  Having a common layout encourages consistency, and you can see the difference in the archive view already.  More work needs to be done on the individual posts view.

The controller methods are positively pedestrian at this point.  They simply obtain the necessary information from the model, and then proceed to render the associated view.

This is but a modest beginning... allowing people to enter new comments, openid, implementing spam avoidance measures, automated extraction of excerpts, ... the list goes on and on.  But first, I plan to put this code under version control (probably git), and implement a test suite.

Lachlan Hunt@media 2008 Presentation Slides

I have published the slides from my @media 2008 presentation in London. Overall, I think the presentation went very well. James Graham and I managed to speak for almost an hour about HTML 5, even though we had only met in person for the first time about 2 hours before we presented, and only had 10 minutes to briefly rehearse.

All the presentations were recorded and the podcasts should be released some time in the near future, though I’m not sure exactly when.

Standards SuckEverything HTML5 but the kitchen sink

Pattern theorists have suggested Steve Faulkner will be hosting this show, but this is not the case. In fact, it’s Lachlan and I, Anne, again. With Marcos adding our awesome music.

HTML5 has recently been published again by the W3C and this podcast introduces the new features and some of the old. data-* attributes, ruby annotations (not programming), global tabindex attribute, et cetera.

Documents that published by the W3C are HTML5, HTML 5 differences from HTML 4, and HTML 5 Publication Notes.

W3C Team blogHTML 5 Publications

Three documents have been published for HTML 5 by the HTML Working Group.

In addition of these 3 documents, the HTML Working Group has also published a W3C Note on May 30, 2008 about Offline Web Applications. The abstract is quite clear:

HTML 5 contains several features that address the challenge of building Web applications that work while offline. This document highlights these features (SQL, offline application caching APIs as well as online/offline events, status, and the localStorage API) from HTML 5 and provides brief tutorials on how these features might be used to create Web applications that work offline.

If you had any particular questions about these documents, just leave a comment here. If you want to comment on the technologies, send a comment to the appropriate mailing-list public-html-comments@w3.org.

Shawn MederoW3C HTML WG Publishes Second HTML 5 Working Draft

Via the hardwork of WHATWG and the W3C, a second Working Draft is now available for HTML 5. The W3C consider a Working Drafts ready for review by the community and feedback is always welcome via public-html-comments@w3.org

There's a number of ways to view changes between the current and previous draft:

  1. Anne van Kesteren's HTML 5 differences from HTML 4, includes this section helpful section: HTML 5 Changelog

  2. Michael Smith's HTML 5 publication notes includes provides "supplmemental information" mostly relating to the differences between the previous and current drafts.

  3. If you'd like view every little change, you can also view the HTML marked-up (including color highlighted) diff version of the HTML 5 specification.

Shawn MederoW3C TAG Settles on ARIA Syntax for HTML 5

W3C Technical Architecture Group (TAG) passed down their recommendation and ended up supporting the original aria- solution present in HTML 5 and already implemented in several user-agents and JavaScript toolkits.

The TAG accepts that the most pragmatic short-term approach for WAI-PF is to go ahead and add attributes into the HTML5 spec, using names that begin "aria-" in liaison with the HTML-WG. This in no way endorses the use of the same attributes with other specs, or any XML specs, nor is this taken as being a solution for HTML versioning, HTML modularization, or HTML to XML conversions which are still open. Distributed extensibility remains an important goal for languages used on the Web, and for XML languages in particular. The TAG hopes to work with the community to strike the right balance between achieving that, and meeting the practical needs of the HTML community.

TAG had tried to push a more XML friendly namespace approach but met a lot of resistance from implementors concerned about DOM consistency.

WAI-ARIA is defined as:

Accessible Rich Internet Applications Suite (ARIA), defines a way to make Web content and Web applications more accessible to people with disabilities. It especially helps with dynamic content and advanced user interface controls developed with Ajax, HTML, JavaScript, and related technologies.

It is been implemented in several places including popular Javascript toolkits like DOJO and YUI. Additionally the IE8 beta ships with partial ARIA support and WebKit, one of last major UA to implement ARIA support, now has an initial implementation of the following ARIA roles: button, checkbox, heading, link, radio, textbox.

A List Apart has a crash-course for web developers interested in a higher level ARIA intro.

Steve Faulkner et alSucking on WCAG 2.0

While at @media I had the opportunity to meet up with Lachlan Hunt, who works at Opera and is a fellow W3C HTML5 working group member. He did a short interview with me for standardssuck.org, asking some questions about WCAG 2.0, the almost minted W3C specification, designed to provide guidance on how to build web sites and web applications that are accessible and usable by people with disabilities. Note: I had limited coherence at this point (at the end of the conference after a few drinks).

Lachlan made reference in his questions to Joe Clark’s criticisms of WCAG 2 in his article, from 2006 - To Hell with WCAG 2 . Below are some comments that Joe made in an interview with Jeremy Keith at @media 2007, after his article was published.

“Now, to their credit, the Web Content Accessibility Guidelines Working Group read and responded to absolutely every objection to the first draft of WCAG 2. Every single one, they didn’t skip any, and they tried to do something to address all of them… I read the two changes documents, which show that there has been tremendous improvement…”

Interview Transcript

Lachlan:
 Hi I’m Lachlan Hunt here for standardssuck.org. (gestures with his fist clenched except for the index and little fingers which are outstretched. ) That’s our sign. I’m here with Steve Faulkner from The Paciello Group, is that right?
Steve:
thats right.
Lachlan:
We’re here to talk about the WCAG 2 specification. So, tell us what is WCAG 2?
Steve:
WCAG 2 is the next generation of the web content accessibility guidelines.
Lachlan:
Okay and what does that involve? well what is accessibility? well what is web accessibility? what is the purpose of these guidelines?
Steve:
To help people make web sites and web applications accessible to people with disabilities. That’s my take on it anyway.
Lachlan:
How has it changed from WCAG 1? Is it an improvement or…
Steve:
Hopefully, it is definitely an improvement, they’ve taken seven years to do it and they have had lots and lots of input from lots and lots of people so hopefully there’s an improvement. The main difference is that it attempts to be a lot more technology neutral and also it attempts to deal with a lot of the issues that are more prevalent today. I mean, the web moved on since - what - it was nine year ago or whatever since they first came out ,1999, and the web has moved on a lot. the way people use the web and interact with the web has moved on and it has to come to terms with those challenges. I think in the end, although it is not a perfect document, there are still issues with it as far as cognitive disabilities are concerned (for example). It’s a big improvement on WCAG 1 and yeah, it contains a lot of good information about how to make web sites and web application accessible, which is what it tries to do.
Lachlan:
OK, Joe Clark wrote an article on a listapart called ‘To Hell with WCAG 2′ a year or two ago and he complained a lot about things in WCAG 2, he said the spec was basically unreadable and didin’t cover all these issues. What was you take on that?
Steve:
Well as i said previously, I think Joe had a lot of good points about the spec at that point, but it went through a couple of ‘last calls’ or whatever they’re called: processes within the W3C where people can, the public can make comment, and there was literally thousands of comments, and the comments that Joe made formed part of that and , one the issues you talked about, the use of language, they really worked on that so it is a lot easier to read. It’s more human readable than it was. I think some of these issues were peripheral or issues that clouded the strengths of the document itself, and once they cleaned those up it shined up quite well.
Lachlan:
OK, Joe also started the WCAG Samuri, shortly after he wrote that article and he has since released a lot of errata for the WCAG 1 specification, what did you think of those? Were they good and what did you think of the way they were developed, in secret?
Steve:
It was all a bit cloak and dagger, I mean, it was dramatic wasn’t it? A certain drama and melodrama about the whole thing. The resulting document was quite interstesting, there are a lot of good points, but I think, in a way, it has not had a huge impact, because it was a document that was developed outside the W3C and for better or worse the W3C has some credibility as far as these things are concerned and governments  and corporations around the world tend to take the WCAG guidelines as a benchmark, where as something such as the WCAG samuri may contain interesting or good information, is not going to be taken on as a benchmark. Where as I think, again WCAG 2 will.
Lachlan:
OK thanks steve, it has been a pleasure talking to you.
Steve:
Is that it?
Lachlan:
You’ve got more to say?
Steve:
Yes, i would like to say, it has been great to meet Lachlan in person, and its’s nice to be with a fellow Australian.
Lachlan:
Hey I forgot to say, we are here in London at the @media conference, yeah I met up with him yersterday and we decided to do this, so hope you enjoyed it. This has been Lachlan Hunt and Steve faulkner for StandardsSuck.org (makes the standardssuck gesture with hand).
Steve:
See Ya! (makes incorrect gesture with hand, three middle fingers extended) Oye, (then makes correct gesture).

 Further Reading:

Further Listening:

The eminent Patrick Lauke talks in depth about WCAG 2.

Steve Faulkner et alWAI-ARIA, it’s Easy - @media 2008

Last Friday I had the pleasure of presenting at @media 2008 on WAI-ARIA, the Web Accessibility Initiative Accessible Rich Internet Applications specification. The slides from the presentation WAI-ARIA It’s Easy are now available.

WHATWG blogOffline Web Applications

Since HTML5 is a large specification Ian and I, being encouraged by Dan Connolly from the W3C, wrote an introductory document to the offline Web application features in HTML5 — Offline Web Applications — which the W3C published earlier today. In summarized form, it explains the SQL API, the offline application cache API, and some of the related APIs, such as online and offline events.

WHATWG blogHTML 5 published as W3C First Public Working Draft!

Moments ago the joint effort of the W3C HTML WG and WHATWG resulted in publication of two documents in the W3C Technical Report space: HTML 5 and HTML 5 differences from HTML 4. I think I can safely say that the WHATWG community is very happy with the W3C publishing HTML 5 as a First Public Working Draft. Many thanks to all involved!

Dimitri GlazkovWrap Your Head Around Gears Workers


Google I/O was a continuous, three-thousand-person mind meld, so I talked and listened to a lot of people last week. And more often than not I discovered that Gears is still mostly perceived as some wand that you can somehow wave to make things go offline. Nobody is quite sure knows how, but everyone is quite sure it’s magic.

It takes a bit of time to accept that Gears is not a bottled offlinifier for sites. It takes even more time to accept that it’s not even an end-user tool, or something that just shims between the browser and the server and somehow saves you from needing to rethink how you approach the Web to take it offline. That the primitives, offered by Gears, are an enabling technology that gives you the capability to make completely new things happen on the Web, and it is your, developer’s task to apply it to solve problems, specific to your Web application.

And it’s probably the hardest to accept that there is no one-size-fits-all solution to the problem of taking your application offline. Not just because the solution may vary depending on what your Web application does, but also because the actual definition of the problem may change from site to site. And pretty much any way you slice it, the offline problem is um, hard.

It’s not surprising then that all this thinking often leaves behind a pretty cool capability of Gears: the workers. Honestly, workers and worker pools are like the middle child of Gears. Everybody kind of knows about them, but they’re prone to be left behind in an airport during a family vacation. Seems a bit unfair, doesn’t it?

I missed the chance to see Steven Saviano’s presentation on Google Docs, but during a hallway conversation, it appears that we share similar thoughts about Gears workers: it’s all about how you view them. The workers are not only for crunching heavy math in a separate thread, though that certainly is a good idea. The workers are also about boundaries and crossing them. With the cross-origin workers and the ability to make HTTP requests, it takes only a few mental steps to arrive at a much more useful pattern: the proxy worker.

Consider a simple scenario: your JavaScript application wants to consume content from another server (the vendor). The options are fairly limited at the moment — you either need a server-side proxy or use JSON(P). Neither solution is particularly neat, because the former puts undue burden on your server and the latter requires complete trust of another party.

Both approaches are frequently used today and mitigated by combinations of raw power or vendor’s karma. The upcoming cross-site XMLHttpRequest and its evil twin XDR will address this problem at the root, but neither is yet available in a released product. Even then, you are still responsible for parsing the content. Somewhere along the way, you are very likely to write some semblance of a bridge that translates HTTP requests and responses into methods and callbacks, digestible by your Web application.

This is where you, armed with the knowledge of the Gears API, should go: A-ha! Wouldn’t it be great if the vendor had a representative, who spoke JavaScript? We might just have a special sandbox for this fella, where it could respond to our requests, query the vendor, and pass messages back in. Yes, I am talking about a cross-origin worker that acts as a proxy between your Web application and the vendor.

As Steven points out at his talk (look for the sessions on YouTube soon — I saw cameras), another way to think of this relationship is the RPC model: the application and the vendor worker exchange messages that include procedure name, body, and perhaps even version and authentication information, if necessary.

Let’s imagine how it’ll work. The application sets up a message listener, loads the vendor worker, and sends out the welcome message (pretty much along the lines of the WorkerPool API Example):

// application.js:
var workerPool = google.gears.factory.create('beta.workerpool');
var vendorWorkerId;
// true when vendor and client both acknowledged each other
var engaged;
// set up application listener
workerPool.onmessage = function(a, b, message) {
  if (!message.sender != vendorWorkerId) {
    // not vendor, pass
    return;
  }
  if (!engaged) {
    if (message.text == 'READY') {
      engaged = true;
    }
    return;
  }
  processResponse(message);
}
vendorWorkerId = workerPool.createWorkerFromUrl(
                                'http://vendorsite.com/workers/vendor-api.js');
workerPool.sendMessage('WELCOME', vendorWorkerId);

As the vendor worker loads, it sets up its own listener, keeping an ear out for the WELCOME message, which is its way to hook up with the main worker:

// vendor-api.js:
var workerPool = google.gears.workerPool;
// allow being used across origin
workerPool.allowCrossOrigin();
var clientWorkerId;
// true when vendor and client acknowledged each other
var engaged;
// set up vendor listener
workerPool.onmessage = function(a, b, message) {
  if (!engaged) {
    if (message.text == 'WELCOME') {
      // handshake! now both parties know each other
      clientWorkerId = message.sender;
      workerPool.sendMessage('READY', clientWorkerId);
    }
    return;
  }
  // listen for requests
  processRequest(message);
}

As an aside, the vendor can also look at message.origin as an additional client validation measure, from simple are you on my subdomain checks to full-blown OAuth-style authorization schemes.

Once both application and the vendor worker acknowledge each other’s presence, the application can send request messages to the vendor worker and listen to responses. The vendor worker in turn listens to requests, communicates with the vendor server and sends the responses back to the server. Instead of being rooted in HTTP, the API now becomes a worker message exchange protocol. In which case the respective processing functions, processRequest and processResponse would be responsible for handling the interaction (caution, freehand pseudocoding here and elsewhere):

// vendor-api.js
function processRequest(message) {
  var o = toJson(message); // play safe here, ok?
  if (!o || !o.command) {
    // malformed message
    return;
  }
  switch(o.command)
    case 'public': // fetch all public entries
      // make a request to server, which fires specified callback on completion
      askServer('/api/feed/public', function(xhr) {
        var responseMessage = createResponseMessage('public', xhr);
        // send response back to the application
        workerPool.sendMessage(responseMessage, clientWorkerId);
      });
      break;
    // TODO: add more commands
  }
}

// application.js
function processResponse(message) {
  var o = toJson(message);
  if (!o || !o.command) {
    // malformed message
    return;
  }
  switch(o.command) {
    case 'public': // public entries received
      renderEntries(o.entries);
      break;
    // TODO: add more commands
  }
}

You could also wrap this into a more formal abstraction, such as the Pipe object that I developed for one of my Gears adventures.

Now the vendor goes, whoa! I have Gears. I don’t have to rely on dumb HTTP requests/responses. I can save a lot of bandwidth and speed things up by storing most current content in a local database, and only query for changes. And the fact that this worker continues to reside on my server allows me to continue improving it and offer new features, as long as the message exchange protocol remains compatible.

And so you and the vendor live happily ever after. But this is not the only happy ending to this story. In fact, you don’t even have to go to another server to employ the proxy model. The advantage of keeping your own server’s communication and synchronization plumbing in a worker is pretty evident once you realize that it doesn’t ever block UI and provides natural decoupling between what you’d consider the Model part of your application. You could have your application go offline and never realize it, because the proxy worker could handle both monitoring of the connection state and seamless switching between local storage and server data.

Well, this post is getting pretty long, and I am no Steve Yegge. Though there are still plenty of problems to solve (like gracefully degrading this proxy model to a non-Gears environment), I hope my rambling gave you some new ideas on how to employ the worker goodness in your applications and gave you enough excitement to at least give Gears a try.

Steve Faulkner et alARIA in HTML5 - video discussion

ARIA in HTML5 part 1 & ARIA in HTML5 part 2 - videos of Anne van Kesteren discussing ARIA and issues surrounding its implementation with Marcos Caceres,

Anne van KesterenHTML5: Ruby Annotations

About twenty-eight months ago I looked into ruby in HTML. Earlier today ruby support was added to HTML5. (The sole browser to support this remains Internet Explorer though, unfortunately.)

It would be really cool if people who use ruby already could take a look and provide feedback (public-html-comments@w3.org, whatwg@whatwg.org, or below). Thanks and all!

Anne van KesterenXTech 2008: Presentations

XTech was in Dublin this year and I had great fun hanging out with Marcos, collegues, and people we met over there. I didn’t attend many talks and one I did attend told me that the shortcuts I use in ECMAScript are bad and shouldn’t have been part of the language (semicolon insertion and optional braces, for instance). People say this over HTML as well. I guess I’m glad things evolve the way they do. Anyway, I also gave these two talks:

Both use some SVG in img that requires Opera 9.5 or some WebKit build, but then WebKit doesn’t quite work if you want to view it as presentation as it does not support the projection media type yet.

Standards SuckARIA in HTML5

Welcome to our two part discussion on ARIA (Accessible Rich Internet Applications) and HTML5.

During XTech2008, Anne and I had a chance to sit down and talk about the current state of the “ARIA in HTML5″ debate. Anne gives an overview of ARIA and the controversy over naming of ARIA attributes and makes some suggestions as to how the community can move forward.

Part 1:

Part 2:

Next week, Anne sits down with Lachlan Hunt, editor of the W3C’s Selectors API specification, to chat about implementations, open issues, and where that spec is headed.

(We will try to make the original video files available in due course. Flash sucks too, but it’s our best bet currently.)

IEBlogEnabling Mashups in Internet Explorer 8 with Cross Document Messaging

Hello, I’m Sunava Dutta and I’m the Program Manager focused on improving our AJAX scenarios in IE8. In this short post I’ll introduce you to a feature we’re implementing in the browser that enables safer mashups. The Same Origin Policy (SOP) requires that browsers prevent script from accessing the contents of another domain to prevent cross site script attacks. Web sites today, like Facebook and Live among others, allow users to drag and drop third party ‘gadgets’ or applications to their page. As the BBC News reports, there are many challenges to doing so safely. These components are usually embedded third party scripts. Unfortunately these third party scripts run with the same privileges as the parent page and can potentially access personal data, cookies and other credentials. Attempts are currently underway to secure such script based applications. Other forms of embedding applications exist such as inserting the gadget in an IFrame, however while these are more secure they can’t communicate with the page and aren’t as useful.

In order to allow rich mashup scenarios where components can exchange information and permissions with the parent page, the IE team and other members of the HTML 5.0 Working Group are developing a cross document messaging feature. Communication using strings is enabled by a postMessage method. Hosting pages or gadgets are advised to check the origin domain of the content before inserting it in its DOM. For more details, please refer to our MSDN Dev Center Article on cross document messaging.

Sunava Dutta
Program Manager

Edit: added "more" to last sentence in first paragraph

WHATWG blogHTML5 conformance checking in Vim

Kai Hendry has written an HTML filetype plugin for Vim that allows you to use Henri Sivonen’s Validator.nu conformance checking (validation) service remotely to check the contents of any HTML document you edit in Vim and determine if the document is HTML5-conformant (valid).

Here’s a screenshot of it in use (note that it links to a full-size image that's easier to read).

screen showing quickfix mode in Vim being used to edit an HTML file

The filetype plugin is also demo'ed in a screencast tutorial on editing Web applications that Kai has blogged about in a VIM IDE for Web applications posting on his blog (see the blog posting for a link to the video).

All that you need to do to install the Vim filetype plugin is to download the plugin source and save it into ~/.vim/ftplugin/html.vim. To use it to check a document, first do :make within Vim, then use :cope and :clist and :cnext and such to locate the errors (for more details, read the section of the Vim docs that relates to those commands.)

How and why it works

Vim has a set of “quickfix” commands that provide something that many development IDEs also have these days: A way to run a compiler or lint checker or other external tool on the contents of a file you are editing, and then to have any errors returned — along with the line and column numbers of the places in your file where the errors occur — as a list that you can then easily step through or jump through one-by-one and fix. It’s a very powerful feature.

Kai’s HTML filetype plugin provides a way to use Vim’s “quickfix” commands to do conformance checking of HTML5 files. The plugin is dead simple; it’s just two lines:

set makeprg=curl\ -s\ -F\ laxtype=yes\ -F\ parser=html5\
  \ -F\ level=error\ -F\ out=gnu\ -F\ doc=@%\
  http://validator.nu
set errorformat=\"%f\":%l.%c-%m

(Note that I've just wrapped the first line for the purpose of readability in this post.)

The makeprg option in the first line tells Vim what “make program” you want to use when checking HTML files. And the errorformat option in the second line tells Vim the expected format of error messages from that “make program” — so that it can parse the error messages to get the line and column numbers of the places in your file where the errors occur (the meanings of the various parts of the string used in that errorformat value are: %f, filename; %l, line number; %c, column number; %m, error message).

Interaction with Validator.nu

What Kai’s HTML filetype plugin does it to use as the “make program” the curl command-line HTTP client, and in turn, to have curl send a POST request to Validator.nu. The contents of that POST request are set by the parameters and values specified by the -F options passed to curl. Essentially what this does is to emulate what would happen if you used the form-based interface at the Validator.nu website to manually set the values of the various form fields in that interface. (Note that wget could be probably used here (with different options) to do the same thing.)

What Validator.nu does in return is to send a response with the list of errors — in a format that allows the list of errors to be easily parsed by tools that have built-in support (like Vim’s “quickfix”) for reading error lists that are in a regular format and doing something with them.

GNU-formatted error output

In this case, since the out=gnu parameter and value were passed to Validator.nu, the particular format in which Validator.nu returns the error list is the standard GNU error format that’s used by many applications (including that other editor, Emacs). This use case (enabling remote validation and error-evaluation with editing applications) is actually one of the main cases for which Henri added the GNU-formatted error-reporting option to Validator.nu.

Validator.nu + Vim = easy HTML5 conformance checking

The end result is that you get the error information back into Vim in a way that lets you more easily locate and fix the errors.

So setting just two options is all it takes in an editing application like Vim to enable Validator.nu to be used remotely like this (that is, to do integrated HTML5 conformance-checking and error-reporting within the editor). This seems to me to be a pretty good testament (another in a long list) to the utility of the Validator.nu service and to the foresight that’s gone into its design.

It guess it also says a lot about the utility of Vim and the foresight that’s gone into its design — but we all already know how great Vim is, right? :)

Lachlan Hunt@media 2008 Presentation

This year, I have the pleasure of presenting at @media in London. I will be presenting with my colleague James Graham, whom I’ve not yet met, but who I’ve known online through the WHATWG for a while now. Our talk, entitled Getting Your Hands Dirty with HTML5, will focus on how the HTMLWG and WHATWG are working to address the needs of authors and users, and demonstrate real use cases for the new features being introduced. We will also take a look at the remarkable community surrounding the effort and show just how easy it is for you to get involved.

Planet WebKitTop Secret, Hush Hush!

So, after Simon snitched on me and leaked highly sensitive information about my top-secret project, I guess it’s finally time to spill the beans.

Yes, it is true. For the past few months I have been semi-secretly working on taking over the world implementing support for the HTML5 video and audio el