The War of the Worlds

« Shipbuilding (or, cruel to be kind) | Main | What Benevolent Dictator? »

The War of the Worlds

Almost 70 years ago, on a Sunday, October 30, 1938, we could hear on a radio:

Ladies and gentlemen, we interrupt our program of dance music to bring you a special bulletin from the Intercontinental Radio News. At twenty minutes before eight, central time, Professor Farrell of the Mount Jennings Observatory, Chicago, Illinois, reports observing several explosions of incandescent gas, occurring at regular intervals on the planet Mars.

Recently on Monday, June 23, 2008, we could read on a radio site

hCalendar will be gone from /programmes by the next deploy (probably this Thursday).

In the meantime we'll be looking at the possible use of RDFa (a slightly bigger S semantic web technology similar to microformats but without some of the more unexpected side-effects).

What's common between the two? They created a big wave of reactions, comments and arguments: A war of the worlds.

microformats, RDFa and HTML 5

I would like to focus on two blog posts which I like in this flood of comments. There are many more interesting.

Ed Dumbill says in The BBC, microformats, RDFa and Resig:

One of the wonderful things Resig has done with JavaScript is take time to love it and figure out its corners. Take some of the "confusing" and "advanced" things away and you're not able to achieve the same things. What he's done in jQuery is add a layer of elegance, predictability and accessibility.

I for one would love to see what Resig would do with semantic markup. jQuery really encourages and enables good markup practices, so there's a lot of synergy with his current style.

Not only jQuery, I met once, John Resig in Tokyo. He was giving a talk about new features of the future Ecmascript. It was complex, not necessary easy to understand, but he made it in a way that was enlightning. We could see he had pleasure talking about it. That was refreshing. I decided to put it on the side of good speakers who are worth to go see again.

Then not so far ago, John ported Processing vizualization language to Javascript. I love graphics and information processing. It was yet again another moment of pleasure thinking "Some people have talents and creativity in their hands, they do beautiful things with complex objects."

The other blog post is in French and comment also about the affair. Damien Bonvillain is giving his take on RDFa and its simplicity:

In fact, RDFa defines only 5 new attributes (about, property, resource, datatype, typeof)

RDFa became a candidate recommendation last week. You can read the Primer or go to the RDFa wiki to learn a bit more about the technology. Yes, indeed, for some people it will need a bit of work to understand the concepts. But it took me time to learn HTML, and I don't really master Javascript, but people like John gave me the opportunity to simplify things by developping tools, libraries or authoring tools.

And HTML 5 in all that? Here again there is the story behind the story. The first version of RDFa was using a lot elements like meta and link in the body of a page. But browsers because of invalid markup found on the Web have to recover pages and put back the link and the meta in the head of the document. RDFa community listened and learned. They modified their model to make a step toward HTML 5, to create an environment that will create less interoperability issues. They made a step in the right direction to be able to work together.

Next week, I will show why it is important and how that can work even if not perfectly. But remember, it is because there are people like John Resig, who creates, that complex things become easy. The war of the worlds was a fiction.

Filed by Karl Dubost on June 27, 2008 7:27 AM in HTML, Opinions and Editorial, Semantic Web, W3C Life
| Permalink | Comments (4) | TrackBacks (0)

Comments

Henri Sivonen # 2008-06-27

I think counting only 5 attributes as simplicity misses the main point of complexity: RDFa uses QNames in content (considered an anti-pattern by many—including me) and to resolve them, you need to know the namespace mapping context at each node.

It's not only an issue of HTML not having a concept of namespace mapping context traditionally or in HTML5 as drafted. While tracking the namespace mapping context on the application-level is feasible when the document tree doesn't change (e.g. when you compile an XSLT program), keeping track of the namespace mapping context becomes problematic in a browser environment where scripts can mutate the document tree over time.

For the problem at hand, HTML5 proposes the 'time' element as the solution. Unfortunately, the 'time' element is not part of HTML 4.01 and is, therefore, against microformat principles. But then, RDFa attributes weren't in HTML 4.01, either.

Damien B # 2008-06-28

Henri, the count of 5 attributes was only a reaction to the statement made by John Resig that RDFa introduced "many new attributes", and citing 3 of them, giving the image that it was a small part of the overwhelming number of new attributes.

"keeping track of the namespace mapping context becomes problematic in a browser environment where scripts can mutate the document tree over time."

The mutating tree problem is disconnected from the namespace mapping context problem. Right now, if you want to take that in account, you can throw away maybe 95% of the existing microformat parsers. The temporal model for interpreting inner metadata (µformat, RDFa, whatever...) is currently undefined. For instance, the Tails Export extension on Firefox is not refreshed automatically on tree mutation, and it doesn't support the "include pattern" mandated by hReview. The Operator Firefox extension does not seem to support hReview or hResume at all, so it's difficult to know how it would handle the "include pattern" in the case of a tree modification (other modifications are reflected on-the-fly).

My point is: so far, when there is scripting manipulation of the DOM, there are already problems for the existing in-browser microformat interpreters. As such, we can read in "RDFa in XHTML: Syntax and Processing" §5.5 : "In other words, XHTML processing rules must still be applied, even if document processing takes place in a non-HTML environment such as a search indexer.", which shows that those kind of metadata must be usable without client-side scripting support (which does not mean that we should not have that kind of metadata targeted to a browser environment).

Now, how is the handling of the namespace mapping context on a mutating tree hard? It basically is a cascading problem, and DOM3 appendix B is frozen since more than four years ago. You say in the pamphlet "namespaces considered harmful": "I wonder how many hours in my life has been wasted looking up namespace URIs for copying and pasting". I wonder how many hours of my life has been wasted looking up from where my CSS styles were coming from and why the selectors didn't work as I expected. Meanwhile, I didn't contribute any line to a text named "C in CSS considered harmful". I don't see how people writing CSS handling code could fail to tackle the namespace mapping problem, it is just beyond me.

"For the problem at hand [...]. But then, RDFa attributes weren't in HTML 4.01, either."

So it's fine, because the pages at hand, BBC/programmes, are not HTML 4.01. They are not XHTML 1.1 either, but a switch from XHTML 1.0 strict to it is not a huge step. But then again, the problems are: can I represent my metadata? is it accessible? The microformat's way for the problem at hand is not accessible. Is the "time" element a solution, even as a hack? It could be, but by violating every known microformat parser implementation, it's kind of defeating the purpose. Furthermore, it would put the constraint on having a mandatory "datetime" attribute on the time element, since we can not expect microformat parsers to talk to the DOM (maybe it's so in HTML5?).

Henri Sivonen # 2008-06-30

The mutating tree problem is disconnected from the namespace mapping context problem.

They are interrelated in a browser context.

Right now, if you want to take that in account, you can throw away maybe 95% of the existing microformat parsers.

Obviously, the tree mutation case is not applicable to microformat parsers that don't run inside a browser and don't have another means of executing scripts.

That the problem is inapplicable to RDFa consumers outside the browser is not the point. The point is that microformats and a metaformat positioned as a microformat replacement should work robustly inside a browser as well.

The temporal model for interpreting inner metadata (µformat, RDFa, whatever...) is currently undefined.

That's bad. (As far as undefined things go, the main issue I take with microformats is that the microformats community doesn't provide a document conformance spec and a processing spec on the HTML5 level of detail.)

For instance, the Tails Export extension on Firefox is not refreshed automatically on tree mutation,

That seems inconvenient especially for microformats that are particularly suited for in-browser consumption and applicable to ajaxy use cases, such as hCard and hCalendar that one would want to be UI-sensitive for transferring into an address book or calendar app.

and it doesn't support the "include pattern" mandated by hReview. The Operator Firefox extension does not seem to support hReview or hResume at all, so it's difficult to know how it would handle the "include pattern" in the case of a tree modification (other modifications are reflected on-the-fly).

hReview and hResume don't make as much sense for in-browser support as hCard and hCalendar. hReview and hResume target content aggregators.

My point is: so far, when there is scripting manipulation of the DOM, there are already problems for the existing in-browser microformat interpreters.

If script manipulation is already a problem, does it make sense to make the problem worse?

Now, how is the handling of the namespace mapping context on a mutating tree hard? It basically is a cascading problem, and DOM3 appendix B is frozen since more than four years ago.

That more code than no code. And what benefit do you get from the layer of indirection that Namespaces is at the end of the day?

You say in the pamphlet "namespaces considered harmful": "I wonder how many hours in my life has been wasted looking up namespace URIs for copying and pasting".

I wasn't aware that I was being quoted on the microformats wiki. Thanks for letting me know.

I wonder how many hours of my life has been wasted looking up from where my CSS styles were coming from and why the selectors didn't work as I expected. Meanwhile, I didn't contribute any line to a text named "C in CSS considered harmful". I don't see how people writing CSS handling code could fail to tackle the namespace mapping problem, it is just beyond me.

Because the CSS cascade provides more value than the indirection Namespaces provide? Also, the people who implement the CSS cascade and the people who implement metadata scaping are not the same people.

Furthermore, citing another case where values propagate in the tree (CSS, xml:lang, base URI, etc.) doesn't make QNames in content less brittle in the face of DOM manipulation.

"For the problem at hand [...]. But then, RDFa attributes weren't in HTML 4.01, either."

So it's fine, because the pages at hand, BBC/programmes, are not HTML 4.01. They are not XHTML 1.1 either, but a switch from XHTML 1.0 strict to it is not a huge step.

HTML 4.01 vs. XHTML 1.0 vs. XHTML 1.1 is irrelevant as far as the validation point goes. Neither HTML5 'time' nor RDFa is valid in any of them.

It could be, but by violating every known microformat parser implementation, it's kind of defeating the purpose.

If you want to use something other than the abbr design pattern and you want the result to work with existing software that only works with the abbr design pattern, there's nowhere you can go. (RDFa doesn't work with every existing microformat parser, either.)

Furthermore, it would put the constraint on having a mandatory "datetime" attribute on the time element, since we can not expect microformat parsers to talk to the DOM (maybe it's so in HTML5?).

I don't follow.

Damien B # 2008-07-04

Sorry for the late answer...

For the sake of concision, I will name "in-browser" the use case where the metadata interpreter is executed inside a web browser, and supposed to be written in javascript talking to the DOM; "standalone" is the use case where the metadata interpreter does not use a web browser environnement and especially, does its own parsing of the document and does not interpret the script elements.

The mutating tree problem is disconnected from the namespace mapping context problem. They are interrelated in a browser context.

Basically everything it interrelated in a browser context. But for "standalone", it has no meaning. And for "in-browser", namespace mapping does not raise specific problems with relation to mutating tree: algorithms exist already.

The point is that microformats and a metaformat positioned as a microformat replacement should work robustly inside a browser as well.

From a strict robustness point of view, I don't see how a XML namespace based solution is less robust than a "magical CSS class name" based solution. Now, current web browsers are indeed poor fits for anything labelled "robust" at that time regarding standards, and especially the XML related ones.

hReview and hResume don't make as much sense for in-browser support as hCard and hCalendar. hReview and hResume target content aggregators.

A microformat should work robustly inside a browser as well. hCard and hCalendar make sense in "in-browser" today because there are standard standalone formats to represent them, and for displaying a normalized representation. For hReview, since it expressely ignores HR-XML, only the second use case remains for "in-browser", which is still very valid. Aside from that, the "include pattern" is the key to represent more complex graphs of data in µformat.

My point is: so far, when there is scripting manipulation of the DOM, there are already problems for the existing in-browser microformat interpreters. If script manipulation is already a problem, does it make sense to make the problem worse?

Worse compared to what? It sounds like XML namespaces is itself a fatality. I agree that the DOM Level 2 support in the current web browsers (and especially IE) is not up to the standard; but that does not explain why you dismiss the concept altogether. It's related to the next point.

Also, the people who implement the CSS cascade and the people who implement metadata scaping are not the same people.

For "in-browser", we could expect that the people who implement the CSS cascade and those working on the DOM interfaces work at least as a team. For "standalone", I think there is no problem as of today finding an HTML normalizer + DOM Level 2 (if we want a brute force approach working on a non-standardized HTML 4 + RDFa).

That more code than no code. And what benefit do you get from the layer of indirection that Namespaces is at the end of the day?

Last time I checked, you used Java Namespaces in your code, willingly, and you seem to be alive from that "more code". And you use "CSS Namespaces" as well in your pages (aka descendant selector applied to an ID selector). So, what are the benefits in qualifying a name? To me, it lies in the robustness aspect.

Furthermore, citing another case where values propagate in the tree (CSS, xml:lang, base URI, etc.) doesn't make QNames in content less brittle in the face of DOM manipulation.

Once again, strictly speaking, I don't see anything brittle in Dom Level 3 Appendix B. But maybe I miss some piece of information. And, again, you have the same propagation mechanism in µformat as well (after all, they form a hierarchy).

To conclude, it seems that your position is that XML Namespace are not a robust mechanism in face of tree manipulations, and there I disagree.

Note: this blog is intended to foster polite on-topic discussions. Comments failing these requirements and spam will not get published. Please, enter your real name and email address. Every individual comment is reviewed by the W3C staff. This may take some time, thank you for your patience.

You can use the following HTML markup (a href, b, i, br/, p, strong, em, ul, ol, li, blockquote, pre) and/or Markdown syntax.

W3C Blog