W3C Technical Architecture Group (TAG) Face to Face Meeting - 7 March 2007 (Afternoon)

7 Mar 2007



Vincent_Quint, Tim_Berners-Lee, Stuart_Williams, T.V._Raman, Norm_Walsh, Rhys_Lewis, Noah_Mendelsohn, Henry_Thompons, Dan_Connolly
David, Orchard
Stuart Williams
Noah Mendelsohn


<Noah> scribenick: noah

<scribe> scribe: Noah Mendelsohn

date: 7 March 2007

Tag Soup (continued)

DC: I'm tempted to look at one particular issue: XHTML role and RDFa are both things people are trying to add without actually having to be in the loop with the main HTML work. Beneficiaries are people who want to create/use accessible AJAX applications, without having to get in the queue to get lots of features added to HTML. Are those good focal points for the discussion of extensibility?

<DanC_lap> ("which group?" is a non-trivial question)

TVR: The Web Accessibility group has taken an interesting tack. The role mechanism was initially proposed by me in the XHTML groups, and they are defining particular roles. There is now a question of how to do this in HTML, particularly given that the values of the attributes are QNames. What's being done could be viewed as a hack, or could be viewed as the way we should have done the transition. Make class="xxx;role", and there's a standard Javascript library that rewrites the DOM.

NM: So, the content is different if you're not Javascript enabled?

DC: Yes, but at least they've shown the need is serious.
... So, maybe we just say to the HTML WG: this is important, go standardize it.

TVR: The fact that value is a QName will definitely cause concern.
... I can do anything I want using <div> and <span>

RL: I think I read in the vision document http://www.w3.org/2007/03/vision.html that extensibility was not considered appropriate for the HTML working group goals.

<Zakim> Rhys, you wanted to say that DIAL is the way to look at that for authoring and to

<DanC_lap> (re role use cases, this deom is interesting http://www.mozilla.org/access/dhtml/checkbox . see also ...)

<Zakim> DanC_lap, you wanted to suggest that the cost of the quotes is dominated by the norms in your community

DC: Cost of putting quotes is not about the space taken, typically, but social issues, what people in your community accept, etc.
... Interesting you read extensibility as a non-goal for the HTML WG. I don't think I was trying to say that.

<timbl_> Noah: What I read the vision document to say is that two working groups will share a model.

<scribe> scribenick: rhys

NM: I read the vision document to say that there were two working groups that share a model.

<timbl_> DC: I expect them to share XML.

DanC: I don't expect the groups to share a model.

HT: who is responsible for XHTML 1.1 maintenance?

DanC: its a shared responsibility.

NM: (quotes the section of the vision document about the serialisations)

DanC: The two serialisations are in the HTML working group.

<scribe> scribenick: noah

NM: The vision document says:

"Instead, the charter calls for two equivalent serializations to be developed, corresponding to a single DOM (or infoset, though tag soup cannot be considered to have an infoset currently, while it can have a DOM). This ensures that decisions are not made which would not preclude an XML serialization. It allows the two serializations to be inter-converted automatically. Having new language features, there is an incentive for content authors to use it; and having client-side implementations means that there is the possibility to really use it."

NM: I read that as saying that for any given abstract document, at least in the 80% case, then DOM would be the same.

<Zakim> Noah, you wanted to ask about goals of for HTML

DC: No, it didn't mean to say that.

HT: I'm unclear in the vision document, when it says HTML without qualification, it means all of the several variants, vs. specifically one (scribe infers the tag soup version)

DC: We'll know in a few months.

<DanC_lap> (I hope)

(TimBL edits to say "Instead, the charter calls for two equivalent serializations to be developed by the HTML WG, "

Scribe's note: the final text of the vision document was still being revised while the TAG meeting was in progress. Tim noted above that he made an edit to help eliminate the ambiguity that had just been discussed.

RL: Before lunch we were talking about mobile, especially device independence markup. DIAL is one approach to solving that problem, not having to do redirects, having one representation, etc.

DC: What do we have on DIAL?

RL: There's a WD.

DC: Does it tell a story?

RL: There's a primer.

See http://www.w3.org/TR/dial-primer/

RL: It's XHTML2 + XForms + Some other modules

<timbl_> XHTML2 + XFORMS + soem other modules

<ht_mit> DIAL stands for Device Independent Authoring Language

RL: These stories are based on actual commercial usage, from vendors, network operators, content providers, etc.
... Dirk wishes to create a web site viewable on, say, any web device including mobile.

DC: Which kind of org. does he work for?

RL: Say, network operators, content partners (e.g. Disney/eBay), other sites. Maybe it helps you get ringtones for your device.
... Dirk writes some DIAL as his markup. It's constructed to avoid device dependency?

DC: He uses emacs?

RL: Probably DIAL-ware tools.

DC: Available?

RL: Yes, e.g. from Volantis (chuckle). [Volantis is Rhys' employer.]

DC: Direct manipulation?

RL: Typically mixed, xml editor with some help around the edges.
... If the markup is only the device independent stuff, then the device-specific stuff has to go somewhere, and has to be worth the incremental trouble.
... Example: companies may not trust transcoding of their logo images.
... So, there are ways of linking to the device dependent stuff. This is generic resources.
... The reference is device independent, but the infrastructure serves the right thing.

DC: Mostly deployed server-side?

RL: Yes, mostly.
... Opera mobile is an interesting example. It can do some level of rearrangement and transcoding on the device for a standard HTML page, but it can tend to be less successful insofar as the HTML they're starting with has already lost some information about the intent.
... Eventually, more will happen on the client, but there's a risk you send the device images, etc. it won't need.
... I see XHTML2 as being important for doing those things server-side.
... Forms is the main thing.

TVR: XHTML2 has some things like navigation lists, and the section stuff.

DC: How does section stuff work?

TVR: lets you open and close a tree.

RL: What's really crucial is the XML for extensibility, and BTW we'd like to do that using CDF.

<Zakim> timbl_, you wanted to ask about "Instead, the charter calls for two equivalent serializations to be developed for HTML"

<Zakim> ht_mit, you wanted to share some information about John Cowan's tagsoup project

HT: Reminding that I have 5-10 minutes of intro on tag soup and how it works.

<Zakim> Noah, you wanted to comment on server-side only XHTML2

<scribe> scribenick: rhys

NM: Rhys said that we care about this stuff on the server. The discussion changes when you move to the server. Insofar as we have these compositions only at the server, we've lost

<scribe> scribenick: noah

TVR: I disagree that this is similar to JSP or ASP pages, because those will never run on the client.
... Running it only on the server is a bootstrapping mechanism.
... I was several months ago against tag soup because it kills that story.
... The notion that it can move from server to client is what matters.

TBL: Lots of content is moved on the wire as part of the server-side business of assembling content.

NM: I agree. The risk is that, if tag soup is the only thing that can go beyond the servers, then you will only get composition and extensibility at the server, which indeed would be unfortunate.

RL: BTW, I've offered to talk in future on Uniquitous Web Applications Work.

TVR: Before lunch, we talked about writing a document about transition issues.

Raman shows a list of proposed topics:

<Raman> * TagSoup Issues

<Raman> This document will explore the issues that rise at the

<Raman> intersection of the TAG Soup and XML Web.

<Raman> As TagSoup evolves to enable incremental transition to XML, we

<Raman> identify individual differences in traditional XML 1.0

<Raman> serialization and TAgSoup, and for each such instance, enumerate

<Raman> the pros and cons (carrot vs stick)

<Raman> driving that issue, how it affects various issues of deployment,

<Raman> and who might benefit from us writing down such a document. In

<Raman> addition, it would be useful for the TAG to arrive at a pithy

<Raman> conclusion for each point analogous to the assertion

<Raman> - If you're interested in extensibility, use XML serialization.

<Raman> * Topic List

<Raman> 1. Quotes around attributes.

<Raman> 1. Example use cases.

<Raman> 2. Situations that justify deviation.

<Raman> 3. Possible drawbacks with use of this deviation.

<Raman> 4. Suggested best practice.

<Raman> 2. Some tags are special =img= doesn't need close tag.

<Raman> 3. XML or HTML serialization from /show source/

<Raman> 4. Cut and paste between HTML and XML

<Raman> 5. Points on the HTML TAGSoup <-> XML continuum.

<Raman> 6. Integration of SVG, MathML etc into Web pages

<Raman> 7. Integration of HTML into RSS, ATOM.

<Raman> 8. Connection and impact on one-web.

1. Quotes around attributes.

TBL: This is a bug

TVR: What I'd imagine is a matrix that says, e.g. if you don't put quotes around attributes, you won't be able to mix it with SVG, except that in this case you can clean things up. I'll refactor the list as you suggest.

HT: Missing end tags fall into 2-3 categories: known to be empty, in old SGML dtd were optional, were known not optional.

TBL: Unknown tags, possibly with namespaces.

<DanC_lap> the high-level things like "Integration of HTML into RSS, ATOM" are more appealing to me than "Quotes around attributes."

HT: Hierarchically: unknown start tag.
... Under that, unknown namespace qualified start tag.

TVR: And lest we forget, free floating end tags not corresponding to a start.

HT: This is a a good template, at least as a general model, but let's not fill it in in detail for now.

<DanC_lap> (I realize why I have angst around TAG discussion of missing quotes and end tags... all these great examples and nobody's capturing them for the test suite.)

TVR: For the first bullet I gave subcategories. Can you think of subcats. for others?

HT: Yes, I'd like to see something that says at least hypothetically: "best possible argument in favor -- why do people do this?"
... e.g., I'd guess that most missing ";" at end of entity references are just typos, but others are done with conviction.

SW: Question, am I right that this tag soup thing was not an intentional design, except as a consequence of the "be liberal in what you accept philosphy"?

HT: Not quite, the SGML DTD said "you may omit the following end tags..."

SW: In these charters, there's a common DOM, an XML serialization, and a tag soup serialization.

<timbl_> You could also omit quotes, no?

TVR: It's all well and good if you can clean up soupy input, but why would you reserialize as soup?

SW: Are we doing some of what the WG will do?

TVR: We are learning on our feet. What I want us to focus on is: how will anything we do in the soup world affect the intersection? I want to see ample communication with the TAG.

DC: The groups will do similar things, but with different focus and logistics.

NM: Some workgroups have been very effective in taking more time than is sometimes convenient to be very crisp about articulating use cases, getting everyone to agree on what was important about those use cases, and make sure the mechanisms supported the use cases. That, ideally, would be a good way to get people to make conscious decisions about where extensibility is of value and where not.

TVR: The functions and operators stuff was very well done that way, even though XForms didn't use it in the end.

TBL: One of the very important questions is whether valid XML with namespaces is a subset of the tag soup serialization.

DC: With namespaces?

TBL: Hmm, maybe using the default namespace.

TVR: Does it mean that a browser that consumes soup can necessarily consume valid XHTML with MathML?

TBL: Yes, especially if HTML is default namespace, and the math stuff may not render right.

TVR: There's debate about that.

TBL: Today what's happening is that they'll ignore the namespaces and the math markup, but the math content will render, perhaps messily.

DC: The question is not ignoring unknown tags, it's what can you get at from Javascript. Sure you can stick in namespace decls, but can you get at them from Javascript.

TVR: Yes, what's in the DOM.

NM: I was confused. You have now explained that in addition to the work being done XHTML2, the HTML WG will take responsibility for two serializtions, one XML-based and one soupy?

Several: yes.

NM: Thank you, I was confused. That's very helpful. I thought we had one serialization from HTML, one from XHTML. The clarification is: two from HTML itself, one soupy and one XML.

TBL: I think you'd probably need to use the XML serialization for namespace-qualified stuff.

DC: I'm not convinced folks in the HTML WG are fully bought into supporting namespaces at all.

HT: I think the existing drafts suggest it's possible.

<ht_mit> http://www.whatwg.org/specs/web-apps/current-work/

<ht_mit> HTML5 current draft

<ht_mit> Web Applications 1.0

<ht_mit> Working Draft — 6 March 2007

Working draft of HTML 5 (Web Applications 1.0): http://www.whatwg.org/specs/web-apps/current-work/

<ht_mit> http://home.ccil.org/~cowan/XML/tagsoup/tagsoup.pdf

From that: "Implementations that support XHTML5 must support some version of XML, as well as its corresponding namespaces specification, because XHTML5 uses an XML serialisation with namespaces. [XML] [XMLNAMES]"

We are discussing John Cowan's "TagSoup: A SAX parser in Java for nasty, ugly HTML" (http://home.ccil.org/~cowan/XML/tagsoup/tagsoup.pdf )

TVR: Recovers from lots of "errors" in the markup.

(The following is from the documentation on John Cowan's TagSoup):

"The HTML Scanner

• DOCTYPE declarations are ignored completely

• Consequently, external DTDs are not read

• Comments and processing instructions (ending in >, not ?>) are passed through to the application

• Entity references are expanded or turned into text"

"Element Rectification

• Rectification takes the incoming stream of starttags, end-tags, and character data and makes it well-structured

• TagSoup is essentially an HTML scanner plus a schema-driven element rectifier

• TagSoup uses its own schema language compiled into Schema objects"

"Parent Element Types

• Parent element types represent the most conservative possible parent of an element

• The schema gives a parent element type for each element type:

–The parent of BODY is HTML

–The parent of LI is UL

–The parent of #PCDATA is BODY"

HT: I think there's a meta annotation explicitly in the schema to declare the most conservative possible parent, at least in some cases where it can't be inferred.

DC: It's a bit like LEXX and YACC, in that there is a scanner table and a parser table you can fool with.

HT: But the fixups are built into the Java code, though key'd off the schema. It so happened that the very first document I looked at happened to be one that neither John's Tag Soup nor Dave Ragett's tidy could successfully handle. It was <center><tr> ... </center>. Both tools made the "mistake" of closing the table.

Scribe's note: in earlier informal discussions, it was observed that browsers ignore the <center>

HT: Possible fix is look ahead.

TVR: Maybe just throw away the <center> tag.

HT: Yes, you could probably do that with John's model. I'm thinking of something like a shift/reduce parser, but instead you get shift/ship-as-sax-event

<DanC_lap> (I wonder if ht is going to connect this to extensibility or something else beyond straight HTML parser design.)

HT: I'm experimenting with a system that uses John's tokenizer, and my own upper level. Wondering whether it can reconstruct the HTML 5 English language spec. Since that is sometimes described as a way of capturing the error recovery of today's browsers.

TVR: Sounds appealing. Trouble is likely to be where HTML 5 does backtracking...almost does an "unshift".

HT: I asked on the Tag Soup list whether John has a regression test suite. Elliotte suggested John has things, but got them from the Web, and it's likely there would be copyright problems in sharing it.

DC: Was waiting for you to relate this to extensibility. Our job is not to do a better job on HTML 5 than the WG is going to do.

HT: The TAG has a least power finding.

TVR: It's been suggested we should write a validator.

DC: Do you acknowledge that this could be seen as being rude, in that it's not our business as a workgroup to do this?

HT: Well, they have gone far down the road.

<scribe> scribenick: rhys

NM: I think there is a line to be walked and that we need to acknowledge Dan's concern about ownership. It's reasonable for people to be hesitant about the role of the TAG in this particular case, and others actually. The TAG should be careful and either contribute as individuals or learn from what is happening in particular working groups. It is appropriate for us to discuss all of this because it helps us learn the what the issues are.

<ht_mit> http://lists.w3.org/Archives/Public/www-tag/2006Oct/0062.html

<scribe> scribenick: Noah

HT: In the statement of tagSoupIntegration-54 it says "Treat it "as if" it had been processed by [some formalization of] 'tidy -asxhtml';". I feel I'm exploring that. I think the reasoning is closely related to least power, but trying to make the story as declarative as possible.

<Zakim> DanC_lap, you wanted to ask if ht's explorations suggest anything about the @role situation or other extensibility cases

DC: Any new insights into what to do about role attribute?

HT: Don't think so, the Tag Soup program predates that.

DC: Suppose I want to use something like role without waiting to go through HTML WG. Want to get at it from Javascript.

HT: Posit that it's not in the HTML spec. I don't know what he does with unknown attributes. Seems to me that you should be able to control that in the formalization of the mapping.

DC: Good thing to study. Also think about simpler stuff like SVG elements.

HT: I think those would be passed through. The philosophy of Tag Soup is to pass through when possible. I suspect he passes through.

NW: My experience a bit difference. I had trouble with a bunch of RDDL. It munged the namespace declarations.

HT: There's something about that, and a switch.

DC: Sam Ruby, Ian Hixie and others are building a parsing library and 200 tests.
... Something like 2% of web documents use <image> spelled that way.

<DanC_lap> http://code.google.com/p/html5lib/

<DanC_lap> or 0.2%

SW: We have 15 mins to go. We have a set of points Raman has set down, now need a strategy moving forward.

NM: What's the success criteria for the list Raman is working on?

<DanC_lap> esp http://html5lib.googlecode.com/svn/trunk/tests/

TVR: I would like it to be the place holder document for tag soup issue 54.

NM: And is it the list of answers to some question? Things to worry about?

DC: Potential table of contents for a document.

NM: Works for me.

TVR: And a framework to govern our work.

<DanC_lap> TVR asked for DanC to work on it with him. DanC agreed.

TVR: Happy to do an initial draft, as long as people view it as fodder for discussion, not something to shred.

<scribe> ACTION: T.V. Raman to draft initial discussion material on tag soup for discussion on 26 March, draft on the 19th or so.

TVR: Public or private?

NM: Public. Just make sure it's clear that we're trying to come up to speed, not tread on other peoples' toes.

Future meetings

SW: Next telcon will be on the 12th of March.

DC: regrets for the 12th.

<DanC_lap> I'm at risk for 12 March; travelling to SxSWi

SW: won't have time for agenda work until just after arriving.

DC: What about discussing XML chunk whatever.

NW: Was going to ask to just close it.
... xmlChunk-44 was an attempt to tackle deep equals for XML. I now think we can't do better than XML Functions and Operators.

TBL: No communication from us.

NW: We always write a note when closing the issue.

DC: Garbage collect or endorse the draft.

NW: collect it.

SW: Objections?


<scribe> ACTION: Norm to mark as abandoned the finding on deep equals and announce xmlChunk-44 is being closed without further action, with reason

RESOLUTION: close issue xmlChunk-44

<DanC_lap> sounds like 12 March call is cancelled

<DanC_lap> RESOLVED: to meet next 19 March

RESOLUTION: the next TAG teleconference will be on 19 March 2007.

SW: Adjourned

Summary of Action Items

[NEW] ACTION: Norm to mark as abandoned the finding on deep equals and announce xmlChunk-44 is being closed without further action, with reason
[NEW] ACTION: T.V. Raman to draft initial discussion material on tag soup for discussion on 26 March, draft on the 19th or so.
[End of minutes]

Minutes formatted by David Booth's scribe.perl version 1.128 (CVS log)
$Date: 2007/03/16 15:54:51 $