<Noah> scribenick: noah
<scribe> scribe: Noah Mendelsohn
date: 7 March 2007
DC: I'm tempted to look at one particular issue: XHTML role and RDFa are both things people are trying to add without actually having to be in the loop with the main HTML work. Beneficiaries are people who want to create/use accessible AJAX applications, without having to get in the queue to get lots of features added to HTML. Are those good focal points for the discussion of extensibility?
<DanC_lap> ("which group?" is a non-trivial question)
DC: Yes, but at least they've shown
the need is serious.
... So, maybe we just say to the HTML WG: this is important, go standardize it.
TVR: The fact that value is a QName
will definitely cause concern.
... I can do anything I want using <div> and <span>
RL: I think I read in the vision document http://www.w3.org/2007/03/vision.html that extensibility was not considered appropriate for the HTML working group goals.
<Zakim> Rhys, you wanted to say that DIAL is the way to look at that for authoring and to
<DanC_lap> (re role use cases, this deom is interesting http://www.mozilla.org/access/dhtml/checkbox . see also ...)
<Zakim> DanC_lap, you wanted to suggest that the cost of the quotes is dominated by the norms in your community
DC: Cost of putting quotes is not
about the space taken, typically, but social issues, what people in
your community accept, etc.
... Interesting you read extensibility as a non-goal for the HTML WG. I don't think I was trying to say that.
<timbl_> Noah: What I read the vision document to say is that two working groups will share a model.
<scribe> scribenick: rhys
NM: I read the vision document to say that there were two working groups that share a model.
<timbl_> DC: I expect them to share XML.
DanC: I don't expect the groups to share a model.
HT: who is responsible for XHTML 1.1 maintenance?
DanC: its a shared responsibility.
NM: (quotes the section of the vision document about the serialisations)
DanC: The two serialisations are in the HTML working group.
<scribe> scribenick: noah
NM: The vision document says:
"Instead, the charter calls for two equivalent serializations to be developed, corresponding to a single DOM (or infoset, though tag soup cannot be considered to have an infoset currently, while it can have a DOM). This ensures that decisions are not made which would not preclude an XML serialization. It allows the two serializations to be inter-converted automatically. Having new language features, there is an incentive for content authors to use it; and having client-side implementations means that there is the possibility to really use it."
NM: I read that as saying that for any given abstract document, at least in the 80% case, then DOM would be the same.
<Zakim> Noah, you wanted to ask about goals of for HTML
DC: No, it didn't mean to say that.
HT: I'm unclear in the vision document, when it says HTML without qualification, it means all of the several variants, vs. specifically one (scribe infers the tag soup version)
DC: We'll know in a few months.
<DanC_lap> (I hope)
(TimBL edits to say "Instead, the charter calls for two equivalent serializations to be developed by the HTML WG, "
Scribe's note: the final text of the vision document was still being revised while the TAG meeting was in progress. Tim noted above that he made an edit to help eliminate the ambiguity that had just been discussed.
RL: Before lunch we were talking about mobile, especially device independence markup. DIAL is one approach to solving that problem, not having to do redirects, having one representation, etc.
DC: What do we have on DIAL?
RL: There's a WD.
DC: Does it tell a story?
RL: There's a primer.
RL: It's XHTML2 + XForms + Some other modules
<timbl_> XHTML2 + XFORMS + soem other modules
<ht_mit> DIAL stands for Device Independent Authoring Language
RL: These stories are based on actual
commercial usage, from vendors, network operators, content
... Dirk wishes to create a web site viewable on, say, any web device including mobile.
DC: Which kind of org. does he work for?
RL: Say, network operators, content
partners (e.g. Disney/eBay), other sites. Maybe it helps you get
ringtones for your device.
... Dirk writes some DIAL as his markup. It's constructed to avoid device dependency?
DC: He uses emacs?
RL: Probably DIAL-ware tools.
RL: Yes, e.g. from Volantis (chuckle). [Volantis is Rhys' employer.]
DC: Direct manipulation?
RL: Typically mixed, xml editor with
some help around the edges.
... If the markup is only the device independent stuff, then the device-specific stuff has to go somewhere, and has to be worth the incremental trouble.
... Example: companies may not trust transcoding of their logo images.
... So, there are ways of linking to the device dependent stuff. This is generic resources.
... The reference is device independent, but the infrastructure serves the right thing.
DC: Mostly deployed server-side?
RL: Yes, mostly.
... Opera mobile is an interesting example. It can do some level of rearrangement and transcoding on the device for a standard HTML page, but it can tend to be less successful insofar as the HTML they're starting with has already lost some information about the intent.
... Eventually, more will happen on the client, but there's a risk you send the device images, etc. it won't need.
... I see XHTML2 as being important for doing those things server-side.
... Forms is the main thing.
TVR: XHTML2 has some things like navigation lists, and the section stuff.
DC: How does section stuff work?
TVR: lets you open and close a tree.
RL: What's really crucial is the XML for extensibility, and BTW we'd like to do that using CDF.
<Zakim> timbl_, you wanted to ask about "Instead, the charter calls for two equivalent serializations to be developed for HTML"
<Zakim> ht_mit, you wanted to share some information about John Cowan's tagsoup project
HT: Reminding that I have 5-10 minutes of intro on tag soup and how it works.
<Zakim> Noah, you wanted to comment on server-side only XHTML2
<scribe> scribenick: rhys
NM: Rhys said that we care about this stuff on the server. The discussion changes when you move to the server. Insofar as we have these compositions only at the server, we've lost
<scribe> scribenick: noah
TVR: I disagree that this is similar
to JSP or ASP pages, because those will never run on the
... Running it only on the server is a bootstrapping mechanism.
... I was several months ago against tag soup because it kills that story.
... The notion that it can move from server to client is what matters.
TBL: Lots of content is moved on the wire as part of the server-side business of assembling content.
NM: I agree. The risk is that, if tag soup is the only thing that can go beyond the servers, then you will only get composition and extensibility at the server, which indeed would be unfortunate.
RL: BTW, I've offered to talk in future on Uniquitous Web Applications Work.
TVR: Before lunch, we talked about writing a document about transition issues.
Raman shows a list of proposed topics:
<Raman> * TagSoup Issues
<Raman> This document will explore the issues that rise at the
<Raman> intersection of the TAG Soup and XML Web.
<Raman> As TagSoup evolves to enable incremental transition to XML, we
<Raman> identify individual differences in traditional XML 1.0
<Raman> serialization and TAgSoup, and for each such instance, enumerate
<Raman> the pros and cons (carrot vs stick)
<Raman> driving that issue, how it affects various issues of deployment,
<Raman> and who might benefit from us writing down such a document. In
<Raman> addition, it would be useful for the TAG to arrive at a pithy
<Raman> conclusion for each point analogous to the assertion
<Raman> - If you're interested in extensibility, use XML serialization.
<Raman> * Topic List
<Raman> 1. Quotes around attributes.
<Raman> 1. Example use cases.
<Raman> 2. Situations that justify deviation.
<Raman> 3. Possible drawbacks with use of this deviation.
<Raman> 4. Suggested best practice.
<Raman> 2. Some tags are special =img= doesn't need close tag.
<Raman> 3. XML or HTML serialization from /show source/
<Raman> 4. Cut and paste between HTML and XML
<Raman> 5. Points on the HTML TAGSoup <-> XML continuum.
<Raman> 6. Integration of SVG, MathML etc into Web pages
<Raman> 7. Integration of HTML into RSS, ATOM.
<Raman> 8. Connection and impact on one-web.
1. Quotes around attributes.
TBL: This is a bug
TVR: What I'd imagine is a matrix that says, e.g. if you don't put quotes around attributes, you won't be able to mix it with SVG, except that in this case you can clean things up. I'll refactor the list as you suggest.
HT: Missing end tags fall into 2-3 categories: known to be empty, in old SGML dtd were optional, were known not optional.
TBL: Unknown tags, possibly with namespaces.
<DanC_lap> the high-level things like "Integration of HTML into RSS, ATOM" are more appealing to me than "Quotes around attributes."
HT: Hierarchically: unknown start
... Under that, unknown namespace qualified start tag.
TVR: And lest we forget, free floating end tags not corresponding to a start.
HT: This is a a good template, at least as a general model, but let's not fill it in in detail for now.
<DanC_lap> (I realize why I have angst around TAG discussion of missing quotes and end tags... all these great examples and nobody's capturing them for the test suite.)
TVR: For the first bullet I gave subcategories. Can you think of subcats. for others?
HT: Yes, I'd like to see something
that says at least hypothetically: "best possible argument in favor
-- why do people do this?"
... e.g., I'd guess that most missing ";" at end of entity references are just typos, but others are done with conviction.
SW: Question, am I right that this tag soup thing was not an intentional design, except as a consequence of the "be liberal in what you accept philosphy"?
HT: Not quite, the SGML DTD said "you may omit the following end tags..."
SW: In these charters, there's a common DOM, an XML serialization, and a tag soup serialization.
<timbl_> You could also omit quotes, no?
TVR: It's all well and good if you can clean up soupy input, but why would you reserialize as soup?
SW: Are we doing some of what the WG will do?
TVR: We are learning on our feet. What I want us to focus on is: how will anything we do in the soup world affect the intersection? I want to see ample communication with the TAG.
DC: The groups will do similar things, but with different focus and logistics.
NM: Some workgroups have been very effective in taking more time than is sometimes convenient to be very crisp about articulating use cases, getting everyone to agree on what was important about those use cases, and make sure the mechanisms supported the use cases. That, ideally, would be a good way to get people to make conscious decisions about where extensibility is of value and where not.
TVR: The functions and operators stuff was very well done that way, even though XForms didn't use it in the end.
TBL: One of the very important questions is whether valid XML with namespaces is a subset of the tag soup serialization.
DC: With namespaces?
TBL: Hmm, maybe using the default namespace.
TVR: Does it mean that a browser that consumes soup can necessarily consume valid XHTML with MathML?
TBL: Yes, especially if HTML is default namespace, and the math stuff may not render right.
TVR: There's debate about that.
TBL: Today what's happening is that they'll ignore the namespaces and the math markup, but the math content will render, perhaps messily.
TVR: Yes, what's in the DOM.
NM: I was confused. You have now explained that in addition to the work being done XHTML2, the HTML WG will take responsibility for two serializtions, one XML-based and one soupy?
NM: Thank you, I was confused. That's very helpful. I thought we had one serialization from HTML, one from XHTML. The clarification is: two from HTML itself, one soupy and one XML.
TBL: I think you'd probably need to use the XML serialization for namespace-qualified stuff.
DC: I'm not convinced folks in the HTML WG are fully bought into supporting namespaces at all.
HT: I think the existing drafts suggest it's possible.
<ht_mit> HTML5 current draft
<ht_mit> Web Applications 1.0
<ht_mit> Working Draft — 6 March 2007
Working draft of HTML 5 (Web Applications 1.0): http://www.whatwg.org/specs/web-apps/current-work/
From that: "Implementations that support XHTML5 must support some version of XML, as well as its corresponding namespaces specification, because XHTML5 uses an XML serialisation with namespaces. [XML] [XMLNAMES]"
We are discussing John Cowan's "TagSoup: A SAX parser in Java for nasty, ugly HTML" (http://home.ccil.org/~cowan/XML/tagsoup/tagsoup.pdf )
TVR: Recovers from lots of "errors" in the markup.
(The following is from the documentation on John Cowan's TagSoup):
"The HTML Scanner
• DOCTYPE declarations are ignored completely
• Consequently, external DTDs are not read
• Comments and processing instructions (ending in >, not ?>) are passed through to the application
• Entity references are expanded or turned into text"
• Rectification takes the incoming stream of starttags, end-tags, and character data and makes it well-structured
• TagSoup is essentially an HTML scanner plus a schema-driven element rectifier
• TagSoup uses its own schema language compiled into Schema objects"
"Parent Element Types
• Parent element types represent the most conservative possible parent of an element
• The schema gives a parent element type for each element type:
–The parent of BODY is HTML
–The parent of LI is UL
–The parent of #PCDATA is BODY"
HT: I think there's a meta annotation explicitly in the schema to declare the most conservative possible parent, at least in some cases where it can't be inferred.
DC: It's a bit like LEXX and YACC, in that there is a scanner table and a parser table you can fool with.
HT: But the fixups are built into the Java code, though key'd off the schema. It so happened that the very first document I looked at happened to be one that neither John's Tag Soup nor Dave Ragett's tidy could successfully handle. It was <center><tr> ... </center>. Both tools made the "mistake" of closing the table.
Scribe's note: in earlier informal discussions, it was observed that browsers ignore the <center>
HT: Possible fix is look ahead.
TVR: Maybe just throw away the <center> tag.
HT: Yes, you could probably do that with John's model. I'm thinking of something like a shift/reduce parser, but instead you get shift/ship-as-sax-event
<DanC_lap> (I wonder if ht is going to connect this to extensibility or something else beyond straight HTML parser design.)
HT: I'm experimenting with a system that uses John's tokenizer, and my own upper level. Wondering whether it can reconstruct the HTML 5 English language spec. Since that is sometimes described as a way of capturing the error recovery of today's browsers.
TVR: Sounds appealing. Trouble is likely to be where HTML 5 does backtracking...almost does an "unshift".
HT: I asked on the Tag Soup list whether John has a regression test suite. Elliotte suggested John has things, but got them from the Web, and it's likely there would be copyright problems in sharing it.
DC: Was waiting for you to relate this to extensibility. Our job is not to do a better job on HTML 5 than the WG is going to do.
HT: The TAG has a least power finding.
TVR: It's been suggested we should write a validator.
DC: Do you acknowledge that this could be seen as being rude, in that it's not our business as a workgroup to do this?
HT: Well, they have gone far down the road.
<scribe> scribenick: rhys
NM: I think there is a line to be walked and that we need to acknowledge Dan's concern about ownership. It's reasonable for people to be hesitant about the role of the TAG in this particular case, and others actually. The TAG should be careful and either contribute as individuals or learn from what is happening in particular working groups. It is appropriate for us to discuss all of this because it helps us learn the what the issues are.
<scribe> scribenick: Noah
HT: In the statement of tagSoupIntegration-54 it says "Treat it "as if" it had been processed by [some formalization of] 'tidy -asxhtml';". I feel I'm exploring that. I think the reasoning is closely related to least power, but trying to make the story as declarative as possible.
<Zakim> DanC_lap, you wanted to ask if ht's explorations suggest anything about the @role situation or other extensibility cases
DC: Any new insights into what to do about role attribute?
HT: Don't think so, the Tag Soup program predates that.
HT: Posit that it's not in the HTML spec. I don't know what he does with unknown attributes. Seems to me that you should be able to control that in the formalization of the mapping.
DC: Good thing to study. Also think about simpler stuff like SVG elements.
HT: I think those would be passed through. The philosophy of Tag Soup is to pass through when possible. I suspect he passes through.
NW: My experience a bit difference. I had trouble with a bunch of RDDL. It munged the namespace declarations.
HT: There's something about that, and a switch.
DC: Sam Ruby, Ian Hixie and others
are building a parsing library and 200 tests.
... Something like 2% of web documents use <image> spelled that way.
<DanC_lap> or 0.2%
SW: We have 15 mins to go. We have a set of points Raman has set down, now need a strategy moving forward.
NM: What's the success criteria for the list Raman is working on?
<DanC_lap> esp http://html5lib.googlecode.com/svn/trunk/tests/
TVR: I would like it to be the place holder document for tag soup issue 54.
NM: And is it the list of answers to some question? Things to worry about?
DC: Potential table of contents for a document.
NM: Works for me.
TVR: And a framework to govern our work.
<DanC_lap> TVR asked for DanC to work on it with him. DanC agreed.
TVR: Happy to do an initial draft, as long as people view it as fodder for discussion, not something to shred.
<scribe> ACTION: T.V. Raman to draft initial discussion material on tag soup for discussion on 26 March, draft on the 19th or so.
TVR: Public or private?
NM: Public. Just make sure it's clear that we're trying to come up to speed, not tread on other peoples' toes.
SW: Next telcon will be on the 12th of March.
DC: regrets for the 12th.
<DanC_lap> I'm at risk for 12 March; travelling to SxSWi
SW: won't have time for agenda work until just after arriving.
DC: What about discussing XML chunk whatever.
NW: Was going to ask to just close
... xmlChunk-44 was an attempt to tackle deep equals for XML. I now think we can't do better than XML Functions and Operators.
TBL: No communication from us.
NW: We always write a note when closing the issue.
DC: Garbage collect or endorse the draft.
NW: collect it.
<scribe> ACTION: Norm to mark as abandoned the finding on deep equals and announce xmlChunk-44 is being closed without further action, with reason
RESOLUTION: close issue xmlChunk-44
<DanC_lap> sounds like 12 March call is cancelled
<DanC_lap> RESOLVED: to meet next 19 March
RESOLUTION: the next TAG teleconference will be on 19 March 2007.