HTML/XML Task Force

Meeting 3, 11 Jan 2011


See also: IRC log


Norm, John, Yves, Michael Champion, Michael Kay, Noah, Henri
James, Anne
Norm Walsh
Noah Mendelsohn, NM



NW: Next call will be in a week, on 18 January. Any regrets?


topics: Use cases

<hsivonen> Use case email was http://lists.w3.org/Archives/Public/public-html-xml/2010Dec/0064.html

NW: I've been somewhat out of touch, but have seen at least two interesting email threads: 1) xml in feeds and 2) how to detect html5


NW: Well, some thread subjects said XML

<hsivonen> we covered use cases 1 and 2. We didn't cover 3 and 4

Use case email was http://lists.w3.org/Archives/Public/public-html-xml/2010Dec/0064.html

Use case #3, islands of HTML5-marked prose

From the email description of the use case:

3. I have an XML document and I want to embed islands of human prose

marked up with HTML5 in it because I want to be able to extract

those sections for use in, for example, documentation.

JC: In that environment, we don't have an HTML5 DOM, I think, so we don't have to deal with inconsistent DOMs

NW: Yes, mainly XML tools for this case.

JC: What limitations are there on HTML5? E.g., I know about noscript.

NW: (missed something about semantics) I was thinking about things like HTML5 rules that automatically add namespaces to SVG, and that won't happen in an XML toolchain.

JC: The XHTML5 elements mean the same as their like-named counterparts in HTML5, with the exception of NOSCRIPT

HS: Yes, and also ISINDEX

NW: Why?

HS: Those are both sort of parser-managed things on the HTML side. ISINDEX as sort of a parser macro, is invalid into HTML5, and is invalid in that sense. It expands into other elements like a macro. NOSCRIPT depends on the context.

JC: How it's parsed depends on whether you have scripting.

NW: Thanks, good to know. Sounds like it's safe to set aside ISINDEX. Less sure about NOSCRIPT, but likely at worst a minor problem.

Use case #4, HTML document with islands of XML

From the use case email

4. I have an HTML5 document and I want to embed islands of XML in it

because I want to be able to write JavaScript and CSS to manipulate

those elements, for example, in the browser.

NW: The HTML5 parser won't do the same thing as XML would if the element names are in the HTML5 language.
... I believe that the only workaround is to put the XML in a <SCRIPT> element, that gives you the XML in an escaped node.

MK: Or download the XML separately.

HS: The text node will have the text unescaped.

NW: Oh, OK, yes. If serialized then escaped, but in the node it's not.

NM: The XML need not be for manipulation only in Javascript/CSS, you may also or instead want to manipulate it in XML (or HTML) tools at the server, or conceivably elsewhere on a client.

HS: The script element trick works for all languages, so XML is being treated as a special case.

NM: Yes, and there are arguments pro and con as to whether that makes sense. HTML and XML have a long history togther, and this task force is focused on exploring synergies.

JC: Just use XHTML?

NM: Yes, but we always get back to the huge install base that runs best with text/html

<darobin> "The script element allows authors to include dynamic script and data blocks in their documents. The element does not represent content for the user."

NW: I find the uniformity of treatment of all languages by NOSCRIPT to be appealing.

<Norm> I'm not sure I went so far as to say that I found it appealing, but ...

NM: So, I'm a little troubled by the fact that <SCRIPT> tags have mandated processing in the case there's a script there. What if the script is media type applicaiton/xml

JC: Not troubled by that. You'll use something like application/xslt+xml if you want your XML interpreted as (in this example) an XSLT script.
... Historically, media type is what to do with it, not what it is.

NM: I strongly disasgree with that.

JC: Oh, I mean in HTML

NM: Specifically on the SCRIPT tag

NM: I'd prefer to associate the processing rules with the spec for the SCRIPT tag

JC: What does the HTML5 spec say?

HS: I agree with Noah that in principle there's an architectural issue; in practice the set of languages supported in browsers is small and slowly growing. So far none in XML. If necessary, any such new XML scripting language could get a more specific type.

Speaking for myself: OK, maybe the HTML5 spec should say what Henri just said.


HS: They don't support it in <SCRIPT>, and it doesn't make much sense to do so.

JS: I understand this isn't likely to happen, but not sure why it wouldn't make sense.

HS: Script processing starts when end tag </script> is parsed, and you only have a partial DOM. Seems not to make sense to do XSLT then. Hmm, but a DEFER script could make sense I guess.

JC: Could run multiple successively.

MK: Some of my points have been partly covered. There are a lot of potential XSLT processing scenarios, many of which can't be captured by <script type="..xslt type..">
... E.g. when to run, what the input is, whether there's more than one script, etc., parms, etc.
... Relying on one attribute seems insufficiently extensible. Henri reinforces that when he says "won't happen in next year, therefore uninteresting". Seems the wrong way to architect. We should look further into the future, to when Javascript seems as old fashioned as COBOL. The world is dynamic.

JC: Propose we add embedded XSLT as another use case.

NW: +1

NM: Too bad we're leaving this behind so quickly. The purpose of our group is to maximize HTML/XML synergies, and for >certain< purposes XSLT is a terrific language for HTML scripting

HS: There is some implementation in the runtimes for giving the HTML DOM as input to XSLT processing (scribe isn't sure he got this right)
... The XSLT program can be put in a script element, and use bootstrapping Javascript that compiles the XSLT program, and chooses as input tree to give to that program.
... You can put the output in the DOM.

<Zakim> noah, you wanted to talk about circularity

MK: Yes, we've seen the folks at ETH Zurich do just that, using two <SCRIPT> elements, one javascript and one XQuery. The former looks for and runs the latter.

HS: The set of programming languages supported natively by browsers has always been "1" across multiple browsers, that is Javascript. Internet Explorer has for years also supported VBScript. In IE, there are also good extensibility APIs that allow languages to be plugged in.
... Gecko allows some extensibility, but for various reasons only for local content.
... Anyway, the trend is toward focus on Javascript only, and viewing that as a compiler target for other languages. That said, there is precedent for having other languages.
... You cannot ever use type="text/vbscript" for data, because there exists a browser that would attempt to execute it.

BINGO! That's why I don't much like using the <SCRIPT> tag for data.

NW: That is astonishingly unsatisfying. It would make much more sense to add a new <DATA> element, without the risk that IE would later decide that type="application/fribble" would launch missles.

HS: The reason it's called SCRIPT and not DATA is that there are only a handful of elements that don't try to parse their content.
... If we introduce something called <DATA>, it would be incompatible with the install base of browser.

What about <script type="xxxx" mode="NORUN">?

HS: So, the pattern is formalized in HTML5. An alternative is using <STYLE>. Another is <XMP>, but that's not hidden by default.

NW: Yeah, I forgot the compatibility problem.

<Zakim> noah, you wanted to ask about NORUN attribute

<hsivonen> existing browsers wouldn't honor NORUN

<jcowan> Announcement: I'm working on a MicroXML parser/DOM called MicroLark (hommage to Tim's Lark parser from the early days of XML)

NM: I think a new attribute would have fewer problems BUT: I admit that it would be at best eliminating future problems, and then only rarely. The advantage would be architectural robustness. It appeals to me intellectually, but I suspect that even if built it would be used only sometimes.


Summary of Action Items

[End of minutes]

Minutes formatted by David Booth's scribe.perl version 1.135 (CVS log)
$Date: 2011/01/17 18:13:18 $