Re: Disclaimer for Authoritative Metadata (ACTION-793) from Bjoern Hoehrmann on 2013-04-09 (www-tag@w3.org from April 2013)

From: Bjoern Hoehrmann <derhoermi@gmx.net>
Date: Tue, 09 Apr 2013 02:37:29 +0200
To: Larry Masinter <masinter@adobe.com>
Cc: "www-tag.w3.org" <www-tag@w3.org>
Message-ID: <5dk6m8pv0o5vqe0dlfeknr184ns7h6gbht@hive.bjoern.hoehrmann.de>

* Larry Masinter wrote:
>seriously, shouldn't XSLT coerce/convert its input to XML or an XML DOM 
>rather than force its input to be treated as XML no matter what it looks 
>like or how it is labeled?

If it does not look like XML on the inside, then there would be an error
and normal processing would be aborted; and the envelope does not say if
there is XML inside or not, except for some special cases. Sure, for the
`image/png` case you could say that's not XML, and for `application/xml`
you could say it is, even back in 1999, but `text/xsl`, `image/svg-xml`,
and many others make that a hard problem, especially because the `+xml`
convention did not exist back when XSLT 1.0 became a Recommendation.

XSLT 2.0 also has a `unparsed-text(...)` function that would retrieve a
resource and turn it into a Unicode string value. Media types also don't
tell whether a resource is "text" in that sense. As an example, JSON is
text in a compatible sense, but `application/json` does not indicate it
in any way, and `image/png` does not indicate that it isn't text in that
sense. So failing on resources that aren't text while succeeding on ones
that are would require a lot of effort.

XSLT could also have made a `retrieve-bytes(...)` function, and two more
functions `turn-bytes-into-plaintext(...)` and `parse-bytes-as-xml(...)`
which apart from character encoding and other encoding details (that are
not very relevant in practise) would compose the two problem functions,

  document(...) = parse-bytes-as-xml(retrieve-bytes(...))
  unparsed-text(...) = turn-bytes-into-plaintext(retrieve-bytes(...))

and then what? The functions on the right hand side are independently
useful, there isn't much of a reason why XSLT should not have them, but
then the blame for treating `image/png` as XML would fall on authors of
XSLT documents. But for them it is even more implausible to research and
implement all the things needed to make sure their code fails when it
ought to fail.

My http://search.cpan.org/perldoc?HTML::Encoding sidesteps the issue by
allowing callers to pass in regular expressions for things like `is_xml`
with a vaguely reasonable default (is `example/xml` XML?) That could
have been done for XSLT aswell, so you would have to explicitly ask for
a failure mode using the method designed with that in mind to have any
`image/png` content be handled as if it was XML (just like an 'ignore-
type' option would).

>XSLT on text/html treated as if it were XML seems like a nightmare,
>opening for all sorts of failures and security problems.

If you make an XHTML 1.0 document and label it with `text/html` because
you want the content to render in applications that support `text/html`
but not `application/xhtml+xml` or whatever else you might be using, you
quite probably actually do want `document(...)` to treat it as XML.
-- 
Björn Höhrmann · mailto:bjoern@hoehrmann.de · http://bjoern.hoehrmann.de
Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/

Received on Tuesday, 9 April 2013 00:37:56 UTC