Re: Request for Volunteers: Polyglot spec

Jonas Sicking wrote:
> On Fri, Mar 26, 2010 at 1:52 PM, Sam Ruby <rubys@intertwingly.net> wrote:
>> I took an action item from the TAG yesterday to convey the following
>> request:
>>
>>    The W3C TAG requests there should be in TR space a document
>>    which specifies how one can create a set of bits which can
>>    be served EITHER as text/html OR as application/xhtml+xml,
>>    which will work identically in a browser in both bases.
>>    (As Sam does on his web site.)
>>
>> This request requires a lot of explanation.  To start, it is recognized up
>> front that this will be a subset of the set of possible documents that can
>> be expressed as HTML5.  This is entirely OK.  For example, if it were to be
>> the case that such a subset were to entirely disallow scripts of any kind,
>> that would be acceptable as there exists a substantial class of documents
>> which do not require scripting of any kind.
> 
> Out of curiosity, what does "work identically" encompass? Do they have
> to have the same DOM? Or just render the same when the default UA
> stylesheet is applied? Or just be semantically equivalent?
> [...]
> If DOMs aren't important, only rendering is, I assume that this
> document won't qualify:
> 
> <html xmlns="http://www.w3.org/1999/xhtml">
>   <head>
>     <style> tbody { background: green } </style>
>     <title>example document</title>
>   </head>
>   <body>
>     Integer values for true/false.
>     <table>
>       <tr><td>true</td><td>1</td></tr>
>       <tr><td>false</td><td>0</td></tr>
>     </table>
>   </body>
> </html>

This one would also render differently:

<html xmlns="http://www.w3.org/1999/xhtml">
   <head><title>example document</title></head>
   <body>
     <pre>
Arbitrary example text</pre>
   </body>
</html>

and this one will also cause data corruption depending on the content-type:

<html xmlns="http://www.w3.org/1999/xhtml">
   <head><title>example document</title></head>
   <body>
     <form>
       Edit your comment:
       <textarea name="comment">
Your previous text</textarea>
     </form>
   </body>
</html>

(because the text/html parser strips a leading newline character in 
pre/textarea/listing elements), which seem like more serious issues than 
the <tbody>, since (unless I'm missing something) it's impossible to 
safely use these elements in polyglot documents, unless you do

   <pre><!---->
   text
   </pre>

which is a horrid hack and won't work for textarea anyway. So I think a 
true polyglot subset would have to exclude the textarea element, which 
limits its usefulness further. (Maybe the remaining subset is still 
large enough to be worth specifying in detail.)

-- 
Philip Taylor
pjt47@cam.ac.uk

Received on Thursday, 1 April 2010 18:42:02 UTC