Re: ISSUE-4 - versioning/DOCTYPEs

Boris Zbarsky, Thu, 13 May 2010 21:46:51 -0400:
> On 5/13/10 6:17 PM, Leif Halvard Silli wrote:
>>> If the file has an HTML MIME type, then "XHTML syntax" doesn't make
>>> sense.
>> 
>> What does it mean that a file "has an HTML MIME type"? Do you mean,
>> like, forever?
> 
> Well, typically when you open a file in an editor the editor has some 
> idea of what format the file is in.  This is typically indicated by 
> extensions, MIME types, magic numbers, etc.

That is not typical for XHTML vs HTML syntax - XHTML syntax typically 
uses .html as extension. There are some exceptions - most notably in 
Web browers: Most of them insist on saving an online 
application/xhtml+xml file with the .xhtml syntax. Most Web browsers 
also saves the page without touching the source code. Not so with all 
editors - they may change the syntax according to whatever standard it 
sees as the correct one for this file.

Opera, btw, a) insists on saving online application/xhtml+xml files 
with the .xml suffix, b) renders files with the .xhtml syntax as 
impossible to open (in file opening dialogs) - impossible to overrule 
in OS X - whereas on Windows on may select "show all files".

More: http://lists.w3.org/Archives/Public/public-html/2010Apr/1204

> Having a file with a .html extension would tend to mean you want it 
> treated as an HTML file on most of the currently-popular desktop 
> operating systems.

For parsing, then yes. For editing, then less so.

>> The point with polyglot documents is that the MIME type can change.
>> KompoZer handles polyglot XHTML editing already - and it does it in in
>> text/html mode. What's the problem with that?
> 
> The fact that last I checked the relevant code couldn't actually 
> usefully edit XML very well (though of course KompoZer could have 
> local hacks to make it better).

I definitely think that NVU/KompoZer is programmed to behave 
differently from how e.g. the Composer part of SeaMonkey behaves. 
NVU/SeaMonkey *can* handle XHTML1 editing, while the more "Gecko 
puristic" SeaMonkey Composer can not. I still don't see the problem.

>> Most editors doesn't have an particular mode, however, since they are
>> just text editors.
> 
> Hold on.  We were just talking about wysiwyg HTML/XHTML editors, no? 
> Those are very much NOT text editors.

Subject of e-mail: "ISSUE-4 - versioning/DOCTYPEs". KompoZer is an 
example of an editor that relies on the doctype when it decides the 
syntax to follow. Other editors, including both text editors and 
WYSIWYG editors, also seems to rely on the doctype for choosing syntax. 
Some pure XML/XHTML editors of WYSIWYG style will also treat .html as 
XHTML, I think. E.g. editors such as Oxygen. See the message I pointed 
to above.

>> But text editors have features such as autocomplete etc,
>> and they need to know what kind of syntax to create.
> 
> Yep.  Then again, the text editor I use on a regular basis does make 
> a quite clear distinction between HTML and XML modes.

I will try to find out what editor you use. ;-)

But, based on the file suffix *only*? I admit that it doesn't make 
sense to use HTML4 alike syntax in a .xhtml file. But the question is 
also about .html.

>> Editing in the XHTML MIME type doesn't guarantee polyglot syntax.
> 
> Neither does editing as HTML, right?  It sounds like editors need a 
> polyglot mode.

Yes. But I think that, to a degree, some DOCTYPEs already causes 
polyglot mode. E.g. KompoZer turns <img></img> into <img />.

>>>> One could say that XHTML5 specifications are allowed to create DOCTYPEs
>>>> for use in text/html
>>> 
>>> If it's text/html, then XHTML5 has nothing to do with it.
>> 
>> The polyglot spec is not your cup of tea then, I gather.
> 
> OK, let's back up.  If you're using a non-polyglot-aware HTML editor 
> on what it thinks of as an HTML document your chances of ending up 
> with a usable polyglot document are slim to none.

Not in KompoZer, AFAICT. Or, to put it another way: KompoZer *is* a 
polyglot aware editor, whenever it operates a file (in text/html mode) 
with the XHTML1 doctype. (This might be a side-effect of the Gecko 
text/html parser in KompoZer, though?)

If we say that HTML4 vs XHTML1 is like HTML5 vs XHTML5, then it is 
simple to discern between HTML4 and XHML1, but impossible to discern 
HTML5 versus XHTML5 (versus quirks-mode HTML). 

For HTML4 versus XHTML1, the discerning, in a controlled milieus (e.g. 
such as in a single editor) happens by looking at the DOCTYPE. Of 
course, editors also have the option of looking at the MIME type - the 
suffix - if the file has a suffix. But if it doesn't have suffix, what 
then? At the very least Webkit and Gecko then looks at the META 
content-type element ...

Whereas, when it comes to HTML5 (versus XHTML5 versus polyglot HTML5), 
then what? For suffix less files, then <meta http-equiv="Content-Type" 
content="application/xhtml+xml" /> isn't even permitted ... 

The simplest thing would be, I think, if editors discerned between 
HTML5 and XHTML5 the same way they already discern between HTML4 and 
XHTML1. That is: they should look at the MIME type, but also at the 
DOCTYPE.

>  Likewise for a 
> non-polyglot-aware X(HT)ML editor used on an XHTML document.

Given the error correction in text/html, this has a much higher chance 
to work, IMHO. Also, even if it is mostly harmless (except for <br 
></br> - though 2 instead 1 line break is also often pretty harmless), 
XHTML editors tend to prefer <element /> over <element></element> - at 
least when creating XHTML1 documents.

> The only way to edit polyglot documents sanely is to have a polyglot 
> mode that you put your editor into, right?  One in which it enforces 
> the quite specific requirements polyglot documents have.
> 
> Are we on the same page that far?

That far - I don't know. ;-) But at least we are on the same page when 
it comes to 'polyglot mode' - such a mode is needed. And some editors 
might choose to offer only that mode, I think. The question is what to 
use to discern between those modes. 
-- 
leif halvard silli

Received on Friday, 14 May 2010 14:01:26 UTC