W3C   W3C Internationalization (I18n) Activity: Making the World Wide Web truly world wide!

Related links

Other reviews

Review radar

Core WG home page

Manage page

Internationalization Comments on Polyglot Markup: HTML-Compatible XHTML Documents

Date of latest comments: July 2010

The WG column indicates whether these are comments on behalf of the Internationalization Core WG. The "Owner" column indicates who has taken the responsibility of tracking discussions on a given comment. Orange shading signifies that the comment is unresolved.

We recommend that responses to the comments in this table use a separate email for each point. This makes it far easier to track threads. Click on the icons in the right-most column to see email discussions.

You can edit this page by clicking on Edit Table (right), then using the buttons that appear in the right-most column and the forms below the table. It doesn't actually change the text on the server. For that, hit the Create Source Code button near the bottom of the page and use the resulting text to edit the page or send to someone for editing.

ID Location Subject Comment Owner WG Ed. /
Subs.
Mail  
13. Character EncodingUTF-16 BOM is required

"When polyglot markup uses UTF-16, it should include the BOM indicating UTF-16LE or UTF-16BE."

The RFC use of the word 'should' is incorrect. The XML spec requires the BOM for UTF-16 documents. http://www.w3.org/TR/REC-xml/#charencoding

RIYS Link to mail thread
23. Character EncodingIn-document declarations always useful

"In addition, polyglot markup need not include the meta charset declaration, because the parser would have to read UTF-16 in order to parse it by definition."

The i18n WG guidelines recommend, nevertheless, that you always include a visible encoding declaration in your document, since it helps developers, testers, or translation production managers who want to visually check the encoding of a document. So it's true to say that you strictly don't need it, but we would prefer that people do. Please could you reflect that in your document.

RIYS Link to mail thread
33. Character EncodingBOM with utf-8 and/or utf-16

" Use UTF-8 or UTF-16 with the appropriate BOM. "

This could be read "use utf-8 with the appropriate BOM or UTF-16 with the appropriate BOM", but a utf-8 bom (or signature) is not strictly necessary, and some would argue that it may cause problems, and it's use should be discouraged here.

RIYE Link to mail thread Link to mail thread
43. Character EncodingOmit the either/or list

" In short, for correct character encoding, polyglot markup must either: "

The MUST is too strong. There is no problem with using more than one declaration, and in an earlier comment we said that we recommend that you have a readable declaration in the source in addition to a UTF8/16 encoding.

I think it is better just to omit the list and it's lead-in paragraph "In short, for correct ...".

The information is contained in the following paragraph that starts with "If polyglot markup uses an encoding other than..."

RIYE Link to mail thread Link to mail thread
57. AttributesMention lang and xml:lang

No mention is made of the lang and xml:lang attributes. The document should say that both should be used when language attributes are used.

It may also recommend the use of the language attributes in the html element to set the default language for the document, and mention that the meta Content-Language element has no usefulness at all in XML for setting the language of content.

RIYE Link to mail thread
66.2.3 Attribute valuesCase requirements

" however, case requirements do not apply to non-ASCII letters such as Greek, Cyrillic, or non-ASCII Latin letters. "

We are confused by this text. Scripts such as Greek, Cyrillic, and Armenian do have case distinctions, and those distinctions are significant in XML if you have attribute names or values in those scripts. But we are not clear when any characters from those scripts or non-ASCII Latin letters are used for attribute names or values in HTML.

Please clarify for us what the intent is.

(There is similar text in 6.2.2)

RIYE Link to mail thread
78. Named Entity ReferencesNamed entity references

" For example, polyglot markup uses   instead of  . "

We would prefer your example to use the hexadecimal NER   rather than the decimal. See http://www.w3.org/TR/2005/REC-charmod-20050215/#C048

RIYE Link to mail thread Link to mail thread
88. Named Entity Referencesployglot

typo: ployglot -> polyglot

RIYE Link to mail thread

Dump all source

Title:

Page template by Richard Ishida (ishida@w3.org).