Richard Ishida's draft comments on XHTML2 WD

Version reviewed

http://www.w3.org/TR/2004/WD-xhtml2-20040722/

Notes

These are my own comments. I will show them to the I18N WG, who may formally endorse some of these comments or modify them.

I have not yet read in detail the following sections:

Comments

IDLocationTypeCommentMail thread
1Abstract

It seems to me that XHTML 2 could perform a very useful role in establishing a base set of tags for people who want to develop their own more specialised vocabulary. It takes care of things like tables, linking strategies, bidi requirements, accessibility concerns, etc, etc. and allows the schema developer to concentrate on unique aspects of their vocabulary. It seems that this would be a strong selling point for XHTML2 and worth adding a clear mention to the abstract or at least the introduction.

21.2, "Sections and headings:..."typo

"lets you explicit markup the"

31.2, "XHTML takes a completely different approach..."edit

Some browsers display the title attribute as a tooltip. Suggest change to 'alt or title attributes'. (See also later comments about title attribute.)

41.2 "Edit:..."subst

Note that, depending on your editing environment, ins and del can be much easier to apply and delete for inline text than an attribute. This is the case, for example, in the graphical tags-on view such as you have in XMetal, where the elements can be inserted by simply double-clicking or a simple key sequence. I will be sorry to see ins and del go. Is there a chance we could have them and attributes - author to choose which they prefer?

55.5 Attribute types

The term 'charset' does not really mean 'encoding'. It may be clearer to change this term to 'encoding'. On the other hand, there is now a long history of incorrect use of 'charset'. Please consider whether it should be changed.

65.5 Attribute types@@

Note to self: need to look at suitability of XML Schema dataTime type.

75.5 Attribute types

LanguageCode: this should say "as per RFC3066 or it successor". We already had lots of trouble getting people away from RFC1766 with earlier versions of HTML (and i think it's still not in the errata). Let's not go down that road again. Note: there is a successor in preparation.

SHOULD POINT TO THE XML:LANG DEFINITION IN THE XML SPEC - NOT REDEFINE

85.5 Attribute types

Number: Does "one or more digits" mean any Unicode digit, or just 0-9? Eg, there are arabic digits, farsi digits, bengali digits, thai digits, etc.

THIS IS NOT SEMANTICS - IT IS SYNTAX - SHOULD SAY 'INTEGERS' - DOES IT INCLUDE DECIMAL POINTS? - PERHAPS USE THE XML SCHEMA DEFINITION

NOT CLEAR WHETHER REDEFINING OR USING XHTML MODULARIZATION SPEC DEFINITIONS - WOULD HELP TO HAVE A POINTER

95.5 Attribute types

URI: Should this be called IRI? At least we should indicate clearly in a note that this can contain non-ASCII text.

SHOULD INDICATE INTENT TO POINT TO NEW DRAFT

106

Why are I18N and Bi-directional modules separate from Core? If you are to develop properly internationalised documents you will always need these items. Having them in separate modules suggests that internationalization is a feature, rather than part of the base, and I think that is a bad thing.

AGREE THAT THIS SHOULD BE RAISED

117.1

Would it make sense to require a DOCTYPE?

127.1,1st paraedit

"After the document type declaration" - there may also be an XML declaration or PI, such as stylesheet, not just DOCTYPE

147.1

I would like to see some text that mentions the importance of adding the xml:lang attribute to the <html> tag. It is important for accessibility and for i18n to declare the text processing language at this point (and not others such as the body element). We need to raise awareness of the value of this and how it should be done for content authors. We also have a chicken and egg situation wrt the usefulness of language information. We need people to use it as a matter of course to enable future developments that make use of it. (See also the discussion of the difference between use of the HTTP header and html tag for declaring language, in comment @@.)

I would also like to see xml:lang used in all examples that include the html tag.

FINE

157.3Just a suggestion. It may be worthwhile to add a note to draw attention to the fact that the title can now contain phrasal markup - a boon for bidi and language markup. (And thanks for doing that!)
168.3

I was surprised that there wasn't more advice about the use of blockquote, and expected presentational behaviour, particularly wrt quote marks.

178.5

Why are there two styles of headings allowed. Doesn't the continued presence of h1 etc allow for continued misuse of headings? It's not clear to me what the benefit is of having these guys around.

I also find the usage in the example in 8.5 somewhat dubious, and counter to the aim of clarity in structure. I think that each section should have at most one heading, and the boundaries of the text described by that heading should be clear. Otherwise, it becomes more difficult to do things like programmatically extract a section and its heading using scripting in XSLT.

188.6

It would be helpful for content authors to point to or include some information about why the content model for p has changed in this way, and what the implications are.

Sidenote: The localization community will need to recognize that a p element is no longer necessarily a good candidate for a translation unit, to support source matching. Context is all important.

199.9

It would be good to see a brief discussion on the relative merits of the separator element and the use of divs with border styling.

20alledit

Note that editor's notes are difficult to spot, or at least the end is difficult to spot, if people print the specification without background colors. Some indentation or border styling might improve readability.

219.1

I am strongly against the use of the title attribute to provide the full or expanded form of an abbreviation, for the following reasons:

  1. title attributes do not allow for inline markup to express bidi or language needs - indeed, some inline styling or graphics may be appropriate for such full forms
  2. it may be appropriate to draw a distinction between the language used to pronounce the abbreviation and that used to pronounce the full form, eg. an acronym may be spelled out using English letter names, but the full form may be in French. The problem here being that you cannot label an attribute's language differently from that of the element using xml:lang.
  3. it precludes the use of title for OTHER purposes
  4. the term 'title' does not describe the content of the attribute, and does not identify the semantics of the content - this goes against the philosophy of good structural markup

Please change the content model of abbr to include an element that expresses the full form.

AGREED

229.1

"When necessary, authors should use style sheets to specify the pronunciation of an abbreviated form."

We have brought this up many times in the past. The use of style sheets only addresses a small part of the problem. It is totally inadequate for dealing with an abbreviation such as "CSAT", pronounced "see-sat", or "MA", pronounced "Massachussetts", etc. Please put in place a method to allow pronunciation to be dealt with properly. If this is not done, please at least modify the text in the specification to recognise that stylesheets are only going to be useful in certain circumstance - it is not a real solution to the problem. (I think it would also be helpful to show an example of how stylesheets should be used, if you suggest that.)

239.2typo

extra " just before Gandalf

249.2

It's bad enough that the US has recently appropriated such British classics as The Italian Job, Ladykillers, and Pooh Bear, but please don't try to claim that Gandalf speaks en-us ! ;-) In the interests of international harmony, please change the example to say "<quote xml:lang="en">".

259.8

I think it would greatly aid clarity and consistency of markup to offer some advice about when it is appropriate for document authors to add quotes directly in the text or via style sheets [note that this is spelled as a single word in the para starting "Visual user agents...", but spelt as two words in earlier parts of the text].

I would recommend that style sheets are used as the default method, and that therefore the XHTML processor support the necessary CSS. This is to facilitate localization. It is much faster and easier to adapt quotation marks in a style sheet than to change all instances in the markup. Manual insertion of quotation marks is appropriate when quoting a passage that will not be translated.

I think you should also recommend that the quotes appear outside the quote markup, since they are part of the surrounding text.

Note that the example given for quote does the wrong thing on both counts here, and I would say encourages bad practise.

269.8nit

There is no punctuation at the end of the cite example.

279.10edit

Please add "dir attribute" to "... in conjunction with style sheets, the xml:lang attribute, etc...."

289.11edit

"The strong element indicates higher importance for its contents." Higher than what?

299.11nit

I think the example would read better in English as "Please put the rubbish out on <strong>Monday</strong>, but <em>not</em> before nightfall!"

3010

Does it not introduce inconsistent semantics to a document to allow the use of <a> in addition to other methods of linking? I'm not clear why it was retained?

I think that at least you could change "but has been retained to allow the expression of explicit links" to" but has been retained to allow an alternative expression of explicit links".

3111.1nit

I'd like to have seen the definition of 'geek' rather than 'hacker', given that for 'dweeb'.

3211.2edit

In the example, unneeded space after "Contents".

3312.1edit

Suggestion: it may help good practise to add a note that class names should be chosen to reflect semantic distinctions, not presentational ones, eg. use 'emph' rather than 'italic'. This also assists in ensuring localizability of markup, since presentational values do not necessarily map from one script to another.

3412.1, title

We appreciate your work in eliminating attributes containing user readable text from the XHTML2 format. This will significantly aid localizability, since it reduces the number of places where unique ids, or language or bidi markup are unavailable.

The title attribute still sticks out like a sore thumb, though, in this regard. Can we not convert it to a common inline element, that can optionally appear as the first item in most other elements? Otherwise, there is a significant usability impact for international users, and XHTML will look Western-centric.

3513.1, hreflang

This is interestingly different from my understanding of how hreflang was used in HTML 4.01 (which I think was actually problematic).

You should indicate in the second para that language values should conform to RFC 3066 or its successors.

3613.1, accessedit

The 'shortcuts' title isn't correctly presented.

3714

The spec uses the term 'base language' without an apparent definition. It also uses the term 'primary language' in the example in this section. The i18n WG has put a lot of thought into matters of this kind lately and it would seem appropriate to step back and consider the usage of language in XHTML2.

The i18n WG has begun to use the terms 'text processing language' and 'primary language' ('document language metadata') to mean different things. [The actual terminology is a secondary consideration here.] The idea is that there are two ways in which one needs to declare the language of content: the first is to express the basic language of the document as a whole (this could be used for searching, serving, etc.), the second is to express the language of a specific run of text so that applications that manipulate the text, such as text-to-speech, can correctly understand the text they are currently dealing with. The former declaration (what we call 'primary language') could involve declaring more than one language, eg. for documents containing parallel texts in multiple languages, but doesn't necessarily mention every language that appears in the document. The latter type of declaration (what we call 'text processing language') must, of necessity, refer to only a single language at a time, though that declaration can be overriden for a labelled fragment of the text, eg. an embedded French word in English text.

The rules governing the use of language values in HTTP headers and language attributes reinforce our view that the HTTP header should be used to declare the primary language, and that language attributes should be used to declare the text processing language. It is acceptable in my mind to say that, in the absence of language attributes, the first value of the HTTP header Content-Language field could be used to declare the default text processing language for the document, but it would always be better to declare that explicitly in the html tag.

Based on the foregoing, I have the following recommendations for this section:

  • Suggested rewording: "This attribute specifies the base language of an element's attribute values and text content" -> "This attribute specifies the language of an element's attribute values and text content, and that of all elements it contains unless overwritten".
  • Add a paragraph to recommend that xml:lang be used always with the html tag to set the default language of the document for text processing.
  • Also say the the use of the HTTP language information for text processing purposes should only be considered a fallback solution, and define the expected behaviour if the HTTP Content-Language declaration contains more than one language.
  • Suggested rewording for the example: "In this example, the primary language of the document is..." -> "In this example, the default processing language of the document is...".
3814

Other brainstorming thoughts related to language declarations, based on the philosophy introduced in comment 37:

  • Would it be appropriate to provide a way of declaring the primary language of the document in the document itself, other than the current use of a meta element with Content-Type declaration? This would retain such information even if the document was not pulled from a server, eg. read from CD.
  • Would it make sense to have a common attribute called something like attr-lang, which allowed one to specify the language of the title attribute when that differed from the language of the element content?
3915.1, 1st para

Suggested rewording: "This direction overrides the inherent directionality of characters as defined in Unicode Standard" --> "This direction affects the display of characters as defined in Unicode Standard". It doesn't change the inherent directionality of the characters themselves, just the behaviour of those characters in context.

4015.1, example

You would only need to use the dir attribute on the p element in a rtl context. It would be better to omit it here and to simply state that the default directionality for this text is ltr. We encounter many people who add dir's to almost all the block elements in a file, when a single declaration on the html tag would suffice. I don't want to encourage such behaviour, since it is unnecessary and detrimental.

[ Thankyou for adding the note about setting the base direction for an entire doc in the html tag. Very helpful !]

4115.1, example

It is not at all obvious to the uninitiated reader what the effect of the lro attribute would be unless you show the resulting displayed text. (This is a tricky area to show examples ;)

4216

The attribute name datetime is rather unspecific, given that it can appear on any element. I would prefer editdatetime.

4320.5.2, example

Several problems with this example:

  1. There is inconsistency in the use of English vs translated (French) labels. This should not be the case. The question is, which should it be. It depends to some extent on how the information will be displayed, and who the intended reader is. I think it probably should be in foreign languages.
  2. For the Dutch manual, xml:lang="du" is wrong. Either translate the text to Dutch, or change xml:lang to hreflang. Or translate and add hreflang. Same goes for Portuguese and Arabic.
  3. The French should not use an entity to represent the ç
4420.5.2

Again, the title is used for a role specific to this element, and contains user readable text that might need to be marked up for directionality, language, translation id, etc (especially given the context of usage). Does the link element have to be empty? Could it not include this text as content?

4520.6.1, example

Please use the character è rather than the NCR in Grèce.

4623.1

charset: Could this be called 'encoding'?

4723.1

It says "Please consult the section on character encodings for more details.", but I wasn't sure which section that was referring to.

4823.1.3,example

Please add xml:lang="en" to the html tag.

4923.1.3

Where is the perl script?

5026.4.2

Suggested rewording: "When set for the table element" --> "When set for or inherited by the table element".

51Appendix F

Entities are extremely useful for disambiguation of invisible or identical characters, such as &rlm; and &nbsp;. It is much easier to work with names in these cases than to use NCRs. Please be driven by user needs rather than technological limitations where possible.

Version: $Id: xhtml2-review.html,v 1.2 2005/01/20 17:31:04 rishida Exp $