These are my own comments. I will show them to the I18N WG, who may formally endorse some of these comments or modify them.
I have not yet read in detail the following sections:
ID | Location | Type | Comment | Mail thread |
---|---|---|---|---|
1 | Abstract |
It seems to me that XHTML 2 could perform a very useful role in establishing a base set of tags for people who want to develop their own more specialised vocabulary. It takes care of things like tables, linking strategies, bidi requirements, accessibility concerns, etc, etc. and allows the schema developer to concentrate on unique aspects of their vocabulary. It seems that this would be a strong selling point for XHTML2 and worth adding a clear mention to the abstract or at least the introduction. | ||
2 | 1.2, "Sections and headings:..." | typo |
"lets you explicit markup the" | |
3 | 1.2, "XHTML takes a completely different approach..." | edit |
Some browsers display the title attribute as a tooltip. Suggest change to 'alt or title attributes'. (See also later comments about title attribute.) | |
4 | 1.2 "Edit:..." | subst |
Note that, depending on your editing environment, ins and del can be much easier to apply and delete for inline text than an attribute. This is the case, for example, in the graphical tags-on view such as you have in XMetal, where the elements can be inserted by simply double-clicking or a simple key sequence. I will be sorry to see ins and del go. Is there a chance we could have them and attributes - author to choose which they prefer? | |
5 | 5.5 Attribute types |
The term 'charset' does not really mean 'encoding'. It may be clearer to change this term to 'encoding'. On the other hand, there is now a long history of incorrect use of 'charset'. Please consider whether it should be changed. | ||
6 | 5.5 Attribute types | @@ |
Note to self: need to look at suitability of XML Schema dataTime type. | |
7 | 5.5 Attribute types |
LanguageCode: this should say "as per RFC3066 or it successor". We already had lots of trouble getting people away from RFC1766 with earlier versions of HTML (and i think it's still not in the errata). Let's not go down that road again. Note: there is a successor in preparation. SHOULD POINT TO THE XML:LANG DEFINITION IN THE XML SPEC - NOT REDEFINE | ||
8 | 5.5 Attribute types |
Number: Does "one or more digits" mean any Unicode digit, or just 0-9? Eg, there are arabic digits, farsi digits, bengali digits, thai digits, etc. THIS IS NOT SEMANTICS - IT IS SYNTAX - SHOULD SAY 'INTEGERS' - DOES IT INCLUDE DECIMAL POINTS? - PERHAPS USE THE XML SCHEMA DEFINITION NOT CLEAR WHETHER REDEFINING OR USING XHTML MODULARIZATION SPEC DEFINITIONS - WOULD HELP TO HAVE A POINTER | ||
9 | 5.5 Attribute types |
URI: Should this be called IRI? At least we should indicate clearly in a note that this can contain non-ASCII text. SHOULD INDICATE INTENT TO POINT TO NEW DRAFT | ||
10 | 6 |
Why are I18N and Bi-directional modules separate from Core? If you are to develop properly internationalised documents you will always need these items. Having them in separate modules suggests that internationalization is a feature, rather than part of the base, and I think that is a bad thing. AGREE THAT THIS SHOULD BE RAISED | ||
11 | 7.1 |
Would it make sense to require a DOCTYPE? | ||
12 | 7.1,1st para | edit |
"After the document type declaration" - there may also be an XML declaration or PI, such as stylesheet, not just DOCTYPE | |
14 | 7.1 |
I would like to see some text that mentions the importance of adding the xml:lang attribute to the <html> tag. It is important for accessibility and for i18n to declare the text processing language at this point (and not others such as the body element). We need to raise awareness of the value of this and how it should be done for content authors. We also have a chicken and egg situation wrt the usefulness of language information. We need people to use it as a matter of course to enable future developments that make use of it. (See also the discussion of the difference between use of the HTTP header and html tag for declaring language, in comment @@.) I would also like to see xml:lang used in all examples that include the html tag. FINE | ||
15 | 7.3 | Just a suggestion. It may be worthwhile to add a note to draw attention to the fact that the title can now contain phrasal markup - a boon for bidi and language markup. (And thanks for doing that!) | ||
16 | 8.3 |
I was surprised that there wasn't more advice about the use of blockquote, and expected presentational behaviour, particularly wrt quote marks. | ||
17 | 8.5 |
Why are there two styles of headings allowed. Doesn't the continued presence of h1 etc allow for continued misuse of headings? It's not clear to me what the benefit is of having these guys around. I also find the usage in the example in 8.5 somewhat dubious, and counter to the aim of clarity in structure. I think that each section should have at most one heading, and the boundaries of the text described by that heading should be clear. Otherwise, it becomes more difficult to do things like programmatically extract a section and its heading using scripting in XSLT. | ||
18 | 8.6 |
It would be helpful for content authors to point to or include some information about why the content model for p has changed in this way, and what the implications are. Sidenote: The localization community will need to recognize that a p element is no longer necessarily a good candidate for a translation unit, to support source matching. Context is all important. | ||
19 | 9.9 |
It would be good to see a brief discussion on the relative merits of the separator element and the use of divs with border styling. | ||
20 | all | edit |
Note that editor's notes are difficult to spot, or at least the end is difficult to spot, if people print the specification without background colors. Some indentation or border styling might improve readability. | |
21 | 9.1 |
I am strongly against the use of the title attribute to provide the full or expanded form of an abbreviation, for the following reasons:
Please change the content model of abbr to include an element that expresses the full form. AGREED | ||
22 | 9.1 |
"When necessary, authors should use style sheets to specify the pronunciation of an abbreviated form." We have brought this up many times in the past. The use of style sheets only addresses a small part of the problem. It is totally inadequate for dealing with an abbreviation such as "CSAT", pronounced "see-sat", or "MA", pronounced "Massachussetts", etc. Please put in place a method to allow pronunciation to be dealt with properly. If this is not done, please at least modify the text in the specification to recognise that stylesheets are only going to be useful in certain circumstance - it is not a real solution to the problem. (I think it would also be helpful to show an example of how stylesheets should be used, if you suggest that.) | ||
23 | 9.2 | typo |
extra " just before Gandalf | |
24 | 9.2 |
It's bad enough that the US has recently appropriated such British classics as The Italian Job, Ladykillers, and Pooh Bear, but please don't try to claim that Gandalf speaks en-us ! ;-) In the interests of international harmony, please change the example to say "<quote xml:lang="en">". | ||
25 | 9.8 |
I think it would greatly aid clarity and consistency of markup to offer some advice about when it is appropriate for document authors to add quotes directly in the text or via style sheets [note that this is spelled as a single word in the para starting "Visual user agents...", but spelt as two words in earlier parts of the text]. I would recommend that style sheets are used as the default method, and that therefore the XHTML processor support the necessary CSS. This is to facilitate localization. It is much faster and easier to adapt quotation marks in a style sheet than to change all instances in the markup. Manual insertion of quotation marks is appropriate when quoting a passage that will not be translated. I think you should also recommend that the quotes appear outside the quote markup, since they are part of the surrounding text. Note that the example given for quote does the wrong thing on both counts here, and I would say encourages bad practise. | ||
26 | 9.8 | nit |
There is no punctuation at the end of the cite example. | |
27 | 9.10 | edit |
Please add "dir attribute" to "... in conjunction with style sheets, the xml:lang attribute, etc...." | |
28 | 9.11 | edit |
"The strong element indicates higher importance for its contents." Higher than what? | |
29 | 9.11 | nit |
I think the example would read better in English as "Please put the rubbish out on <strong>Monday</strong>, but <em>not</em> before nightfall!" | |
30 | 10 |
Does it not introduce inconsistent semantics to a document to allow the use of <a> in addition to other methods of linking? I'm not clear why it was retained? I think that at least you could change "but has been retained to allow the expression of explicit links" to" but has been retained to allow an alternative expression of explicit links". | ||
31 | 11.1 | nit |
I'd like to have seen the definition of 'geek' rather than 'hacker', given that for 'dweeb'. | |
32 | 11.2 | edit |
In the example, unneeded space after "Contents". | |
33 | 12.1 | edit |
Suggestion: it may help good practise to add a note that class names should be chosen to reflect semantic distinctions, not presentational ones, eg. use 'emph' rather than 'italic'. This also assists in ensuring localizability of markup, since presentational values do not necessarily map from one script to another. | |
34 | 12.1, title |
We appreciate your work in eliminating attributes containing user readable text from the XHTML2 format. This will significantly aid localizability, since it reduces the number of places where unique ids, or language or bidi markup are unavailable. The title attribute still sticks out like a sore thumb, though, in this regard. Can we not convert it to a common inline element, that can optionally appear as the first item in most other elements? Otherwise, there is a significant usability impact for international users, and XHTML will look Western-centric. | ||
35 | 13.1, hreflang |
This is interestingly different from my understanding of how hreflang was used in HTML 4.01 (which I think was actually problematic). You should indicate in the second para that language values should conform to RFC 3066 or its successors. | ||
36 | 13.1, access | edit |
The 'shortcuts' title isn't correctly presented. | |
37 | 14 |
The spec uses the term 'base language' without an apparent definition. It also uses the term 'primary language' in the example in this section. The i18n WG has put a lot of thought into matters of this kind lately and it would seem appropriate to step back and consider the usage of language in XHTML2. The i18n WG has begun to use the terms 'text processing language' and 'primary language' ('document language metadata') to mean different things. [The actual terminology is a secondary consideration here.] The idea is that there are two ways in which one needs to declare the language of content: the first is to express the basic language of the document as a whole (this could be used for searching, serving, etc.), the second is to express the language of a specific run of text so that applications that manipulate the text, such as text-to-speech, can correctly understand the text they are currently dealing with. The former declaration (what we call 'primary language') could involve declaring more than one language, eg. for documents containing parallel texts in multiple languages, but doesn't necessarily mention every language that appears in the document. The latter type of declaration (what we call 'text processing language') must, of necessity, refer to only a single language at a time, though that declaration can be overriden for a labelled fragment of the text, eg. an embedded French word in English text. The rules governing the use of language values in HTTP headers and language attributes reinforce our view that the HTTP header should be used to declare the primary language, and that language attributes should be used to declare the text processing language. It is acceptable in my mind to say that, in the absence of language attributes, the first value of the HTTP header Content-Language field could be used to declare the default text processing language for the document, but it would always be better to declare that explicitly in the html tag. Based on the foregoing, I have the following recommendations for this section:
| ||
38 | 14 |
Other brainstorming thoughts related to language declarations, based on the philosophy introduced in comment 37:
| ||
39 | 15.1, 1st para |
Suggested rewording: "This direction overrides the inherent directionality of characters as defined in Unicode Standard" --> "This direction affects the display of characters as defined in Unicode Standard". It doesn't change the inherent directionality of the characters themselves, just the behaviour of those characters in context. | ||
40 | 15.1, example |
You would only need to use the dir attribute on the p element in a rtl context. It would be better to omit it here and to simply state that the default directionality for this text is ltr. We encounter many people who add dir's to almost all the block elements in a file, when a single declaration on the html tag would suffice. I don't want to encourage such behaviour, since it is unnecessary and detrimental. [ Thankyou for adding the note about setting the base direction for an entire doc in the html tag. Very helpful !] | ||
41 | 15.1, example |
It is not at all obvious to the uninitiated reader what the effect of the lro attribute would be unless you show the resulting displayed text. (This is a tricky area to show examples ;) | ||
42 | 16 |
The attribute name datetime is rather unspecific, given that it can appear on any element. I would prefer editdatetime. | ||
43 | 20.5.2, example |
Several problems with this example:
| ||
44 | 20.5.2 |
Again, the title is used for a role specific to this element, and contains user readable text that might need to be marked up for directionality, language, translation id, etc (especially given the context of usage). Does the link element have to be empty? Could it not include this text as content? | ||
45 | 20.6.1, example |
Please use the character è rather than the NCR in Grèce. | ||
46 | 23.1 |
charset: Could this be called 'encoding'? | ||
47 | 23.1 |
It says "Please consult the section on character encodings for more details.", but I wasn't sure which section that was referring to. | ||
48 | 23.1.3,example |
Please add xml:lang="en" to the html tag. | ||
49 | 23.1.3 |
Where is the perl script? | ||
50 | 26.4.2 |
Suggested rewording: "When set for the table element" --> "When set for or inherited by the table element". | ||
51 | Appendix F |
Entities are extremely useful for disambiguation of invisible or identical characters, such as ‏ and . It is much easier to work with names in these cases than to use NCRs. Please be driven by user needs rather than technological limitations where possible. |
Version: $Id: xhtml2-review.html,v 1.2 2005/01/20 17:31:04 rishida Exp $