I18n comments on Internationalization Tag Set (ITS) Version 1.0

Version reviewed

http://www.w3.org/TR/2006/WD-its-20060518/

Main reviewers

Richard Ishida

Notes

These are comments on behalf of the I18N Core WG. The Owner column indicates who has been assigned the responsibility of tracking discussions on a given comment.

We recommend that responses to the comments in this table use a separate email for each point. This makes it far easier to track threads.

NOTE: This set of comments is unusual due to the fact that we have included some initial discussion in the comment fields. This is because Felix Sasaki is an editor of the ITS specification and also an i18n WG participant, and as such has responded to some of the comments by email before they were sent to the ITS WG. Since Felix was travelling when the i18n WG discussed these comments, and because we wanted to send our comments to the ITS WG before Felix would be able to engage in further discussion with the group, Felix's initial responses are documented here where they diverge from or add to the point of view of other i18n WG participants.

Initial responses from Felix look like this, and begin with 'FS:'.

Responses from the i18n WG follow.

Comments

ID Location Subject Comment Owner Ed. /
Subs.
Discussion threads
1 6.7.1 Language information dc overview

"The element langRule is used to express that a given piece of content (selected by the attribute langPointer) is used to express language information as defined by [RFC 3066bis]."

It's surely not the content that expresses the language information, it's the markup that expresses the language of the content. We feel that this paragraph does not explain its intent clearly, and mixes up several ideas.

We suggest:

"The element langRule is used to express the language of a given piece of content (selected by the attribute langPointer). The langPointer attribute indicates the markup which expresses the language of the content pointed to by langRule. This markup must do so using values that conform to BCP 47 or its successor."

RI E
2 6.7.1 RFC 3066bis

We recommend that you say, BCP 47 instead of RFC 3066bis.

We also strongly recommend that you add the phrase "or its successor" after reference to RFC 3066bis or BCP 47, since RFC3066bis is expected to become obsolete soon after it is released (to make way for RFC 3066ter).

RI S
3 6.7.1 Example 33

"The following langRule element expresses that all p elements (including attributes and textual content of child elements) have a language value conformant to [RFC 3066bis]. The value is given by the mylangattribute attached to the p elements."

The following langRule element expresses that the content of all p elements (including attribute values and textual content of child elements) are in the language indicated by the mylangattribute, which is attached to the p elements, and expresses language using values conformant to BCP 47 or its successors.

RI E
4 6.7.1 Use of xml:lang

The explanation of the language information data category should clarify that this only provides for rules to be expressed at a global level because it is assumed that it would be unnecessary locally. Locally users would be able to use xml:lang (which is defined by XML) or an attribute specific to the format in question (as in Example 33).

The XML spec acts in a way like an ever present provider of certain global rules defining such things as xml:lang.

RI E
5 6.2 Translatability

'Translatability' is not a good term for this, since it is already used in the sense of internationalization to allow for easy translation. Perhaps "Translation information" would be better, and more consistent with other data category titles.

FS: The first ITS WD already talks about "translatabilty". So does the requirements document http://www.w3.org/TR/2005/WD-itsreq-20050805/#transinfo . Given this long history of the term which you must be aware of, I disagree with your request to change it. I also disagree with your argument of consistency with other data categories: Our envisaged users are likely to focus only on a subset of data categories, see also the conformance section which separates data categories. Hence, consistency of naming is not so important, but rather consistency between ITS working drafts, implementations, presentations, ... .

I18n: There is no need to be consistent in this regard with past working drafts. People should expect Working Drafts to change, as described in the status section. There is a much greater need to go forward with appropriate terminology.

We don't see that this is a difficult change to make.

"Our envisaged users are likely to focus only on a subset of data categories" We believe this is irrelevant to appropriate naming of a given data category, but in addition I don't think you are proposing that the 'translatability' category will always be used independently of other implementations, so I don't think this argument holds.

We may be prepared to accept that 'Translation Information' is too vague. Alternative suggestions for the title are 'Translate Information' or 'Translate Directive'.

RI E
6 6.2 Inheritance of translation information

The inheritance of translate information by child elements but not attributes is mentioned in 6.1, but should also be mentioned in 6.2 with reference to the global rules as well as the local ones.

RI E
7 6.2 Default translate value

This section must mention that the default value of the translate dc is 'yes'.

RI S
8 6.2.2 Invert translate examples

The examples show the value of translate being set to 'yes', but since this is the default anyway, this is unnecessary and confusing. Please substitute examples that indicate where translation is not required.

RI E
9 6.3.1 6.3 ed 1

"This data category has several purposes:"

Suggest: This data category can be used for several purposes, including, but not limited to:

Otherwise, it looks like an exclusive list.

RI E
10 6.3.1 Mention translation tools

Just a suggestion.

Add: Note that translation tools can be made to recognize the difference between these two types of localization information, and present the information to translators in different ways.

RI E
11 6.3.2 Use of elements

The locInfo element cannot contain directional markup for bidi languages, nor language markup, nor ruby markup, nor spans - all things which this document makes out to be important for well internationalized content.

Can we not come up with a model that allows for at least those things?

Please allow for span elements to at least include other span elements, so that language or directionality values can be applied to ranges of text within a span.

FS: the "model" you describe would be IMO just to allow for its:span within <locInfo>, and to allow nesting of <its:span> within <its:span>.

RI S
12 6.3.2 Loc Info or Loc Note

Is there any reason that locInfo is not called locNote - since that is much more true to the meaning.

FS: The naming relies on the name of the data category. I don't think that your proposal "locNote" is appropriate, since providing notes is *one* usage of this data category. See a different usage described at http://lists.w3.org/Archives/Member/member-i18n-its/2006AprJun/0107 ( issue 4, "With the localization information data type"), which (possibly automatically) uses "localization information" for adding linking information between different translation versions. Such links are certainly no "notes" and have a different status than e.g. locn-note in the XMLSPEC i18n DTD, see example 20 at http://www.w3.org/TR/xml-i18n-bp/#xmlspec .

I18n: We think there is a serious risk of diluting the meaning of locInfo here to the point that a translation tool, for example, doesn't know what to do with the information pointed to. Pointing to previous translations should be another data category in ITS v2, since the translation tool may well need to do different things with this type of information. That's exactly why I wanted to change the name of the data category - because localization information is too broad a category to describe exactly what this is about. We need to be more specific. Note that the requirements document [1] refers to this specifically as Localization Notes. (If you are going to argue for consistency with that document in comment 6, you should take it into account here, too. ;-)

So the key issue here is what exactly is the scope of the locInfo data category. We feel it should be kept specifically to providing notes to translators, and other similar mechanisms such as pointing to former translations, should be considered more carefully in version 2, when they may well be best implemented as a different data category altogether.

[1] http://www.w3.org/TR/2006/WD-itsreq-20060518/#locnotes

We are raising the status of this comment to S.

RI S
13 6.3.2 Examples 22 and 23

These examples are weak. Can we improve them to look more realistic/useful? I may be able to help with this.

RI E
14 6.3.2 Explanations for examples

It would be better to provide commentaries on each example, indicating what is being said, to avoid possible ambiguity/misunderstandings.

RI E
15 6.3.2 Example 24

The localization notes pointed to in the document are presumably non-translatable text, and the example should therefore include a translateRule element in order to correct observe the proprietaries.

RI E
16 6.3.2 Attribute used for locInfo

We will advise against using attributes for natural language text in the BP doc - we should at least explain why we are doing it here.

RI E/S
17 6.4.1 Marks terms and meanings

"The terminology data category is used to mark terms."

Surely, as a definition, this must say 'The terminology data category is used to mark terms and associate them with definitions'?

FS: You will see at http://www.w3.org/TR/its/#terminology-markup that you can use the terminology data category for just identifying terms. All attributes for adding further information (e.g., but *not only* about definitions) are optional.

I18n: Then let's reword the proposed text to Surely, as a definition, this must say 'The terminology data category is used to mark terms and optionally associate them with information, such as definitions'?

(We have reduced this comment to editorial.)

RI E
18 6.4.1 Role of termInfo

We think we are running against danger if we broaden the scope of the data definition so widely that any kind of information can be associated with 'terms'. The whole point of ITS is to clarify what to expect from markup. Given your comment above, If tools don't know what to do with the data they have read in, what have we gained by using ITS?

We think that we should say that the information at the end of the termInfo link is a definition or relevant text to support the understanding of the term identified.

If we are too vague about what is pointed to, we will lose the advantage we were aiming for of tools understanding what the ITS markup is saying.

(For example, some enterprising developer may decide that the terminology mechanism here provides a nice way of associating designer's notes with inline text. That would be a bad idea because one point of creating the ITS spec is that translation tools should understand what type of information they are receiving, and handle it automatically. He should use the locInfo mechanism for that.)

RI S
19 6.4.2 Pointing to terms

"identifying terminology information at selected nodes is realized with a termRule element with a mandatory selector attribute. In addition, an optional termInfoRef attribute can be used to refer to external information about the term. "

We think this should say:

"terms are identified by a termRule element, which has a mandatory selector attribute. An optional termInfoRef attribute can be used to refer to external information about the term."

Note the clarification of 'terminology information' -> 'terms', since otherwise termRule and termInfoRef seem to be pointers to the same thing.

RI E/S
20 6.4.2 termInfoRefPointer referant's data

Please make it clearer that the attribute pointed to by termInfoRefPointer (and any similar attributes) MUST contain a URI.

RI E
21 6.4.2 termInfoRef should allow for id strings

Please provide an attribute that points to information about a term using a reference to an ID, recommending that this be defined as an IDREF where appropriate. Eg. the termref element in xmlspec contains a def attribute that points to the term definition using just an idref.

RI S
22 6.4.2 Allowing for xpath expressions to point to term definitions

Please provide a mechanism to point to a term definition if it doesn't have an associated id? Eg. in a DL list, where the successive DT and DD elements contain terms and definitions without ids. Surely something like termInfoPath is needed.

Example 19 shows this scenario quite clearly. Terms are identified, but not linked to the definitions given in the example.

FS: You could point to term definitions with a termInfoPointer attribute:

<dl>

<dt>...</dt>

<dd>...<dd>

</dl>

possibly in a different document (see my reply to your comment 50 below):

<its:termRule selector="//dl" termInfoPointer="following-sibling::dd"/>

For consistency, I would propose to keep the naming scheme "xxxPointer", and not "termInfoPath". Also for consistency, where should be a global or local termInfo attribute (if the ITS group decides to accept this proposal).

RI S
23 6.8.1 Hard to know what this is about

The definition of flow of content as 'representing how the nodes of the element should be treated as a single unit for linguistic purposes' is not only hard to read, but doesn't tell me anything about its intended use. Could we be talking about identifying noun phrases?

I think we are talking about identifying translation unit segmentation breaks. If this is the case, please make that clear. I think people reading this could be either unclear about its intent, or misunderstand in such a way that it is implemented for all sorts of uninteroperable structural definitions of the content.

RI E
24 6 Standardised wording

As I try to understand the section on data categories, I keep wishing there was more standardization of the text. For instance, in one place we have "Directionality can be expressed with global rules or locally on an individual element." as the second sentence under implementation; elsewhere "Ruby can be expressed locally in a document or with global rules." as the first sentence; elsewhere "This data category can be expressed only in a set of rules. It cannot be expressed as local markup on an individual element." And Language Information doesn't have an Implementation section at all.

It would be much easier to compare and contrast, but also pick up information if this was expressed in a standard form, eg. first sentence under "Implementation" is always "XXX can be expressed with global rules, or locally on an individual element.", or, in the case like the third above "XXX can only be expressed locally on an individual element."

Other similar standardisations could be applied to section 6.

RI E
25 6.7 No implementation section

There is no implementation section in 6.7. Please add one.

RI E
26 6.5.1 Repetition

'Its values are "ltr", "rtl", "lro" or "rlo". '

This sentence can be dropped, since that aspect is defined in the following implementation section.

RI E
27 6.5.2 Is dir mandatory?

"As for global rules, directionality is expressed in rules using a dirRule element with the dir attribute. In addition, a selector attribute is required." ->

Global rules: Directionality is expressed using a dirRule element with one dir attribute and one selector attribute, which are both mandatory.

RI E
28 6.5.2 Avoid xml:lang='he'

GEO WG regularly has to clarify for people that language declarations and directionality markup are very different things - including a long and draw out discussion with DITA folks. It seems dangerous to introduce an example here that seems to the uninitiated to say that the language markup is defining the application of the directionality markup.

In particular since the xpath expression probably doesn't need to go that far anyway for a reasonable example. It could just say /body/p[1]/quote or //quote[23] or some such.

RI E
29 6.5.2 Some Hebrew quotation

I don't know whether the intent here was to say 'Some text that is a Hebrew quotation', but as it stands, it sounds rather dismissive. "A Hebrew quotation" would probably be better.

RI E
30 6.5.2 eg 30 Example 30

The <book>, <head> and <body> elements just appear to be unnecessary clutter in this example, making it take longer to understand what to focus on. Please remove.

Also, it would probably be useful to add some real Hebrew here. I suggest:

פעילות הבינאום, W3C

RI E
31 6.5.2 Refer to bidi article

Please refer to the article What you need to know about the bidi algorithm and inline markup or to the best practise document, soon to be renamed Internationalization Best Practices: Handling Right-to-left Scripts in XHTML and HTML Content, for more information about how to use this attribute.

RI E
32 6.5 Default is ltr

Please state that the default direction is LTR if nothing else is specified.

RI S
33 6.6.2 Example 31 lacks rp

I can't see any good reason why one should omit the <rp> markup in Example 31, but I can see very good reasons for including it. Please add.

RI E
34 6.6.2 Use Japanese in Example 31 Although it may be useful to show that ruby content doesn't have to be Japanese, I think it would be more useful and improve the international feel of the spec, to use a Japanese example. We can provide one if needed. RI E
35 6.6.2 Existing ruby markup

It's not abundantly clear from the text, but i think you mean to say, in the last para, that there are a set of global rules for associating markup that *conforms to Ruby-TR* with ruby concepts. Please make that clearer, and make it clearer that these global rules *do not apply* if the target markup is not identical to or a conformant subset of Ruby-TR.

I'm also wondering whether XHTML 1.1 would need such global rules. If it does, it would be a good example to cite.

RI E/S
36 6.6.3 Make the application of legacy ruby clearer

It is very poorly explained what the difference is between this and the situation described in the last para of 6.6.2.

How about:

"Where legacy formats do no contain ruby markup conformant to [Ruby-TR], it is still possible to associate ruby text with a specified range of document content using the rubyRule element."

Note also s/and there one wants/and where one wants/

RI E
37 6.6.3 rubyText is an attribute

Is there a reason an attribute has been used for rubyText, rather than an element?

RI S
38 6.6.3 in the case of no selection

I don't understand the following two pieces of text:

"(corresponding to the rt element in the case of no selections)"

"", corresponding to the rb element in the case of no selection.

RI E
39 6.6.3 Example 32 head

The </head> tag appears to be in the wrong place. Surely it should appear after the its:rules element?

RI E
40 General Too many places to look

To get a clear picture of the data categories you need to flick between section 5.1 (which i initially ignored because i thought it was just a 'summary'), 6.1, and the data category section, plus various bits such as parts of 5.3. This is really hard work! Can't we at least merge 5.1 and 6.1 into the data category sections. (If we want to keep them as summaries we could make a copy in the Appendices.)

FS: I disagree with your proposal. Merging 5.1 and 6.1 into the data category sections would make spec reading *very hard* for people who are interested in the general concepts of ITS, and who possibly only implement a single data category.

I18n: We are happy to repeat the information currently in summary form in 6.1, butwe would still like to see that information copied into the data categories. We think we need to find a way to make life easy for people interested in general concepts while not making it harder for people who need to implement the data categories. And we think that whether you want to implement one category or several, you'd be better off not having information scattered around the document.

I strongly contend that section 5.1 should be merged into the data categories - and that at the very least it should not be entitled 'Summary' since it is not summarizing at all. A summary draws information from elsewhere and condenses it. This section is setting out an initial set of information, some of which is only found here.

Also in the scope of the selection is specified for the local case in 6.3.2 but that for the global rules (only!) is specified in 6.1. This sort of inconsistency makes it worse.

RI E
41 5.1, 5.3.2 xml:lang missing

We feel it is not acceptable that the markup proposed by ITS makes no mention of ways to define language for the content of ITS elements. At the very least, a proposed approach should be described in text, with xml:lang being held up as the most standard way to do things. Such text could say something like:

"xml:lang should be available on the span element, unless the format into which ITS is being integrated has an attribute that fulfills the role of xml:lang, in which case that attribute should be available on the span element." And similar for other elements defined for ITS.

ITS can point out that xml spec does define xml:lang in a global rules kind of way.

RI S
42 5.3 Spell out the attributes

s/As for data category specific attributes like locInfoPointer which point to existing information in the document, a RelativeLocationPath as described in [XPath 1.0] MUST be used. The XPath expression is evaluated relative to the nodes which are selected via the selector attribute./Attributes that point to existing information in the document, ie. attributes whose name ends in ...Pointer, MUST use a RelativeLocationPath as described in [XPath 1.0]. The XPath expression is evaluated relative to the nodes selected by the selector attribute./

RI E
43 5.3.2 Example 14 translate

The default for the translate flag is yes, so it doesn't need to be specified on <body>. A better example might be an extract from XHTML showing translate=no on the <head>, but overriding that with translate=yes on the <title>.

RI E
44 5.3.2 Example 14 dir

Please don't encourage people to specify dir on the <body>. It should be specified on the <text> element (because the head may just as well contain rtl information.

In addition, ltr is specified, but this is the default value, so it should not appear here.

RI E
45 5.6 ITS markup must be integrated

"Some markup schemes provide markup which can be used to express ITS data categories. ITS data categories can be associated with such existing markup, using the global selection mechanism described in Section 5.3.1: Global, Rule-based Selection. In this way, there is no need to integrate ITS markup into documents."

The last sentence is completely incorrect as it stands. It should say that there is not need to integrate *local* ITS markup into those documents. The markup to support ITS global rules is very much needed!

FS: Please have a look at http://www.w3.org/TR/its/#selection-precedence , list point no. 3: [Global selections in an external file (using a rules element), linked via the XLink href attribute or a different mechanism] That is: you *can* use different mechanisms than an <its:rules> element with an XLink href attribute, e.g. an external rules file. It depends on your implementation how the rules are activated, e.g. via a command line option. In other words: "In this way, there is no need to integrate ITS markup into documents." My current implementation in XSLT allows for such an (command line) option, and I'm sure it is no problem to have it for Yves' and Sebastian's implementation or others as well. However, it does not make sense to standardize this option, since it is implementation dependend.

I18n: Wherever the global rules are they are integrated 'into documents', so we assume that what you meant to say was perhaps "If global rules are defined in an external rules file, there is no need to integrate ITS markup into the document."

If, however, you are referring specifically to rules activated via command line options, as mentioned above, and given that you say that it doesn't make sense to standardize this option, then I'm wondering why you would mention it in this way. At best a note with a caution would be appropriate. Please provide clarification.

RI E
46 5.6 Example 19 xml:lang

Should this be identified by global rules as language information? If not, we should explain why in the language information data category section.

FS: The current definition 6.7.1. says "The element langRule is used to express that a given piece of content (selected by the attribute langPointer) is used to express language information as defined by [RFC 3066bis]." I would add a note saying

"Applying this data category to xml:lang attributes does not make sense since xml:lang is already defined in terms of RFC 3066 or its successor". If that would address your concern, please add it to your comment.

I18n: We propose a slight alteration to the above:

Applying the langRule data category to xml:lang attributes using global rules is not necessary, since ...

RI S/E
47 5.1 Attributes missing?

"The following list summarizes elements and attributes to be used locally:"

The list only shows elements and the attributes that can be associated with them. What about attributes such as its:translate, which can appear locally?

RI E
48 General xml:lang = language info, please

We feel that there should be a note in the specification that says that it is implied/understood that xml:lang represents language information wherever it appears.

RI E

Version: $Id: Overview.html,v 1.10 2006/07/11 19:05:34 rishida Exp $