I18n comments on InkML

I18n comments on Ink Markup Language (InkML)

Main reviewers

Richard Ishida

Notes

These are comments on behalf of the I18N Core WG. The Owner column indicates who has been assigned the responsibility of tracking discussions on a given comment.

The I18n Core WG has not yet reviewed these comments.

We recommend that responses to the comments in this table use a separate email for each point. This makes it far easier to track threads.

Comments

ID	Location	Subject	Comment	Owner	Ed. / Subs.
1	6.3.1 Annotation element	xml:lang not used	Why is xml:lang not used to specify the language of the text in the document? It could be used on the <ink> element to set the default language for the document, then on <annotation> elements where the language of the element's content diverges from the default. A significant advantage to using xml:lang is that the scope is clearly defined and the value is inherited in the infoset - unlike an annotation with semantic tagging. In addition, it is a standard mechanism that is well understood. Note that we are only talking about identifying natural language of text in the document here. Specifying the language of traces should probably be done using a different mechanism. (See xml:lang in XML document schemas for a discussion of the difference.)	RI	S
2	6.3.1 Annotation element	meaning of language annotations	"<annotation type="language" encoding="ISO639">en</annotation>" "<annotation dc:language="en"/>" The example shows these two annotations related to language (in fact there are 3, if you count the Text/en before). It is not made clear why two are needed, nor specifically what they indicate. This needs to be clarified if people are to provide interoperable language information.	RI	S
3	6.3.1 Annotation element	encoding="ISO639"	The description of the encoding attribute and the example a little further down both refer to ISO 639 as a potential source of codes for language values. We strongly recommend that this be replaced with references to the IETF's BCP 47, which is the current standard for language tags, and should be used for such values. If people use ISO639 to derive these tags they will be severely limiting their ability to express language, and will encounter issues about which of the ISO 639 codes should be used for a given language when multiple codes exist. BCP 47 was created to resolve these issues. For more information see Language tags in HTML and XML The text lower down: "The language specification may be made using any of the language identifiers specified in ISO 639, using 2-letter codes, 3-letter codes, or country names." is just completely wrong. ISO 639 does not describe country codes (ISO 3166 does). Also, as mentioned above, it is a recipe for confusion to say that people can use both 2- or 3-letter codes where both are available for a given language, which is what is implied in the text. I'm beginning to suspect that you meant RFC 4646 all along, which is part of BCP 47, and which clarifies all this.	RI	S
4	6.3.1 Annotation element	Specifying script info	Note that script information comes for free these days with language tags, if you are using BCP 47. Would this be sufficient for the needs of the user here? If BCP47 is not recommended to cater for script information, then you should recommend the use of ISO 15924 to identify scripts in an interoperable and well-reasoned fashion.		S
5	6.3.1 Annotation element	Text/<language>	"Some text may also require a script specification (such as Kanji, Katakana, or Hiragana) in addition to the language." It isn't clear whether the script designation applies to the markup content or to the traces. Again, BCP 47-based language tags now support script subtags as part of a language tag (see Language tags in HTML and XML). Why is language specified as part of the content of the contentCategory annotation, rather than via a language attribute on the annotation element, or even the trace and related elements? Again, that we strongly feel that BCP 47 should be used after Text/ if language is described in this way, to allow for the notation to be recognized across applications.	RI	S
6		6.3.1 Annotation element	Is there a possibility that an inkML user would need to label, say, a traceGroup as being in more than one language? If so, how would they do that?	RI	S
7		6.3.1 Annotation element	For most of the examples of sub-category you give (esp. Date, Time, Currency) , how will it be clear what format is used, eg. is the date 01/02/03 a day in January or February?	RI	E
8	6.3.1	Support for bidi markup	Bidirectional text, as might be found in languages using Arabic or Hebrew scripts, may need more than the Unicode bidirectional algorithm to correctly display data (see What you need to know about the bidi algorithm and inline markup). Please consider allowing for such markup on natural language text such as found in annotations.	RI	S
9	6.3.2	xml:lang on XHTML example	Please include xml:lang="en" on the html element of the embedded HTML example, to indicate best practise.	RI	E
10	6.4	Case sensitivity of units	Are unit names case-sensitive? In particular, why do we see 'Kg' rather than 'kg'? A required upper-casing as shown could result in confusion for users.	RI	S/E
11	4.4.1	timeString	It is extremely difficult/impossible to compare timestamps when some have and some do not have time zone offset information. We believe that in InkML there will be times when people will want to add timezone offsets, and therefore we believe you should require all timestamps to always have a timezone offset. (This could just be Z if offsets are not important for a particular instance.) For more information on this, please see Working with Time Zones.	RI	S

Location

Subject

Comment

Owner

Ed. /
Subs.

Discussion threads

6.3.1 Annotation element

xml:lang not used

Why is xml:lang not used to specify the language of the text in the document? It could be used on the <ink> element to set the default language for the document, then on <annotation> elements where the language of the element's content diverges from the default.

A significant advantage to using xml:lang is that the scope is clearly defined and the value is inherited in the infoset - unlike an annotation with semantic tagging. In addition, it is a standard mechanism that is well understood.

Note that we are only talking about identifying natural language of text in the document here. Specifying the language of traces should probably be done using a different mechanism. (See xml:lang in XML document schemas for a discussion of the difference.)

6.3.1 Annotation element

meaning of language annotations

"<annotation type="language" encoding="ISO639">en</annotation>"

"<annotation dc:language="en"/>"

The example shows these two annotations related to language (in fact there are 3, if you count the Text/en before). It is not made clear why two are needed, nor specifically what they indicate. This needs to be clarified if people are to provide interoperable language information.

6.3.1 Annotation element

encoding="ISO639"

The description of the encoding attribute and the example a little further down both refer to ISO 639 as a potential source of codes for language values.

We strongly recommend that this be replaced with references to the IETF's BCP 47, which is the current standard for language tags, and should be used for such values. If people use ISO639 to derive these tags they will be severely limiting their ability to express language, and will encounter issues about which of the ISO 639 codes should be used for a given language when multiple codes exist.

BCP 47 was created to resolve these issues. For more information see Language tags in HTML and XML

The text lower down:

"The language specification may be made using any of the language identifiers specified in ISO 639, using 2-letter codes, 3-letter codes, or country names."

is just completely wrong.

ISO 639 does not describe country codes (ISO 3166 does). Also, as mentioned above, it is a recipe for confusion to say that people can use both 2- or 3-letter codes where both are available for a given language, which is what is implied in the text. I'm beginning to suspect that you meant RFC 4646 all along, which is part of BCP 47, and which clarifies all this.

6.3.1 Annotation element

Specifying script info

Note that script information comes for free these days with language tags, if you are using BCP 47. Would this be sufficient for the needs of the user here?

If BCP47 is not recommended to cater for script information, then you should recommend the use of ISO 15924 to identify scripts in an interoperable and well-reasoned fashion.

6.3.1 Annotation element

Text/<language>

"Some text may also require a script specification (such as Kanji, Katakana, or Hiragana) in addition to the language."

It isn't clear whether the script designation applies to the markup content or to the traces.

Again, BCP 47-based language tags now support script subtags as part of a language tag (see Language tags in HTML and XML).

Why is language specified as part of the content of the contentCategory annotation, rather than via a language attribute on the annotation element, or even the trace and related elements?

Again, that we strongly feel that BCP 47 should be used after Text/ if language is described in this way, to allow for the notation to be recognized across applications.

6.3.1 Annotation element

Is there a possibility that an inkML user would need to label, say, a traceGroup as being in more than one language? If so, how would they do that?

6.3.1 Annotation element

For most of the examples of sub-category you give (esp. Date, Time, Currency) , how will it be clear what format is used, eg. is the date 01/02/03 a day in January or February?

6.3.1

Support for bidi markup

Bidirectional text, as might be found in languages using Arabic or Hebrew scripts, may need more than the Unicode bidirectional algorithm to correctly display data (see What you need to know about the bidi algorithm and inline markup).

Please consider allowing for such markup on natural language text such as found in annotations.

6.3.2

xml:lang on XHTML example

Please include xml:lang="en" on the html element of the embedded HTML example, to indicate best practise.

6.4

Case sensitivity of units

Are unit names case-sensitive? In particular, why do we see 'Kg' rather than 'kg'? A required upper-casing as shown could result in confusion for users.

S/E

4.4.1

timeString

It is extremely difficult/impossible to compare timestamps when some have and some do not have time zone offset information. We believe that in InkML there will be times when people will want to add timezone offsets, and therefore we believe you should require all timestamps to always have a timezone offset. (This could just be Z if offsets are not important for a particular instance.)

For more information on this, please see Working with Time Zones.

Version: $Id: Overview.html,v 1.3 2007/01/09 16:57:28 rishida Exp $

I18n comments on Ink Markup Language (InkML)

Version reviewed

Main reviewers

Notes

Comments