This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 6245 - SML locid example request from ITS Interest Group
Summary: SML locid example request from ITS Interest Group
Status: RESOLVED FIXED
Alias: None
Product: SML
Classification: Unclassified
Component: Core (show other bugs)
Version: LC
Hardware: PC Windows XP
: P2 normal
Target Milestone: PR
Assignee: Virginia Smith
QA Contact: SML Working Group discussion list
URL:
Whiteboard:
Keywords: externalComments, resolved
Depends on:
Blocks:
 
Reported: 2008-11-20 14:31 UTC by John Arwe
Modified: 2009-01-28 15:26 UTC (History)
2 users (show)

See Also:


Attachments

Description John Arwe 2008-11-20 14:31:45 UTC
Original request at http://lists.w3.org/Archives/Public/public-sml/2008Oct/0032.html from Yves Savourel on behalf of the ITS Interest group wrt appendix F "Localization and Variable Substitution Sample (Non-Normative)".  Creating this bug in response to the decision of the SML working group telecon of 2008-11-06.

The request consists of several distinct parts:

[1] provide an example that uses XML to store the locale-specific entries
[2] correct a spelling error in the French message in the draft document
Comment 1 John Arwe 2008-11-20 14:45:16 UTC
Yves, in your comment you supply some example XML and then afterward note that the ITS TR "could also" be used ("Note: In this example, the XML documents containing the text could also use the Internationalization Tag Set (ITS:
http://www.w3.org/TR/its/) ").  

Since we have no one in the SML working group intimately familiar with ITS or the state of the art with respect to XML globalization practice, some questions came up during the wg's initial discussion:

- Is the XML you supplied, especially the unqualified "messages" and "msg" elements, part of ITS, some other recognized and adopted standard, or any industry practice with demonstrable public domain adoption?

- Same question for the putative ITS version you allude to.

The existing examples were built based on code in one of the existing known SML implementations, based on the Java resource bundle concept (I'm not sure which 
category above this falls into, but at the minimum it is one with broad industry adoption amongst Java apps).

Please note also that the appendix in question is exemplary, not normative or limiting.  To the degree that either alternative you are suggesting has demonstrable public adoption or prescribes a format agreed to by a broad community I expect that will help make the case for adding it/them as additional examples.  I am less sanguine about the prospects for removing the existing example ("we suggest to replace this section"), since it is based on implementation experience.
Comment 2 Yves Savourel 2008-11-20 22:49:05 UTC
Hi John,

> - Is the XML you supplied, especially the unqualified "messages" and 
> "msg" elements, part of ITS, some other recognized and adopted 
> standard, or any industry practice with demonstrable public domain 
> adoption?

The example shows a generic imaginary XML file. There is no ITS-specific markup in it. It does implements some of the XML internationalization best practices: use of xml:lang to identify the language and use of a unique ID for each message.

So, while this specific document instance is not in a standard vocabulary, it is in XML which is a recognized and widely adopted standard. And this is our main point: XML (any XML) is often the best choice to store XML content (rather than properties files).


> - Same question for the putative ITS version you allude to.

ITS is only a set of attributes and tag one can add on top of an existing XML document, not a format you could use directly to store strings. You can see an example of such markup in the recent SVG-Tiny PR document: http://www.w3.org/TR/SVGTiny12/i18n.html#SVGi18nl10nmarkup (its.svg).



> The existing examples were built based on code in one of the existing 
> known SML implementations, based on the Java resource bundle concept 
> (I'm not sure which category above this falls into, but at the minimum 
> it is one with broad industry adoption amongst Java apps).

And we certainly don't see much wrong with it. Except that it could be done in a way that is more flexible for localization. We would see no problem in keeping that current example.

One side note on your existing example: the files seem to use a naming convention that is not quite the recommended one: The locale codes should be suffixes rather than prefixes. For instance it should be lang_fr.txt rather than fr_lang.txt. The names of properties file is important as their pattern is hard-wired in Java classes such as java.util.ResourceBundle (see the getBundle() method for example).



> Please note also that the appendix in question is exemplary, not 
> normative or limiting.

Yes, but I also think it is quite important to convey best practices in examples, as they are often the references many developers use to design and code their own implementations by default. In a sense, appendices like this are the place where the broad community is taking its clue from.



> To the degree that either alternative you are suggesting has 
> demonstrable public adoption or prescribes a format agreed to by a 
> broad community I expect that will help make the case for adding 
> it/them as additional examples. I am less sanguine about the prospects 
> for removing the existing example ("we suggest to replace this 
> section"), since it is based on implementation experience.

I understand these valid concerns.
At the same time, you may want to take in account the following:

-1) With regards to ITS: Some of the recommendations the W3C produces break new grounds and, initially, are not adopted by a broad community. While there are various ways to promote such specifications, one important conduit is the other W3C specifications where users can see examples and get exposed to them. It is especially true for specification like ITS which are more 'add-ons' than full-blown XML applications addressing a specific domain. Think of ITS (for example the its:translate attribute) as something akin to xml:lang.

-2) With regards to XML vs Properties: For many reasons, from the localization viewpoint, translatable data (and most especially those with XML tags) are, in general, best stored in XML than in other formats. For example:

a) Encoding is clearly addressed and easily handled in XML (no \uHHHH escaping, much less chances to lose or corrupt non-ASCII characters).

b) You can use many generic purpose XML tools to work with XML files: for example one could open an XML resource file in an XML editor and spell-check the translated text, or do grammar checking, or perform a word-count, or use an XSLT template to display it in a user friendly way for review, etc.

c) You can easily have the storage format evolve over time without changing its core or the tools that use it. For example you can add/remove attributes useful for the translation process workflow.

d) If the data contain XML tags (like your example). Most XML-enabled tools will be able to "see" them as tags part of the content and protect them accordingly. If the same data is in a different storage format (like a properties file) most translation tools will treat the inline tags as text, exposing them to accidental modifications that can end up in invalid data at runtime.

e) XML documents have now an internationalization set of tags (ITS) that can be used to provide a lot of internationalization and localization-related features in a standard way, facilitating the localization workflow.


All this is true, independently of SML and any implementations of SML. Obviously, you always have to weigh the pros and cons of any solution, and in this occurrence some applications may find the better choice to be simple properties files. But I think it would make sense to also show an example of what we think is a better practice.

Maybe an alternative to replacing the existing example could be to add one with an XML file. Something similar to the following, that would go just above the "Variable substitution support" title:

=====

Translatable messages, especially strings containing XML tags (like <sch:value-of select="string(u:ID)"/> in this example), may be best stored in XML containers. This allows more flexibility to manipulate and translate the data. For example, the XML document could utilize ITS to add localization-related information.

<?xml version="1.0" encoding="UTF-8"/>
<messages xml:lang="en"
 xmlns:sch="http://purl.oclc.org/dsdl/schematron"
 xmlns:its="http://www.w3.org/2005/11/its" >  <msg xml:id='StudentIDErrorMsg'
  its:locNote="This message should not be longer than 128 characters">The specified ID <sch:value-of select="string(u:ID)"/> does not begin with 99.</msg> </messages>

=====

cheers, -ys
Comment 3 John Arwe 2008-12-04 19:52:46 UTC
The working group discussed this bug on its telecon of 2008-12-04 and decided to make the following changes:

1. Correct the existing French spelling mistake (see email).
2. Change the "java resource bundle" file names as suggested in comment 2.
3. Add the requested ITS-based example offered at the end of comment 2.

Please let us know if this is acceptable Yves via comment to this bug.  The draft will be available in January and should be integrated into the PR spec.
Comment 4 Yves Savourel 2008-12-04 20:28:44 UTC
> 1. Correct the existing French spelling mistake (see email).
> 2. Change the "java resource bundle" file names as suggested in comment 2.
> 3. Add the requested ITS-based example offered at the end of comment 2.
> Please let us know if this is acceptable Yves via comment to this bug.

Yes, it is acceptable.
Thank you.

Comment 6 John Arwe 2009-01-28 14:30:29 UTC
(In reply to comment #5)
For Yves primarily... note that the diff is truncated, and there is changed text in the truncated portion (this is a known tool issue, nothing our editors can fix).  To see the rest of the newly drafted text, please see http://dev.w3.org/cvsweb/~checkout~/2007/xml/sml/build/sml.html?content-type=text/html;%20charset=utf-8#Acknowledgements and page up.  Only the paragraphs after the final example in App F were truncated.
Comment 7 Yves Savourel 2009-01-28 15:26:13 UTC
Looks fine to me.
Thanks for taking the time to implement the change.
-yves