W3C Internationalization Workshop
Position Statement

Richard Ishida

Globalisation Consultant
International Document & User Interface Design
Xerox GKLS

1. Summary

The following is a brainstorm of ideas that I feel could be discussed during the workshop given unlimited time. Such discussion should determine which of these are particularly relevant to the current work of the W3C, and of those which are of the highest priority for determining the near time direction.

2. Standards, Guidelines, Education & Outreach

[1] Guidelines development

Guidelines should be developed that encourage people to design and develop information and applications in a way that meets the needs of the international user AND anticipates the needs of the localisation activity (ie. design that allows for easy localisation at the point of need). 'People' include developers of specifications, tools and applications involving web technology, as well as developers of DTDs, content and stylesheets.

Note that the *localisability* of data is an area which merits significant attention, in addition to meeting the needs of the international user by addressing such things as script support or character encoding. One of the key needs of industry is to reduce the cost and length of localisation in order to make entry into a wider range of international markets cost effective. There is also a need to design so as to not put barriers in the way of further market expansion when opportunity or need dictates.

The guidelines should complement and complete the guidelines of the Web Accessibility Initiative in a practical way - ensuring that the notion of 'accessibility' fully incorporates the idea of availability to people of differing cultures. In my mind, this is more than just a proximity of concepts, and should mean some convergence (or at least strong partnering) of the activities of WAI and internationalisation groups at the W3C. In fact, I would go as far as to say that there should be only one document containing guidelines - not one for WAI and one for i18n (although it makes sense to allow the user to screen out certain types of information according to their needs). Guidelines need to be as practical and easy to use as possible for the intended reader at the point of need (eg. checklists grouped by task, and supported by backup examples and explanations). Asking a person who wants to implement, say, a table in an accessible way to look up WAI and i18n guidelines in separate documents that use different approaches imposes on what is usually still these days their goodwill.

The W3C will need to address the question of the range of topics to be covered by guidelines. The guidelines of the Character Model are heavily concerned with matters relating to characters, encoding and the like. An obvious area for additional guidelines is the use of constructs in existing markup languages (eg. (x)HTML, SVG,...) to either enable interoperability in a globalised system or improve the localisability of data. But the range of i18n considerations applicable to document and ui design also includes such things as navigation, screen space and layout, implementing graphics, creating source text, designing interoperable systems, choosing and implementing fonts and complex script rendering, multimedia design, handling data format conventions, supplying data for translation/localisation, assessing the impact of cultural differences on market requirements, and much more. Some prioritisation will be needed.

As a high priority, I would like to see guidelines focussing on content development, DTD design and stylesheet development relating to implementation in XHTML, XML, XSL, XSLT, CSS, XForms, SVG, and other similar specifications. I would also like to see some focus on the separation of localisable data from stylesheets and templates.

[2] Guidelines for developing internationalised DTDs

As a suggested starting point for guidelines on the development of internationalised XML DTDs see ITS Requirements, Working Draft, Jun-06-01. http://groups.yahoo.com/group/lisa-its/files/ITS-Requirements/ITS-Requirements.html

This latter document is particularly concerned with the needs of the localisation effort. It needs to be amplified with other areas relating to such things as white space handling, use of markup vs. Unicode control characters, use of alternative content or entities for different markets, provision of meta data to describe document structure for localisation tools, provision of information about available space and other aspects of content affected by localization, the ability to tag terminology and semantics within content, a way of expanding the language tag concept to adequately cover the locale and script oriented needs of the localization community, incorporation of markup to support international script features (such as ruby and arabic directionality), and so on.

The ideal, in my mind, would be to establish re-usable standards, namespaces, guidelines, practises and the like, so that it becomes second nature to design with the global user in mind, and so that the wheel does not need to be reinvented each time. To make this work the localisation community will need to be involved. See [7] below.

[3] Outreach and education

I believe we should follow, emulate and participate in the WAI outreach activities (adapted of course to the appropriate audience). They are doing a great job.

[4] W3C Housekeeping

We should establish a task force to ensure that outputs of the W3C follow the advice we put together for others - esp. HTML Tidy, XMLSpec, publication rules and checkers, and so on. Also ensure that pages can be easily served in utf-8 if desired.

[5] Browser testing and feature applicability charts

It would be very useful to me to be able to look up a chart that indicated what browsers and browser versions supported which i18n features (eg. ruby, bidi, utf-8, lang attribute, :lang, white-space handling, writing-mode:lr-tb, etc, etc.) This would help me implement pages that used the most up-to-date internationalisation features appropriate to my audience without the pain of trial and error (or perhaps more likely erring too far on the side of caution).

This would involve developing tests and publishing the results. I think it would be a great service to offer, and a good way to pull people regularly to the pages relating to the internationalisation activity.

Similar charts could also be used to show the use of any proposed standard tags (see [7]) for XML vocabularies or their alternatives (eg. in DocBook, etc.).

3. Technical Development Work

[6] Internationalised data formats

Time and date formats are just two of many ways in which people represent the same or similar information differently. Other examples include numbers, currencies, temperatures, weights, dimensions, addresses, telephone numbers, personal names, paper sizes, etc.

It would be great if there was a way of capturing this information in a non-culturally-specific way and rendering and (more difficult) recognising it automatically in a culture-specific format, that could be used by people implementing web based communication - be it web page forms or exchange of information between machines.

The work involved in this is not trivial, but it is desperately needed. Whether the W3C should attempt to produce this or work with others to achieve it is for discussion, but either way I believe it would be very useful.

[7] Internationalisation tag set

It would be useful to develop a set of tags that others could use for creating DTDs to ensure attention to key internationalisation requirements, rather than expecting them to redesign the wheel. This could be in the form of a namespace for inclusion in a schema, or simply a partial DTD and set of recommendations. (This ties in closely with the guidelines work described earlier, but goes further to suggest the names of tags to be implemented.)

It would also be useful to come up with or point to recommendations for localisation related meta-information as described by Yves Savourel.

The maximum benefit will only be realised if the localization community is involved in standardising the approach to identifying non-translatable content, providing designer's notes, and so forth. This standardisation will allow computerised translation tools to automatically recognise the vocabulary of the XML data being localised, facilitating and making more efficient the exchange and processing of data at the point of localization.

See ITS Requirements, Working Draft, Jun-06-01. http://groups.yahoo.com/group/lisa-its/files/ITS-Requirements/ITS-Requirements.html

[8] 'Language tags'

Although rfc3066 is currently used for 'language tagging' in XML and HTML, it doesn't completely address the practical needs of all people working in industry. For example, quite apart from the need to expand the number of language names (with its inherent difficulties in distinguishing between language and dialect, as well as historical variations), there are ambiguities in usage and a lack of a clearly defined way to separate out information such as script. There also needs to be a clearer idea of what locales are, and how they are represented, for people who deal with these.

For a few ideas on this subject (meant only to stimulate thought), see Appendix A.

Again, it may be inappropriate for the W3C to propose a solution to this on its own because of the size of the issue, and the need to take into account the views and activities of several other communities that have a vested interest in this problem, but I think it may be worth the i18n group to be involved in some way.

[9] Script support

I think the i18n working group should continue to contribute to the development of new constructs relating to extending script support in markup languages and stylesheets. Great progress has been made with the html bidi implementation, xhtml ruby module and the proposal for text support in CSS3, but more remains and I would like to see this firmly within the mission of the working group.

[10] Standard icons

It would be useful to produce some proposals for standard icons or adaptable metaphors (allowing for regional variation where appropriate). For example, a standard icon that allowed one to point to a list of (or link to) country/language site selections would be extremely handy. It would avoid the need for the language-specific text-based pointers that are currently the norm (eg. 'Global sites...'). Text based approaches can be problematic in two ways:

  1. they may not be understood - that's often why you are going to the selection list (eg. how would the average American find the 'global sites' link on a page in Arabic or Japanese - not made up examples!)
  2. they may make the user feel like his/her needs are secondary.

The problem is that there is currently no immediately recognisable graphic that conveys this message, and so a proposed standard would be useful.

There may be other similar things to consider.

4. Liaison

[11] W3C internal

I think the i18n activity should certainly continue to review the outputs of the W3C as it has so far, and this activity should of course draw on any new guidelines that are developed. Resource is already a problem here, so a process for initial self-assessment would be extremely helpful.

Close collaboration and harmonisation with the WAI activities would also seem essential to my mind.

[12] Liaison with external bodies

I think there are opportunities for liaison with other bodies involved in i18n activities and localisation activities. Indeed this will be imperative in my mind for such proposals above as language tagging and internationalisation tag set development.

See the Appendix: Language Tagging