Endorsed
by the
IDPF logo

W3C Logo

eBooks & i18n: Richer Internationalization for eBooks

Second W3C Workshop on Electronic Books and the Open Web Platform

4 June 2013, Tokyo, Japan


ebooks photo

Host

Keio logo

Workshop Sponsors

intel logo

 

W3C organizational sponsor

google logo

Workshop report

Following on from the first W3C Workshop on Electronic Books and the Open Web Platform in New York in February, W3C, with the endorsement and support of IDPF (International Digital Publishing Forum), held a Workshop on internationalization of electronic books and the Open Web Platform, under the title eBooks & i18n: Richer Internationalization for eBooks, on the 4 June 2013 in Tokyo, Japan.

The Workshop's technical discussions focused on requirements for eBooks in global markets and investigated international functionality that needs to be added to the Open Web Platform. The Open Web Platform includes core W3C technologies such as HTML, CSS, SVG, XML, XSLT, XSL-FO, PNG, RDF, and many more, that are already extensively in eBooks and eBook production.

On this page:

Executive Summary

Today's eBook market is dynamic, fast-changing and strong. eBooks compete with printed versions, and there is a wide choice of hardware and software available for eBook readers. Nevertheless, there is still work to do in order to reach users in their local language and script.

36 position papers were submitted, and there were 50 registered participants. There were 13 presentations spread over 4 sessions during the day.

The workshop was opened by Masao Isshiki, W3C Keio Site Manager. After introductory talks describing the interests of the IDPF and the work on internationalization going on at W3C, a number of speakers laid the foundations for the later discussions with presentations touching on paged media in CSS, the convergence of Web and publishing worlds, interoperability challenges between epub and CSS, the issues of fragmentation of the Japanese marketplace, and questions about how to handle highly graphic content such as manga.

Speakers and discussions in the afternoon sessions dwelt on topics such as how CSS Paged Media spec can already go a long way to support ebooks, various aspects of ruby annotation that are not yet addressed, how to handle ideographic characters that are not in the character encoding, how JavaScript can be relevant to ebooks (especially since internationalization features are currently being added to the core language), and the need to increase the availability and useability of Far Eastern fonts for ebooks.

A key aim of the workshop was to identify and prioritize requirements for extensions and additions to the Open Web Platform for eBook contents, applications and services that will improve the use of Web technologies for handling multiple languages. The final session of the day was dedicated to producing a brainstorm list of such topics. A prioritised list of issues raised during this session is included further down this page. It will be made available to the W3C Digital Publishing Interest Group for consideration, when it begins work in Autumn 2013.

Main workshop discussions

The day included: a welcome session with speakers; three sessions with speakers followed by discussion; and a final discussion session aimed at pulling together a list of requirements. [See the IRC minutes]

Welcome session

The workshop started with a welcome by Masao Isshiki, W3C Keio Site Manager, and continued with three introductory talks:

Markus Gylling (IDPF) talked about the W3C Digital Publishing Interest Group which should start next Autumn (see the charter), and how this workshop and the two others held this year (New York, Paris) would feed into that. For the IDPF and the W3C Interest Group, accessibility and internationalization are non-negotiable requirements. IDPF used extensions to implement various internationalization features that were not yet ready in CSS, to enable the industry to move towards use of the Open Web Platform. It is not ideal, but it was necessary. Markus hopes that the workshop will help to identify a set of core issues to move forward so that we don't need to use epub prefixes again.

The second talk by Richard Ishida (W3C Internationalization Activity) gave an overview of the Internationalization Activity at the W3C and how it works. He then reviewed a large number of examples where internationalization requirements are important in HTML and CSS, as a way of helping non-experts grasp the issues in question, and a way of helping stimulate ideas for discussion. He concluded by showing some of the successful recent initiatives defining requirements for the Open Web Platform, and encouraged everyone to help move the work forward through participation, rather than just asking for changes.

The final talk by Bert Bos (W3C Style Activity) looked at some of the things that CSS has addressed or needs to address to enable general support for paged media. (This talk was not internationalization-specific, but intended to provide useful background information.) Previously the complex stuff for supporting paged media was handled by XSL-FO, but now times are changing, and there is a need for CSS to address this space. Bert showed many examples (see the IRC minutes).

 

Session 2

The three following sessions began with presentations (3 per session), followed by a discussion period:

The first talk was by Makoto Murata (Japanese Electronic Publishing Association, JEPA). He showed how the Web world and the traditional publishing world are converging. This is creating a need to re-evaluate traditional publishing technologies and approaches used for ebooks. There are, however, some differences in approach in the publishing world that are not yet fully addressed in the Web world. The issues underlying convergence need to be explored and addressed, since the younger generation no longer expects the distinction to remain.

Murata-san also talked about Advanced Hybrid Layouts and possible use of SVG for comics and magazines. He proposed an approach to dealing with the highly graphical nature of comics that involves different renditions of the content, maintained in parallel. The user or automatic algorithms would be able to switch between renditions as needed.

The second talk was by Koji Ishii (Rakuten), who explored various areas that present interoperability issues between epub and CSS technologies.

The final speaker in this session was Shinya Takami (Book Walker), who made the case for rules to allow extraction of particular items of content for book samples used by online bookstores. For example, the Japanese 'light novel' or manga books have pages that are important for branding and customer interest, but that are not necessarily included in examples. There is also a need in Japan to reconcile the fragmentation of the market due to there being around 20 stores and over 3000 publishers. He proposed a standard for library sharing to address this issue

These presentations were followed by a discussion with the audience. See the IRC minutes for details. Topics included: metadata and the ebooks IG; ruby in running headers; test cases; alternative rendering approaches for graphic content; relationships between epub and paged media; SVG as an enabler for manga translation; whether vertical writing is still important.

 

Session 3

This session also included 3 speakers, followed by discussion.

The first talk was by Shinyu Murakami (Antenna House). He pointed out that many books are printed today using CSS Paged Media and other specifications, even though it is still only a draft, including the Japanese version of the document Requirements for Japanese Text Layout produced at the W3C. He showed various examples of ways in which CSS Paged Media spec solves ebook layout issues, and called for continuing attention to implementation of the requirements described by the Japanese Layout Requirements document (JLReq) in technologies such as CSS.

That talk was followed by Bobby Tung (Wanderer Digitial Publishing), who described the Taiwanese version of ruby text, called bopomofo or zhuyin fuhao. He showed various attempts to represent bopomofo on the Web, but all are workarounds, and none are truly successful. The Government of Taiwan and other organizations in Taiwan are interested in providing funds and assistance to developers to enable proper bopomofo support in implementations. CSS needs to address the need for bopomofo-style ruby.

The final speaker in this session was Yasuki Ikeuchi (Access). He described two issues in Japanese typography for which there is currently no support in the Open Web Platform: nakiwakare, and kanbun. Nakiwakare (or 'long ruby') is used in light novels, and refers to extremely long ruby text over short base text sequences. This is particularly problematic when it flows onto a new page. Kanbun is a method for annotating ideographic characters or sequences in texts such as poetry. It's particular requirements are not currently supported by the ruby model in HTML or CSS.

These presentations were followed by a discussion with the audience. See the IRC minutes for details. Topics included: kanbun; extended usage of ruby; bopomofo ruby; the status of ruby in the OWP, and how to move things forward; incremental functionality when creating ebooks; input to the eBooks IG.

 

Session 4

This session also included 3 speakers, followed by discussion.

The session started with a talk by Norbert Lindenberg (Lindenberg Software) who described work he has been doing with ECMA to introduce internationalization features such as local data formats into JavaScript. He lists several of these features. Full Unicode support will also be addressed in ECMAScript 6, including normalization and recognition of Unicode character properties. He then went on to make the case for the usefulness of JavaScript in ebooks, citing examples. He would like to know more about what kind of requirements there are, and what is needed to support books.

The second talk was by Taichi Kawabata (NTT), who described a variant type of ruby called 'ateji'. One of the problems associated with ateji is that it may need to be used in metadata, but metadata doesn't support proper ruby or otherwise annotated content. Another issue is how to handle ideographic characters that are not covered by Unicode or the character encoding in use. A third issue relates to ways of identifying language of content. For each of these issues, Kawabata-san provided some examples of current and possible approaches, and then asked the group whether there was a better way.

The final speaker was Kyoji Tahara (Toppan Printing) who talked about font-related issues. There are very few fonts available for online use compared to the number available for printed documents. Fonts are important for readability and aesthetic appeal. Better technical support is needed for embedded fonts, but that is not difficult. The difficulty is in releasing fonts, especially free fonts, for use online. The problem is solved in PDF, but not in epub, and we need to ask ourselves why.

These presentations were followed by a discussion with the audience. See the IRC minutes for details. Topics included: deployment status of JavaScript internationalization features; copyright and Japanese fonts; subsetting of embedded fonts; javascript enabled just-in-time font downloads for support of user input; examples where javascript can add features to ebooks; problems related to inability to annotate metadata content.

 

Session 5

The final session of the workshop was used to draw up a list of internationalization-related topics that need further attention. This list will be passed to the W3C Digital Publishing Interest Group which will begin work in the Autumn.

A survey was sent out to workshop participants after the event, listing these items and asking the participants to rate their urgency. The list of topics and an aggregate summary of the ratings is given below.

 

Wrap up

After brainstorming the list of issues to put to the W3C Digital Publishing Interest Group, there was a very short wrap-up. Richard Ishida took an action to send out the brainstorm list to the participants using a WBS form, so that they could indicate the degree of urgency associated with each. Richard would then compile the results (see below).

The chairs extended thanks to Keio University for hosting the Workshop; sponsors Intel, and Google; the program committee members and speakers; the scribes and the interpreters; and W3C staff for logistical and other arrangements.

Many thanks to the following organizations for making it possible to have interpreters available during the workshop: Antenna House, BOOK WALKER Co., Ltd. (KADOKAWA group), Dai Nippon Printing Co., Ltd., Nippon Telegraph and Telephone Corporation, Microsoft Japan Co., Ltd., Sony Corporation, Toppan Printing Co., Ltd.

List of issues

The table below shows the issues raised during the final session of the workshop.

Sixteen responses were received from the survey sent out to participants, asking them to rate the urgency of each of the items below. Those responses are shown in the table below, where the numbers represent the following:

  • 4   Particularly urgent!
  • 3   Fairly urgent
  • 2   Definitely needs attention at some point, but not urgent
  • 1   Not really important, but would be nice to have
  • 0   Not an issue

The items are ordered according to the urgency perceived by those who responded to the survey. See below the table for an explanation of the table and caveats related to the table ranking.

 

Short name Description Experts Ave Non-experts Ave Weighted score
Vertical text Vertical text support in CSS needs to be finalised. 4 4 4 4 3 4 4 4 3.9 4 4 4 4 3 4 3 3.7 11.5
Ruby Ruby markup and ruby styling (especially alignment) needs to be finalized in HTML5 and CSS. 4 4 4 4 4 3 4 4 3.9 3 4 4 3 3 3 4 3.4 11.2
Hyphenation/line-breaking rules Hyphenation and line-breaking rules for other languages than European ones need to be understood. 3 4 3 3 3 1 4 4 3.1 4 2 4 3 3 2 3.0 9.3
More requirements data We need to replicate work on the Japanese Layout Requirements initiative for other languages and scripts, and also address special format issues (such as Arabic mathematical layout). 4 3 3 4 2 2 2 3 2.9 3 3 3 3 3 3 3 3 3.0 8.8
Extra long ruby (nakiwakare) A solution is needed for support of 'extra long ruby' (nakiwakare) when it runs across lines and pages. 2 1 2 3 4 4 4 4 3.0 2 3 2 3 3 4 2.8 8.8
Tate chu yoko A mechanism is need to support automatic creation of tate chu yoko (horizontal numbers and acronyms in vertical text). 3 3 1 1 4 2 4 2.6 3 4 3 3 4 3 3.3 8.5
Online samples A way is needed to indicate, in a locale-specific way, which parts of a book should be extracted for samples in online book stores, eg. the illustrations and other information at the start of light novels in Japan. 3 2 2 1 4 3 4 4 2.9 2 4 3 3 3 1 2.7 8.4
Positioning items on page CSS needs to provide more features for positioning floats in pages, especially vertical centring of items on a page. 3 3 4 3 1 3 3 2.9 2 1 3 3 3 2.4 8.1
Autospace The autospace feature of CSS that puts visual spacing around embedded Latin text or numbers in lines of ideographic/kana text needs to be supported. 4 2 3 4 1 1 2 4 2.6 2 2 3 3 2 2.4 7.7
Customised line break rules It should be possible for an author to customise rules for line breaking (eg. kinsoku or geumchik rules). 3 2 2 3 1 2 4 2.4 2 4 1 3 3 3 2.7 7.5
Bopomofo ruby Tone mark placement for bopomofo ruby needs to be clarified and whatever changes needed to support bopomofo ruby in CSS need to be finalised. 4 2 3 4 2 1 0 3 2.4 2 3 3 3 2 3 2.7 7.4
Switching between vertical & horizontal It needs to be possible to switch between horizontal and vertical layouts and automatically make all appropriate changes needed to number formats, punctuation styles, etc. 2 4 3 2 2 3 1 2 2.4 3 2 1 3 4 3 2.7 7.4
Language selection Selection of language should be based on media queries, not just elements. 3 2 3 2 1 4 2.5 3 1 3 3 2 2.4 7.4
Ruby generalization Ruby markup needs to be extended to allow for more general annotations (such as for glosses, or to support other languages). 2 3 2 2 2 3 0 3 2.1 1 4 3 3 3 4 3.0 7.3
Rich metadata Metadata values need to be changed to support markup or annotations (such as for indicating text direction, ruby text, custom embedded fonts for non-unicode characters, language, etc). 3 2 2 2 2 3 0 3 2.1 2 3 2 3 3 3 2.7 6.9
Font availability There are very few fonts available for use with ebooks (unlike printed fonts), and a significant reason is that font owners are not giving permission for download. 3 2 3 0 2 1 2 4 2.1 3 2 3 3 4 4 2 0 2.6 6.9
Footnote placement Requirements for footnote placement in other languages need to be understood, if any. 3 4 3 3 2 0 2 2.4 2 1 3 3 0 2 1.8 6.7
On-the-fly font glyphs A mechanism is needed to support additional font glyphs on the fly, eg. to support user input. This may involve a JavaScript API. 3 3 2 2 1 0 3 2.0 3 1 3 3 3 2.6 6.6
Links to typographic rules We should create a list pointing to official rules of typography, where they exist, for various countries (such as ISO 4051). 2 2 2 2 2 2 2 4 2.3 2 2 3 3 3 1 1 2.1 6.6
Kanbun There needs to be support for kanbun in vertical text layout. 2 1 2 3 2 0 2 1.7 2 3 2 3 2 2.4 5.8
CSS generated content CSS needs to provide a way to mark up or annotate generated CSS content to support ruby, text direction, language, etc. (such content may be picked up from the document, eg. text in an attribute value). 2 3 1 2 3 1 0 1 1.6 1 2 2 2 3 2.0 5.3
Manga speech bubbles A solution needs to be found for fitting translations into manga caption balloons, esp. when they are designed with vertical text in mind. 3 2 2 2 2 2 0 3 2.0 2 0 0 1 3 2 1.3 5.3
Onomatopeia in manga There needs to be a way of designing the graphic sound effects in manga panels so that they can be translated. 3 2 1 2 2 0 2 1.7 1 0 0 1 3 1.0 4.4

Respondents self-identified as expert or not in the market requirements. Scores from each are averaged separately in the table, and the weighted score is derived by doubling the average for the experts and adding the average for the non-experts.

The respondents indicated which languages were a preoccupation. The experts cited Japanese (3), Traditional Chinese (1), Asian languages (2), English (1) and All languages (2). The non-experts Japanese (5), All (1), and None (2).

The ranking should be taken with a pinch of salt, for the following reasons:

  1. the responders only consitute part of the attendance (roughly one third).
  2. many of the workshop attendees were Japanese, and so Japanese concerns were brought to the fore.
  3. there are many cases where respondents said "Don't know". This skews the results a little (although those responses were ignored when calculating the averages).
  4. it is not always clear whether the respondent is judging the urgency in terms of their personal interest, rather than the objective needs of the market place.
  5. the Digital Publishing IG members in consultation with the Internationalization Activity will ultimately make decisions about the work priorities. (If you would like to have a say in this process, please join one or both of these groups.)

Program

Links to slides will be added as they become available. For position papers, see below.

9.00Welcome Session
Masao Isshiki, Brief welcome
Workshop logistics and introductions (T Kobayashi, R Ishida)
Markus Gylling, IDPF & W3C
Richard Ishida, i18n at W3C
Bert Bos, CSS and Paged Media [slides]
10.00 BREAK
10.30 Session 2
Speakers: Makoto Murata, Koji Ishii, Shinya Takami [slides]
Discussion
12.00 LUNCH
1.00 Session 3
Speakers: Shinyu Murakami, Bobby Tung, Yasuki Ikeuchi
Discussion
2.00 BREAK
2.30 Session 4
Speakers: Norbert Lindenberg, Taichi Kawabata, Kyoji Tahara
Discussion
3.30 Session 5:
Consolidation discussion (formalising workshop outputs)
Short wrap-up
5.00 FINISH

If you have any questions, please contact team-ebooks-ws-chairs@w3.org.

Attendees

The following is a list of attendees. Links point to position papers.

* submitted a position paper but didn't attend