[eBooks WS Tokyo] position paper submission (KAWABATA, Taichi & AKUTSU, Akihito)


Dear sirs,

Here I would like to attend eBooks WS Tokyo in upcoming in June,
and here I submit the position paper explaining our interests and concerns
regarding to eBooks.

* Participant's Interest

I have been working for NTT, but here we would like to submit my
position paper not as a person who work for the organization but as a
person with interests in digital publishing.

We (I and AKUTSU Akihito) have several years of experiences working in
standardization of characters and W3C specification, and privately
working on development of eBooks and OpenType font.  From the
experiences, I'm interested in making digital publishing platform
being easier to create publications and more interoperable.

----------------------

** Unencoded Characters and eBooks

There may be a chance that unencoded characters (in japanese, GAIJI)
may be used in a content, or title, or author of a book, especially in
East Asian Markets.  (For example, Aozora Bunko contains one literal
work whose title contains unencoded character.) Normally, unencoded
characters in the books are replaced by PUA (Private Use Area)
character or small image files.  However, such solutions has the
following problems.

1. Image files as unencoded characters can not cope with CSS
   foreground color specification, or text layout algorithms of eBook
   Readers.

2. Information of text with PUA may be lost when they are copied or
   transferred over the Internet.

3. Current ePub metadata do not support image-file as metadata.  When
   PUA characters are used in metadata, proper sorting or searching
   algorithm may fail.

4. Character properties, such as whether they should be displayed
   vertically or horizontally in vertical writing-mode, or whether can
   they break a line or not, can not be specified in normal way.

** Request and Suggestion

I hope that discussion to be held on how to handle unencoded
characters and ideographs on contents and metadata of eBooks.  It
should involve specifications on text encoding, character properties,
CSS writing-mode, CSS Fonts, Accesibilities, text collations and
metadata specifications.  Also, probalby, "Ruby" (parallel text
annotation) may have the same problem, as there have been no way to
specify ruby on non-HTML contents, such as metadata.


---------------

* Language and Script Speicification of eBooks

Language specification of eBook text involve the followings.

1. Layout rules of the script which belongs to specified language.
   (For example, JLREQ is often referred for Japanese text layout
   rules)
2. Speech Synthesis of Text (as specified in SSML)
3. Font glyph selection.  (OpenType specification provides
   language-dependent).

However, how such language specification should be reflected on these
visual/audio rendering is ambiguous.  For example, in SSML,
specification of natural languages on speech, lexicons, metadata,
tokens for text analysis, abbreviations, are all different.
(http://www.w3.org/TR/speech-synthesis11).

** Request and Suggestions

I hope that discussion on how language are specified in eBooks, and
influences of such specification concerning to audio/visual
presentations, for the clarification of the problem.  That may greatly
widen the use of eBooks by mean of forein language textbooks and
eBooks for dyslexia people.

--