W3C

Ruby Annotation Under The Sunlight

(Updated on Friday 3 February 2006 to add valuable source of information given by Richard Ishida)

In the concepts of microformats, there is a key concept which is design for humans first, machines second. We have often been faced to the problem of giving a date that will be easy to read by human and will be easy to parse by a computer. Or maybe you would like to give the price of a product with values in two currencies? Or you would like to give the translation of a term in another language? How would you be able to do that in XHTML? Is there a simple way of associating different types of information with a simple semantic relationship.

There is a language for these applications which has been created in 2001 at W3C: Ruby Annotation.

About Ruby?

Ruby on the Web these days is known to be a programming language, a programmer working for IBM and I guess for most people, everywhere in the world, a red gemstone. In the context of W3C, Ruby is a simple markup language which offers the possibility of creating inline annotations, as described in the Ruby specification

Ruby is the term used for a run of text that is associated with another run of text, referred to as the base text. Ruby text is used to provide a short annotation of the associated base text. It is most often used to provide a reading (pronunciation guide). Ruby annotations are used frequently in Japan in many kinds of publications, including books and magazines. Ruby is also used in China, especially in schoolbooks.

Ruby text is usually presented alongside the base text, using a smaller typeface. The name “ruby” in fact originated from the name of the 5.5pt font size in British printing, which is about half the 10pt font size commonly used for normal text.

You will find a lot of answers to your questions in the FAQ about Ruby.

Use Cases for Ruby Annotation

We asked Felix Sasaki, working in the W3C Internationalization Activity to give us a set of use cases for Ruby Annotation. These examples are viewable with Firefox and this XPI extension for Ruby.

Date

You might want to represent a date and give information about the different parts of the date on a package to be sure that no mistakes will be made.

  • Ruby base: Thursday 2 2 2006
  • Explanations: This is an expiration date with the following sequence: a day name, the day, the month and the year.

See example

Japanese use

In Japan, when you learn the language, you may need to give the sound equivalent of the words in a specific scripting system, so readers will be able to pronounce it.

  • Base text: “??”
  • Ruby as pronounciation description with Hiragana “???”, Katakana “???” or Romaji “nihon”

How to write it?

<p><ruby>
  <rb>日本</rb>
  <rt>nihon</rt>
</ruby></p>

See example for the rendering.

Chinese use

In a similar way for chinese language.

  • Base text: ?
  • Ruby as pronunciation description with pin-yin “hao”

See example

Linguistic morphological glossing

You might want to explain in a sentence which part is a noun and which part is a verb. For example, at school, when you teach grammar to children.

  • Base text: “I like fish”
  • Ruby as morphological glossing: Noun (for “I”) Verb (for “like”) Noun (for “fish”)

See example

Expression of “invisible” units in a text

The base text contains a Japanese sentence “Yesterday I went to Shibuya”. The pronoun “I” is usually omitted in the original, which makes it hard to understand for e.g. beginning Japanese language learners.

The ruby text above the base text explains what is omitted. The first ruby text line below the base text contains a romanized vesion. The second ruby text line below the base text contains morpho-lexical information. The abbreviations: “TM” means “topic marker”, “DM” means “direction marker”.

See example

Adding information about gestures to conversation transcriptions

  • Base text: “And we bought a biiiig icecream.”
  • Ruby can be used to mark up “biiig” and to add information about gestures. It could be for example an image of someone making the gesture and big eyes.

Expressing non-segmentable word boundaries

This is useful for

  • a contraction (Old High German, English, many other languages)
  • a compound word (Sanskrit, Avestan, many other languages)
  • a group of words whose forms have been affected by “euphonic” sandhi changes (Sanskrit, Breton)
  • a group of words in which, for orthographic or other reasons, the word junctions are not indicated (Sanskrit, Japanese)

An example from Japanese:

  • base text: yo-mu (means: “reading in the base form”; with the Japanese syllabic script, the boundaries between the morphemes “yom” and “m” cannot be expressed)
  • Ruby for a segmentation of morphological boundaries: contains a romanized version of “yomu” with the correct boundaries “yom-mu”.

Implementations: Sunny side-up?

Well not really for now. Anne Van Kesteren (Opera) in a recent post Ruby in HTML has looked at the implementation of Ruby in Internet Explorer.

Using the Live DOM Viewer I tried to figure out more or less how it works. Not everything is covered, but the basic parsing rules are here; simple research.

We have not found an implementation report of Ruby in different browsers, user agents and in authoring tools (the too often forgotten ones). If someone could create an implementation report using the Ruby Annotation Test Cases developed by ???? and also the Ruby Test Cases developed by Internationalization Activity, it would help to have a good picture of the implementation landscape. Simple Ruby annotations are implemented in Amaya. There is also an xpi extension for Ruby which is available for Firefox and Mozilla and help to visualize simple and complex ruby annotation. Unfortunately it’s not a native implementation in Mozilla rendering engine, which has still an open issue for Ruby markup.

On the styling side, the CSS Working Group is working on a module called CSS 3 Module for Ruby and ??? has published a way to style ruby with CSS in Mozilla.

So we are not there yet, but I would say that almost all the information is available for helping developers to implement in their products but there are real benefits using ruby in a page. For example, we found this page giving a table which contains the birthplaces and native names of celebrities (actors, singers, sports figures, etc.) in different languages. (to see with Firefox and the XPI extension).

Any comments or additional information are welcome, specifically from the internationalization specialists… ;)

More information / Reminder