Copyright © 1998, 1999 W3C® (MIT, INRIA, Keio), All Rights Reserved. W3C liability, trademark, document use and software licensing rules apply.
This is a W3C Working Draft for review by W3C members and other interested parties. It is a draft document and may be updated, replaced, or obsoleted by other documents at any time. The W3C will not allow early implementation to constrain its ability to make changes to this specification prior to final release. It is inappropriate to use W3C Working Drafts as reference material or to cite them as other than "work in progress". A list of current W3C Working Drafts can be found at http://www.w3.org/TR.
This W3C Working Draft is published by the Internationalization Working Group (members only). In a future version, this work is intended to be submitted to the HTML Working Group (members only) for inclusion as a module in the XHTML 1.1 [XHTML11].
As is characteristic of a W3C Working Draft, all the proposed tag naming and structure in this document are subject to change.
Please send comments and questions regarding this document to i18n-editor@w3.org (archived for W3C members). Comments in languages other than English, in particular Japanese, are also welcome.
The HyperText Markup Language (HTML) is a simple markup language used to create hypertext documents that are portable from one platform to another. XHTML 1.0 [XHTML1] is a reformulation of HTML 4.0 [HTML4] as an XML 1.0 [XML] application, and the modularization of XHTML [XHTMLMOD] provides a means for subsetting and extending XHTML. This specification extends XHTML to support ruby text typically used in East Asian documents. Some familiarity with HTML 4.0, XHTML 1.0 and the XHTML Modularization framework is assumed.
This section is informative.
"Ruby" is the commonly used name for a run of text that appears in the immediate vicinity of another run of text, referred to as the "base". Ruby serve as a pronunciation guide or an annotation associated with the base text. Ruby are used frequently in Japan in most kinds of publications, such a books and magazines, but also in China, especially in schoolbooks. Figure 1.1.1 shows an example.
Figure 1.1.1: Ruby giving the pronunciation of the base characters.
East Asian typography has developed various elements that do not appear in western typography. Most of these can be addressed appropriately with facilities in stylesheet languages such as CSS or XSL. Ruby, however, require markup in order to define the association between base text and ruby text. This document defines such markup, designed to be usable with HTML, in order to make ruby available on the Web without using special workarounds or graphics. This section gives some background on ruby. Section 1.2 gives an overview of the markup for ruby. Section 2 contains the formal definition of ruby markup in the context of the XHTML Modularization framework [XHTMLMOD].
The font size of ruby text is normally about half the font size of the base text (see Figure 1.1.1). The name "ruby" in fact originated from the name of the 5.5pt font size in British printing, which is about half the 10pt font size commonly used for normal text.
There are several positions where the ruby text can appear relative to its base. For horizontal layout, it is most frequently placed above the base text (see Figure 1.1.1), i.e. before the line containing the base text. Sometimes, especially in educational texts, ruby may appear below, i.e. after, the base text.
Figure 1.1.2: Ruby (in Latin letters) below the base text (in Japanese)
In vertical layout, where lines are placed starting on the right, ruby appear on the right side of (i.e. again before) the vertical line if they appear above in horizontal layout. The layout flow of the ruby text is the same as that of its base, that is vertical if the base is vertical, and horizontal if the base is horizontal.
Figure 1.1.3: Ruby in vertical text (before/to the right)
Ruby text appears on the left side of the base in vertical layout if it appears below it in horizontal layout.
Figure 1.1.4: Ruby in vertical text (after/to the left).
Ruby before the base text are often used to indicate pronunciation; ruby after the base text are often used to indicate meaning. In this and other cases, it can happen that ruby appear on both sides of the base text.
Figure 1.1.5: Ruby applied above and below a line of Japanese text
In some cases, it is desirable to give more details about which parts of the ruby base and which parts of the ruby text are associated together. This can be used for finetuning of the display or for other operations. Using fine-grained associations, and in particular showing the association with each single character of the base text, is mainly used in educational texts and other cases where the exact association is important or potentially unknown. More coarse-grained association is used when the actual details of the association on a lower level are assumed to be known or irrelevant; due to the fact that in such cases, longer spans of ruby text are set with the same spacing, better readability and more even layout may be achieved. Such a structure is called group ruby. For example, a person name can be decomposed into family name and given name. Or a kanji compound or phrase can be decomposed to show semantic subparts, as in the following example:
Figure 1.1.6: Group ruby with text below spanning the group
In the example, the ruby text above is made of two sequences: the hiragana sequence 'でんわ' (denwa) for 'phone' and the hiragana sequence 'ばんごう' (bango) for 'number'; the ruby text below is a single English sequence: 'Phone number'.
In the following example the ruby text below, 'University', relates to the second base sequence while the ruby texts above, 'けいおうぎじゅくだいがく' (keiou gijyuku daigaku; Keio University in hiragana) refer to the two base sequences.
Figure 1.1.7: Group ruby with text below only spanning the second part
Details of ruby formating in a Japanese print context can be found in JIS-X-4051 [JIS].
Introducing ruby to the Web leads to some phenomena and problems that are not present in traditional typography where the term 'ruby' is taken from. The term 'ruby' in Japanese is only used for text alongside the base text, for example as shown in the various figures above. However, once structural markup for ruby is defined as done in this document, there is no guarantee that the associations defined by this markup will always be rendered alongside the base text. There is a very wide variety of current and future output devices for documents marked up with HTML. The following are possible scenarios and reasons for different rendering:
Figure 1.1.8: Inline ruby applied to horizontal Japanese as a fallback
Using parentheses for the fallback may lead to confusion between runs of text intended to be ruby text and others that happen to be enclosed within parentheses. The author should be aware of the potential for that confusion and is advised to choose an unambiguous delimiter for the fallback, if this is a concern.
This section gives an overview of the markup for ruby defined in
this document. A formal definition can be found in
Section 2. The markup is in
XML
[XML]. The core of the markup is the ruby
element. The ruby
element encloses all the text and markup
necessary for defining the association between base text and ruby text.
The name of this enclosing element, 'ruby
', should be
interpreted to mean that what follows is associating ruby
with base text. It must not be misunderstood to mean that everything
inside, including the base text, is ruby. The name of
the enclosing element was choosen to compactly and clearly identify
the function of the markup construct; the names for the other elements
were choosen to keep the overall length short.
The ruby
element serves as a container for one of
the following:
rb
, rt
and possibly
rp
elements (for simple cases)rbc
and one or several
rtc
elements (for more complicated cases, such as
group ruby).In the following, these two cases are discussed in more detail.
For the simple case, the rb
element contains the base
text, the rt
element contains the ruby text, and the
rp
elements contains the parenthesis characters used
in the fallback case. The rb
stands for 'ruby base',
the rt
for 'ruby text', and the rp
for
'ruby parenthesis'. This allows a simple association between one
base text and one ruby text, and is sufficient for most ruby cases.
For example, the following simple ruby:
Figure 1.2.1: Ruby applied to English
can be represented as follows:
<ruby> <rb>WWW</rb> <rp>(</rp> <rt>World Wide Web</rt> <rp>)</rp> </ruby>
Figure 1.2.2:
Example of simple ruby markup including rp
elements
for fallbacks
The rp
elements and the parentheses inside them are
provided for fallback only. Some browsers, which ignore unknown
elements but display their contents, will display
WWW(World Wide Web).
The rp
element identifies the parentheses (or whatever else
that may be used in their place) to browsers that know about the markup
defined in this document so that the parentheses can be removed. If the
author is not concerned about fallbacks for browsers that neither know
about ruby markup nor support
CSS2
[CSS2] or
XSL
[XSL] style sheets, then the rp
elements are not needed:
<ruby> <rb>WWW</rb> <rt>World Wide Web</rt> </ruby>
Figure 1.2.3:
Example of simple ruby markup without rp
elements
for fallbacks
In CSS2, if necessary, the parentheses can be generated using the 'content' property of the :before and :after pseudo-elements as for example in the following style declaration:
rt:before { content: "(" } rt:after { content: ")" }
Figure 1.2.4:
CSS2 style
sheets to generate parentheses around rt
element
In the above example, parentheses are automatically generated around
the rt
element. It is assumed that the style information
for positioning the ruby text inline is used together.
Generation of parentheses in
XSL is
straightforward.
For more complicated cases of associations between base text and
ruby text, a combination of rbc
and rtc
elements is used. This includes associating more than one ruby text
with the same base text (typically displayed on both sides of the
base text) and fine-grained associations of base text and ruby text
(group ruby). The ruby
element contains one rbc
element followed by one or more rtc
elements. The
rbc
element contains rb
elements, the
rtc
element contains rt
elements. Several
rtc
elements are used to associate more than one ruby text
with the same base text. Several rb
elements inside
an rbc
element, combined with several rt
elements inside an rtc
element, are used for group ruby.
The rt
element may use the rbspan
attribute
to indicate that a single rt
element spans (is associated
with) multiple rb
elements. This is similar to
the colspan
attribute of the th
/td
elements in tables. The rbc
stands for 'ruby base component',
and the rtc
for 'ruby text component'.
An example of group ruby is shown in the following figure:
Figure 1.2.5: Group ruby with mixed above and below ruby texts
can be represented as following:
<ruby xml:lang="ja" class="pronunciation annotation"> <rbc> <rb>斎</rb> <rb>藤</rb> <rb>信</rb> <rb>男</rb> </rbc> <rtc class="pronunciation"> <rt>さい</rt> <rt>とう</rt> <rt>のぶ</rt> <rt>お</rt> </rtc> <rtc class="annotation"> <rt rbspan="4" xml:lang="en">W3C Associate Chairman</rt> </rtc> </ruby>
Figure 1.2.6: Ruby markup to achieve both above and below ruby text on the same base.
In this case, the rtc
element with the class
"pronunciation" should be associated with the style
information to place the ruby texts above the ruby bases, and
the rtc
element with the class "annotation"
should be associated with the style information to place the ruby text
below the ruby bases.
This document only defines ruby markup. Formatting properties for styling ruby are under development not for HTML/XML, but for CSS/XSL. See "International Layout" [I18N-FORMAT] (work in progress) for more details.
The rp
element is not available in this
representation. This has two reasons. First, the rp
element is for backwards compatibility only, and it was considered
that this is much more important for the more frequent simple case.
Second, for the more complex cases, it is in many cases very difficult
to come up with a reasonable fallback display, and constructing
markup for such cases can be even more difficult if not impossible.
Some readers may wonder why two new elements, rbc
and rtc
, were introduced for this case, instead of
using a single new element inside rb
and rt
.
This was done because in
XML, it is
impossible to express that an element should contain either only text
(#PCDATA
) or only a certain combination of elements.
In XML, as soon as
#PCDATA
is allowed somewhere directly within an element, this
element has mixed content, and #PCDATA
is allowed everywhere
directly within that element.
Note. For non-visual rendering, such as speech synthesis and braille output, rendering both the base text and the ruby text can be annoying. This is in particular the case if the ruby represent a pronunciation. In this case, a speech synthesizer may either be able to correctly pronounce the base text, in which case the same text is spoken twice, or it may not know the correct pronounciation of the text and make up a pronounciation, in which case the result may be quite confusing.
As an example, in the case of Figure 1.2.6, the ruby bases "斎", "藤", "信", "男" are less useful than the ruby texts "さい" (sai), "とう" (tou), "のぶ" (nobu), "お" (o) for aural or braille rendering, and it does not make sense to render both the ruby bases and the ruby texts, because aural and braille rendering are phonetic-based and in this case ruby texts are used to represent pronunciation, so it is straightforward to use them for non-visual rendering. In such cases, something like the following style information may help.
@media aural { ruby[class~="pronunciation"] rb { speak: none } }
Figure 1.2.7: CSS2 style sheet to suppress aural rendering of ruby base
The above style sheet will suppress aural rendering of ruby base,
when the rb
element is a child element of the
ruby
element with the class "pronunciation".
See [CSS2] for more details.
It is important to note that not all ruby are pronunciations.
Authors should distinguish ruby used for different purposes by
using the HTML
class
attribute, as done above by assuming
class="pronunciation" for ruby used to indicate
pronunciation. Also, it should be noted that somebody listening to
aural rendering may be interested in accessing the skiped base text
to check the characters used.
This section is normative.
This section contains the formal syntax definition and the specification of the functionality of the ruby elements.
The following is the abstract definition of ruby elements, which is consistent with the XHTML Modularization framework [XHTMLMOD]. Further definitions of XHTML abstract modules can be found in [XHTMLMOD].
Elements | Attributes | Minimal Content Model |
---|---|---|
ruby | Common | ((rb, rp?, rt, rp?) | (rbc, rtc+)) |
rbc | Common | rb+ |
rtc | Common | rt+ |
rb | Common | (PCDATA | Inline - ruby)* |
rt | Common, rbspan (CDATA) | (PCDATA | Inline - ruby)* |
rp | Common | PCDATA* |
Ed. Note.
The definition of "Inline" doesn't include ruby
element in [XHTMLMOD], but for convenience,
the above abstract definition assumes that the ruby
element is included as one of inline elements.
An implementation of this abstract definition as the XHTML DTD modules can be found in Appendix A.
ruby
elementThe ruby
element is an inline (or text-level) element
that serves as the container for either the rb
,
rp
and rt
elements or the rbc
and rtc
elements. It provides the structural association
between the ruby base elements and their ruby text elements.
The ruby
element does not accept any attributes
other than the common ones, such as id
, class
,
xml:lang
or style
.
In this simplest example, ruby text "aaa" is associated with base "AA":
<ruby><rb>AA</rb><rt>aaa</rt></ruby>
Figure 2.2.1: Simple usage of the ruby
element
rbc
elementThe rbc
element is the container for rb
elements. This element is not used for simple ruby, and is only used
for group ruby. Only one rbc
element may appear inside
a ruby
element.
rtc
elementThe rtc
element is the container for rt
elements. This element is not used for simple ruby, and is only used
for group ruby. Several rtc
elements may appear inside
a ruby
element to associate multiple ruby texts with
a single ruby base, represented by an rbc
element.
For example, the following markup, utilizing CSS from "International Layout" [I18N-FORMAT] (work in progress) may be used to associate two ruby texts with the same ruby base:
<ruby> <rbc> <rb>KANJI</rb> </rbc> <rtc style="ruby-position: above"> <rt>kana-above</rt> </rtc> <rtc style="ruby-position: below"> <rt>kana-bellow</rt> </rtc> </ruby>
Figure 2.4.1: Ruby markup to achieve both above and below ruby on the same base.
The markup above would be rendered as:
kana-above KANJI kana-below
Figure 2.4.2: The result of two ruby texts associated to a single ruby base.
rb
elementThe rb
element is the container for the text of the
ruby base. For simple ruby, only one rb
element may appear.
For group ruby, multiple rb
elements may appear inside
an rbc
element.
The rb
element may contain inline elements or character
data as its content, but the ruby
element may not appear
as its child element.
rt
elementThe rt
element is the container for the ruby text.
For simple ruby, only one rt
element may appear.
For group ruby, multiple rt
elements may appear inside
an rtc
element.
The rt
element may contain inline elements or character
data as its content, but the ruby
element may not appear
as its child element.
The rbspan
attribute allows an rt
element
to span multiple rb
elements.
The value shall be an integer value greater than zero ("0").
The default value of this attribute is one ("1").
An example of this is shown in
Figures 1.2.5 and 1.2.6.
rp
elementThe rp
element is intended to contain parenthesis characters.
Parentheses are necessary for the ruby to be rendered correctly
when it is inline. The existence of the rp
element is
necessary especially for UA's that are unable to render ruby text
above the ruby base. That way, any ruby will degrade to no worse
than a properly formed inline ruby in non-supporting UA's.
The rp
element cannot be used in group ruby.
Consider the following markup, specifying an above (default) ruby:
<ruby> <rb>A</rb> <rp>(</rp> <rt>aaa</rt> <rp>)</rp> </ruby>
Figure 2.7.1:
Ruby markup using rp
elements
A user agent that supports above ruby would render it as:
aaa A
Figure 2.7.2: Above ruby rendered by a supporting UA (note the parentheses are not visible)
However, a UA that is unable to render above ruby or does not support ruby HTML, would still correctly show:
A(aaa)
Figure 2.7.3: Above ruby rendered by a non-supporting UA (note the parentheses are visible)
This appendix is normative.
The following is the Ruby DTD modules, a DTD driver and a catalog file that can be used with the XHTML 1.1 DTD modules [XHTML11]. These modules are conforming to the Module Conformance requirements as defined in the "Building XHTML Modules" specification [BUILDING].
Ed. Note: These modules are expected to be included in the XHTML 1.1 and are not intended to define a new markup language as an XHTML-family document type, so names are temporary and slightly different from the Naming Rules in the XHTML Family Document Type Conformance.
This appendix is informative.
Ed. Note: Do we really need an SGML definition?
Although this specification defines ruby elements in XML, there might be some usage in SGML. This appendix provides a sample ruby DTD fragments in SGML. The functionality of the ruby elements are defined in section 2.
Note that in the
SGML
DTD, elements and
attributes are intended to be case-insensitive, and in some cases,
end tags can be omitted.
For example, Figure 1.2.2 can be described
using the following markup in
SGML,
where it is possible to omit the end tags of rb
,
rt
and rp
:
<ruby> <rb>WWW <rp>( <rt>World Wide Web <rp>) </ruby>
Figure B.1: Ruby markup example in SGML
For convenience, parameter entities %inline;
and
%attrs;
are used to represent generic inline elements
and generic attributes. In the case of
HTML
[HTML4], these are something like this:
<!-- %inline; covers inline or "text-level" elements --> <!ENTITY % inline "#PCDATA | %fontstyle; | %phrase; | %special; | %formctrl; | ruby"> <!ENTITY % attrs "%coreattrs; %i18n; %events;">
Further definitions can be found in [HTML4].
<!ELEMENT ruby - - ((rb, rp?, rt, rp?) | (rbc, rtc+))> <!ATTLIST ruby %attrs; > <!ELEMENT rbc - O (rb)+ -- container for rb elements --> <!ATTLIST rbc %attrs; > <!ELEMENT rtc - O (rt)+ -- container for rt elements --> <!ATTLIST rtc %attrs; > <!ELEMENT rb - O (%inline;)* - (ruby) -- container for ruby base --> <!ATTLIST rb %attrs; > <!ELEMENT rt - O (%inline;)* - (ruby) -- container for ruby text --> <!ATTLIST rt %attrs; rbspan NUMBER 1 -- number of rbs spanned by rt --> <!ELEMENT rp - O (#PCDATA) -- container for parenthesis characters --> <!ATTLIST rp %attrs; >
This appendix is informative.
This appendix is informative.
The model presented in this specification is largely inspired by the work done by Martin Dürst [DUR97].
This specification would also not have been possible without the help from:
Mark Davis, Laurie Anna Edlund, Arye Gittelman, Hideki Hiura, Koji Ishii, Eric LeVine, Chris Lilley, Charles McCathieNevile, Chris Pratley, Nobuo Saito, Rahul Sonnad, Takao Suzuki, Chris Thrasher, Chris Wilson, Masafumi Yabe.
This appendix is informative.
This appendix is informative.
Section | Change |
---|---|
Status of This Document |
|
Abstract |
|
1. Introduction |
|
2. Formal definition of ruby elements |
|
Appendix A. Ruby modules in XHTML |
|
Appendix B. Ruby usage in SGML |
|
Appendix C. Glossary |
|
Appendix D. Acknowledgements |
|
Appendix E. References |
|
Appendix F. Changes from previous public Working Draft |
|