International text layout and typography index

Abstract

This document points browser implementers and specification developers to information about how to support typographic features of scripts or writing systems from around the world, and also points to relevant information in specifications, to tests, and to useful articles and papers. It is not exhaustive, and will be added to from time to time.

Status of This Document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at https://www.w3.org/TR/.

The information in this document helps to link users and developers so that browsers can better support typographic needs around the world. It is expected that this document will be constantly updated, as new material becomes available or comes to our attention.

Note

Sending comments on this document

If you wish to make comments regarding this document, please raise them as github issues. Only send comments by email if you are unable to raise issues on github (see links below). All comments are welcome.

To make it easier to track comments, please raise separate issues or emails for each comment, and point to the section you are commenting on using a URL for the dated version of the document.

This document was published by the Internationalization Working Group as a First Public Working Draft. If you wish to make comments regarding this document, please send them to www-international@w3.org (subscribe, archives). All comments are welcome.

Publication as a First Public Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. The group does not expect this document to become a W3C Recommendation. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.

This document is governed by the 1 September 2015 W3C Process Document.

2. Characters & phrases

2.1 Punctuation

Many scripts use native punctuation marks in addition to or instead of those used in Latin script text. In other cases, such as Greek, common Latin punctuation marks may mean something different from what they mean in English. It may be important to understand what needs to be supported, how these punctuation marks function, and how they interact with other operations applied to the text.

Requirements

Chinese Layout Requirements:
Ethiopic Layout Requirements: Punctuation
Hangul Layout Requirements:
Indic Layout Requirements:
- Guiding principles of Line breaking for Indian languages
Japanese Layout Requirements:
Tibetan Layout Requirements:
- Tibetan Punctuation
- Character List

See also 2.2 Quotations and 3.1 Line breaking.

2.2 Quotations

Quotation marks vary from language to language, not just from script to script. Also, you should expect variations in behavior when quotation marks are nested. Furthermore, the quotation marks used for vertical Japanese text are not the same as those typically used for the same text when horizontally laid out.

Requirements

Chinese Layout Requirements: Indicator Punctuation Marks
Ethiopic Layout Requirements: Quotation
Japanese Layout Requirements:

Spec links

HTML5:
- The q element
- The blockquote element
CSS3 Generated and Replaced Content: Specifying quotes with the 'quotes' property

2.3 Identifying boundaries of graphemes, words and larger groupings

A browser or application needs to correctly apply functions to the basic units of text, be they characters, character sequences, syllables, or words. Some scripts, such as those used in South and South-East Asia, require clusters of characters to be treated as a single unit for most editing operations. Many other scripts use combining characters such as accents, vowel signs, length markers, etc. that must be kept with the base character they are associated with.

When a user double-clicks on some text, the appropriate units should be selected. In scripts such as Chinese and Thai, 'words' should be selected even though they are not separated by spaces. In scripts such as Tibetan and Ethiopic, the word separator may be a visible character, rather than a space. It is important to understand how they should be treated when a 'word' is highlighted, or when text wraps, etc.

Requirements

Chinese Layout Requirements: Characters and Principles for Setting them in Chinese Composition
Indic Layout Requirements:
- Indic orthographic syllable boundaries
- Text segmentation
Tibetan Layout Requirements:
- Tibetan Syllables
- Text Segmentation in Tibetan

2.4 Glyph controls

In some scripts, such as Arabic, it may be desirable to allow the content author to control the placement of glyphs such as diacritics, or to control ligation, etc.

Requirements

Arabic Layout Requirements: Positioning diacritics relative to base characters

2.5 Transforming characters

Conversion between lower, upper and title case only applies to a few scripts, most scripts are unicameral. Where it does apply, the rules can vary by language.

In other cases, a particular script may require a different type of transform. For example, in Japanese it is important to be able to convert between half-width and full-width presentation forms.

Requirements

Notes on case conversion

Spec links

Tests

CSS3 Text: text transform

2.6 Inline spacing

Many scripts create emphasis or other effects by spacing out the letters or syllables in a word. There are questions about how this should work in Indic and SE Asian scripts, and in Arabic-based scripts which join up adjacent letters. Another aspect of inline-spacing relates to separation of characters or items in text. For example, French uses spaces before certain punctuation marks, and the traditional Mongolian script requires special spacing between word stems and certain suffixes.

Requirements

Chinese Layout Requirements: Principles for Arranging Characters during Chinese Composition
Indic Layout Requirements: Letter spacing
Hangul Layout Requirements: Writing Process for Punctuation Marks, etc.

Issues

Spec links

CSS3 Text: Spacing

2.7 Ruby annotation

Ruby is used for phonetic and semantic annotations of East Asian text, including furigana, pinyin and zhuyin fuhao systems. In addition to positioning annotations along the correct side of the base text, there are many fine adjustments of the annotation and base text to support.

Requirements

Chinese Layout Requirements: Interlinear annotations
Chinese information: Bopomofo on the Web
Japanese Layout Requirements:
- Ruby and Emphasis Dots
- Positioning of Jukugo-ruby
Japanese information: Use Cases & Exploratory Approaches for Ruby Markup

Issues

Spec links

HTML5: The ruby element
CSS3 Ruby

Tests

HTML5, the ruby element and its children

2.8 Text decoration

Some aspects related to the drawing of lines alongside or through text involve local typographic considerations. For example, underlines need to be broken in special ways for some scripts, and the height of underlines, strike-through and overlines may vary depending on the script. For vertical text the placement needs to be to the right or left of the line of text, rather than under or over.

Requirements

Chinese Layout Requirements: Indicator Punctuation Marks > Fullwidth low line
Chinese information: Wikipedia: Underline
Ethiopic Layout Requirements:
Japanese Layout Requirements:
- Examples of Items Jutting Out of the Kihon-hanmen
- Line Gap Arrangement with Ruby and Other Objects

Issues

Spec links

CSS3 Text Decoration: Line Decoration: Underline, Overline, and Strike-Through

2.9 Emphasis

Bold and italic are not always appropriate for expressing emphasis, and some scripts have their own unique ways of doing it, that are not in the Western tradition at all.

Requirements

Ethiopic Layout Requirements:
- Emphasis
Japanese Layout Requirements: Ruby and Emphasis Dots
Tibetan Layout Requirements: Text Emphasis and Highlighting

Spec links

CSS3 Text Decoration: Emphasis Marks

2.10 Initial letter styling

Does the browser or ereader correctly handle special styling of the initial letter of a line or paragraph, such as for drop caps?

Requirements

Indic Layout Requirements: First Letter
Latin Layout Requirements: Initial Capitals

Spec links

CSS3 Selectors: The ::first-letter pseudo-element
CSS3 Inline Layout: Initial Letters

Tests

CSS3 Selectors: first-letter

2.11 Fonts

Some scripts require special handling with regard to how font properties are specified and how font resources are loaded dynamically.

Requirements

Chinese Layout Requirements: Typefaces for Chinese Characters
Hangul Layout Requirements:
- 'Letter Face Position in Character Frame' Standard
- Kerning for Hangul Fonts
Mongolian information:
- Mongolian script layout requirements

Spec links

CSS3 Fonts

Tests

3. Lines & paragraphs

3.1 Line breaking

There are some specific rules about how scripts such as Chinese, Japanese and Korean behave when a line is wrapped. For example, these scripts tend to break a line in the middle of a word (with no hyphenation) – even in Korean, which has spaces between words.

It is common for certain characters to be forbidden at the start or end of a line, but which characters these are, and what rules are applied when depends on the script or language. In some cases, such as Japanese, there may be different rules according to the type of content or the user's preference.

Requirements:

Chinese Layout Requirements:
- Line Composition Rules for Punctuation Marks
- Hanging Punctuation at Line End
Hangul Layout Requirements:
- Writing Process for Punctuation Marks, etc.
- Line Breaking Rules
Indic Layout Requirements: Guiding principles of Line breaking for Indian languages
Japanese Layout Requirements: Possibilities for Line-breaking between Characters
Mongolian information: Mongolian script layout requirements (.ppt)
Tibetan Layout Requirements:
- Line breaking

Issues

Spec links:

CSS3 Text: Line Breaking and Word Boundaries

Tests:

3.2 Hyphenation

Some scripts don't use hyphenation, those that do have particular rules about how it should be applied that are typically language-specific.

Requirements

Indic Layout Requirements: Hyphenation
Latin Layout Requirements:
- Hyphenation
- The Classical Rules of Hyphenation and Pagination
Mongolian information: Mongolian script layout requirements (.ppt)

Spec links

CSS3 Text: Breaking Within Words

Tests

Hyphenation

3.3 Justification & line-end alignment

Typographers have come up with various methods for effective full justification – causing the text to completely fill the line, in order to create visual alignment on both edges of a paragraph.

Typographic conventions for full text justification depend on the writing system, the content language, and the calligraphic style of the text. Results also tend to vary based on the capabilities of the layout engine and a given typographer’s preferences for weighing its various detrimental effects on typographic color and readability.

Requirements

Arabic Layout Requirements:
- Justification
Arabic information:
Chinese Layout Requirements:
- Line Composition Rules for Punctuation Marks
- Composition of Chinese and Western Mixed Texts
Ethiopic Layout Requirements: Justification
Ethiopic information: Proposal to Reclassify Ethiopic Wordspace as a Space Separator (Zs) Symbol
Hangul Layout Requirements:
- Paragraph Adjustment
- Line Adjustment Process
Japanese Layout Requirements:
Latin Layout Requirements: Justification
Tibetan Layout Requirements: Justification
Tibetan information: Tibetan script requirements (.ppt)

Issues

Spec links

CSS3 Text: Alignment and Justification

Tests

CSS3 Text: text-align, text-align-last, text-justify

3.4 Counters, lists, etc

The CSS specification describes a set of simple and complex styles for counters to be used in list numbering, chapter heading numbering, etc. It also provides a generic mechanism for content authors to create their own counter styles. One has to consider not only the characters and algorithms to be used (numeric, alphabetic, additive, etc), but also what the separator or other associated marks look like.

Requirements

Ready-made counter styles

Issues

Spec links

CSS3 Counter Styles

Tests

3.5 Bidirectional text direction

Scripts whose characters are typically written right-to-left, like Arabic, Hebrew, Thaana, and so on, become bidirectional when they include numbers or text from other scripts (such as Latin acronyms). Browsers and applications need to support bidirectionality. This means supporting the Unicode Bidirectional Algorithm, but also different visual locations of line start and end, isolation of embedded strings, correct line alignment, and so forth.

Requirements

Additional Requirements for Bidi in HTML

Spec links

Tests

3.6 Baselines & inline alignment

Browsers and applications must accurately and comprehensively cover requirements for baseline alignment between mixed scripts. For example, Arabic script descenders go far below those of the Latin script, and Armenian characters need to be aligned with ideographic characters in Chinese appropriately with regard to comparative heights and baselines. European, Far Eastern and South Asian scripts tend to use different baselines, which must be aligned correctly.

Requirements

Latin Layout Requirements, Baseline Grids
Mongolian information:
- Mongolian script layout requirements (.ppt)
Tibetan Layout Requirements:
- Baseline alignment

Spec links

CSS3 Writing Modes, Inline-level Alignment
CSS Line Grid

3.7 Other paragraph features

Some scripts have particular rules about indenting text at the start of a paragraph, or indeed whether that's normal. Some allow punctuation to hang outside the text box at the start or end of a line. There may be other aspects of how paragraphs are presented that vary from script to script, or need to be controlled by the content author.

Requirements

Chinese Layout Requirements:
- Hanging Punctuation at Line End
- Paragraph Adjustment Rules
Japanese Layout Requirements:
Hangul Layout Requirements: Paragraph Adjustment
Latin Layout Requirements: Paragraphs and indentation

Spec links

CSS3 Text: Edge effects

4. Layout & pages

4.1 Vertical text

There are special requirements for vertically oriented text. For example, it's common for content authors to want to mix short horizontal runs of text, such as 2-digit numbers, in a vertical column (tate chu yoko). It's also important to provide appropriate support for text in scripts that are normally only horizontal.

Requirements

Chinese Layout Requirements: Writing Mode
Indic Layout Requirements: Vertical arrangements of characters
Japanese Layout Requirements:
- Vertical Writing Mode and Horizontal Writing Mode
- Japanese and Western Mixed Text Composition (including Horizontal-in-Vertical Text Composition)
Hangul Layout Requirements:
Mongolian information:
- Mongolian script layout requirements (.ppt)

Spec links

CSS3 Writing Modes:
- Vertical Text Layout
- Glyph Composition

4.2 Notes, footnotes, etc

Support for notes, footnotes, endnotes or other necessary annotations of this kind may vary in other cultures. In some cases, a script may use a very idiosyncratic approach to represent notes inline or to link to footnotes.

Requirements

Japanese Layout Requirements:
- Inline Cutting Note (Warichu)
- Processing of Notes
Hangul Layout Requirements:
- Notes (Footnote, Endnote)
- Footnotes

4.3 Page numbering, running headers, etc

These links point to conventions for managing the content that appears outside the main text block, for example page numbering, or the way that running headers and the like are handled.

Requirements

Japanese Layout Requirements: Running Heads and Page Numbers
Hangul Layout Requirements: Page Numbers
Latin Layout Requirements: Running headers and footers

4.4 More page layout and pagination

Some cultures define page areas and page progression direction very differently from those in the West. For example, the size of the Japanese kihon-hanmen, or main text block, is traditionally established by counting character cells, and margin space is then defined by the remaining space. In right-to-left scripts, pages also progress from right to left.

Requirements

Chinese Layout Requirements:
Japanese Layout Requirements:
Hangul Layout Requirements
Latin Layout Requirements:
Tibetan information: Tibetan script requirements (.ppt)

International text layout and typography index

W3C First Public Working Draft 09 February 2017

Abstract

Status of This Document

1. Introduction

2. Characters & phrases

2.1 Punctuation

2.2 Quotations

2.3 Identifying boundaries of graphemes, words and larger groupings

2.4 Glyph controls

2.5 Transforming characters

2.6 Inline spacing

2.7 Ruby annotation

2.8 Text decoration

2.9 Emphasis

2.10 Initial letter styling

2.11 Fonts

3. Lines & paragraphs

3.1 Line breaking

3.2 Hyphenation

3.3 Justification & line-end alignment

3.4 Counters, lists, etc

3.5 Bidirectional text direction

3.6 Baselines & inline alignment

3.7 Other paragraph features

4. Layout & pages

4.1 Vertical text

4.2 Notes, footnotes, etc

4.3 Page numbering, running headers, etc

4.4 More page layout and pagination

5. Changes Since the Last Published Version