Language enablement index

Abstract

This document points browser implementers and specification developers to information about how to support typographic features of scripts or writing systems from around the world, and also points to relevant information in specifications, to tests, and to useful articles and papers. It is not exhaustive, and will be added to from time to time.

Most languages are now supported by Unicode, but there are still occasional issues. In particular, there may be issues related to ordering of characters, or competing encodings (as in Myanmar), or standardisation of variation selectors or the encoding model (as in Mongolian).

GitHub resources

Gap analysis

Some scripts require special handling with regard to how font properties are specified and how font resources are loaded dynamically. In some scripts it is common to use different fonts for headings or emphasis, rather than bolding or italicisation. Fallback font families used by browsers (eg. serif, sans-serif, cursive, etc.) may need to be mapped differently to fonts for different scripts. For example, Khmer has slanted, upright, and round font styles, and Arabic has naskh, nasta'liq, ruq'a, kano, etc., which may need special handling. Special OpenType features may need to be supported.

See also 3.2 Fonts.

Requirements

Arabic layout requirements: Writing Styles
Implementing Japanese Subtitles on Netflix: Slanted Text
An Introduction to Writing Systems & Unicode: Italics (Cyrillic)
Khmer layout requirements: Font styles
Orthography Notes: Font styles (esp. Tibetan, Thai, Arabic, Hebrew, N'Ko, Pular, Russian, Uighur)

GitHub resources

Spec links

CSS Fonts: Font style: the font-style property•Font weight: the font-weight property

Gap analysis

Arabic & Persian

In some scripts, such as Arabic, it may be desirable to allow the content author to control the placement of glyphs such as diacritics, or to control ligation, etc. Languages written with Arabic and Hebrew scripts have particular rules, of course, about when it is appropriate to show or hide diacritics for short vowel sounds. Many complex scripts have rules about how characters combine in syllabic structures, and scripts like Arabic may need controls to indicate where ligatures are wanted or not wanted. In addition, controls (based on Unicode characters or otherwise) may allow the user to control the shaping and positioning of glyphs, for example to compose/decompose conjuncts in brahmi-derived scripts.

See also the separate section 3.5 Cursive text.)

Requirements

Arabic Layout Requirements: Ligatures•Diacritics•Positioning diacritics relative to base characters
Orthography Notes: Glyph shaping & positioning

GitHub resources

Gap analysis

Arabic & Persian

In scripts such as Arabic, Mongolian and N'Ko adjacent characters are joined together in normal printed text. It is important to ensure that those connections can be maintained correctly when characters are forced apart, or when transparency is applied to the text, etc. There are also situations where cursive joining behaviour exists when there is no adjacent character, or where joining needs to be disabled between glyphs. Cursive links shouldn't be broken by appropriate markup or styling. Etc.

Requirements

Arabic Layout Requirements: Joining•Special requirements when dealing with cursive glyphs•Arabic Script and Typography
Orthography Notes: Cursive script (esp. Arabic, Syriac, Urdu, Rohingya, Mongolian, N'Ko, Pular, Assyrian, Turoyo, Kashmiri, Mandaic)

GitHub resources

Spec links

CSS Text: Shaping Across Element Boundaries

Tests

Gap analysis

Arabic & Persian

Conversion between lower, upper and title case only applies to a few scripts, most scripts are unicameral. Where it does apply, the rules can vary by language. In other cases, a particular script may require a different type of transform. For example, in Japanese it is important to be able to convert between half-width and full-width presentation forms.

Requirements

r12a blog: Notes on case conversion•Bicameral scripts
Orthography Notes: Case & other character transforms (esp. Armenian, Georgian, Greek, Osage, Pular, Russian, Cherokee, Hausa [latn])

GitHub resources

Spec links

CSS3 Text: Transforming text
CSS Custom Text transformations

Tests

CSS3 Text: text transform

Gap analysis

A browser or application needs to correctly apply functions to the basic units of text, be they characters, character sequences, syllables, or words. Some scripts, such as those used in South and South-East Asia, require clusters of characters to be treated as a single unit for most editing operations. Many other scripts use combining characters such as accents, vowel signs, length markers, etc. that must be kept with the base character they are associated with.

When a user double-clicks on some text, the appropriate units should be selected. In scripts such as Chinese and Thai, 'words' should be selected even though they are not separated by spaces. In scripts such as Tibetan and Ethiopic, the word separator may be a visible character, rather than a space. It is important to understand how they should be treated when a 'word' is highlighted, or when text wraps, etc.

Requirements

Arabic Layout Requirements: Text Segmentation
Bengali Layout Requirements: Grapheme boundaries•Word boundaries
Chinese Layout Requirements: Characters and Principles for Setting them in Chinese Composition
Devanagari Layout Requirements: Grapheme boundaries•Word boundaries
Indic Layout Requirements: Indic orthographic syllable boundaries•Text segmentation
Khmer Layout Requirements: Word boundaries•Zero-width space (ZWSP) & Word-joiner (WJ)
Mongolian Layout Requirements: Selection rules•Cursor movement rules•Mouse pointer rules
Thai Layout Requirements: Word boundaries•Zero-width space (ZWSP) & Word-joiner (WJ)
Tibetan Layout Requirements: Tibetan Syllables•Text Segmentation in Tibetan
Orthography Notes: Grapheme boundaries•Word boundaries

GitHub resources

Tests

Gap analysis

Many scripts use native punctuation marks in addition to or instead of those used in Latin script text. In other cases, such as Greek, common Latin punctuation marks may mean something different from what they mean in English. It may be important to understand what needs to be supported, how these punctuation marks function, and how they interact with other operations applied to the text.

Another aspect of this relates to separation of characters or items in text. For example, French inserts a particular type of space before certain punctuation marks, and the traditional Mongolian script requires special spacing between word stems and certain suffixes.

Other special inline markers may appear when handling abbreviation, ellipsis, and iteration, bracketing information, or demarcating things such as proper nouns, etc.

See also 3.9 Text decoration, 3.10 Quotations, and 3.11 Inline notes & annotations, which are broken out into separate sections.

Requirements

Bengali Layout Requirements: Phrase boundaries: Danda & double danda
Chinese Layout Requirements: Categories and Usage of Punctuation Marks•Sizes and positions of Punctuation Marks•Atypical punctuation marks and their composition•Punctuation marks in Chinese•Line Composition Rules for Punctuation Marks•Prohibition Rules for Unbreakable Marks
Devanagari Layout Requirements: Phrase boundaries: Danda & double danda
Ethiopic Layout Requirements: Punctuation
Hangul Layout Requirements: Hangul Punctuation Mark Code Ranges based on Unicode•Hangul Punctuation Marks in Horizontal and Vertical Writing•Arrangement of 'Letter Face Position in Character Frame' for Full Width Parentheses•Arrangement of 'Letter Face Position in Character Frame' for Punctuation•Writing Process for Punctuation Marks, etc.
Indic Layout Requirements: Guiding principles of Line breaking for Indian languages
Japanese Layout Requirements: Characters Used for Japanese Composition•Line Composition Rules for Punctuation Marks•Examples of Items Jutting Out of the Kihon-hanmen•Character Positioning based on Kihon-hanmen•Design Grouping of Characters and Symbols depending on their Positioning
Mongolian Layout Requirements: Punctuation rules•Display rules for Mongolian space
Tibetan Layout Requirements: Tibetan Punctuation•Character List
Orthography Notes: Punctuation & inline features

GitHub resources

Some aspects related to the drawing of lines or markers alongside or through text involve local typographic considerations. For example, underlines need to be broken in special ways for some scripts, and the height of underlines, strike-through and overlines may vary depending on the script. For vertical text the placement needs to be to the right or left of the line of text, rather than under or over. Also, for many scripts bold and italic are not always appropriate for expressing emphasis or highlighting text, and some scripts have their own unique ways of doing it that involve adding special marks alongside letters or syllables, etc.

Requirements

Chinese Layout Requirements: Indicator Punctuation Marks > Fullwidth low line•Indicator Punctuation Marks > Emphasis Dots
Wikipedia: Underlines in non-Latin scripts (chinese)
Ethiopic Layout Requirements: Emphasis•Emphasis Within Prose•Emphasis With Ethiopic Wordspace•Section Headings
Japanese Layout Requirements: Examples of Items Jutting Out of the Kihon-hanmen•Line Gap Arrangement with Ruby and Other Objects•Composition of Emphasis Dots
Implementing Japanese Subtitles on Netflix: Boutens
Mongolian Layout Requirements: Text decoration
Tibetan Layout Requirements: Text Emphasis and Highlighting

GitHub resources

Spec links

CSS3 Text Decoration: Line Decoration: Underline, Overline, and Strike-Through•Emphasis Marks

Tests

Gap analysis

Quotation marks vary from language to language, not just from script to script. Also, you should expect variations in behavior when quotation marks are nested. Furthermore, the quotation marks used for vertical Japanese text are not the same as those typically used for the same text when horizontally laid out.

Requirements

Layout Requirements, Quotations: Bengali•Devanagari•Ethiopic•Javanese•Khmer•Lao•Tamil•Thai
Chinese Layout Requirements: Indicator Punctuation Marks
Japanese Layout Requirements: Line Head Indent and Line End Indent•Processing of Spaces between Paragraphs•Adjustment of Processing of Realm in Block Direction•Differences in Vertical and Horizontal Composition in Use of Punctuation Marks
Wikipedia: Quotations
Orthography Notes: Quotations

GitHub resources

Spec links

HTML5: The q element•The blockquote element
CSS3 Generated and Replaced Content: Specifying quotes with the 'quotes' property

Tests

Gap analysis

Ruby is used for phonetic and semantic annotations of East Asian text, including furigana, pinyin and zhuyin fuhao systems. In addition to positioning annotations along the correct side of the base text, there are many fine adjustments of the annotation and base text to support. Warichu is a kind of inline annotation where the note text is two approximately equal lines of half sized text, one above the other, but both within the normal line height.

Requirements

Chinese Layout Requirements: Interlinear annotations
Bopomofo on the Web
The Manual of the Phonetic Symbols of the Mandarin Script: english•chinese
Japanese Layout Requirements: Ruby and Emphasis Dots•Positioning of Jukugo-ruby•Inline Cutting Note (Warichu)•Superscripts and Superscripts•Furiwake Processing
r12a blog: What’s so difficult about jukugo ruby?
Use Cases & Exploratory Approaches for Ruby Markup
Implementing Japanese Subtitles on Netflix: Rubies
Orthography Notes: Inline notes & annotations

GitHub resources

Spec links

HTML5: The ruby element•The sub and sup elements•The blockquote element
CSS3 Ruby

Tests

Gap analysis

See also 4.6 Lists, counters, etc.

There are often specific rules about how scripts behave when a line is wrapped. For example, Chinese, Japanese and Korean tend to break a line in the middle of a word (with no hyphenation) – even in Korean, which has spaces between words. Others break lines at syllable boundaries. (See below for hyphenation.)

It is common for certain characters to be forbidden at the start or end of a line, but which characters these are, and what rules are applied when depends on the script or language. In some cases, such as Japanese, there may be different rules according to the type of content or the user's preference.

See also 4.2 Hyphenation, which is broken out into a separate section.

Background reading

Approaches to line breaking

Requirements

Arabic Layout Requirements: Line breaking
Bengali Layout Requirements: Line breaking
Chinese Layout Requirements: Prohibition Rules for Line Start and Line End•Prohibition Rules for Unbreakable Marks•Hanging Punctuation at Line End
Devanagari Layout Requirements: Line breaking
Indic Layout Requirements: Guiding principles of Line breaking for Indian languages
Japanese Layout Requirements: Possibilities for Line-breaking between Characters
Hangul Layout Requirements: Line Breaking Rules•Writing Process for Punctuation Marks, etc
Javanese Layout Requirements: Line breaking
Lao script layout requirements: Line breaking
Khmer script layout requirements: Line breaking
Mongolian script layout requirements: Line breaking
Thoughts on Word and Sentence Segmentation in Thai
Tamil Layout Requirements: Line breaking
Thai Layout Requirements: Line breaking
Tibetan Layout Requirements: Line breaking
Orthography Notes: Line breaking & hyphenation

GitHub resources

Spec links

CSS3 Text: Line Breaking and Word Boundaries

Tests

Gap analysis

Hyphenation in this sense means identifying broken words after text is wrapped at line end (and not only those involving a hyphen character). See 3.8 Punctuation & other inline features for information about the use of regular hyphens in text. Some writing systems don't use hyphenation, those that do have particular rules about how it should be applied that are typically language-specific.

Requirements

Indic Layout Requirements: Hyphenation
Latin Layout Requirements: Hyphenation•The Classical Rules of Hyphenation and Pagination
Orthography Notes: Line breaking & hyphenation

GitHub resources

Spec links

CSS3 Text: Hyphenation: the hyphens property

Tests

Gap analysis

Typographers have come up with various methods for effective full justification – causing the text to completely fill the line, in order to create visual alignment on both edges of a paragraph.

Typographic conventions for full text justification depend on the writing system, the content language, and the calligraphic style of the text. Results also tend to vary based on the capabilities of the layout engine and a given typographer’s preferences for weighing its various detrimental effects on typographic color and readability.

Background reading

Approaches to full justification

Requirements

Arabic Layout Requirements: Justification•Kashida•Tatweel•Combination of the Mechanisms
Arabic text justification
Justify Just or Just Justify (Arabic)
Typography questions for HTML & CSS: Arabic justification
Rule-based expert system for Urdu nastaleeq justification
Chinese Layout Requirements: Composition of Chinese and Western Mixed Texts•Paragraph Adjustment Rules•Punctuation Width Adjustment•Line Adjustment
Ethiopic Layout Requirements: Justification
Proposal to Reclassify Ethiopic Wordspace as a Space Separator (Zs) Symbol
Hangul Layout Requirements: Paragraph Adjustment•Line Adjustment Process
Japanese Layout Requirements: Line Adjustment•Opportunities for Inter-character Space Reduction during Line Adjustment•Opportunities for Inter-character Space Expansion during Line Adjustment•Paragraph Adjustment Rules
Latin Layout Requirements: Justification•Paragraphs and indentation
Mongolian script layout requirements: Text alignment
Tibetan Layout Requirements: Justification
Orthography Notes: Text alignment & justification

GitHub resources

Spec links

CSS3 Text: Alignment and Justification•Edge effects

Tests

Gap analysis

This section is concerned with spacing that is adjusted around and between characters on a line that is driven by aims different from the full line justification described in the previous section, although it will affect line layout. Examples follow. Many scripts create emphasis or other effects by moving apart the letters or syllables in a word. (This may even apply in Indic and SE Asian scripts, and in Arabic-based scripts which join up adjacent letters.) Other times, increasing or decreasing the typical space between characters aids readability. Scripts used for Japanese or Chinese may also seek to reduce space between adjacent punctuation, to avoid large gaps. On the other hand, it may be necessary to add a gap around embedded numbers or Latin text in scripts that don't normally use spaces around words. Some scripts prefer to indent the first line of a paragraph, rather than leave vertical gaps between paragraphs. And in some scripts space needs to be carefully controlled before and after certain punctuation marks, such as in French or Thai.

Requirements

Arabic Layout Requirements: Letter-spacing
Chinese Layout Requirements: Principles for Arranging Characters during Chinese Composition
Indic Layout Requirements: Letter spacing
Khmer Layout Requirements: Letter spacing
Hangul Layout Requirements: Writing Process for Punctuation Marks, etc.
Orthography Notes: Letter spacing

GitHub resources

Spec links

CSS3 Text: Spacing

Tests

Exploratory tests: Justification & letter-spacing

Gap analysis

Browsers and applications must accurately and comprehensively cover requirements for baseline alignment between mixed scripts. For example, Arabic script descenders go far below those of the Latin script, and Armenian characters need to be aligned with ideographic characters in Chinese appropriately with regard to comparative heights and baselines. European, Far Eastern and South Asian scripts tend to use different baselines, which must be aligned correctly. The complexity of characters in a script may affect line height settings. However, some scripts also expect larger inter-line gaps than others, in addition to the line height. This section covers these and other factors related to vertical spacing of lines.

Requirements

Arabic layout requirements: Multi-level baselines
Latin Layout Requirements: Baseline Grids
Mongolian script layout requirements: Baselines
Tibetan Layout Requirements: Baseline alignment

GitHub resources

Spec links

CSS3 Writing Modes: Inline-level Alignment
CSS Line Grid

Gap analysis

List numbering in vertical text runs across the page, but may need to be rotated to run horizontally. In a list where items are alternatively right-to-left and left-to-right, where does the counter go, and how is the list aligned? The CSS specification describes a set of simple and complex styles for counters to be used in list numbering, chapter heading numbering, etc. It also provides a generic mechanism for content authors to create their own counter styles. One has to consider not only the characters and algorithms to be used (numeric, alphabetic, additive, etc), but also what the separator or other associated marks look like.

Requirements

Ready-made counter styles
Bengali Layout Requirements: Counters
Devanagari Layout Requirements: Counters
Lao script layout requirements: Counters
Khmer Layout Requirements: List counters
Mongolian script layout requirements: Counters, lists, etc
Tamil Layout Requirements: Counters
Thai Layout Requirements: List counters
Orthography Notes: Counters, lists, etc.

GitHub resources

Spec links

CSS3 Counter Styles: Defining Custom Counter Styles: the @counter-style rule•Simple Predefined Counter Styles•Complex Predefined Counter Styles

Tests

Gap analysis

Does the browser or ereader correctly handle special styling of the initial letter of a line or paragraph, such as for drop caps?

Requirements

Devanagari Layout Requirements: Styling initials
Indic Layout Requirements: Initial letter styling
Latin Layout Requirements: Initial Capitals
Orthography Notes: Styling initials

GitHub resources

Spec links

CSS3 Selectors: The ::first-letter pseudo-element
CSS3 Inline Layout: Initial Letters

Tests

Gap analysis

In paged media for right-to-left scripts or vertically set documents, pages progress from right to left, and the front and back cover are in the opposite locations to, say, an English book. Unlike the general Western approach, the size of the main text block in Japanese pages (called the hanmen) is traditionally established by counting character cells, and margin space is then defined by the remaining space. Columns run across a page in vertically-set pages. The standard page layout for Mongolian is landscape, and horizontal scrolling within a page is much more important than in the West, so default scrollbar positions may need special support.

Other topics that belong here include any local requirements for things such as printer marks, tables of contents and indexes.

Requirements

Chinese Layout Requirements: The Type Area (or Printing Area)•Handling of Widows and Orphans•Headings & Page Breaks
Japanese Layout Requirements: Vertical Writing Mode and Horizontal Writing Mode•Page Formats for Japanese Documents•Specifying the Kihon-hanmen•Block Direction Setting Process of Lines, Paragraphs etc.
Hangul Layout Requirements: Page layout
Latin Layout Requirements: Paginating Single-Column Text•Heads•The Classical Rules of Hyphenation and Pagination
Mongolian script layout requirements: Bookbinding and the Direction of Page Turning•Paper direction•Paper scrolling direction•The scrolling direction of scroll bars•Columns
Orthography Notes: Page & book layout

GitHub resources

Language enablement index

Abstract

Status of This Document

1. Introduction

2. Text direction

2.1 Vertical text

2.2 Bidirectional text

3. Characters & phrases

3.1 Characters & encodings

3.2 Fonts

3.3 Font styles, weight, etc.

3.4 Glyph shaping & positioning

3.5 Cursive text

3.6 Transforming characters

3.7 Grapheme/word segmentation & selection

3.8 Punctuation & other inline features

3.9 Text decoration

3.10 Quotations

3.11 Inline notes & annotations

3.12 Data formats & numbers

4. Lines & paragraphs

4.1 Line breaking

4.2 Hyphenation

4.3 Text alignment & justification

4.4 Text spacing

4.5 Baselines, line-height, etc.

4.6 Lists, counters, etc

4.7 Styling initials

5. Layout & pages

5.1 General page layout and progression

5.2 Grids & tables

5.3 Footnotes, endnotes, etc

5.4 Page headers, footers, etc

5.5 Forms & user interaction

6. Changes Since the Last Published Version