This document describes requirements for the layout and presentation of text in languages that use the Lao script when they are used by Web standards and technologies, such as HTML, CSS, Mobile Web, Digital Publications, and Unicode.

This document describes the basic requirements for Lao script layout and text support on the Web and in eBooks. These requirements provide information for Web technologies such as CSS, HTML and digital publications about how to support users of Lao scripts. Currently the document focuses on Lao as used for the Lao language. The information here is developed in conjunction with a document that summarises gaps in support on the Web for Lao.

The editor's draft of this document is being developed by the Southeast Asian Layout Task Force, part of the W3C Internationalization Interest Group. It is published by the Internationalization Working Group. The end target for this document is a Working Group Note.

Introduction

About this document

Some text goes here.

Gap analysis

This document is pointed to by a separate document, Lao Gap Analysis, which describes gaps in support for Lao on the Web, and prioritises and describes the impact of those gaps on the user.

Wherever an unsupported feature is indentified through the gap analysis process, the requirements for that feature need to be documented. This document is where those requirements are described.

This document should contain no reference to a particular technology. For example, it should not say "CSS does/doesn't do such and such", and it should not describe how a technology, such as CSS, should implement the requirements. It is technology agnostic, so that it will be evergreen, and it simply describes how the script works. The gap analysis document is the appropriate place for all kinds of technology-specific information.

Other related resources

The document International text layout and typography index (known informally as the text layout index) points to this document and others, and provides a central location for developers and implementers to find information related to various scripts.

The W3C also maintains a tracking system that has links to github issues in W3C repositories. There are separate links for (a) requests from developers to the user community for information about how scripts/languages work, (b) issues raised against a spec, and (c) browser bugs. For example, you can find out what information developers are currently seeking, and the resulting list can also be filtered by script.

Lao Script Overview

Lao is an alphabet. This means that it is phonetic in nature, where each letter represents a basic sound.

The script was originally an abugida, but since the script reforms leading up to 1960 it has been alphabetic. The syllable is the unit for various aspects of the behaviour of the script. Lao is a tonal language, and the script is designed to reflect tonal information.

The alphabet is split into vowels and consonants. The consonants are grouped into classes that affect the default tonal behaviour of a syllable. There are no independent vowels. Where there is no consonant to support a vowel sign, the character ອ [U+0EAD LAO LETTER O] is used as a support. Vowel signs are typically used in combinations to form the vowel sounds of a syllable.

Words are not separated by spaces. Text runs horizontally, left to right.

Lao script summary can be read for a high level overview of characters used for the script, and some basic features. Text from that the latter part of that page was used for the initial version of this document.

Text direction

Lao is written horizontally, left to right.

Structural boundaries & markers

Quotations

The default quote marks for Lao should be [U+201C LEFT DOUBLE QUOTATION MARK] at the start, and [U+201D RIGHT DOUBLE QUOTATION MARK] at the end.

When an additional quote is embedded within the first, the quote marks should be [U+2018 LEFT SINGLE QUOTATION MARK] and [U+2019 RIGHT SINGLE QUOTATION MARK].   This is according to CLDR – need to check.

Text boundaries & selection

TBD

Inter-character spacing

TBD

Line & paragraph layout

Line breaking

Although Lao doesn't use spaces or dividers between words, the expectation is that line-breaks occur at word boundaries.

Unlike Thai or Khmer, it is fairly straightforward to parse individual syllables in Lao, because its alphabetic nature makes it possible to identify syllable-final consonants. (Note that syllable wrapping must include any syllable-initial clusters involving h or l.)

While nearly all syllables can be argued to be words in their own right, there is still a preference for keeping multi-syllabic words (eg. ປະເທດ pa thēt country) together when wrapping text to the next line. For this, an application typically needs to use a dictionary to parse Lao text.

However, widely used software automatically inserts ​ U+200B ZERO WIDTH SPACE (ZWSP) in Lao text at word or syllable boundaries, and many web pages use such inserted ZWSP characters to get browsers to wrap correctly.

If a dictionary fails to keep two or more syllables together as needed, it should be possible to use the Unicode character U+2060 WORD JOINER between the two syllables. This is an invisible character, equivalent to a zero-width no-break space, and used to prevent line-breaks.

Other ideas to consider:

Also "Some writers are beginning to use optional hyphens for syllable boundaries within words, which helps readability at line breaks." (@jmdurdin)

"One other difference from Thai is that European style punctuation (period, comma) is much more widely used in Lao than in Thai, with the consequence that traditional spaced phrase punctuation is now often incorrectly used, with spaces sometimes inserted within words." (@jmdurdin)

Counters

Counters are used to number lists, chapter headings, etc.

Lao uses a numeric counter style, based on the decimal model, and using the standard Lao digits, '໐' '໑' '໒' '໓' '໔' '໕' '໖' '໗' '໘' '໙' in a decimal pattern.

1 ⇨  2 ⇨  3 ⇨  4 ⇨  11 ⇨ ໑໑ 22 ⇨ ໒໒ 33 ⇨ ໓໓ 44 ⇨ ໔໔ 111 ⇨ ໑໑໑ 2222 ⇨ ໒໒໒

Examples of counter values using the Lao numeric counter style.

Acknowledgements

Special thanks to the following people who contributed to this document (contributors' names listed in in alphabetic order).

Anousak Anthony Souphavanh, Arthit Suriyawongkul, Ben Mitchell, James Clarke, John Durdin, Martin Hosken, Norbert Lindenberg.

Please find the latest info of the contributors at the GitHub contributors list.