W3C International Internationalization / Localization

This page is no longer maintained and may be inaccurate. For more up-to-date information, see the Internationalization Activity home page.


In principle, whether to hyphenate or not is a style question and CSS should develop properties to control hyphenation. In practice, however, for most languages there is no algorithm or dictionary that gives all (and only) correct word breaks, so some help from the author may occasionally be needed.

The author could indicate possible break points either in the style sheet or in the HTML itself. In the style sheet, it could take the form of a table of hyphenation exceptions, but it would be difficult to distinguish homonyms. For HTML, a proposal has been formulated by Peter Svanberg and Olle Jarnefors, using a new element HYPH. It was announced on the html-wg mailing list. Example for Dutch: cafeetje -> ca-fé-tje, zoëven ->zo-e-ven

ca<HYPH></HYPH>f<HYPH BEF="é">ee</HYPH>tje
zo<HYPH AFT="e">ë</HYPH><HYPH></HYPH>ven

(Somebody suggested the same idea to me in February '96. Unfortunately, I can't remember who it was...)

Mirsad Todorovac suggested

<HYPH points="chocola-tje,cho-co-laatje">chocolaatje</HYPH>
<HYPH points="zo-even,zoë-ven">zoëven</HYPH>

instead. This would maybe also allow an ordering of break points that are more desirable than others. E.g., compound words should be broken between the constituents, if possible.

Of course, the simple cases could also be handled with a `soft hyphen' (&shy;), if browsers would only sup­port it. (<-- see the &shy; here?). Unicode also has the `zero width space' (&#8203;) which allows breaking without inserting a hyphen.

W3C Bert Bos, i18n coordinator
Last updated $Date: 2008/05/07 17:14:14 $