A handful of typographic challenges

Cascading Style Sheets

Bert Bos (W3C) <bert@w3.org>

dotCSS
Paris, France
14 November 2014

Publishing has, over the ages, established practices of good typography. They are not the same for every script, and not even for every language that uses the same script. But they all have several rules that are a challenge for HTML and CSS, certainly if one wants to mark up the HTML in a logical, semantically rich way that supports such things as translation and re-styling.

It seems that HTML and CSS aren't yet rich enough for their purported aims. This talk doesn't try to propose concrete changes to the Web platform, but only poses the question. It does so by showing three examples of well-established typographical recommendations from languages in the Latin script.

On the other hand, it is also not forbidden to think that typography can change. Maybe, in the future, typographical traditions produce texts that are as readable for future generations as current texts for us, while being less ambiguous.

Typography: why follow tradition?

I'll show you rules from authorities such as…

Typography isn't enshrined in laws. When you place a comma incorrectly, the police won't come after you (usually). But publishers may have sets of rules that they enforce more or less strongly if you want them to publish your text. Many of these rules are very similar from one publisher to the next. They differ a bit more between different languages, and even more between different writing systems.

Books such as The Chicago Manual of Style and the rule books published by the French national printers and the Dutch national publisher capture rules that are pretty much the canon for texts in American English, French and Dutch, respectively. You can find similar books for most languages or regions. If individual publishers apply different rules, they rarely differ from these in more than one or two details.

Which means that these books and others like it are pretty authoritative sources of typographical rules.

Typography: why follow tradition?

But why should we follow those rules?

(Typography should help w/o being noticed)

But even if you aren't required to follow these rule books by your publisher (e.g., because your are your own publisher), it is good to adhere to them anyway. And that is basically for two reasons.

The rules have become like this over the centuries in a kind of typographical evolution. There is no claim that these rules are the “best” for ease of reading, but they have shown themselves to work well.

In addition, what people find easiest to read also depends on what they are used to. And they are very much used to these rules.

Example 1

Quotation styles

French  Mais, lui dis-je, tu le connais déjà.

Quotation styles

— Mais, lui dis-je, tu le connais déjà.

French « Mais, lui dis-je, tu le connais déjà. »

The guillemets are more commonly used as secondary quote marks nowadays, for quotes-within-quotes. In older French books, you often find both guillemets and em dashes, with different functions: the guillemets mark the beginning and end of a dialog consisting of a series of quotations. But that usage has all but disappeared.

French tradition has apparently chosen not to disrupt the reading with too much punctuation, but at the cost of some abiguity: sometimes it is not obvious where the second part of the quote start, in case there are more than two commas. It is the author's responsibility to avoid phrases that could be spoken both by the author and by the person quoted, or to reorder the sentence.

Quotation styles

— Mais, lui dis-je, tu le connais déjà.

« Mais, lui dis-je, tu le connais déjà. »

US English But, I said to her, you know it already.

The American style avoids confusion by closing and re-opening the quotes. However, for aesthetic reasons, the first comma is then put inside the quotes.

It says “US English” and not “English,” because the British tradition (like the Dutch, shown next) doesn't put the comma inside the quote marks. Punctuation is only inside the quote marks if it logically belongs to the quoted text.

Quotation styles

— Mais, lui dis-je, tu le connais déjà.

« Mais, lui dis-je, tu le connais déjà. »

‘But,’ I said to her, ‘you know it already.’

US English But, I said to her, you know it already.

Double quotes are an alternative to single quotes. They are currently used less and more often reserved for quotes within quotes. They follow the same typographical rules as single quotes.

Quotation styles

— Mais, lui dis-je, tu le connais déjà.

« Mais, lui dis-je, tu le connais déjà. »

‘But,’ I said to her, ‘you know it already.’

“But,” I said to her, “you know it already.”

Dutch Maar, zei ik tegen haar, je kent het al.

These are the most common Dutch quote marks nowadays. They look like the American ones because that's where they come from. Over time, Dutch typographers have come to consider the traditional double quotes (next slide) to use more space than necessary for its function.

However, they didn't adopt the American way of moving punctuation inside, but kept the traditional, logical way of keeping punctuation with the phrase it belongs to.

There is still one concession to aesthetics, though: If the sentence would logically end with a full stop, but the quoted phrase already ends with a full stop, exclamation mark or question mark, then the sentence's own full stop is omitted. I.e., don't end a sentence with !’. or ?’. or !’. but write !’ or ?’ or !’.

Quotation styles

— Mais, lui dis-je, tu le connais déjà.

« Mais, lui dis-je, tu le connais déjà. »

‘But,’ I said to her, ‘you know it already.’

“But,” I said to her, “you know it already.”

‘Maar’, zei ik tegen haar, ‘je kent het al.’

Dutch Maar, zei ik tegen haar, je kent het al.

Quotation styles

— Mais, lui dis-je, tu le connais déjà.

« Mais, lui dis-je, tu le connais déjà. »

‘But,’ I said to her, ‘you know it already.’

“But,” I said to her, “you know it already.”

‘Maar’, zei ik tegen haar, ‘je kent het al.’

„Maar”, zei ik tegen haar, „je kent het al.”

<p><q>Mais</q>, lui dis-je, <q>tu le connais déjà.</q>

Based on the most logical of the punctuation styles (Dutch and British in this case), one would expect a mark-up like this, with two Q elements and the comma outside the Q. But that would make the style rules for French pretty hard and would put the comma in the wrong place in the American text.

Quotation styles

— Mais, lui dis-je, tu le connais déjà.

« Mais, lui dis-je, tu le connais déjà. »

‘But,’ I said to her, ‘you know it already.’

“But,” I said to her, “you know it already.”

‘Maar’, zei ik tegen haar, ‘je kent het al.’

„Maar”, zei ik tegen haar, „je kent het al.”

<p><q>Mais</q>, lui dis-je, <q>tu le connais déjà.</q>
<p>&mdash;&thinsp;Mais, lui dis-je, tu le connais déjà.

You can, of course, just type the quote marks by hand and not mark anything up, but that makes the job a bit harder for (automatic) translators and for publishers who would like to change the quote marks.

(A more sophisticated mark-up language than HTML might also connect the two parts of the quote. TEI does that, but HTML is difficult enough for most people, let's not demand that people use TEI. One may also discuss how much of the other punctuation, apart from the quote marks, should be generated by the style sheet.)

Example 2

Punctuation and style

English Not the cat, the dog!

Most languages that use the latin script no longer put a thin space before the punctuation. And with the punctuation against the preceding word, it looks best if the punctuation has the same style as that word.

Punctuation and style

Not the cat, the dog!

FrenchPas le chat, le chien!

French typography still prefers half a space before tall punctuation marks, and thus there is no need to put the punctuation in italics unless it's logically part of the italic phrase.

Punctuation and style

Not the cat, the dog!

Pas le chat, le chien !

Also French Pas le chat, le chien !

However, even in French, small punctuation (period and comma) are put close to the preceding word and thus follow the style of the word. (Of course, for periods and commas, the difference is often hardly noticeable.)

Punctuation and style

Not the cat, the dog!

Pas le chat, le chien !

<p>Not the cat, the <em>dog</em>!

or?

<p>Not the cat, the <em>dog!</em>

But the question is how to mark it up. Logically, the punctuation belongs to the sentence, not the emphasized word. The French is thus more logical here.

Punctuation and style

Mark it up logically?

Lacking intelligence in CSS to style the punctuation correctly based on the style of the preceding text, we currently have no choice but to mark up the document according to the intended style. And any translator will just have to be smart enough to change the mark-up. (For inline mark-up, that is not uncommon anyway.)

Punctuation and style

And then what's next?

Logical mark-up is attractive. It makes it easier to let the computer do interesting things with the text, including automatic translation, summarizing, and counting certain text phenomena.

If we favor logical mark-up and work on a smarter style language, we may even get to a point where ordinary people can automatically get typography that currently still requires manual work by professional typographers, such as the correct types of spaces in French or a little extra space (the italic correction) after italic words that end with tall letters.

The previous examples and those above can be done manually, before the text is sent to a device to be formatted and displayed, by putting in the right characters by hand. But there are typographic rules that, on the Web platform, cannot be applied by a human editor. The next topic is an example of that.

Last example

Styling indexes

Not 142, 143, 143 formaat, papier 142–143
formules 65, 104, 172, 173
–, breken van 173, 173, 174
–, coderen van 214
–, Griekse letters in 172

Note that XSL can do this. Should CSS?
Use JavaScript? But…

Indexes can be quite complex. They encode a lot of information in a small space. But I'd like to concentrate on the range of page numbers in the first line.

Page numbers can obviously only be filled in after layout. There are CSS drafts that propose how to do that. Thus you can reprint a book with a different style or change the font size in an e-book and still have the correct page numbers in the table of contents or any cross-references. Some programs indeed already implement them and by now lots of books are printed with CSS thanks to these proposed extensions.

But this index goes further: After the page numbers have been determined, series of adjoining page numbers in the same style are collapsed into ranges.

XSL has special features for indexes to let the designer choose various common ways of collapsing duplicates and creating ranges. CSS does not. Should it? One suggestion has been to let designers code their indexes in JavaScript, but that has obvious disadvantages: not declarative, too difficult for most designers, JavaScript not available in all formatters…

XSL is still ahead of CSS in many ways, at least for paginated rendering of documents. But there are currently no plans to extend XSL further, while CSS is still being developed, and as a result (and also because CSS is often easier to use for the things that it already does) many publishers are changing to CSS and asking for the missing functionality to be added. Thus, it is possible that CSS in the future will support more sophisticated handling of (page) numbers.

So what next?

Tell me!

These slides just pose some questions…

Thanks! !

I think these traditional typographic rules are worth keeping and applying. I think it should, eventually, also be possible for authors & typographers to apply them more easily while using semantic mark-up.

Maybe the conclusion for now is (1) to apply style rules like this already today, where possible; and (2) to think about how to improve the Open Web Platform (i.e., don't assume there is only CSS) to make it easier to use semantic mark-up and to switch between styles.

These slides can be found at: http://www.w3.org/Talks/2014/1114-CSS-Paris