[Workflow submission] Bert Bos - The limits of "single-source publishing" with XML and CSS

Position paper:

     The limits of "single-source publishing" with XML and CSS

                         By: Bert Bos (W3C)

The promise: multiple style sheets via Media Queries
----------------------------------------------------

The model for single-source publishing with XML (or HTML) and CSS is 
that the only thing needed to add another kind of output is one more 
style sheet. Each style sheet is labeled with a "Media Query" that 
describes the kind of media the style is meant for. The process that 
produces a certain output only has to select the style sheet(s) with the 
appropriate label and ignore the others. E.g., 'speech' is a Media Query 
for a style sheet for a speech synthesizer, 'print and (color)' for a 
color printer, and 'screen and (max-width: 40ch)' for a screen that is 
40 letters wide or less.

In addition, the model says that style sheets are often reusable, i.e., 
there are classes of documents that are similar enough that the same 
style sheets can be used for all of them. E.g., all books in a certain 
series, or all articles for a certain conference can share style sheets, 
with maybe minor overrides for specific cases.

The problems
------------

In practice, the model works less well than intended. The reasons fall 
into several categories:

1) The mark-up of the document isn't always sufficiently rich. To be 
able to distinguish and style different parts of a document, the style 
sheet relies on element types, attributes and context (i.e., element 
types and attributes of nearby elements). If the author hasn't marked-up 
different kinds of contents differently, the style sheet cannot style 
them differently. This may happen, e.g., when the author hasn't realized 
that a certain kind of content needs different treatment on a particular 
kind of media. E.g., table mark-up is often not rich enough if the 
author hasn't thought about speech output, or an image caption isn't 
distinguished from a normal paragraph and thus cannot be restyled when 
the image is moved or scaled on a small screen.

2) The Media Queries do not have sufficiently many media types and media 
features, or those that exist aren't clearly defined. The media types 
are only nine, and three of them (screen, handheld and print) cover 
almost the whole range of visual media: paper, e-readers, smartphones, 
tablets, laptops and desktop screen. Only video projection is separate. 
The official specification offers no help to decide if an e-reader is a 
'screen' or a 'handheld' or a 'print'. And the media features, of which 
there are thirteen and which are meant to distinguish subclasses of each 
media, do not help much either. They do distinguish screen sizes, but 
not, e.g., if the display is paginated or scrolling.

3) CSS isn't sufficiently rich to re-order content. The document is, 
ideally, marked-up in a logical order that makes the document the most 
understandable if it is rendered with no style sheets at all (apart from 
the default style of the renderer, such as the default HTML style in the 
case the document is HTML or XHTML). But on a large screen, the layout 
is two-dimensional, and on a small screen the layout is, although 
linear, not necessarily in the original order. Proposals for layout 
"templates," which would allow reordering, were made in 1996 and updated 
in 2005, but have so for not passed the stage of prototypes. Combining 
CSS with XSLT, while theoretically a solution, is rarely done in 
practice, for various reasons. E.g., EPUB3 does not include XSLT.

4) Implementations do not follow the specifications. Sometimes 
deviations from the specifications are simple bugs and are a temporary 
problems. But sometimes they cannot be fixed so easily. This is a the 
case, e.g., with screen readers. The model of single-source publishing 
mentioned above is that a screen shows the result of applying one style 
sheet and the speech synthesizer speaks the result of applying another. 
In practice, screen readers try to interpret the screen instead, because 
the speech style is too often of very low quality and the screen style 
often adds important information that is not found in the document 
itself. But as long as screen readers work like that, designers cannot 
use the screen and speech styles as intended. Chicken and egg... EPUB3 
includes support for speech style sheets and some e-readers offer speech 
output. However, initial reports suggest that they aren't as accessible 
as they should be.

5) Transclusion is not well handled by CSS. This includes the common 
case that a document consists of multiple parts: not just a text and 
separate images, but several text parts, e.g., one file per chapter. 
EPUB describes a packaging format for such multi-file documents, but CSS 
has no support for it. Giving the main file (the one that links to all 
the chapters) to CSS can at most result in a rendering where each 
chapter is a fixed-size, scrolling box. The SEAMLESS attribute of 
IFRAMEs in HTML5 is an example of the same problem, for which CSS has no 
solution yet.

Ideas towards solutions
-----------------------

1) Mark-up formats such as XHTML (which is used in EPUB, e.g.) are 
probably rich enough to express all the different roles a piece of text 
can have, at least sufficiently to allow styling. Solutions for 
improving mark-up may involve educating authors or giving them automated 
tools (validators and lint-like tools).

There are also cases where the mark-up is rich enough in principle, but 
CSS selectors aren't yet powerful enough to make use of it. A typical 
case is that the distinguishing factor between two elements is something 
that is *inside* the elements: E.g., a section should be styled 
differently when it starts with a heading than when it contains no 
heading. There are proposals already being discussed in the CSS WG that 
should solve this particular case and several others.

2) The list of Media Types in the Media Queries dates from 1997 and 1998 
and is based on predictions about devices that didn't exist at the time. 
The Media Features, which distinguish subclasses of those media, were 
defined in 2001 (although they didn't become a standard until 2012). It 
is probably time to make a list of current and expected devices and 
define how they map to each media type and media feature, and, where 
needed, define new types or features.

E.g., e-readers could be a new media type. Or they could be classified 
as 'print', but distinguished from paper by a new feature 
'(interactive)'. Or they could be 'handheld' with a feature 
'(paginated)'. Another example: An electronic billboard (the current 
terminology is "digital signage") is like a screen, but it is not 
interactive. However, it is dynamic and can support animations, unlike 
paper.

3) CSS does not support document transformations, such as those provided 
by XSLT. That is to keep CSS easy to understand and use, and to better 
support WYSIWYG editing of documents. But even so, it could in theory do 
much more than it does now. (It currently has the concept of float, 
which allows elements to move left or right in a limited way, it has a 
'caption-side' property, which sometimes allows an element to move above 
or below a sibling, and the "flexbox" properties; but all of these can 
at most reorder siblings and they have other limitations and side-
effects, because they were meant to solve very specific problems.)

The idea of a "layout template" is a very old one and one that appears 
easy to understand and easy to use in CSS. A template defines one or 
more "regions" that are laid-out on a grid or other suitable layout 
framework. The regions can optionally be combined (a.k.a. "chained," as 
a way to make disjoint regions). The elements of the document are then 
each assigned to a region or chain of regions.

CSS in fact already has a kind of layout template, in the form of a 
predefined page template. This is a simple template designed to handle 
the most common kinds of running headers and footers in paginated 
output.

(It is probably possible to use layout templates also as page templates, 
when the predefined page template is not enough. The behavior of running 
headers is, in fact, not a function of the template, but is defined by 
the 'content' property of CSS.)

Nevertheless, it is probably not possible to rely on templates for all 
layout. Sometimes document transformations are still necessary. E.g., 
there are no template-based proposals for re-ordering speech yet. And to 
style documents such as RDF, which has no defined order, some 
transformation to a known order is needed. In the case of RDF, that 
might be done with SPARQL.

4) The problem that bugs cannot be fixed because the resulting user 
experience would as often be worse as it would be better, is probably 
not easily solved with technical measures. Maybe if speech synthesis 
improves enough that people will *want* to listen to their book instead 
of reading it, or if browsers and reading devices offer an easy switch 
so that the user can try which of the document's different style sheets 
gives the best result, then authors and designers will do more effort to 
use the features that make each media interesting.

There is a lack of knowledge, at least a lack of widely shared 
knowledge, and probably a lack of research, on how to make e-readers 
accessible. E.g., is it interesting for a listener to know that the 
visual representation is paginated, and if so, what parts of the 
generated text (running headers, page numbers, references to page 
numbers, etc.) should be spoken and which should be omitted?

5) The notion of "intrinsic size" in CSS is so far only defined for 
external resources with a fixed aspect ratio. CSS can handle 
transclusions where, when the style sheet sets the width, the height is 
always the width times a certain constant. But the intrinsic size of a 
text document works differently: when you increase the width, the height 
typically *decreases,* and not in a linear way. Maybe all that is needed 
for an author is a simple keyword to indicate that the height depends: 
'height: min-content'. But there may be security/privacy implications 
that make the definition tricky. If the device supports JavaScript, a 
script may derive information from the resulting size of a transcluded 
resource (e.g., that it failed to load) and send that information 
somewhere, even if it cannot see what is inside the transclusion.

Giving a transclusion its proper height is enough to make it visually 
part of the document. It doesn't become part of the DOM and it keeps its 
own style sheet, which also means that its margins do not collapse with 
the element it is transcluded in. But that is probably not more than a 
minor inconvenience.

The situation is different for speech output. The transcluded document 
should be read out loud, reading should continue in the outer document 
when the transcluded one ends, and navigation should be possible almost 
as if the transclusion was an actual part of the outer document: E.g., 
skipping to the next heading should skip into the transclusion if the 
next heading is there and out of it if there is a next heading in the 
outer document.

Some research is needed. Maybe in this case a new keyword on the 'speak' 
property is useful. Or maybe not: the 'content' property can already 
express that an external resource is transcluded. Maybe all that is 
needed is to define how speech works for transcluded content that 
contains text.

Conclusion
----------

The idea that a single document overlaid with a single mark-up is very 
often enough for publication on widely different media has proved to be 
correct to some extent. But style sheet technology hasn't kept up with 
the appearance of new media (or new variations on old media). CSS is 
also lacking functionality because transformation languages such as XSLT 
are not used as much as expected.

Research is needed, and in part already underway, to draw up new 
requirements, in particular to enhance Media Queries and to improve, 
replace or complement CSS.

Bert
-- 
  Bert Bos                                ( W 3 C ) http://www.w3.org/
  http://www.w3.org/people/bos                               W3C/ERCIM
  bert@w3.org                             2004 Rt des Lucioles / BP 93
  +33 (0)4 92 38 76 92            06902 Sophia Antipolis Cedex, France