Pagination Requirements

From Digital Publishing Interest Group
Jump to: navigation, search

Requirements for a System to Render and Paginate Dynamic EBook Content


The display of ebook content has requirements beyond that of a traditional web browser. While browsers are designed to display dynamic web page content and most reflowable ebook content is built upon the same foundations (for example, HTML and CSS) as the web, the expectations of users can be quite different. Ebook content is typically paginated, and users of ebooks expect a different level of interaction with their books than they do with web pages. For instance, they may want to make highlights that persist across content loads and even over multiple devices. This document is intended to address the functionality a User Agent would need to provide to an ebook reading application built on web infrastructure, either as a web app or using a hybrid design (a native device application accessing a web component).

Since this document is targeted at reading applications built on top of web technologies, it will assume the underlying format for the content is HTML and CSS. While other formats exist for both reflowable and fixed layout content, addressing their needs is outside the scope these requirements.

Common Implementation Patterns

Ebook reading applications are not as rigorously defined as a browser User Agent. The most common web based format for such ebooks is the EPUB specification which relies heavily on both HTML and CSS specifications to define rendering and layout, and adds some additional levels of conformance for Reading Systems (defined in the epub spec, but essentially the thing an end user will use to read an epub-based ebook). However, it does not define exactly how such a Reading System should be written, nor does it cover the interaction between the Reading System and User Agent. This has left the actual implementation and interaction details up to developers which has led to a number of solutions. This section will briefly address some of the more common methods.

Web based reader

In this scenario, the ebook application is implemented in javascript as a web app. While this may be targeted at a single browser, ideally it should run on most current browser engines. Since this is entirely javascript based, all behaviors must be implemented using the well-defined interfaces for DOM and style manipulation. All application controls (page turns, bookmarks, font zooming, etc) must happen using only existing DOM and style manipulation methods.

Hybrid reader

A hybrid reader is one where the actual reading application is written around an existing browser engine. There are several ways to accomplish this. For instance, this can be done by employing the system native embedded browser component (colloquially, a “webview”) or by separately porting and bundling an open source engine (for example, webkit or blink). In these cases, the browser engine may only be displaying the actual ebook content, with UI elements, annotations, downloads, user authentication, etc all happening in the native application. These applications may make use of extra APIs (for instance, for setting fonts) or even employ private/custom APIs to manipulate content.

Custom User Agents

It is also possible to implement an ebook reader from scratch, essentially implementing an entire User Agent targeted to ebook reading. This provides the ultimate level of integration, as ebook-specific features can be implemented at the core of the layout engine. This allows for better line breaking, actual page support, font replacement, etc without being limited by the APIs provided. This is also the most work, and a particularly daunting task that is becoming untenable due to the increasing complexity of browser engines.

These three approaches cover a broad range of actual implementations, and each come with their own sets of pros and cons. Some Reading System implementors have used more than one of these methods (for instance, having hybrid mobile readers and a browser desktop reader). For the purposes of this document, we will only be considering the first two implementations, as they are most in need of additional javascript (or other) public APIs. By providing these high level APIs, we can help avoid the use of private APIs and hard to maintain custom ports while encouraging code reuse across implementations.

Implementation Requirements

This section will detail some specific requirements for the presentation and navigation of ebook content. While some of these requirements will also be useful to general web pages, they are called out here as being of particular interest to ebooks. No attempt is made to specify what these APIs should be, though in some cases existing interfaces may be mentioned..

Ability to turn pagination on and off

Ebook applications tend to support pagination, and it is a feature most ebook users expect from their reader. While some implementations may offer users a choice of paginated vs non-paginated modes (and some content may express a preference for one mode over the other), most reading systems will at least offer users the choice to paginate. Current mechanisms to paginate content range from use of CSS overflow to CSS multi-col, behind-the-scenes content scrolling to DOM alteration as well as CSS Regions on browsers that support it.

Ability to select page(s) for display

As a user reads a section of content, they will find the need to change pages or risk terminal boredom. This can be simple (“show the next/previous page”) to more complex (“Show heading 3 of chapter 6”).

Ability to display multiple pages (2-up)

Web page content is typically shown in a single, long scroll. Book content, however, is typically shown as two facing pages (in modern physical books). It is not uncommon for ebook reading systems to mimic the 2-up form of print, especially when the display is much wider than it is tall (for instance a tablet in landscape or a typical computer screen). Without a 2-up mode, text runs either become exceptionally long, or margins become excessively large. In addition to 2-up mode, where contiguous pages may be displayed side-by-side, reading systems may also want to have more than 2 pages in a row or column (for navigation purposes in a visual table of contents, or some type of thumbnail view), and it may want to display discontiguous pages from the same chapter without reloading all the chapter content. Some mechanism must be available to choose what pages are currently being displayed and where they should appear to fulfill this requirement.

Determine the page(s) an element is on

Given an arbitrary DOM node, reading systems must be able to determine what page or pages the node appears on. For instance, given an arbitrary highlight of text content, a reading system must be able to turn to the page that the highlight starts on. Since there is no guarantee of an element with an id existing at or near the location of the markup, there must be some mechanism to request that a particular location be displayed with at best access to a DOM node. Additionally, being able to determine the dynamically generated page number for such a selection may be required by a reading systems user interface (for instance, to display a page number next to every annotation or bookmark).

Determine exactly where in text a page break happens

This information may be used to determine how far in a book the user has read or to display pages based on the content of text nodes. Typically javascript has very little insight into the actual display of content. While DOM nodes are easily accessed, determining how they are being displayed is tricky at best. For instance, there is no way to determine how many lines of text a paragraph holds, or what the first word of the third line is. Although workarounds exist for this (for example, determining the number of client rects generated by a text node), they are imprecise and difficult to use. Specifically, take the case of a single text node that spans 3 pages. To display a page that contains a specific range of that text, the reading system would need to know not just which pages the text node falls on, but exactly where the breaks occur within the text node.

Map line-box content to source

A mechanism for finding the source content in the DOM that is on a specific line may be used for setting highlights, breaking pages, handling window and orphans, etc. It is possible this would not be required in the cases that all other pagination use cases are fully addressed, but it is a common need for various implementations.

Notification of font loading

Font load events are already being worked on and implementations are appearing. Included here due to the importance of knowing when metrics will be accurate for pagination/display. Often the font resources are local, so deferring rendering and layout until they are loaded can reduce flashing and frequent changes to pagination results. Flashing is particularly problematic on device that do not support rapid refreshes (for example, eInk displays).

Stable mechanism to convert between DOM and source

In the world of dynamic web applications, being able to select a section of text and save that selection information (not just the text contents of the selection) for later use is rare. However, this is extremely common in ebooks. Often, these locations are synced across multiple devices and browser engines, and it can be difficult to remember their precise locations to restore them after dynamic pagination or on another device.

Ability to segment content

Due to memory limitations on some devices and a potential requirement to show pages from multiple chapters at the same time, the ability to segment content is important. That is, the ability to load only a portion the DOM for a given chapter, perhaps from certain previously calculated ranges.