Abstract

This document introduces Portable Web Publications, a vision for the future of digital publishing that is based on a fully native representation of documents within the Open Web Platform. Portable Web Publications achieve full convergence between online and offline/portable document publishing: publishers and users won't need to choose one or the other, but can switch between them dynamically, at will.

Status of This Document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

This document outlines a general vision and should not be considered a technical specification. Instead, its goal is to outline that vision and the possible technical directions to achieve it; more detailed technical work should be documented in separate documents.

This document was published by the Digital Publishing Interest Group as a Working Draft. If you wish to make comments regarding this document, please send them to public-digipub-ig@w3.org (subscribe, archives). All comments are welcome.

Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. The group does not expect this document to become a W3C Recommendation. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.

This document is governed by the 1 September 2015 W3C Process Document.

Table of Contents

1. Our Vision

Our vision for Portable Web Publications is to define a class of documents on the Web that would be part of the Digital Publishing ecosystem but would also be fully native citizens of the Open Web Platform. In this vision, the current format- and workflow-level separation between offline/portable and online (Web) document publishing is diminished to zero. These are merely two dynamic manifestations of the same publication: content authored with online use as the primary mode can easily be saved by the user for offline reading in portable document form. Content authored primarily for use as a portable document can be put online, without any need for refactoring the content. Publishers can choose to utilize either or both of these publishing modes, and users can choose either or both of these consumption modes. Essential features flow seamlessly between online and offline modes; examples include cross-references, user annotations, access to online databases, as well as licensing and rights management.

The precise definition of Portable Web Publications is provided in the section 4.1 Portable Web Publications. For the sake of this introduction, suffices to say that a Portable Web Publication is a collection of content items (e.g. pages, chapters, modules, articles) whose content is compatible with Web usage, and structured as a single, self-contained logical unit. Individual items can consist of text, images, graphics, possibly interactive mathematical or chemical formulae, as well as audio and video. These documents, by definition, have a default, linear “reading order”, however the user may choose to skip around in the content just as with a book on paper; alternatively, interactive aspects of the content may alter the reading order on behalf of the user.

2. Why work on this now?

Digital Publishing can be considered to be at a tipping point. Digital publishing formats like EPUB have been broadly adopted globally for trade ebooks, and are starting to gain adoption among textbook publishers as well as corporate marketing departments. However, these formats have largely been seen as “offline” formats up until now. Various browser extensions supporting such complex publications exist, and other solutions are available for delivering these publications in browsers. Browser- and cloud-based solutions require relatively complex server and/or client software. In many cases browser- and cloud-based solutions depend on a proprietary transformation of the packaged files into formats more suitable to network delivery. A focused effort to make digital publications first-class Open Web Platform citizens will result in a significant reduction in the complexity of deploying publishing content into browsers for both online and offline consumption. Further, this focused effort will increase the momentum of digital publication and associated Web adoption across communities who are looking for an open, non-proprietary, next-generation portable document format.

The broader Web Platform can also be considered to be at a tipping point. Mobile platform web site use is diminishing in favor of native applications. Hybrid applications that use web content alongside native application technology, and web-technology-based system applications are growing. The specific means of delivering hybrid and web-technology-based system applications is currently proprietary to specific applications frameworks and/or browser platforms. The point of Portable Web Publications is to increase problem solving momentum in package, metadata, and offline support applicable to both portable publications and installed applications. Open and native solutions to replace proprietary packaging, metadata, and offline support are intended to ensure the broadest possible general adoption of the Open Web Platform.

In many respects, the convergence is already happening. In a number of areas (e.g., educational publications, travel books, etc.) publishers already exploit the advanced possibilities of packaged publishing formats to produce highly interactive documents whose features are very close to what one is used to on the Web (see the separate section 6. Portable Web Publications and EPUB 3 for some examples). And the converse is also true: tutorial and introductory articles have appeared on the Web that have the quality of traditional publications that one was used to seeing in a scientific magazine, but combined with the interactive possibilities of the Web (Mike Bostock’s article on visualizing algorithms or Bret Victor’s article on visualization are just two of several possible examples). However, the convergence still has a long way to go, and this is the topic that this document, and the concept of Portable Web Publications, aim to explore.

Image showing a laptop accessing unpackaged multiple files for online use.  With same content packaged for use on a mobile cellphone.
Fig. 1 The same content can be turned into an archived file and back without any inherent changes to the core content or associated digital assets. (Picture is available directly SVG or PNG formats.)

3. What are the areas of interest?

The convergence of Digital Publishing and the Open Web Platform provides a common set of solutions and opportunities to various stakeholders:

3.1 Publishers

Book publishers are investing in the development of technical expertise in web technologies. While gaining understanding of technical topics is important to new and future publishing workflows, the lack of communication between the trade publishers and web application developer communities is resulting in unnecessary duplication and investments in effort.

Collaboration between the Web content development and publishing communities will result in major benefits to publishers. Adopting a universal and interoperable format means publishers can concentrate on engaging content authors in the production of high quality content. The web content development community can be relied on to deal with sophisticated technical issues (e.g., CSS, SVG). Potential future web content formats (e.g., 3D rendering) and various interactive web programs (e.g., visualization tools like D3) will naturally flow into the publishing realm through Portable Web Publications, hence increasing publishers' opportunities to sell new content products across the board.

Realizing new opportunities is a reality for publishers traditionally considered to be on the leading edge of technological advances in working with content. These publishers include STM and educational publishing houses, as well as scholarly and journal publishing organizations (see the section 3.2 Scholarly Journal and STM Publishers).

A converged platform will support more tools and services and a much larger population of trained practitioners compared to the current state of working in parallel universes.

3.2 Scholarly Journal and STM Publishers

Scholarly journal publishers also provide articles for download these days. The most popular distribution format for journal articles continues to be [PDF] as a direct reflection of the scholarly community which highly prioritizes linear text and preservation of print typography. Indeed, the original goal for scholarly publishers to make files available online was to enable readers to download and print content directly, instead of borrowing a paper copy of a journal issue and photocopying relevant articles.

But things are changing. First of all, Web-only publications become part of the mainstream (e.g., the multidisciplinary PLOS ONE or the new PeerJ CompSci journals) with the main content being published using traditional Web technologies like HTML and CSS. And there is much more. Scholarly communication increasingly uses additional media such as video, audio, animated graphics, or very large images, and the trend is to consider these as integral parts of the scientific output. (Mike Bostock’s recent article on visualizing algorithms or the “live” presentation of data in a paper published by F1000 Research are good examples for the new possibilities.) Furthermore, publishing the scientific data sources, like the results of a sociological survey or measurement output of biochemical experiments in XML or CSV formats, alongside the “main” publication, is also coming to the fore, with some journals and institutions actually requiring public access to those. Gaining access to all these various media and contents both online and offline is important for scholars, whether the goal is to read the publication on the Web, or to download the papers for various reasons: reading the article offline, inclusion of the paper into bibliographic management systems like Mendeley or Zotero, or peer-reviewing submissions. Any offline format for scholarly purposes should be adapted to these needs.

Having essentially identical online and offline versions of the same content, including the usage of various media, leads to similar reading experiences whether online or offline (see, e.g., [Sigarchian]). User annotations, formal reviews, etc., performed by the scholar on a small, mobile device while being offline can be automatically synchronized with the online version as soon as there is Internet access. Being based on a general archival format, Portable Web Publications provide an easy way to consistently include video, audio, interactive scripts, any kind of data, and can naturally contain active links to the scientific data published elsewhere on the Web in case the data is too large to be distributed offline. These and other possibilities provided by Portable Web Publications may contribute to fundamentally change the way scholarly publishing works.

3.3 In-house Publishers

A special form of document production is related to technical and/or user documentation of complex products as well as complex administrative documents. Such documents are often akin to STM or scholarly publications edited by traditional trade or scholarly publishers but, often, the sheer quantity and complexity of production, as well as confidentiality requirements, mean that the production are done in-house. In many respects major corporations such as IBM, Intel, Renault, or Boeing, or institutions like the European Commission, the FAO, or the UNESCO have become specialized publishers themselves.

The quantity of documentation makes it infeasible to produce these documents in print (or print-only); instead, publishing them on the public Web, an Intranet, and/or providing them through specialized mobile devices is the viable alternative. The production of these documents has similar challenges to scholarly publications like accessibility issues, portability of annotations, or the possible inclusion of complex media.

Just as for scientific publications, Portable Web Publications will provide new possibilities for these types of documents. Documentation in Portable Web Publications can be used offline in, for example, a cockpit, while being easily updated through the Web when possible. Inclusion of interactive animation, explanations, etc., become easy thanks to the possibilities provided by the Open Web Platform, whether online or offline.

3.4 Reading System developers

Reading system developers will also benefit. It is already true today that, due to the large scale use of the Open Web Platform technologies in publishing formats, reading systems often rely on existing Web browser “cores”. This means that the development of these reading systems already benefit from a level of synergy insofar as they can rely on software developments done elsewhere. Making Portable Web Publications “native” to browsers will mean that an even larger percentage of the necessary software will be available as part of the “core” and developers can concentrate on book-specific issues such as specialized user interfaces or connection to online bookstores.

But the main advantage of Portable Web Publications for reading systems is a vastly larger user base. Whilst, today, reading systems are mainly used to read traditional novels, the introduction of Portable Web Publications will open up new possibilities for, e.g., scholarly and educational use, journals and magazines, governmental usage, etc.

3.5 Web page designers

The synergy between the traditional publishing community and the Web site designers may help in greatly improving the quality of overall Web page design. Indeed, the publishing community has significant experience on issues like ergonomy, complex layout design, paged layout, or user interface problems when consuming, for example, long, elaborate, and mostly linear content. Publishers also have an experience in a proper editorial and curatorial workflow in producing content, which can be easily transposed from traditional publishing to Web site production.

Another aspect of Web page design is its adaption to various environments easily. Creating documents on the Web that could be uncompromisingly displayed both on traditional screens and on mobile devices with varying screen sizes is already a growing trend today; with Portable Web Publications, users will be able to create digitally native documents easily, whether the document is viewed online or offline.

3.6 Web browsers

Generation of an offline version of a Web page (mainly in terms of very long and complex content) is an area where browsers will benefit from Portable Web Publications. Such a facility is important: when roaming charging are often high, or when internet access may be of a low quality or not available at all, users need the possibility to create, in an ad-hoc and easy manner, an offline version of the Web page they are reading. Several browsers offer such facilities already, albeit in mutually incompatible formats. Making Portable Web Publications native to a browser means to standardize an archive format that can be used through a suitable user interface by anyone using a browser. Also, some of the facilities required by reading systems are also extremely useful for “traditional” Web content; annotation facilities are an obvious example. A joint development will therefore provide a welcome addition to the core browser facilities.

It must be emphasized, however, that Portable Web Publications are not meant to create an offline version of any Web page; the emphasis is on Web publications and not to, so to say, duplicate the Web. For example, it is not the goal of Portable Web Publications to store the page of a Web-based email client. The exact boundaries and limitations are specified in a separate section 4. Terminology.

Note that, technically, the inclusion of Portable Web Publication capabilities in browsers is a matter of enhancement rather than addition of completely new capabilities: because Portable Web Publications documents are based on core Web Technologies, the “extras” to make them a native feature of the Web is limited to a smaller set of tasks like handling packages, dealing with features like reading order, dynamic pagination, and displaying tables of contents. An important goal of Portable Web Publications will be to further streamline these tasks, all while making sure it is feasible to include Portable Web Publications content handling even in mobile environments, where the computing and memory limits are more demanding.

3.7 Libraries and archival services

The archiving of digital assets is coming to the fore as a significant issue for dedicated institutions like national libraries. With the arrival of highly dynamic and possibly interactive Web Publications as primary content, the traditional means of archiving (i.e., storing an XML or HTML page on some backup device for long term preservation) is no longer adequate. Web Publications depend on a multitude of auxiliary files, like CSS style sheets, images, videos, javascript programs, etc. The completeness of a Portable Web Publication has a significant role to play in this respect: combined with archiving it provides means to store the content offline, making it appropriate for archival purposes.

3.8 Users

Users will benefit, arguably the most, from a convergence of efforts between Portable Web Publications and other uses of Web technologies. Users will have the choice among different reading systems for the same content, ranging from specialized devices to traditional Web Browsers. Beyond the overall qualities of the reading environment the choice can also be made based on the content and usage: whereas a specialized device would work well for reading a novel on the beach, a Web browser or a high-end tablet may be preferred to consume highly interactive educational content in a class room. Publishers do not have to make this decision: users can do that. The same content can also smoothly migrate from one device or system to another, possibly carrying notes, annotations, but also the possibility to fill interactive Web forms offline (and “pushing” the results to its destination when on line again). Features for people with disabilities will also be provided consistently, whether the content is a portable document or a Web page.

4. Terminology

This document is based on the following definitions.

4.1 Portable Web Publications

Note

See the separate section 5.4 Addressing and Identification for further details on addressing Web Publications.

Web Resources in a (Portable) Web Publication are based on the core Open Web Platform technologies like [html5], [svg], [css21] and CSS3 modules, or [ECMAScript] API-s. A Web Publication may also include images, audio and video, metadata files, executable code in, for example, iPython or Maple scripts, or datafiles in [CSV]: Web Resources that are needed to render the essential content of the publication.

The differences between the distinguishing characteristics of Web Publications and Portable Web Publications can be viewed as situational and gradual rather than as representative of bright-line distinctions. Consider the following example:

Note

The concepts of content, essential content, and functionality have been taken over from the W3C Web Content Accessibility Guidelines[WCAG20], thought slightly modified for this context.

The reference to the File URI Scheme[file-uri-scheme] may be removed in future.

4.2 States of a Portable Web Publication

The states of a Portable Web Publication can be separated into two different axes. These different states require a different behavior from the user agent, while some of the characteristics of the publication may be invariant across states. The different states are as follows:

  1. States related to the organization of the Web Resources: A Portable Web Publication is in
    • Packed State: when all constituents Web Resources are combined into one unit for storage in a file system, network transfer, etc.
    • Unpacked State: when all constituent Web Resources can be directly accessed individually through standard network protocols like HTTP, FTP, etc., or through file system access
  2. States related to the access of Web Resources: A Portable Web Publication is in

The table below shows the same publication (PWP) in the most typical states:

Protocol File
Packed PWP as one archive file on a Web Server PWP as one archive file on a local disc
Unpacked PWP spread over several files on a Web Server PWP spread over several files on a local disc
Note

The difference between protocol and file states is not identical, although close, to the difference between the commonly used notions of “online” vs. “offline”. A PWP can be on a local disc but accessed through a Web Server running on that machine (i.e., through a http://localhost URL): the PWP is in a protocol state, though clearly “offline”. Similarly, a remote file system can be mounted as a local disc, in which case a PWP can be accessed as a file, i.e., is in a file state, though possibly “online”.

5. Achieving convergence: work areas

This section lists some of the work areas that activities around Portable Web Publications should engage in. The list is not exhaustive and there are only hints at the technical solutions; one of the main goals of the work ahead will be to clarify the requirements and technical details. It must be emphasized that the solutions to these problems may not come exclusively from W3C, but possibly from other, external organizations (document identification is a typical example).

5.1 General Architecture for Portable Web Publications

The latest evolution of browser technologies around Web Workers[web-workers] and Service Workers[service-workers] may fundamentally change the way browsers operate in terms of offline/online. Service Workers will provide a flexible and programmable way to efficiently implement local caching of Web Resources. Caching is implemented as a programmable network proxy, meaning that the browser’s rendering engine becomes oblivious to whether a resource originates from the local cache or directly from the network.

This evolution makes it possible to bring Portable Web Publications under the same abstraction as a Web Resource identified by a URL. A specialized service worker can cache resources, i.e., can bridge the differences between the offline and online access of a document. Furthermore, by adding additional functionalities to the service worker, it can also deal with possible (un)packaging as well as bridging the differences between protocol and file states. In other words, the core rendering engine of a user agent can operate as if the Portable Web Publication was in unpacked and protocol states, with all resources available; the differences are handled by the service worker acting as a proxy.

A laptop computer gaining access to the unpackaged publication from a web browser connecting to a Server with HTTP requests.
Fig. 2 Accessing a Portable Web Publication in an unpacked state directly from the Web. (Picture is available directly SVG or PNG formats.)
A laptop computer gaining access to the packaged publication from a web browser connecting to a Server via service workers.
Fig. 3 Accessing a Portable Web Publication in packed state from the Web via a Service Worker. (Picture is available directly SVG or PNG formats.)
Note

This architecture needs some special care when handling a Portable Web Publication on a local file system. This is because Service Workers operate using the HTTPS (or HTTP for localhost) protocol only, i.e., the file:/// protocol is not understood. A possible approach, already implemented in some experimental code, is to start up a local HTTP server serving the resources within the Web Publication. See also Issue 11.

5.2 Archive formats

Regardless of the details of the practical architecture realizing Portable Web Publications, an archive format is necessary for the storage of a publication as one file (e.g., for distribution or possibly archiving) defined as the packed state of a Portable Web Publication. A variety of formats for offline/archival storage of collections of digital resources exist today [OCF], [ODF], [OOXML], but none of them is universally recognized and supported across all ecosystems. Depending on the general architecture, Portable Web Publications may use one of the deployed formats (e.g., the current EPUB packaging format based on [OCF]), or an archive format that is generic and native to the Open Web Platform.

W3C’s Web Platform Working Group has published a Working Draft for a Streamable Package Format for the Web[web-packaging] to encompass the needs of various applications (like installing Web Applications or downloading data for local processing). It is not clear at this moment whether browsers will adopt this format, though.

However, the importance of streaming is not paramount for Portable Web Publications. Indeed, the same publication may be accessed by the same user from different clients; if some user-dependent management also keeps track of the latest reading position in the publication, switching from one client to the other may mean that a client would have to “jump” into the content, thereby bypassing streaming. Nevertheless, if browsers, eventually, do converge towards a browser and streaming friendly packaging format, adopting it for Portable Web Publications may become a real alternative. The community will have to balance native browser availability against the the wide availibility of tooling and industry distribution based on [OCF].

Note

The IETF has published an informational draft on a top-level media type for archives. Although that draft does not specify a specific archive format, and the work is currently on hold, it shows the overall interest in packaging on the Web in line with the concerns of Portable Web Publications.

5.3 Publication Manifests

The manifest of Portable Web Publications is a Web Resource that includes information pertaining to the overall publication structure, such as the default logical reading order(s) of the set of resources that comprise the publication (the “spine”), as well as predictable user-facing meta-structures, such as one or several tables of contents, glossaries, etc. As described in the separate section 5.4 Addressing and Identification that overall structural information may be what a client would return on a request for a Portable Web Publication instance, as an abstraction for a collection of Web Resources. For Portable Web Publications it may be imperative to optimize these data structures in a way that is native to the Open Web Platform and more easily supported in authoring tools, browsers, and Reading Systems.

Note

The Manifest for web application format[web-manifest], currently developed at W3C, is one example of a technology that could potentially be used, or adopted, to achieve this goal. Note, however, that the [web-manifest] specification is (currently) geared towards Web Applications, which may not make it suitable for direct adoption for Portable Web Publications; hence the possible needs of adoption rather than using it directly.

Note

Whilst these information objects are important for larger and/or more complex publications, it is unnecessary for other use cases of Portable Web Publications. A typical example is the archival of a single Web page with all the necessary style sheets, images, and similar resources. The definition for Portable Web Publications should therefore include the definition of a set of “defaults”, i.e., it should not require the presence of, say, a spine if the publication contains one single HTML file.

5.4 Addressing and Identification

HTTP(S) URLs serve as the fundamental method for addressing a resource, or a fragment thereof, on the Web. Such URLs can also be used to uniquely identify a resource; however, conceptually, the role of addressing and identification are different. Both of these functionalities should be available for Portable Web Publications: a publication should be uniquely identified for, e.g., library catalogues or archival, and a resource locator should be available so that the user could access the content. In other words, a Web Publication (whether portable or not) SHOULD have both one or even possibly several identifier(s) and one locator. These may be be identical but may also be different: e.g., an identifier may refer to a specific publication by a publisher (e.g., using an ISBN), whereas the locator may refer to a personal copy of that publication that the owner can freely annotate for personal use.

Note

A typical, and extremely important use case for the presense of an identifier beyond the need for a locator is in academic and scholarly publishing. There are currently several methods for citing online works, but there is no equivalent standard method for citations to ebooks. Even if a reflowable ebook is by a scholar, the author must refer to PDF, paper copy, or HTML version to cite it in her bibliography. Identifiers attached to Portable Web Publications should enable stable citations.

Note

A general [URI] (which includes the notion of [URL]) can serve as an identifier using, e.g., the [ISBN-URN] or [UUID] schemes. In other words, an identifier does not necessarily resolves to a location on the Web, although it is a good practice to have a dereferencable identifier.

There is no ubiquitously accepted method for identifying a publication among the various document formats (whether electronic or printed). Within the scholarly publishing industry, for example, initiatives such as DOI and CROSSREF have addressed this problem, whereas traditional “trade” publishing rely more on ISBN related services. Some of these identifier schemes provide resolver services or a “standard” representation in term of [URL]. The definition of Portable Web Publication should be oblivious to the exact identification used; this issue is left to specialized services and industry organizations.

Do we need a more general form of identification to represent scopes?

For the purpose of the general architecture of Portable Web Publications this document concentrates on locators; as far as identifiers are concerned, the only important point is that such identifier SHOULD also exist and made available, and that they should be stable across, for example, copying the publication or changing its location on the Web. Also, based on the discussion in 5.1 General Architecture for Portable Web Publications, only the unpacked and protocol states of Portable Web Publications are discussed.

There are a number of different aspects of addressing that should be specified for a Portable Web Publication. Some of these are follows:

5.4.1 What should be the response of an HTTP(S) GET request?

Conceptually, it should be the manifest of the publication which, in fact, “represents” the Web Publication as a whole. In practice, this means providing access to the corresponding manifest file through either:

  • returning the manifest itself; or
  • returning another content (e.g., the first chapter in the document) but providing a link to the manifest through a LINK: in the HTTP response header; or
  • returning and HTML content (e.g., the first chapter in the document) but providing a link to the manifest through a <link> element.

These possibilities are not mutually exclusive.

5.4.2 What is the URL for one of the resources within a Portable Web Publication?

I.e.: if the http://www.ex.org/doc.pwp locator refers to the Portable Web Publication as a whole, what is the URL for one of its constituent resources, e.g., chapter1.html?

The simplest approach is to consider the URL of the Web Publication as the base for all relative URL-s, meaning that the location of the constituents are of the form: http://www.ex.org/doc.pwp/chapter1.html where http://www.ex.org/ is the URL of the the Web Publication itself. In other words, the locator of the Web Publication sets the “context”.

This simple approach works well if the Web Publication is created from bottom up; however, it does rely on a particular organization of the resources. The scheme does not work if the Web Publication (which is defined to be a set of Web Resources) includes resources at many different places on the Web. This issue may be mitigated by adding mappings (“virtual redirections”) of URLs in the manifest of the Web Publication.

Note

When using the virtual redirections a particular constituent Web Resource may end up having two URL-s: one is its intrinsic URL as a resource on the Web, the other via the URL of the Web Publication.

Note

Another alternative that is used in practice is to use an explicit separator between the URL for the publication and the rest, yielding, e.g., http://www.ex.org/doc.pwp!chapter1.html. It is not clear what advantages this approach would have.

5.4.3 How does one locate the internal content of a particular constituent resource?

The definition of Web Publications should not include any new mechanism at this point; they should rather entirely rely on fragment identifiers as defined and widely used on the Web.

Note

As examples for the power of fragment identifiers, the W3C Annotation Working Group has a joint deliverable with the W3C Web Platform Working Group called [FindText]. Although the specification aims at a programmable API, it may also provide a general framework of identifying specific textual content within, e.g., and HTML file through the definion of a fragment identifier. Similarly, the W3C Media Fragments specification[media-frags] may prove useful to address some of the use cases.

5.5 Metadata: discovery

Throughout the digital publishing industry, highly specialized metadata vocabularies, and serialization forms thereof, are being used. Within book trade publishing as an example, ONIX[ONIX] has attained a dominant status as a metadata package that typically exists (in XML form) independently of the publication, and contains not only bibliographic metadata, but also trade information such as pricing. Scholarly publishing, on the other hand, often uses various derivatives of the ubiquitous BibTeX vocabulary.

While not contradicting the obvious use cases for out-of-line metadata records as used by publishers, retailers and libraries, Portable Web Publications must define a syntax for basic in-line metadata records that is agnostic to the publication’s states. This means that the syntax must seamlessly support discovery and harvesting by both generic Web search engines, as well as dedicated bibliographic/archival/retailer systems. While it is expected that Portable Web Publications will define a minimal set of required metadata (cf. the section 5.4 Addressing and Identification), development and adoption of other vocabularies in Portable Web Publications will most likely be deemed as out of scope. In other words, domain-specific metadata requirements are up to the domains themselves to define via a profiling mechanism, or similar yet-to-be-defined means.

Note

The adoption of HTML as the vehicle for expressing publication-level metadata (i.e., using RDFa[html-rdfa] and/or Microdata[microdata] for metadata, like authors or title) would have the added benefits of better I18N support than XML or JSON formats.

5.6 Styling and Layout, Pagination

As outlined in [dpub-latinreq] or [dpub-css-priorities], the Open Web Platform in general, and CSS in particular, is still lacking solutions for meeting all of the publishers’ expectations on satisfactory typography and layout for digital publications. While improved presentation fidelity will be of paramount importance to the overall success and adoption rate of Portable Web Publications, it is clear that many of these issues are going to be addressed on a case-by-case basis by the CSS Working Group over a longer period of time. STM publishing, for example, where the faithful representation and rendering of, say, mathematical or chemical formulae is of a paramount importance, has particularly severe requirements that must be fulfilled by the Open Web Platform technologies. Similarly, dynamic pagination of reflowable content is not natively supported by browsers today, and as a result Reading System developers, for example, are forced to implement pagination using various ad-hoc approaches, all coming with a significant penalty in terms of development costs, performance and stability.

It is anticipated that native support for pagination (in CSS and/or in the DOM) is going to be put forward by stakeholders as a critical component of Portable Web Publications; thus the finalization of Portable Web Publications may be contingent on the availability of a native pagination model for Web content.

Note

Note that the “Houdini” Task Force, recently started jointly by the W3C CSS WG and the W3C TAG, may open new avenues to handle pagination.

5.7 Security Models

The security model of the Web, based primarily on the same-origin policy and the concept of “site”, does not apply to portable documents, as the notion of “origin” is based on HTTP properties that are invalidated/non-existent when a document transitions from its online state to the portable state. Portable Web Publications must incorporate a state agnostic security model that defines rules for both the online and portable states.

5.8 Presentation Control and Personalization

When reading long-form (and sometimes mission-critical) publications, personalization—i.e., the ability for users to adapt the presentation to suit their needs—is of a great importance. While technologies such as CSS Media Queries have come a long way in terms of adapting content to devices, this is not the same thing as adapting to a user. Presentation control features are often available in e-book readers of different kinds, for example the possibility to dynamically change font size or background/foreground color schemes, but implementations are brittle and limited due to the lack of an underlying framework that explicitly supports user adaptation.

Portable Web Publications needs to incorporate an explicit framework for achieving advanced and predictable user-triggered presentation control. (Note that from this perspective, accessibility can be seen just a radical case of personalization.)

5.9 Models for embracing domain-specific restrictions and extensions

Different domains of digital publishing have vastly different expectations and/or requirements on the nature of the content and their presentation. In the digital comics domain for example, the default presentation form is, traditionally at least, pre-paginated, fixed-form, and image-based, possibly with a set of omnipresent (i.e., cross-publisher) user interaction patterns that are expected to be enabled. On the other hand, for trade publishing the default form is fully reflowable content, where user interaction patterns are defined entirely by the user agent. In educational publishing, the ability to control structure, to include rich domain-specific structural semantics and extensive specialized metadata, are at the basis for enhanced reading system behaviors, as well as predictable content discovery and repurposing.

To allow for the predictability of content within those domains that need it, Portable Web Publications may need to incorporate a notion of “profiles” that content can be authored and validated against, and that user agent implementations can use to trigger enhanced behaviors, if any. To allow for agile feature-set extensions and innovation, Portable Web Publications profiles also needs to embrace the notion of “feature addons” that can be included by a publisher without risking to invalidate the integrity and functionality of the basic publication.

6. Portable Web Publications and EPUB 3

The development of the definitions of Portable Web Publications should not be made in isolation, given the diverse publishing ecosystems that already exist. Rather, the development should rely as much as possible on existing and deployed formats, which include both the various Open Web Platform technologies and digital publishing formats.

Several document formats exist in the digital publishing domain; however, the only vendor independent and HTML based format is EPUB 3[EPUB3], which emphasizes a dynamic determination of content presentation and a closer alignment with the Open Web Platform. EPUB 3 is built on Web Standards, and the individual items that make up an EPUB publication are identical to types of content on a Web site: [html5], [svg], [css21], [ECMAScript], [JPEG] and [png] images, etc. Various browser extensions supporting EPUB exist (e.g., Readium in Chrome, EPUBReader in Firefox). Other solutions exist for delivering these files in browsers (Readium-Cloud, EPUB.js, Safari Books Online, etc.). Publishers are actively exploring the new and possibly interactive possibilities offered by EPUB 3; a good example is Cay Horstmann’s “Big Java, Late Objects”[BigJava] that combines the feel of a traditional book with interactive learning materials that makes it reminiscent of similar, Web based tutorials (see the video on a companion page for the book)

Whilst the concept of Portable Web Publication is close to, and has been inspired by, EPUB 3, it goes beyond it, insofar as it emphasizes the need for a convergence between the offline and online usages. It would be highly desirable to deliver on the requirements on Portable Web Publications in an evolutionary manner that would build on, and would be backwards compatible with, existing EPUB 3, since the latter is already widely deployed. However, this may not be possible. At this stage, this document emphasizes all requirements envisioned for Portable Web Publications without addressing the natural tension between goals of preserving compatibility and fully achieving all these requirements in the most elegant manner. But neither this end-goal emphasis, nor the use of the new term “Portable Web Publication”, should be taken as implying a recommendation to definitely create a completely new format that would replace EPUB. Further investigation is required, and the ongoing evolutionary trajectory of EPUB 3 must also be taken into account.

Note

The current evolution of EPUB 3 towards EPUB 3.1 will address several compatibility and convergence issues with the Open Web Platform; this will make the evolution path towards Portable Web Publications easier.

It must also be emphasized that the central part of any EPUB 3 publication, namely the content, will remain unchanged or will require only minimal changes when transitioning towards Portable Web Publications. The bulk of the changes are expected to occur around the accompanying constructs like publication-level metadata records, the spine, or the packaging format of the content. As described in the section 4.1 Portable Web Publications, the content of a Portable Web Publication is based on core Open Web Platform technologies including [html5], [svg], or [css21], and other types of files like images, audio and video, metadata files, or executable code. Most of these are valid contents in EPUB 3 already; a transition towards Portable Web Publications will leave these resources and their usage intact. The envisaged changes will be mostly restricted to the implementation details of reading systems and production workflows. The evolution of the past few years of online tooling for the production of EPUB content based on the Open Web Platform (e.g., the platforms developed and used by companies like O’Reilly, Hachette, Metrodigi, or Inkling) will also greatly facilitate any transition to Portable Web Publications; adapting these tools is expected to be quite straightforward.

7. Conclusions

This document outlines a vision for the convergence between the Open Web Platform and portable documents while also significantly advancing and expanding the existing digital publishing ecosystem. The realization of this vision would require a strong cooperation between the traditional publishing and Web communities, based on a close collaboration between the W3C and other relevant organizations, like IDPF, EDItEUR, BISG, or others. While it is envisaged that most of the work could be done in one or more dedicated Working Groups (within W3C or elsewhere, depending on the exact charter), it must be emphasized that many of the features will affect and will be affected by work done elsewhere, within or outside these organizations. The starting point will be to explore and plan for the detailed technical challenges to gain a better insight into the work ahead; this exploration should be done together with the various interested communities.

A. References

A.1 Informative references

[BigJava]
Cay Horstmann. Big Java Late Objects, John Wiley & Sons, Inc., 2013
[CSV]
Yakov Shafranovich. Common Format and MIME Type for Comma-Separated Values (CSV) Files. Informational RFC. URL: http://tools.ietf.org/html/rfc4180
[ECMAScript]
ECMAScript Language Specification. URL: https://tc39.github.io/ecma262/
[EPUB3]
Garth Conboy; Matt Garrish; Markus Gylling; William McCoy; Murata Makoto; Daniel Weck. EPUB 3 Overview. Recommended Specification. URL: http://www.idpf.org/epub/301/spec/epub-overview-20140626.html
[FindText]
Doug Schepers. FindText API. 15 October 2015. W3C Working Draft. URL: http://www.w3.org/TR/findtext/
[ISBN-URN]
J. Hakala; H. Walravens. Using International Standard Book Numbers as Uniform Resource Names. Informational RFC. URL: http://tools.ietf.org/html/rfc3187
[JPEG]
Eric Hamilton. JPEG File Interchange Format. September 1992. URL: http://www.w3.org/Graphics/JPEG/jfif3.pdf
[OCF]
James Pritchett; Markus Gylling. EPUB Open Container Format (OCF) 3.0. Recommended Specification. URL: http://www.idpf.org/epub/301/spec/epub-ocf-20140626.html
[ODF]
Michael Brauer; Patrick Durusau; Gary Edwards; David Faure; Tom Magliery; Daniel Vogelheim. Open Document Format for Office Applications v1.0. Oasis Standard. URL: https://www.oasis-open.org/committees/download.php/12572/OpenDocument-v1.0-os.pdf
[ONIX]
EDItEUR. ONIX for Books 3.0.2 Specification.
[OOXML]
ECMA International. Office Open XML File Formats, ECMA-376. Standard ECMA 376. URL: http://www.ecma-international.org/publications/standards/Ecma-376.htm
[PDF]
Document management — Portable document format — Part 1: PDF. ISO.
[Sigarchian]
Hajar Ghaem Sigarchian et al. EPUB3 for Integrated and Customizable Representation of a Scientific Publication and its Associated Resources. In: Proceedings of the 4th Workshop on Linked Science 2014. URL: http://linkedscience.org/wp-content/uploads/2014/10/lisc2014_submission_3.pdf.
[URI]
T. Berners-Lee; R. Fielding; L. Masinter. Uniform Resource Identifier (URI): Generic Syntax. January 2005. Internet Standard. URL: https://tools.ietf.org/html/rfc3986
[URL]
Anne van Kesteren; Sam Ruby. URL. 9 December 2014. W3C Working Draft. URL: http://www.w3.org/TR/url-1/
[UUID]
P. Leach; M. Mealling; R. Salz. A Universally Unique IDentifier (UUID) URN Namespace. Informational RFC. URL: http://tools.ietf.org/html/rfc4122
[WCAG20]
Ben Caldwell; Michael Cooper; Loretta Guarino Reid; Gregg Vanderheiden et al. Web Content Accessibility Guidelines (WCAG) 2.0. 11 December 2008. W3C Recommendation. URL: http://www.w3.org/TR/WCAG20/
[css21]
Bert Bos; Tantek Çelik; Ian Hickson; Håkon Wium Lie et al. Cascading Style Sheets Level 2 Revision 1 (CSS 2.1) Specification. 7 June 2011. W3C Recommendation. URL: http://www.w3.org/TR/CSS2
[dpub-css-priorities]
Dave Cramer. Priorities for CSS from the Digital Publishing Interest Group. 20 August 2015. W3C Working Draft. URL: http://www.w3.org/TR/dpub-css-priorities/
[dpub-latinreq]
Dave Cramer. Requirements for Latin Text Layout and Pagination. 30 September 2014. W3C Working Draft. URL: http://www.w3.org/TR/dpub-latinreq/
[file-uri-scheme]
M. Kervin. The file URI Scheme. Internet Draft. URL: https://tools.ietf.org/html/draft-ietf-appsawg-file-scheme-03
[html-rdfa]
Manu Sporny. HTML+RDFa 1.1 - Second Edition. 17 March 2015. W3C Recommendation. URL: http://www.w3.org/TR/html-rdfa/
[html5]
Ian Hickson; Robin Berjon; Steve Faulkner; Travis Leithead; Erika Doyle Navara; Edward O'Connor; Silvia Pfeiffer. HTML5. 28 October 2014. W3C Recommendation. URL: http://www.w3.org/TR/html5/
[media-frags]
Raphaël Troncy; Erik Mannens; Silvia Pfeiffer; Davy Van Deursen. Media Fragments URI 1.0 (basic). 25 September 2012. W3C Recommendation. URL: http://www.w3.org/TR/media-frags/
[microdata]
Ian Hickson. HTML Microdata. 29 October 2013. W3C Note. URL: http://www.w3.org/TR/microdata/
[png]
Tom Lane. Portable Network Graphics (PNG) Specification (Second Edition). 10 November 2003. W3C Recommendation. URL: http://www.w3.org/TR/PNG
[service-workers]
Alex Russell; Jungkee Song; Jake Archibald. Service Workers. 25 June 2015. W3C Working Draft. URL: http://www.w3.org/TR/service-workers/
[svg]
Jon Ferraiolo. Scalable Vector Graphics (SVG) 1.0 Specification. 4 September 2001. W3C Recommendation. URL: http://www.w3.org/TR/SVG/
[web-manifest]
Manifest for web application. W3C Editor's Draft. URL: https://w3c.github.io/manifest/
[web-packaging]
Jeni Tennison. Packaging on the Web. W3C Working Draft. URL: http://www.w3.org/TR/web-packaging/
[web-workers]
Ian Hickson. Web Workers. W3C Editor's Draft. URL: https://w3c.github.io/workers/