An opinionated guide to digital publishing specifications (guest blog)

(This is a reproduction, with permission, of a blog published by Liza Daly, published on the Safari’s blog, on the 22nd of January.)

The World Wide Web Consortium (W3C) is a standards organization serving the “open web” — the set of freely available specifications that underpin most of the visible internet. In the years since the W3C was founded, all modern businesses have become “web” businesses, with their own industry-specific processes, jargon, and priorities. To that end, the W3C has formed interest groups for those industries which are adjacent to the web, with a goal to promote web technologies and ensure that the web is meeting common commercial needs.

I was co-chair for the Digital Publishing Interest Group for a time, and I have first-hand exposure to their work in interviewing publishers, documenting best practices, and writing recommendations for future specifications.

Screen shot of the first table of the DPUB specification review

One of those deliverables is an intimidating table of W3C specifications and standards that were considered relevant to digital publishing. There’s a lot to digest there, and it’s unlikely that any single human is deeply familiar with all of it. I’ve provided an opinionated gloss of the most relevant or active standards, and feel free to comment if I’ve disparaged or ignored your favorite specification.


The audience

I’m assuming that the reader is one of the following:

  • A developer who is working in digital publishing
  • A curious non-developer who isn’t afraid of the word “normative” and acronyms that begin with ‘X’
  • A standards wonk who wants to be more familiar with publishing activity

These are the “bread and butter” of digital publishing — whether it’s commercial ebooks, academic publishing, or journals:


HTML5 is a monster of a spec, but at least it’s reflective of current browser support. You should be familiar with the basics of markup, as well as the sections on browsers and common APIs.


There’s the workhorse CSS 2.1 specification which has been around for a decade. Unfortunately for the curious but lazy, all the cool new stuff is in CSS3, and that spec is broken out into many modules. Here’s a drive-by of the most interesting or publishing-relevant ones:

  • Start with Dave Cramer’s highly readable Requirements for Latin Text Layout and Pagination (“Latin” here means Western languages, not veni, vidi, vici). Note that this is a requirements document, not a spec, which means much of what Dave recommends won’t actually work anywhere yet. Welcome to standards!
  • CSS Text Module Level 3 is the “real world” equivalent to the above. Though it’s technical a spec in-progress, most everything in here is available in modern browsers and reading systems.
  • CSS Regions Module Level 1 is a good read when you want to be angry about something. Regions can do some amazing things for advanced layout, but there’s a long and sordid history behind their implementation and deployment. There’s a lot of momentum behind getting Regions or an equivalent standard moving again, so there’s hope.

Extra credit assignments: CSS Media Queries and CSS Fonts Module Level 3. And while it’s unlikely that you’d need to actually read the SVG and MathML specs, it’s important to be familiar with those formats at a high level.


The simplest way to approach accessible web or ebook content is to study the semantics that are built in to HTML5. High-quality semantic markup will not only help a range of human users, it’ll aid in discovery and ranking by search engines.

Follow that up with the non-technical best practices in Web Content Accessibility Guidelines, and this overview of creating accessible interactive content.


It’s not dead yet! There’s a lot of cruft in the list, but ebooks are still required to be well-formed XML documents, and academic publishing remains dominated by XML (and, sigh, PDF).

Bleeding edge

If everything above is old hat, check out the emerging specs on the Shadow DOMCSS Flexible Box Layout Module Level 1 (flexbox), and Packaging on the Web.

New W3C Recommendation: Indexed Database API

The W3C Web Applications Working Group has published a W3C Recommendation of Indexed Database API. This document defines APIs for a database of records holding simple values and hierarchical objects. Each record consists of a key and some value. Moreover, the database maintains indexes over records it stores. An application developer directly uses an API to locate records either by their key or by using an index. A query language can be layered on this API. An indexed database can be implemented using a persistent B-tree data structure.

DPUB IG Metadata Task Force Report Published

The Digital Publishing Interest Group has published a Group Note of DPUB IG Metadata Task Force Report. The Metadata Task Force of the DPUB IG found, through extensive interviews with representatives of various sectors and roles within the publishing ecosystem, that there are numerous pain points for publishers with regard to metadata but that these pain points are largely not due to deficiencies in the Open Web Platform. Instead, there is a widespread lack of understanding or implementation of the technologies that the OWP already makes available for addressing most of the issues raised. However, some of the very technologies that are little used or understood in most sectors of publishing are widely used and understood in certain other sectors (e.g., scientific publishing, libraries). Priorities that have emerged are the need for better understanding of the importance of expressing identifiers as URIs; the need for much more widespread use of RDF and its various serializations throughout the publishing ecosystem; and the need to develop a truly interoperable, cross-sector specification for the conveyance of rights metadata (while remaining agnostic as to the sector-specific vocabularies for the expression of rights). This Note documents in detail the issues that were raised; provides examples of available RDF educational resources at various levels, from the very technical to non-technical and introductory; and lists important identifiers used in the publishing ecosystem, documenting which of them are expressed as URIs, and in what sectors and contexts. It recommends that while little new technology is called for, the W3C is in a unique position to bridge today’s currently siloed metadata practices to help facilitate truly cross-sector exchange of interoperable metadata. This Note is thus intended to provide background and a context in which concrete work, whether by this Task Force or elsewhere within the W3C, may be undertaken.

Program Announced for EDUPUB Summit and Workshop (Feb 26-27)

The next gathering of the EDUPUB community will take place in Phoenix, Arizona (USA) on February 26 and 27, 2015. A preliminary program is now available,  and registration is open.  The summit will launch the implementation phase of EDUPUB, a cross-organizational initiative to develop  a comprehensive open platform for next-generation learning content based on EPUB 3, IMS standards for learning environment integration, and the overall Open Web Platform.

First Public Working Draft: Web Annotation Data Model

The Web Annotation Working Group has published a First Public Working Draft of Web Annotation Data Model. Annotations are typically used to convey information about a resource or associations between resources. Simple examples include a comment or tag on a single web page or image, or a blog post about a news article. The Web Annotation Data Model specification describes a structured model and format to enable annotations to be shared and reused across different hardware and software platforms.

Digital Publishing Annotation Use Cases

The Digital Publishing Interest Group has published a Group Note of Digital Publishing Annotation Use Cases. This document describes the set of use cases generated for Annotation and Social Reading within the W3C Digital Publishing Interest Group, in coordination with the Open Annotation Community Group. This Note will also serve as an input for the W3C Web Annotation Working Group

HTML5 is a recommendation

HTML5 The HTML Working Group published HTML5 as W3C Recommendation. This specification defines the fifth major revision of the Hypertext Markup Language (HTML), the format used to build Web pages and applications, and the cornerstone of the Open Web Platform.

Today we think nothing of watching video and audio natively in the browser, and nothing of running a browser on a phone,” said Tim Berners-Lee, W3C Director. “We expect to be able to share photos, shop, read the news, and look up information anywhere, on any device. Though they remain invisible to most users, HTML5 and the Open Web Platform are driving these growing user expectations.

HTML5 brings to the Web video and audio tracks without needing plugins; programmatic access to a resolution-dependent bitmap canvas, which is useful for rendering graphs, game graphics, or other visual images on the fly; native support for scalable vector graphics (SVG) and math (MathML); annotations important for East Asian typography (Ruby); features to enable accessibility of rich applications; and much more.

The HTML5 test suite, which includes over 100,000 tests and continues to grow, is strengthening browser interoperability. Learn more about the Test the Web Forward community effort.

With today’s publication of the Recommendation, software implementers benefit from Royalty-Free licensing commitments from over sixty companies under W3C’s Patent Policy. Enabling implementers to use Web technology without payment of royalties is critical to making the Web a platform for innovation.

Read the Press Release, testimonials from W3C Members, and acknowledgments. For news on what’s next after HTML5, see W3C CEO Jeff Jaffe’s blog post: Application Foundations for the Open Web Platform. We also invite you to check out our video Web standards for the future.

New Draft for the “Requirements for Latin Text Layout and Pagination” Published

The Digital Publishing Interest Group has published a new Working Draft of “Requirements for Latin Text Layout and Pagination”. This document describes requirements for pagination and layout of books in latin languages, based on the tradition of print book design and composition. It is hoped that these principles can inform the pagination of digital content as well, and serve as a reference for the CSS Working Group and other interested parties.

Updated Understanding Web Content Accessibility Guidelines (WCAG) 2.0 and Techniques for WCAG 2.0

Web Content Accessibility Guidelines Working Group today published updates of two Notes that accompany WCAG 2.0: Understanding WCAG 2.0 and Techniques for WCAG 2.0. (This is not an update to WCAG 2.0, which is a stable document.) For information on these updates, please see the Understanding WCAG and WCAG Techniques Updated September 2014 e-mail.

HTML5 Proposed Recommendation Published

The HTML Working Group has published a Proposed Recommendation of HTML5. This specification defines the 5th major revision of the core language of the World Wide Web: the Hypertext Markup Language (HTML). In this version, new features are introduced to help Web application authors, new elements are introduced based on research into prevailing authoring practices, and special attention has been given to defining clear conformance criteria for user agents in an effort to improve interoperability. Comments are welcome through 14 October.