Digital Publishing IG organizing itself: task forces

As a means to organize its work, the Digital Publishing Interest group has defined a set of Task Forces. Each of these Task Forces represents a specific technical area of work, will produce separate documents, and will have parallel discussions. The task forces are as follows.

Latinreq, i.e., Requirement for Latin Text Layout and Pagination
The task force will produce two documents. One will be a general requirement document for Latin text layout, pagination, and typesetting. This document will be patterned after, for example, the “Requirements for Japanese Text Layout” document that W3C has published a while ago, but concentrating on Latin-based languages. Another document will describe how these general requirements map on specific requirements for CSS 3 or CSS 4.
Page DOM
This task force will concentrate on the issue of representing the concept of a page in an XML or HTML DOM: are the current facilities enough, or is there a need for an extension for the purpose of paged based media publishing?
This task force will look at the metadata vocabulary and identification landscape as used by the publishing industry, and will identify some of the missing features, possible mappings, etc., that the industry needs. The task force will also have to answer the question whether an additional work in this area is necessary and, if yes, whether W3C is the right organization to pursue the work or not.
Behavioral Adaption
Digital publications need to identify the role of specific HTML structural element in a publication beyond what the core HTML tag provide. For example, certain elements should be marked up as being candidates for an index to be generated for the book. This task force will consider the various challenges on how to do this in HTML, and whether extensions to HTML are necessary or not (e.g., by introducing a new set of attributes).
One of the main challenges in, for example, reading systems for educational publications is the ability to annotate the document in a portable manner. Although some of these issues have been dealt with in the W3C Open Annotation Community Group, how to use the general approach in terms of the Open Web Platform as used in Digital Publishing may raise further technical challenges and missing features that this task force will identify.
STEM (i.e, Scientific, Technical, Engineering, and Mathematical Publishing)
This category of digital publishing raises a number of particular issues, e.g., in terms of technical illustrations, interactivity, or usage of mathematical formulas. This Task Force will consider these issues in light of what the Open Web Platform provides, and identify possibly missing features.
Security, Privacy
The latest generation of Digital Publishing standards, like EPUB3, introduce the possibility of (albeit limited) scripting, thereby providing interactivity, connection to outside services, etc. However, these new facilities may lead to specific of security and privacy challenges (e.g., what happens to the information gleaned from the user’s reading habits); these new issues may also lead to new requirements in terms of using the Open Web Platform.
Accessibility has always been at the core of Digital Publishing, e.g., with the ability to produce books in Braille, or in forms of Audio Books. However, these possibilities lead to new challenges that may not be properly reflected in the various OWP technologies, and/or the Accessibility guidelines published by the W3C.
Bridging Offline and Online
This task force is a little bit different from the others: instead of looking at the requirements of current Digital Publishing technologies, it rather looks at some longer term issues on how Digital Publishing and the Open Web Platform would align further in future.

The work have already begun in the various task forces; the goal from now on is that the weekly teleconferences would concentrate on the work of one or two specific task forces to synchronize with the group as a whole. The minutes of those meetings are public. The separate page on the Task Forces on the Group’s Wiki also links to the various use cases in each category that have been collected in the past few weeks.

TPAC Update on Accessibility in Digital Publishing

In October of this year, Benetech joined the World Wide Web Consortium to more deeply participate in the evolution of standards that enable educational content to be born accessible. Our first opportunity for deep engagement came earlier this month with the W3C Technical Plenary and Advisory Committee (TPAC) meeting in Shenzhen, China. During the weeklong TPAC meeting, many of the W3C Working Groups that develop W3C Recommendations met to advance their projects and provide an opportunity for others to observe and inform. TPAC was also a great opportunity for face-to-face collaboration with others in the field.

One of the people I had an opportunity to collaborate with in person was Charles McCathie Nevile, aka Chaals, who has long been involved in W3C activities and accessibility. Chaals works for Yandex one of the mainstream search engines, which leverages is collaboration between Bing, Google, Yahoo and Yandex to create and support a standard set of schemas for structured data markup on web pages. These standardized schemas enable webmasters to improve the discoverability of their content. You can experience the benefits of this standardized markup by searching Google for ‘Potato Salad’, clicking ‘Search Tools’ and now having the ability to filter search results by the properties defined at, such as ingredients, cook time or caloric content.

Title: Demo of potato salad recipe search on Google - Description: Demo of potato salad recipe search on Google To experience the same search go to:,rcp_cal:100

For the past year thanks to funding from the Gates Foundation Benetech has led a working group to propose a set of accessibility properties that can be used with existing schemas to enable the discovery of accessible educational resources. The need for a standard set of properties surfaced during our participation as a launch partner for the Learning Registry in 2011. While the Learning Registry can leverage properties defined by the Learning Resources Metadata Initiative (LRMI), such as educational alignment, there was no standard set of properties that would enable an educator to find closed-captioned videos for hearing impaired students or algebra textbooks that used MathML – an accessible format for mathematical expressions. was the ideal place to define these properties, because they would not only benefit tools, such as the Learning Registry, but would benefit broader Open Web Platform technologies and mainstream search engines, such as Google and Yandex.

During TPAC Chaals, Markus Gylling (CTO of IDPF and co-chair of the W3C Digital Publishing Interest Group) and I were able to resolve the remaining concerns that representatives had with the proposed properties. The following week, Dan Brickley, the editor of, publicly announced that would be adopting the accessibility properties proposed by the Accessibility Metadata project and IMS Global Access for All.

Soon after Dan’s announcement the IDPF updated their EPUB 3 Accessibility Guidelines to recommend to the publishing industry the use of those properties with digital textbooks and ebooks. These Guidelines have received broad support from the American Association of Publishers (AAP) and the National Federation of the Blind (NFB). I hope the Guidelines will be instrumental to the newly introduced ‘Technology, Education and Accessibility In College and Higher Education Act’ (TEACH) by U.S. Congressman Tom Petri.

At TPAC I also participated in three W3C group face-to-face meetings. The first was with the Digital Publishing Interest Group (DPUB IG), which Benetech is a member of along with Adobe, Google, Hachette Livre, IBM, Pearson and many others. The mission of the group is to provide a forum for experts in the digital publishing ecosystem for technical discussions, gathering use cases and requirements to align the existing formats and technologies needed for digital publishing with those used by the Open Web Platform. The goal is to ensure that the requirements of digital publishing can be answered, when in scope, by the Recommendations published by W3C.

Per the charter of the group Suzanne Taylor from Pearson and I put together a set of use cases related to accessibility for Digital Publishing. I had the opportunity to discuss the use cases related to image and diagram accessibility with the SVG Working Group, which is preparing the SVG 2.0 specification for final call at the end of this year. The Scalable Vector Graphic format (SVG) is a format that has been deemed the third most important feature of EPUB 3 that publishers need to adopt by the AAP EPUB 3 Implementation Project. SVG currently contains a number of mechanisms that enhance the accessibility of digital publications particularly those in the STEM field. As a result, SVG is an excellent standard to build upon to further address the needs of students with disabilities.

The first use case I discussed was SVG as a fallback and bridging solution for the accessibility of mathematical expressions. Currently, MathML is recommended as the format that publishers should use for accessibility. However, MathML adoption among traditional reading systems has been abysmal and there is no sign that it will soon improve. Google’s Chrome browser recently dropped support for MathML and MathPlayer, a popular Microsoft Internet Explorer (IE) plug-in, which is used by students with disabilities is no longer supported in IE 11. Furthermore, MathML does not work with many mainstream reading systems, such as the Kindle, and even when there is visual rendering support, there is little to no accessibility support. As a result many publishers, such as O’Reilly and Inkling, have resorted to converting mathematical expressions from MathML to SVG or PNG graphics, which either results in a loss of the information needed by blind or vision impaired (BVI) students or increases the cost to publishers of complying with accessibility requirements.

My proposal to the SVG Working Group was for the SVG 2.0 specification to support embedding MathML within SVG along with granular verbal descriptions of the expression as a lowest common denominator for assistive technology. This approach would broaden compatibility and not take away information that could be leveraged by future assistive technologies. I also recommended that SVG express explicit support for the recently drafted ARIA 1.1 describedAt property that enables MathML and corresponding verbal descriptions to also be referenced by an external URI (URL). This URL would provide access to the source MathML and alternative formats, such as Nemeth Braille, and would enable educators and disability services professionals to correct and improve MathML markup, which may render correctly visually, but poorly aurally.

The SVG group was very open and responsive to my proposal and Richard Schwerdtfeger from IBM took the first action by adding to the SVG 2.0 specification draft support for aria-describedat. With these standards in place our plans to take to market a prototyped tool for publishers and other content creators to convert MathML to described SVG become even more compelling.

Next I discussed with the SVG Working Group research and development that Benetech had undertaken to use Open Web Platform technologies to implement the sonification features of MathTrax. MathTrax is a graphing tool for blind and low vision middle and high school students to access visual math data and graph or experiment with equations and datasets.

Doug Schepers, a staff member of the W3C, demonstrated a project we had collaborated on to sonify graphs of mathematical expressions, such as a parabola, using the Web Audio API and the nascent Web Speech API supported in Safari 6.1 and the upcoming Chrome 33. We discussed that in order to generalize this approach work with SVG graphs generated by other tools, such as D3.js, we needed standard semantics to identify which SVG elements represented data and the x and y axis. Richard Schwerdtfeger suggested that new ARIA roles be enumerated for this purpose. I look forward to moving this forward with Rich.

One of the advantages of the SVG format for accessible images and graphs is that textual descriptions of the whole image or individual elements can be included within SVG. This is superior to the use of the alt property with HTML image elements, because descriptions can be lengthier than those typically used with the alt property and they are portable with the image content. Unfortunately SVG descriptions are limited to text and can’t use rich HTML markup to incorporate tables, lists or links. I recommend reading some of the recommendations by the DIAGRAM Center on the use of structured elements for image descriptions, particularly in the STEM field.

I was in luck with this use case as the SVG Working Group was already considering supporting HTML within SVG 2.0. My use case was further confirmation that SVG could benefit from this support and an action was created specific to this use case.

Because we need a way to deal with back titles that don’t leverage these standards for rich image descriptions or have insufficient descriptions, I discussed the need for mechanisms to support the crowdsourcing or post-production addition of image descriptions to titles that have already been published. Previously I discussed that ARIA 1.1’s describedAt property is one enabling standard that tools, such as DIAGRAM Center’s Poet can leverage. Markus Gylling and I also believe that the emergent Open Annotations specification provides another approach. The good news is that Doug Schepers from the SVG Working Group believes this will be possible based on his talks with the developers at who are building one of the first tools to leverage the Open Annotations specification.

Besides aural and textual modalities, the content of SVG graphs can also be conveyed via tactile modalities. Educators to blind and visually impaired students have commonly used tactile graphics to make graphical content accessible. SVG is the recommended digital format for creating tactile graphics that can be printed with specialized printers called embossers. These specialized printers are expensive, thus the DIAGRAM Center and others have been funding and conducting research on the use of 3D printers for making 2D SVG graphics accessible. Given MakerBot’s recently announced mission to put a 3D printer in every classroom, this is very exciting.

Haptics are another promising technology for making SVG graphics accessible via a tactile modality. The DIAGRAM Center recently partnered with ETS to research tablet-based haptic display of graphical information and explore the inclusion of this technology in EPUB 3 textbooks.

Ideally the same SVG graphic could be printed with ink, an embosser or a 3D printer. Based on discussions with SVG Working Group we determined that a solution could be to leverage CSS media queries, which today are used to format web pages for print. An action item was taken for Doug Schepers and Tab Atkins who also sits on the CSS Working Group to work on tactile, 3D and haptic media queries for SVG.

Finally I presented to the SVG Working Group the need for SVG images to be easily reusable within a document and across documents without the need for the HTML image element, which limits much of the advanced capabilities of SVG. The SVG Working Group recommended that iframes be used to inline the same SVG graphic across multiple locations in a document. This seems like a reasonable approach and I will be further discussing it with our DIAGRAM Center partners, the DPUB IG and developers of assistive technologies.

I was quite pleased with how open and curious the SVG Working Group was to issues around accessibility. This was exactly the type of collaboration that the Digital Publishing Interest Group was meant to create.

My last day at TPAC was spent with the W3C Independent User Interface (Indie UI) Working Group. Their mission is to facilitate interaction in Web applications that are input method independent, and hence accessible to people with disabilities. For example currently Web application authors, wishing to intercept a user’s intent to ‘zoom in’ on a map view, need to ‘listen’ for a wide range of events that vary by operating system or device, such as CTRL-Plus, Command-Plus, pinch/zoom touch gestures, etc. Assistive technologies further expand and complicate the possible interactions that web developers must account for.

Ideally, content and applications automatically adapt to the users preferences. To make this possible the Indie UI team is also drafting a user context specification. The Indie UI team was very interested in learning about our work with to specify accessibility properties. One use case enabled by leveraging both of these specifications is applications that automatically narrow search results for users based on their preferences. For example if the user context indicated that the user had a preference for videos with captions, then a search query could automatically narrow results to videos with captions.

I’m very excited to see all these standards coming together to enable a born accessible future and reaping its benefits. I’d like to thank the W3C for putting on a great event, which crystallized for me the importance of the W3C’s mission.

First face-to-face meeting of the Digital Publishing Interest Group

Fairly Lake Botanical Garden, Shenzhen, China

Fairy Lake Botanical Garden, Shenzhen, China

The Digital Publishing Interest Group had its first face-to-face meeting during the W3C TPAC week in Shenzhen, China. The meeting’s main goal was to give more direction to the various task forces that the Interest Group had started to define in earlier weeks, by specifying their scope and main focus. Not all task forces were covered; indeed, the two days also included meetings with other experts from other W3C groups, so the final scoping of the task forces will have to be done in subsequent teleconferences.

The issues around pagination (based on the very fist draft document) were, obviously, the most complex. Producing a document covering all aspects for all kinds of publishing would be a huge task, going beyond what the IG could reasonably do. After much discussion, the scope, for the first version of the document, was restricted along two axes: publishing genres and writing systems. For the former, some of the possible areas were, for the time being, were set aside; these are journals and magazines, poetry, children’s picture books, comics and manga. Indeed, these genres require very special considerations (e.g., possibly fixed layout), and additional efforts and resources will be needed to cover those. As far as writing systems are concerned, the group had to take into account that W3C already had published documents on Japanese Layout, and similar documents may become available in future for Chinese, Korean, or Indic writing systems. As a consequence, the current document will concentrate on Latin based pagination, including the various local variations for different languages or cultures.

Beyond the current pagination concerns (i.e., headers, footers, page breaks, etc.),  it was also recognized that typography issues, again concentrating on Latin languages, should be considered to be very much in scope along the same lines as pagination. Whether this will be a separate document or part of the pagination document is still to be decided.

Although the pagination work primarily results in issues around CSS (possible missing features, setting priorities, etc), which was also the subject of a joint meeting of this group with the CSS Working Group, it was also recognized that pagination raises a number of problems in terms of the content model, in the DOM, as well as available events (e.g., event should be raised when user turns a page). These notions are necessary for reading systems, and it is not clear at this moment whether all the necessary features are covered by the current set of events defined for HTML and/or whether DOM extensions would be necessary. A separate document will have to be published to look into this, which may result in some further joint work with the HTML Working Group in the future.

A very different problem area the group looked at is what is currently known as “Behavioral Adaption”, exemplified by some use cases on the IG Wiki. The solution of those problems require some sort of an additional markup identifying, e.g., the publisher’s semantics for specific elements (chapter title, index, etc.). There are different approaches: one is to use more powerful metadata syntaxes like microdata or RDFa Lite to annotate the content; the other is to use e-book specific attributes as extensions to the core HTML5 set. After discussions on the pros and cons of these two alternatives, the IG decided in favor of the attribute approach. This will be considered in more details in the months to come. The current EPUB specification already introduces an EPUB namespace, yielding epub:type attributes; however, that approach may lead to issues in the future in view of the evolution of HTML5. The direction that will be explored further is the attributes of epub-XXXX format, i.e., without the usage of the XML namespace syntax. It was recognized that a document specifying these attributes, as well as possible values, should be produced (probably by IDPF) to get this accepted as a bona fide HTML5 extension.

The issue of security was also addressed. After quite some discussion it was decided that this large area of concern should be made more specific to decide what is, and what is not in scope for the Interest Group. The issue of DRM on books naturally came up; indeed, it would be, in theory, possible for the IG to collect use cases for various forms of DRM. However, the feeling was that the IG would never get to a consensus on such use cases, due to the different appreciations of the underlying business models. As a consequence, the IG decided that DRM is out of scope for this IG. There are, however, other security as well as privacy issues that are relevant for digital publishing: e.g., what happens if a malicious URI is added to the spine of an electronic book, what happens to the private data a reading system may collect on the user’s behaviour, etc. These issues are very much in scope, and the decision of the IG is to explore those areas further.

There were other discussions areas, sometimes with guests coming from different groups within W3C, e.g., on accessibility or testing. The more detailed minutes, both for the first day and the second, are available on line.

It was a good meeting, which also gave the possibility for many to meet personally for the first time! Additionally, members of the Digital Publishing Interest Group attended other working group meetings throughout the week which, hopefully, was useful for everyone involved.

Last Call: CSS Syntax Module Level 3

The Cascading Style Sheets (CSS) Working Group has published a Last Call Working Draft of CSS Syntax Module Level 3. This module describes, in general terms, the basic structure and syntax of CSS stylesheets. It defines, in detail, the syntax and parsing of CSS – how to turn a stream of bytes into a meaningful stylesheet. CSS is a language for describing the rendering of structured documents (such as HTML and XML) on screen, on paper, in speech, etc. Comments are welcome through 17 December.

HTML Working Group updated HTML 5.1, HTML Canvas 2D Context, Level 2, and HTML Microdata

The HTML Working Group has update two Working Drafts and a Working Group Note today:

  • A Working Draft of HTML 5.1, which defines the 5th major version, first minor revision of the core language of the World Wide Web: the Hypertext Markup Language (HTML). In this version, new features continue to be introduced to help Web application authors, new elements continue to be introduced based on research into prevailing authoring practices, and special attention continues to be given to defining clear conformance criteria for user agents in an effort to improve interoperability.
  • A Working Draft of HTML Canvas 2D Context, Level 2. This specification defines the 2D Context for the HTML canvas element. The 2D Context provides objects, methods, and properties to draw and manipulate graphics on a canvas drawing surface.
  • A Group Note of HTML Microdata, which defines the HTML microdata mechanism. This mechanism allows machine-readable data to be embedded in HTML documents in an easy-to-write manner, with an unambiguous parsing model. It is compatible with numerous other data formats including RDF and JSON.

Internationalization Tag Set (ITS) Version 2.0 is a W3C Recommendation

The W3C MultilingualWeb-LT Working Group has published a W3C Recommendation of Internationalization Tag Set (ITS) Version 2.0. ITS 2.0 provides a foundation for integrating automated processing of human language into core Web technologies. ITS 2.0 bears many commonalities with its predecessor, ITS 1.0, but provides additional concepts that are designed to foster the automated creation and processing of multilingual Web content. Work on application scenarios for ITS 2.0 and gathering of usage and implementation experience will now take place in the ITS Interest Group.

New draft for CSS writing modes

The Cascading Style Sheets (CSS) Working Group has published a Working Draft of CSS Writing Modes Level 3. CSS Writing Modes Level 3 defines CSS support for various international writing modes, such as left-to-right (e.g. Latin or Indic), right-to-left (e.g. Hebrew or Arabic), bidirectional (e.g. mixed Latin and Arabic) and vertical (e.g. Asian scripts). Inherently bottom-to-top scripts are not handled in this version.

Event Report: Multimedia Archives and Metadata for Digital Publishing

The W3C Germany and Austria office has published a report on the Multimedia Archives and Metadata for Digital Publishing September 2013 event, which was jointly held with Xinnovations. The metadata topic is covered in detail in the report and shows high relevance for a wide range of technologies – from Semantic Web to Digital Publishing and Web technology in general – and application areas: from general or scientific publishers and libraries to Wikipedia related communities. More information in German is provided by a dedicated press release.

At the CONTEC Conference

I had the pleasure of participating at the CONTEC Conference last week, taking place in conjunction with the Frankfurt Book fair. /It was really good to be there and I would like to thank Kat Meyer for the invitation to participate. I had lots of conversations, informal or semi-informal meetings with various people; I do not want to list names because I would incur the danger of forgetting, and thereby offending, someone… Suffice it to say that it was really good for networking!

I spent most of my afternoon at two open sessions, both around EPUB 3, namely a session on IDPF and on, respectively. Although, through W3C, we of course have a  contact with IDPF, this session was extremely useful to gain a bit more insight into what is happening there these days. I knew about some of the work going on (e.g., the fact that EPUB 3.01 goes to ISO), but others were new to me. For example, I did not know until that day that a work is planned to adapt the Open Annotation Model (developed in the corresponding W3C Community Group) to EPUB. This work makes a lot of sense, portable annotations is a hugely important area for electronic books, and I am quite excited to see this work happening; I will try to keep up-to-date on this. The other extensions to EPUB (e.g., on indexes, usage of dictionaries) also look interesting and important. Finally, it was also interesting to see that IDPF is continuing its efforts in outreach (e.g., that it will take over the Support Grid of BISG and develop it further); I think outreach is yet another area where a future cooperation between IDPF and W3C may happen.

While I of course knew about many things about IDPF, the presentation on was different: I only had very vague ideas, previously, about what was going on there. The goal is to develop an open source implementation to be at the core of EPUB3 readers. This “Readium SDK” will sit on top of Open Web Platform based rendering engines like Gecko or Webkit, and should take care of all the core EPUB3 specific features (e.g., table of contents, management of indexes, packaging, etc.). The code is expected to be available at the end of the year, and we can expect first full-blown readers mid 2014. This can become hugely important: it means EPUB3 compliant readers can really come to the fore and, due to the architecture, those readers can evolve in parallel with browser developments.

There was also a separate presentation on the thorny issue of content protection through the separate sub-project called LCP (Lightweight Content Protection). The way I understood it, as a kind of an elevator pitch: consider what is currently available for PDF in terms of password protection and right expressions, and adapt it to EPUB3. It is not a really strong content protection, as far as I know, but it seems that at least the participants (which includes a number of publishers) consider it as good enough. I do not know whether this is a solution to the current DRM issues and discussions on books, and I guess it is still controversial, but it was interesting to see that at least new ideas are being sought and are being implemented as alternative solutions. (To avoid any misunderstandings: the Readium SDK is not dependent on LCP; it is up to the final users of the code whether they want to include that module or not.)

Last but not least:-), Markus Gylling and I also had a session on the relationships between IDPF and W3C, entitled “Digital publishing and the open web: The W3C’S digital publishing interest group”. (The slides of the session are also available on-line). We explained the reasons for setting up the W3C Interest Group; that the publishing industry should play a more active role in the development of the Open Web Platform; what has already been achieved; and also how the cooperation between IDPF and W3C is essential in this respect. Although it was not a huge room, it was certainly full with around 50-60 people (out of around 250 attendees overall at CONTEC). It was great to see that many of the participants, who may not have heard of us before, became really interested by the issues around the Open Web Platform; hopefully, this will be the basis for more contact and cooperation in the future!

It was a good day!

Update from the Co-Chairs

In the first two months since the launch of the Digital Publishing Interest Group, we have already identified approx. 36 Use Cases. They include narratives for pagination, annotation, the representation of mathematical and scientific content in reflowable MathML, and accessibility scenarios for personalized learning materials to specific conditions like Dyslexia. Robert Sanderson provided a suite of Use Cases for the basic model for commenting, annotating, tagging with persistent layout, and with that we have a full spectrum of social reading examples. New use cases are added weekly, so please check in regularly with our Directory on the DigiPUb wiki.

Having real-world examples from users is critical in identifying the technical requirements and the Working Groups that will provide the specification for a seamless, portable, and enjoyable reading or learning experience. User experience will no doubt provide more information as our last weekly meeting explored internationalization, second screen / multi-screen, and the convergence of journals, books, and testing. Use Cases for these are hotly anticipated.

Meanwhile, two Task Force developments are underway. Dave Cramer, Hachette Livre, kindly agreed to lead the Pagination team and Suzanne Taylor, Pearson Education will lead Accessibility. Both of these bring attention to the evolving expectations of the digital narrative as we discover different “rules” for the various kinds of publishing, e.g. STEM, Professional, Education – Testing.

Thanks to the participants of the group for their generosity. If we deliver these open specifications, we will surely have the potential to significantly impact and change the way we deliver and consume information. With the recent announcement from Digital Book World magazine of a $13 e-reader called Beagle, the idea of an eventual free e-reader can’t be far off. Smart phones are also beginning to use better e-ink to display text and with 87% of the population owning one, their reach can’t be underestimated

On behalf of my Co-Chair Markus Gylling, we thank you and look forward to keeping you updated with our progress.