Some W3C Documents in EPUB3

I have been having fun the past few months, when I had some time, with a tool to convert official W3C publications (primarily Recommendations) into EPUB3. Apart from the fact that this helped me to dive into some details of the EPUB3 Specification, I think the result might actually be useful. Indeed, it often happens that a W3C Recommendation consists, in fact, of several different publications. This means that just archiving one single file is not enough if, for example, you want to have those documents off line. On the other hand, EPUB3 is perfect for this; one creates an eBook contains all constituent publications as “chapters”. Yep, EPUB3 as complex archiving tool:-)

The Python tool (which is available in github) has now reached a fairly stable state, and it works well for documents that have been produced by Robin Berjon’s great respec tool. I have generated, and put up on the Web, two books for now:

  1. RDFa 1.1, a Recommendation that was published last August (in fact, there was an earlier version of an RDFa 1.1. EPUB book, but that was done largely manually; this one is much better).
  2. JSON-LD, a Recommendation published this week (i.e., 16th of January).

(Needless to say, these books have no formal standing; the authoritative versions are the official documents published as a W3C Technical Report.)

There is also draft version for a much larger book on RDF1.1, consisting of all the RDF 1.1 specifications to come, including all the various serializations (including RDFa and JSON-LD). I say “draft”, because those documents are not yet final (i.e., not yet Recommendations); a final version (with, for example, all the cross-links properly set) will be at that URI when RDF 1.1 becomes a Recommendations (probably in February).

(Republished from my personal blog.)

Chinese Digital Publishing Community Group at W3C

A new Community Group has been set up at W3C, called “Chinese Digital Publishing Community Group”. The group aims to provide a platform for the Chinese digital publishing industry to share perspectives on Chinese text layout, copyrights and other occupational standards. The group will conduct its discussion in Chinese; its work will complement, and will cooperate with, the Digital Publishing Interest Group at W3C. No specifications will be published by the group.

Digital Publishing IG organizing itself: task forces

As a means to organize its work, the Digital Publishing Interest group has defined a set of Task Forces. Each of these Task Forces represents a specific technical area of work, will produce separate documents, and will have parallel discussions. The task forces are as follows.

Latinreq, i.e., Requirement for Latin Text Layout and Pagination
The task force will produce two documents. One will be a general requirement document for Latin text layout, pagination, and typesetting. This document will be patterned after, for example, the “Requirements for Japanese Text Layout” document that W3C has published a while ago, but concentrating on Latin-based languages. Another document will describe how these general requirements map on specific requirements for CSS 3 or CSS 4.
Page DOM
This task force will concentrate on the issue of representing the concept of a page in an XML or HTML DOM: are the current facilities enough, or is there a need for an extension for the purpose of paged based media publishing?
Metadata
This task force will look at the metadata vocabulary and identification landscape as used by the publishing industry, and will identify some of the missing features, possible mappings, etc., that the industry needs. The task force will also have to answer the question whether an additional work in this area is necessary and, if yes, whether W3C is the right organization to pursue the work or not.
Behavioral Adaption
Digital publications need to identify the role of specific HTML structural element in a publication beyond what the core HTML tag provide. For example, certain elements should be marked up as being candidates for an index to be generated for the book. This task force will consider the various challenges on how to do this in HTML, and whether extensions to HTML are necessary or not (e.g., by introducing a new set of attributes).
Annotation
One of the main challenges in, for example, reading systems for educational publications is the ability to annotate the document in a portable manner. Although some of these issues have been dealt with in the W3C Open Annotation Community Group, how to use the general approach in terms of the Open Web Platform as used in Digital Publishing may raise further technical challenges and missing features that this task force will identify.
STEM (i.e, Scientific, Technical, Engineering, and Mathematical Publishing)
This category of digital publishing raises a number of particular issues, e.g., in terms of technical illustrations, interactivity, or usage of mathematical formulas. This Task Force will consider these issues in light of what the Open Web Platform provides, and identify possibly missing features.
Security, Privacy
The latest generation of Digital Publishing standards, like EPUB3, introduce the possibility of (albeit limited) scripting, thereby providing interactivity, connection to outside services, etc. However, these new facilities may lead to specific of security and privacy challenges (e.g., what happens to the information gleaned from the user’s reading habits); these new issues may also lead to new requirements in terms of using the Open Web Platform.
Accessibility
Accessibility has always been at the core of Digital Publishing, e.g., with the ability to produce books in Braille, or in forms of Audio Books. However, these possibilities lead to new challenges that may not be properly reflected in the various OWP technologies, and/or the Accessibility guidelines published by the W3C.
Bridging Offline and Online
This task force is a little bit different from the others: instead of looking at the requirements of current Digital Publishing technologies, it rather looks at some longer term issues on how Digital Publishing and the Open Web Platform would align further in future.

The work have already begun in the various task forces; the goal from now on is that the weekly teleconferences would concentrate on the work of one or two specific task forces to synchronize with the group as a whole. The minutes of those meetings are public. The separate page on the Task Forces on the Group’s Wiki also links to the various use cases in each category that have been collected in the past few weeks.

TPAC Update on Accessibility in Digital Publishing

In October of this year, Benetech joined the World Wide Web Consortium to more deeply participate in the evolution of standards that enable educational content to be born accessible. Our first opportunity for deep engagement came earlier this month with the W3C Technical Plenary and Advisory Committee (TPAC) meeting in Shenzhen, China. During the weeklong TPAC meeting, many of the W3C Working Groups that develop W3C Recommendations met to advance their projects and provide an opportunity for others to observe and inform. TPAC was also a great opportunity for face-to-face collaboration with others in the field.

One of the people I had an opportunity to collaborate with in person was Charles McCathie Nevile, aka Chaals, who has long been involved in W3C activities and accessibility. Chaals works for Yandex one of the mainstream search engines, which leverages Schema.org. Schema.org is collaboration between Bing, Google, Yahoo and Yandex to create and support a standard set of schemas for structured data markup on web pages. These standardized schemas enable webmasters to improve the discoverability of their content. You can experience the benefits of this standardized markup by searching Google for ‘Potato Salad’, clicking ‘Search Tools’ and now having the ability to filter search results by the properties defined at http://schema.org/Recipe, such as ingredients, cook time or caloric content.

Title: Demo of potato salad recipe search on Google - Description: Demo of potato salad recipe search on Google To experience the same search go to: https://www.google.com/#q=potato+salad&tbm=rcp&tbs=rcp_tt:15,rcp_cal:100

For the past year thanks to funding from the Gates Foundation Benetech has led a working group to propose a set of Schema.org accessibility properties that can be used with existing schemas to enable the discovery of accessible educational resources. The need for a standard set of properties surfaced during our participation as a launch partner for the Learning Registry in 2011. While the Learning Registry can leverage Schema.org properties defined by the Learning Resources Metadata Initiative (LRMI), such as educational alignment, there was no standard set of properties that would enable an educator to find closed-captioned videos for hearing impaired students or algebra textbooks that used MathML – an accessible format for mathematical expressions. Schema.org was the ideal place to define these properties, because they would not only benefit tools, such as the Learning Registry, but would benefit broader Open Web Platform technologies and mainstream search engines, such as Google and Yandex.

During TPAC Chaals, Markus Gylling (CTO of IDPF and co-chair of the W3C Digital Publishing Interest Group) and I were able to resolve the remaining concerns that Schema.org representatives had with the proposed properties. The following week, Dan Brickley, the editor of Schema.org, publicly announced that Schema.org would be adopting the accessibility properties proposed by the Accessibility Metadata project and IMS Global Access for All.

Soon after Dan’s announcement the IDPF updated their EPUB 3 Accessibility Guidelines to recommend to the publishing industry the use of those properties with digital textbooks and ebooks. These Guidelines have received broad support from the American Association of Publishers (AAP) and the National Federation of the Blind (NFB). I hope the Guidelines will be instrumental to the newly introduced ‘Technology, Education and Accessibility In College and Higher Education Act’ (TEACH) by U.S. Congressman Tom Petri.

At TPAC I also participated in three W3C group face-to-face meetings. The first was with the Digital Publishing Interest Group (DPUB IG), which Benetech is a member of along with Adobe, Google, Hachette Livre, IBM, Pearson and many others. The mission of the group is to provide a forum for experts in the digital publishing ecosystem for technical discussions, gathering use cases and requirements to align the existing formats and technologies needed for digital publishing with those used by the Open Web Platform. The goal is to ensure that the requirements of digital publishing can be answered, when in scope, by the Recommendations published by W3C.

Per the charter of the group Suzanne Taylor from Pearson and I put together a set of use cases related to accessibility for Digital Publishing. I had the opportunity to discuss the use cases related to image and diagram accessibility with the SVG Working Group, which is preparing the SVG 2.0 specification for final call at the end of this year. The Scalable Vector Graphic format (SVG) is a format that has been deemed the third most important feature of EPUB 3 that publishers need to adopt by the AAP EPUB 3 Implementation Project. SVG currently contains a number of mechanisms that enhance the accessibility of digital publications particularly those in the STEM field. As a result, SVG is an excellent standard to build upon to further address the needs of students with disabilities.

The first use case I discussed was SVG as a fallback and bridging solution for the accessibility of mathematical expressions. Currently, MathML is recommended as the format that publishers should use for accessibility. However, MathML adoption among traditional reading systems has been abysmal and there is no sign that it will soon improve. Google’s Chrome browser recently dropped support for MathML and MathPlayer, a popular Microsoft Internet Explorer (IE) plug-in, which is used by students with disabilities is no longer supported in IE 11. Furthermore, MathML does not work with many mainstream reading systems, such as the Kindle, and even when there is visual rendering support, there is little to no accessibility support. As a result many publishers, such as O’Reilly and Inkling, have resorted to converting mathematical expressions from MathML to SVG or PNG graphics, which either results in a loss of the information needed by blind or vision impaired (BVI) students or increases the cost to publishers of complying with accessibility requirements.

My proposal to the SVG Working Group was for the SVG 2.0 specification to support embedding MathML within SVG along with granular verbal descriptions of the expression as a lowest common denominator for assistive technology. This approach would broaden compatibility and not take away information that could be leveraged by future assistive technologies. I also recommended that SVG express explicit support for the recently drafted ARIA 1.1 describedAt property that enables MathML and corresponding verbal descriptions to also be referenced by an external URI (URL). This URL would provide access to the source MathML and alternative formats, such as Nemeth Braille, and would enable educators and disability services professionals to correct and improve MathML markup, which may render correctly visually, but poorly aurally.

The SVG group was very open and responsive to my proposal and Richard Schwerdtfeger from IBM took the first action by adding to the SVG 2.0 specification draft support for aria-describedat. With these standards in place our plans to take to market a prototyped tool for publishers and other content creators to convert MathML to described SVG become even more compelling.

Next I discussed with the SVG Working Group research and development that Benetech had undertaken to use Open Web Platform technologies to implement the sonification features of MathTrax. MathTrax is a graphing tool for blind and low vision middle and high school students to access visual math data and graph or experiment with equations and datasets.

Doug Schepers, a staff member of the W3C, demonstrated a project we had collaborated on to sonify graphs of mathematical expressions, such as a parabola, using the Web Audio API and the nascent Web Speech API supported in Safari 6.1 and the upcoming Chrome 33. We discussed that in order to generalize this approach work with SVG graphs generated by other tools, such as D3.js, we needed standard semantics to identify which SVG elements represented data and the x and y axis. Richard Schwerdtfeger suggested that new ARIA roles be enumerated for this purpose. I look forward to moving this forward with Rich.

One of the advantages of the SVG format for accessible images and graphs is that textual descriptions of the whole image or individual elements can be included within SVG. This is superior to the use of the alt property with HTML image elements, because descriptions can be lengthier than those typically used with the alt property and they are portable with the image content. Unfortunately SVG descriptions are limited to text and can’t use rich HTML markup to incorporate tables, lists or links. I recommend reading some of the recommendations by the DIAGRAM Center on the use of structured elements for image descriptions, particularly in the STEM field.

I was in luck with this use case as the SVG Working Group was already considering supporting HTML within SVG 2.0. My use case was further confirmation that SVG could benefit from this support and an action was created specific to this use case.

Because we need a way to deal with back titles that don’t leverage these standards for rich image descriptions or have insufficient descriptions, I discussed the need for mechanisms to support the crowdsourcing or post-production addition of image descriptions to titles that have already been published. Previously I discussed that ARIA 1.1’s describedAt property is one enabling standard that tools, such as DIAGRAM Center’s Poet can leverage. Markus Gylling and I also believe that the emergent Open Annotations specification provides another approach. The good news is that Doug Schepers from the SVG Working Group believes this will be possible based on his talks with the developers at Hypothes.is who are building one of the first tools to leverage the Open Annotations specification.

Besides aural and textual modalities, the content of SVG graphs can also be conveyed via tactile modalities. Educators to blind and visually impaired students have commonly used tactile graphics to make graphical content accessible. SVG is the recommended digital format for creating tactile graphics that can be printed with specialized printers called embossers. These specialized printers are expensive, thus the DIAGRAM Center and others have been funding and conducting research on the use of 3D printers for making 2D SVG graphics accessible. Given MakerBot’s recently announced mission to put a 3D printer in every classroom, this is very exciting.

Haptics are another promising technology for making SVG graphics accessible via a tactile modality. The DIAGRAM Center recently partnered with ETS to research tablet-based haptic display of graphical information and explore the inclusion of this technology in EPUB 3 textbooks.

Ideally the same SVG graphic could be printed with ink, an embosser or a 3D printer. Based on discussions with SVG Working Group we determined that a solution could be to leverage CSS media queries, which today are used to format web pages for print. An action item was taken for Doug Schepers and Tab Atkins who also sits on the CSS Working Group to work on tactile, 3D and haptic media queries for SVG.

Finally I presented to the SVG Working Group the need for SVG images to be easily reusable within a document and across documents without the need for the HTML image element, which limits much of the advanced capabilities of SVG. The SVG Working Group recommended that iframes be used to inline the same SVG graphic across multiple locations in a document. This seems like a reasonable approach and I will be further discussing it with our DIAGRAM Center partners, the DPUB IG and developers of assistive technologies.

I was quite pleased with how open and curious the SVG Working Group was to issues around accessibility. This was exactly the type of collaboration that the Digital Publishing Interest Group was meant to create.

My last day at TPAC was spent with the W3C Independent User Interface (Indie UI) Working Group. Their mission is to facilitate interaction in Web applications that are input method independent, and hence accessible to people with disabilities. For example currently Web application authors, wishing to intercept a user’s intent to ‘zoom in’ on a map view, need to ‘listen’ for a wide range of events that vary by operating system or device, such as CTRL-Plus, Command-Plus, pinch/zoom touch gestures, etc. Assistive technologies further expand and complicate the possible interactions that web developers must account for.

Ideally, content and applications automatically adapt to the users preferences. To make this possible the Indie UI team is also drafting a user context specification. The Indie UI team was very interested in learning about our work with Schema.org to specify accessibility properties. One use case enabled by leveraging both of these specifications is applications that automatically narrow search results for users based on their preferences. For example if the user context indicated that the user had a preference for videos with captions, then a search query could automatically narrow results to videos with captions.

I’m very excited to see all these standards coming together to enable a born accessible future and reaping its benefits. I’d like to thank the W3C for putting on a great event, which crystallized for me the importance of the W3C’s mission.

First face-to-face meeting of the Digital Publishing Interest Group

Fairly Lake Botanical Garden, Shenzhen, China

Fairy Lake Botanical Garden, Shenzhen, China

The Digital Publishing Interest Group had its first face-to-face meeting during the W3C TPAC week in Shenzhen, China. The meeting’s main goal was to give more direction to the various task forces that the Interest Group had started to define in earlier weeks, by specifying their scope and main focus. Not all task forces were covered; indeed, the two days also included meetings with other experts from other W3C groups, so the final scoping of the task forces will have to be done in subsequent teleconferences.

The issues around pagination (based on the very fist draft document) were, obviously, the most complex. Producing a document covering all aspects for all kinds of publishing would be a huge task, going beyond what the IG could reasonably do. After much discussion, the scope, for the first version of the document, was restricted along two axes: publishing genres and writing systems. For the former, some of the possible areas were, for the time being, were set aside; these are journals and magazines, poetry, children’s picture books, comics and manga. Indeed, these genres require very special considerations (e.g., possibly fixed layout), and additional efforts and resources will be needed to cover those. As far as writing systems are concerned, the group had to take into account that W3C already had published documents on Japanese Layout, and similar documents may become available in future for Chinese, Korean, or Indic writing systems. As a consequence, the current document will concentrate on Latin based pagination, including the various local variations for different languages or cultures.

Beyond the current pagination concerns (i.e., headers, footers, page breaks, etc.),  it was also recognized that typography issues, again concentrating on Latin languages, should be considered to be very much in scope along the same lines as pagination. Whether this will be a separate document or part of the pagination document is still to be decided.

Although the pagination work primarily results in issues around CSS (possible missing features, setting priorities, etc), which was also the subject of a joint meeting of this group with the CSS Working Group, it was also recognized that pagination raises a number of problems in terms of the content model, in the DOM, as well as available events (e.g., event should be raised when user turns a page). These notions are necessary for reading systems, and it is not clear at this moment whether all the necessary features are covered by the current set of events defined for HTML and/or whether DOM extensions would be necessary. A separate document will have to be published to look into this, which may result in some further joint work with the HTML Working Group in the future.

A very different problem area the group looked at is what is currently known as “Behavioral Adaption”, exemplified by some use cases on the IG Wiki. The solution of those problems require some sort of an additional markup identifying, e.g., the publisher’s semantics for specific elements (chapter title, index, etc.). There are different approaches: one is to use more powerful metadata syntaxes like microdata or RDFa Lite to annotate the content; the other is to use e-book specific attributes as extensions to the core HTML5 set. After discussions on the pros and cons of these two alternatives, the IG decided in favor of the attribute approach. This will be considered in more details in the months to come. The current EPUB specification already introduces an EPUB namespace, yielding epub:type attributes; however, that approach may lead to issues in the future in view of the evolution of HTML5. The direction that will be explored further is the attributes of epub-XXXX format, i.e., without the usage of the XML namespace syntax. It was recognized that a document specifying these attributes, as well as possible values, should be produced (probably by IDPF) to get this accepted as a bona fide HTML5 extension.

The issue of security was also addressed. After quite some discussion it was decided that this large area of concern should be made more specific to decide what is, and what is not in scope for the Interest Group. The issue of DRM on books naturally came up; indeed, it would be, in theory, possible for the IG to collect use cases for various forms of DRM. However, the feeling was that the IG would never get to a consensus on such use cases, due to the different appreciations of the underlying business models. As a consequence, the IG decided that DRM is out of scope for this IG. There are, however, other security as well as privacy issues that are relevant for digital publishing: e.g., what happens if a malicious URI is added to the spine of an electronic book, what happens to the private data a reading system may collect on the user’s behaviour, etc. These issues are very much in scope, and the decision of the IG is to explore those areas further.

There were other discussions areas, sometimes with guests coming from different groups within W3C, e.g., on accessibility or testing. The more detailed minutes, both for the first day and the second, are available on line.

It was a good meeting, which also gave the possibility for many to meet personally for the first time! Additionally, members of the Digital Publishing Interest Group attended other working group meetings throughout the week which, hopefully, was useful for everyone involved.

Last Call: CSS Syntax Module Level 3

The Cascading Style Sheets (CSS) Working Group has published a Last Call Working Draft of CSS Syntax Module Level 3. This module describes, in general terms, the basic structure and syntax of CSS stylesheets. It defines, in detail, the syntax and parsing of CSS – how to turn a stream of bytes into a meaningful stylesheet. CSS is a language for describing the rendering of structured documents (such as HTML and XML) on screen, on paper, in speech, etc. Comments are welcome through 17 December.

HTML Working Group updated HTML 5.1, HTML Canvas 2D Context, Level 2, and HTML Microdata

The HTML Working Group has update two Working Drafts and a Working Group Note today:

  • A Working Draft of HTML 5.1, which defines the 5th major version, first minor revision of the core language of the World Wide Web: the Hypertext Markup Language (HTML). In this version, new features continue to be introduced to help Web application authors, new elements continue to be introduced based on research into prevailing authoring practices, and special attention continues to be given to defining clear conformance criteria for user agents in an effort to improve interoperability.
  • A Working Draft of HTML Canvas 2D Context, Level 2. This specification defines the 2D Context for the HTML canvas element. The 2D Context provides objects, methods, and properties to draw and manipulate graphics on a canvas drawing surface.
  • A Group Note of HTML Microdata, which defines the HTML microdata mechanism. This mechanism allows machine-readable data to be embedded in HTML documents in an easy-to-write manner, with an unambiguous parsing model. It is compatible with numerous other data formats including RDF and JSON.

Internationalization Tag Set (ITS) Version 2.0 is a W3C Recommendation

The W3C MultilingualWeb-LT Working Group has published a W3C Recommendation of Internationalization Tag Set (ITS) Version 2.0. ITS 2.0 provides a foundation for integrating automated processing of human language into core Web technologies. ITS 2.0 bears many commonalities with its predecessor, ITS 1.0, but provides additional concepts that are designed to foster the automated creation and processing of multilingual Web content. Work on application scenarios for ITS 2.0 and gathering of usage and implementation experience will now take place in the ITS Interest Group.

New draft for CSS writing modes

The Cascading Style Sheets (CSS) Working Group has published a Working Draft of CSS Writing Modes Level 3. CSS Writing Modes Level 3 defines CSS support for various international writing modes, such as left-to-right (e.g. Latin or Indic), right-to-left (e.g. Hebrew or Arabic), bidirectional (e.g. mixed Latin and Arabic) and vertical (e.g. Asian scripts). Inherently bottom-to-top scripts are not handled in this version.

Event Report: Multimedia Archives and Metadata for Digital Publishing

The W3C Germany and Austria office has published a report on the Multimedia Archives and Metadata for Digital Publishing September 2013 event, which was jointly held with Xinnovations. The metadata topic is covered in detail in the report and shows high relevance for a wide range of technologies – from Semantic Web to Digital Publishing and Web technology in general – and application areas: from general or scientific publishers and libraries to Wikipedia related communities. More information in German is provided by a dedicated press release.