DPUB IG Telco, 2015-04-20: Consensus call for DPUB ARIA Module, F2F Agenda

See the minutes online for a more detailed record of the discussions.

Consensus call for DPUB ARIA Module

After DPUB-ARIA task force met last week, the DPUB IG put out a call for consensus to put the Digital Publishing Module of ARIA to FPWD. The task force did not resolve all outstanding issues but will attempt to resolve them in a call with PF next week and move forward with FPWD. All present voted for the publication. An email will be sent formally requesting consensus.

Finalize Agenda for May F2F

Agenda for May F2F  will be closed at end of day today.

DPUB IG Telco, 2015-04-13: Aria Module, Packaging, Identifiers

See the minutes online for a more detailed record of the discussions.

Discussion on the ARIA module

The DPUB ARIA module should get a publication approval (to publish the document as a first public draft) from both the DPUB IG and Protocols and Format (PF) Working Group. However, it seems that, during their last teleconference, the PF Working Group has raised some issues that may have to be solved before this first publications. The issues they have are:

  • some participants would prefer to have prefixes for the terms for modules, such as the publishing one; essentially they end up in different domains
  • there were also some concerns about specific terms that may clash with similar terms elsewhere in ARIA

The IG discussed these issues; it seems that the Digital Publishing community at large would be very much against the usage of extra prefixes to the role attribute terms; some publishers may decide to completely ignore the terms altogether if that was the case.

The issue was discussed and was agreed that an email discussion should follow to flesh out the issues before a telco planned with the PF Working Group in about two weeks

Packaging examples

A new Wiki page has been created to list the functionality of the current packaging used in EPUB: what additional information, files, etc, are defined and used. On longer terms, the use cases on packaging should be used to identify possible differences between the current packaging format and the Web Packaging format as worked on elsewhere at W3C. This is an ongoing work.

Identifiers

There were some discussion on the mailing list and this led to a refresh of the corresponding wiki page of the task force. An interesting approach is provided by the so called “selectors” or the Open Annotation Model (which is currently a Working Draft): this provides a general structure to describe ranges, exact positions, etc, in a very flexible manner.

The problem with that approach, however, is that the selectors are not expressed in form of a URI. Indeed, the example in the document:

 "selector": {
    "@id": "http://example.org/selector1",
    "@type": "oa:DataPositionSelector",
     "start": 4096,
     "end": 4104
}

is a structure describing an anchor point in a document, but it is not a fragment identifier that can be part of a URI. Although it may be possible to translate that into a fragment, i.e., something like:


#selector(type=DataPositionSelector,start=4096,end=4104)

The ideal would be if there were some sort of a standard to make this mapping if possible. It was agreed that the question should be asked to the editors of the annotation document to find out whether there is, or has been, work on this, or whether there are fundamental issues that makes this type of mapping impossible or undesirable.

DPUB IG Telco, 2015-03-30: Structural Semantics, Packaging Use Cases

See the minutes online for a more detailed record of the discussions.

DPUB-ARIA a.k.a. Structural Semantics update

For background of this work (quoting from the abstract of the document):

Accessibility of web content requires semantic information about widgets, structures, and behaviors, in order to allow assistive technologies to convey appropriate information to persons with disabilities. This specification defines a WAI-ARIA module encompassing an ontology of roles, states, and properties specific to the digital publishing industry. These semantics are designed to allow an author to convey digital book user interface behaviors and structural information to assistive technologies and to enable semantic navigation, styling, and interactive features used by digital book readers. It is expected this will complement HTML5.

The ARIA DPUB Module has undergone significant changes, and the first public Working Draft (published by the W3C PF WG) is planned to be published mid-April. Comments, issues are very welcome, e.g., through emails or github issues.

The big change, compared to previous versions, is that the definitions of terms have been tightened up significantly. They used to be very book-centric, but they are now more general, meaning that the same terms can be used more broadly on the Web. Lots of details had to be handled (and there is still work to do) to align with terms used elsewhere in ARIA. The superclass roles, related aria attributes, examples for all terms, etc., have all been added.

Subsequent discussions concentrated mostly on how to make this document more understandable to Digital Publishing experts who are not familiar with ARIA. It has been agreed that more examples should be added using the aria attributes, too, that a few words should be added on how this work relates to the current practice of epub:type, the work happening within the EDUPUB initiative, etc.

Packaging use cases

A first batch of packaging use cases has been published on the group’s Wiki pages.

These cases don’t address the basics but they are packaging requirements required for a publishing workflow. Some examples:

Some other issues and possible directions were also discussed at the call (necessity—or not—to add DRM related features, finding metadata, etc.) The goal is to develop these and other use cases and provide these as input for the ongoing Web Packaging Work at W3C (which will have an influence on EPUB-WEB).

Administrativia

Due to Eastern Monday, which is a holiday in most of Europe, there will be no meeting on the 6th of April.

DPUB IG Telco, 2015-03-23: CSS fragment draft review, Identifiers, latinreq update

See the minutes online for a more detailed record of the discussions.

Review on CSS Fragments

The Interest Group was asked to review the CSS Fragmentation Module Level 3 Draft, published in January. The overall view of the specification is that it will provide, when finalized, will take care of a lot of problems on how to handle page-break, column-break, fragment-break, etc., combined with various situations like float.

There were also a number of areas that were identified as practical problems and that are may not have been (yet) addressed (or addressed adequately) by the draft. One notable area is with placed elements like images, video, or movable blocks: how to you handle those (e.g., by possibly reducing their size a bit) to still keep the page breaks acceptable, etc. In general, dynamic reflow when handling pagination is not (yet) addressed.

The reviewers in the DPUB IG have collected their detailed comments in one or several mails that have been posted to the discussion mailing list of the CSS WG; the detailed discussions will be continued there.

Identifiers

The IG has set up a new task force, on Identifiers whose goal is to consider the technical challenges, in relations to EPUB-WEB, on defining identifiers. Some introductory materials has been prepared, and has been added to the Wiki page, with some background materials and a rough proposed strategy.

The discussion concentrated on what the detailed goals of the Task Force will be. The feeling was that the group should, primarily, formulate what kinds of requirements the Publishing Community in general, and EPUB-WEB in particular, would have v.a.v. identifiers.

When considering packages, there are two different aspects: how to get to a specific content file within a package, and then how to get to a final content within the content. The latter should reuse, whenever possible, existing media fragment definitions, e.g., as registered by an xpointer scheme and/or by IETF; the former requires further work (and is exemplified today by EPUB’s CFI or the Fragment specification of the Web Packaging Draft). However, it should be emphasized that, on long term, if one creates a URI, that should look the same no matter what the publication is (archive or online) which is an important thing to remember moving forward.

The discussion will continue on the mailing list…

Latinreq

The status of Latinreq (a document trying to document how page layout should be done in Western languages): it is considered to be a living document, with new issues and aspects added to it. Contribution on additional features to be added are very welcome.

During the meeting additional feature requests were mentioned: e.g., how to publish fitting monolithic content into a fixed-size page, placing captions relative to images, handling tables (e.g., diagonal headers for tables)

Miscellaneous

The group also handled administrative issues like open action items and plans for upcoming face-to-face meetings.

DPUB IG Telco, 2015-03-09: Some Task Force Updates, discussion on Web Packaging

See the minutes online for a more detailed record of the discussions.

Task Force Updates

STEM Task force

The Questionnaire has been sent out to a number of people in several. With the last round (last week) the number of people who have been contacted is around 90, with a deadline for responses set at the end of the month. At the moment, there are 15 respondents plus some others who, though not replying themselves, have forwarded the questionnaire to colleagues.

Accessibility Task Force

The task force has surveyed the W3C Accessibility Guidelines to see which techniques are relevant for Digital Publishing. Most of them (a dozen or so) are not really required, and some others are not clear (e.g., PDF related techniques). However, most of what is in the current guidelines are very much relevant for the Digital Publishing Industry.

The more complicated question, which has not been addressed yet, is whether there are issues in Digital Publishing that are not addressed by the guidelines. Such issues may be related to page numbers, drop caps, etc., although some of these things could be addressed via other specs (like CSS).

The issue is that the task force is a little bit low on resources at the moment…

Content & Markup Task Force (Update to the role module)

The Task Force has been working with the W3C PF Group on a draft for a role module. This is now an early editors‘ draft; some terms have been cut out that could be addressed elsewhere. There is a need to also remove some ambiguity from the terms definitions and make sure things have context outside of Digital Publishing as well.

The challenge is to determine the scope of publishing and the definition of the terms. There are a large number of potential terms (almost a thousand) and a good balance must be found for a module that is neither too large nor too little.

Discussion on Web Packaging

The discussion included Yves Lafon, the W3C staff contact in the Web Application Working Group. Yves gave an update:

The Web packaging format started as a way of identifying—with the URL—a way to identify packaging. It then derailed to some use-case as to why there was a need for a package format or document. One of the main drivers was the need for Javascript libraries. There is also a strong relationship to using service workers; it is kind of a portable cache format. Without the need of a configuration. We wanted to actually know if the work we’ve done will be actually useful to our people. We started to gather input from other people, and we got some security input, signatures—part of the document from inside the package—and of course it would be good for us as we know this IG would be interested in this type of packaging, if our approach was good for you, what would we need to make better. The current point is trying to figure out who would be the perfect customer of the specification.

Subsequent discussions concentrated on technical as well as organizational issues. In general, it was agreed that better Digital Publishing use cases, and resulting requirements, should be collected and forwarded to the Web Application Working Group to represent Digital Publishing, and what is required from a packaging format. Eg, publishers are looking at ZIP alternatives to work well on mobile, that could also efficiently include large data sets, etc.

One of the main technical issues for the Digital Publishing community is the pros and cons of abandoning ZIP in favor of a new packaging format. One of the arguments against ZIP is that it is not properly streamable. However, it may be possible to add some restrictions to ZIP so that the result is actually streamable. If so, there is a legitimate issue whether abandoning ZIP, which is largely deployed through EPUB3 publications, is a acceptable alternative. On the other hand, it is in the interest of the Publishing Community to use a packaging format that can and is natively implemented by browsers.

It was noted that IETF also plans to look at packaging issues (see the IETF WG charter) and is currently considering the W3C Web Packaging Work, too.

The plans for this Interest Group is to (1) find a definite answer on whether ZIP files can be made streamable and (2) collect use cases to be submitted to the Web Packaging work.

DPUB IG Telco, 2015-03-02: Houdini project, EPUB 3.1 workplans

See the minutes online for a more detailed record of the discussions.

CSS WG’s Houdini project

The Houdini Project of the CSS Working Group had its first meeting a few weeks ago in Sydney, and some participants gave a short overview for the IG. The goal of the Houdini Project is to extend CSS. At present, CSS is a big black box where stuff goes in and formatted display comes out; if the magic isn’t what you want, it is difficult to make changes. The goal of the Houdini project is to “open up” that so that scripts might get additional information and control over the layout process and possibly modify how browsers lay things out.

The sense of the Houdini meeting is that there was a great enthusiasm, but also some level of skepticism on how all this can be done. But, behind the large picture, there are a number of little things, plumbing, etc, that will be done and that are useful. E.g., for the Digital Publishing community the possible control over pagination may be the biggest win: putting pagination on top of existing browser using scripting require some low-level elements in order to make a good reading experience in the browser. Such work will be accelerated if the lower-level work gets going.

It was agreed that it is important that the use cases and possible implementation experiences of e-readers, i.e., of the Digital Publishing Community, should be communicated to the Houdini project.

There is a good introduction and report by Simon Sapin, as well as a summary of Vivliostyle, and a report of the project by Peter Linss to the TAG.

IDPF’s workplaces on EPUB3.1

Epub 3.1 was released several months ago: that included bug fixes, and ISO wording + backwards compatibility. IDPF is now thinking of the next version, and this was presented, by Markus Gylling, at a recent EDUPUB Symposium in Phoenix. Various features to be added were mentioned like 3D format, migration of epub:type to the role attribute, or to HTML5. Some features may also be deprecated, like switch. However, at present, all those are just discussion items, no formal decisions or timeline yet; instead, a discussion among IDPF members should take place.

HTML Image Description Extension (longdesc) is a W3C Recommendation

The HTML5 Image Description Extension (longdesc) was published today as a Recommendation by the HTML Working Group, with the approval of the Protocols and Formats Working Group. This extension for HTML5 adds a longdesc attribute that is used to provide links to detailed descriptions of images, and is part of W3C’s work to ensure that the Open Web Platform is accessible to people with disabilities.

W3C Pointer Events is a Recommendation

The W3C Pointer Events Working Group has published a W3C Recommendation of Pointer Events. The Pointer Events specification defines a unified set of events and interfaces for device-neutral pointer input, such as a mouse, touchscreen, and pen-tablet, including capabilities for handling pointer pressure, contact geometry, and tilt; it also defines a mapping to traditional mouse events. This specification provides additional functionality not available in the related Touch Events specification; for more information on the relationship between these two specifications, see the Touch Events Community Group.

DPUB IG Telco, 2015-02-23: Identifiers, packaging, & manifests

(Meta comment: the W3C Digital Publishing IG has weekly teleconferences. The minutes of the meetings, as well as a short summaries, are available on line. However, to give a greater visibility, from now on these summaries will be published on this blog rather than just putting them on the wiki.)

The meeting mostly concentrated on some technical issues around the EPUB-WEB vision. See the minutes online for a more detailed record of the discussions.

Metadata Task force and identifiers

Some of the crucial issues related to EPUB-WEB are around identifiers, fragments, etc. It was suggested that the former Metadata Task Force would concentrate on these, identifying use cases and requirements primarily in the area of fragment identifiers. While the problem area around fragments is relatively clear, the issues on identifiers, and how that would affect EPUB-WEB are more complex. Indeed, many identifiers used out there are based on registries and are only loosely coupled with HTTP URI-s; also, many discussions in that space are happening outside this group. The way forward is probably to “reset” the Metadata Task Force, essentially by creating a new task force to make the intentions clear.

(There are some very initial thoughts on identifiers and EPUB-WEB on the epubweb wiki.)

Overview of the Web Packaging draft

The W3C Web Packaging draft was discussed to see how it would fit in the EPUB-WEB vision (as a possible alternative to ZIP). Ivan Herman has prepared some notes on the document on a wiki page.

Three main areas of attention in the draft are:

  1. Packaging itself, based on (essentially) a multipart Mime approach. The important point is that, conceptually, a package is a concatenation of HTTP responses, including HTTP Headers, for specific resources into one package resource; the package itself may also have its own HTTP Header. This approach brings the package very close to current Web technologies, and provides a rich possibility of metadata on each resource as defined in the HTTP standard. (E.g., and ePub “spine” can be implemented through these headers)
  2. Fragment identifier, as defined in the document, is based on the idea of:
    1. define a set of “candidate” parts within the package (listing a set of possible URL-s, for example)
    2. choose among the candidates using some filters (essentially content negotiations based on type or lang).
    3. use a fragment as defined for that specific media type; i.e., EPUB-WEB can rely on existing and evolving fragment identifications for different media without having to reinvent its own.
  3. “Link relations”, either in form of an HTTP Link header or an HTML <link> element. These provide a suitable entry point to an EPUB-WEB document: e.g., a landing page refers to the package (i.e., the possibly offline document).

Subsequent discussions looked at the question where such a packaging would be advantageous compared to ZIP. The document mentions facilities of streaming, tooling support, and richer per-part metadata; the feeling on the call was that the last argument is the strongest in favor of Web Packaging (although the availability of HTTP related tooling when handling the content of a package was also deemed to be important).

It is worth mentioning that Dave Cramer made a test on how the (ubiquitous) Moby Dick could look like in a package. The package can be downloaded from the Web (note that the fact that it is a “ZIP” file is just a means to make the file smaller in an email; the package itself can be looked at in a text editor.)

It was emphasized that the Digital Publishing community is in a unique position to strongly influence the evolution of Web Packaging, because the work is at its starting phase; joining the relevant Working Group, possibly acting as editor, is in a window of opportunity right now.

Overview of the Manifest draft

The W3C Manifest draft was also discussed to see its relations to EPUB-WEB. Tzviya Siegman has prepared some notes on the document on a wiki page.

The question, from the EPUB-WEB point of view, is whether that manifest format can be used as a manifest for EPUB-WEB documents.

The manifest is a JSON-LD file that can be associated to a resource via a specific <link> element. It has a number of metadata term that are currently aimed at web applications (icons with their sizes, display formats, etc.). Three specific issues were brought forward:

  1. The manifest has a notion of “scope”: a URL that represents the scope of URLs that can be navigated within context (note that web packaging also has the notion of a “scope”). It is not clear whether that functionality is enough for EPUB-WEB to help in identification
  2. Display mode: this is one of the terms defined by the manifest and may be very important for personalization
  3. Openness (or closeness) of the manifest terms: is it possible to add/define additional terms that are more important to the publishing community. It was felt that some sort of an extension structure, whereby various communities could add their own terms, would be a way forward, rather than cast a specific set of terms in concrete.

An opinionated guide to digital publishing specifications (guest blog)

(This is a reproduction, with permission, of a blog published by Liza Daly, published on the Safari’s blog, on the 22nd of January.)

The World Wide Web Consortium (W3C) is a standards organization serving the “open web” — the set of freely available specifications that underpin most of the visible internet. In the years since the W3C was founded, all modern businesses have become “web” businesses, with their own industry-specific processes, jargon, and priorities. To that end, the W3C has formed interest groups for those industries which are adjacent to the web, with a goal to promote web technologies and ensure that the web is meeting common commercial needs.

I was co-chair for the Digital Publishing Interest Group for a time, and I have first-hand exposure to their work in interviewing publishers, documenting best practices, and writing recommendations for future specifications.

Screen shot of the first table of the DPUB specification review

One of those deliverables is an intimidating table of W3C specifications and standards that were considered relevant to digital publishing. There’s a lot to digest there, and it’s unlikely that any single human is deeply familiar with all of it. I’ve provided an opinionated gloss of the most relevant or active standards, and feel free to comment if I’ve disparaged or ignored your favorite specification.

 

The audience

I’m assuming that the reader is one of the following:

  • A developer who is working in digital publishing
  • A curious non-developer who isn’t afraid of the word “normative” and acronyms that begin with ‘X’
  • A standards wonk who wants to be more familiar with publishing activity

These are the “bread and butter” of digital publishing — whether it’s commercial ebooks, academic publishing, or journals:

HTML

HTML5 is a monster of a spec, but at least it’s reflective of current browser support. You should be familiar with the basics of markup, as well as the sections on browsers and common APIs.

CSS

There’s the workhorse CSS 2.1 specification which has been around for a decade. Unfortunately for the curious but lazy, all the cool new stuff is in CSS3, and that spec is broken out into many modules. Here’s a drive-by of the most interesting or publishing-relevant ones:

  • Start with Dave Cramer’s highly readable Requirements for Latin Text Layout and Pagination (“Latin” here means Western languages, not veni, vidi, vici). Note that this is a requirements document, not a spec, which means much of what Dave recommends won’t actually work anywhere yet. Welcome to standards!
  • CSS Text Module Level 3 is the “real world” equivalent to the above. Though it’s technical a spec in-progress, most everything in here is available in modern browsers and reading systems.
  • CSS Regions Module Level 1 is a good read when you want to be angry about something. Regions can do some amazing things for advanced layout, but there’s a long and sordid history behind their implementation and deployment. There’s a lot of momentum behind getting Regions or an equivalent standard moving again, so there’s hope.

Extra credit assignments: CSS Media Queries and CSS Fonts Module Level 3. And while it’s unlikely that you’d need to actually read the SVG and MathML specs, it’s important to be familiar with those formats at a high level.

Accessibility

The simplest way to approach accessible web or ebook content is to study the semantics that are built in to HTML5. High-quality semantic markup will not only help a range of human users, it’ll aid in discovery and ranking by search engines.

Follow that up with the non-technical best practices in Web Content Accessibility Guidelines, and this overview of creating accessible interactive content.

XML

It’s not dead yet! There’s a lot of cruft in the list, but ebooks are still required to be well-formed XML documents, and academic publishing remains dominated by XML (and, sigh, PDF).

Bleeding edge

If everything above is old hat, check out the emerging specs on the Shadow DOMCSS Flexible Box Layout Module Level 1 (flexbox), and Packaging on the Web.