DPUB IG Telco, 2015-06-08: Pagination requirements, scholarly publishing

See minutes online for a more detailed record of the discussions.

Requirements for Pagination

Dave Cramer has put together a first set of items as requirements for pagination. It is only a start. The are some broad categories (margin controls, orientation control, page display, etc.). However, there are lots of steps between this set of requirements and the final development of specs. The point of the document is to get the group’s ideas down so that we would have a discussion with the CSS Working Group, also in view of the Houdini project.

Some discussions followed with new requirements coming to the fore. Issues arising included whether we should talk about non-rectangular regions as pages (it was agreed that, at this point, we should stick to rectangular areas); pops, overlays, naming pages (e.g., a title page may look different than the rest), using templates. These should be used to expand the use cases.

A more broader issue was also brought up on the relationships between a basically “declarative” control over pages, much like the current CSS specification, and a more JavaScript API oriented view, which is closer to what the Houdini project is supposed to deliver. It has been recognized that, at present and close future, some level of programming will remain necessary, but the general goal is to try to reduce that as much as possible. The requirements should include a clear statement on that effect.

Scholarly publishing

The general question is whether scholarly publishing is properly addressed in the group’s use cases as well as, in general, its work. Tzviya Siegman gave an overview of some of the particularities of scholarly publishing, which was then completed by Bill Kasdorf and Ivan Herman. Some of the particularities cited were

  • Scholarly publications are focusing on articles, bound to journal issues or proceedings, but where each individual article is a publication by itself
  • The scholarly community has been on-line for a long time; these days the printed versions tend to disappear. The on-line versions are dominated by PDF usage; reasons include tradition, the (false?) requirements of having pixel level control, faithful reproduction of printed journals. With the disappearance of printed journals more and more journals produce articles in HTML, though the downloaded versions are still in PDF. A particular issue is the predominance of 2 column PDF, which is very bad on, say, tablets.
  • Different production workflow, with an emphasis on peer review. The internals are based on various XML specifications (e.g., JATS)
  • Publication of scientific data become integral part of publications, i.e., should be part of the packaged content, it influences fragment identifier specifications, included JavaScript for visualization, etc.
  • Publications routinely include lots of metadata, as part of the content itself, searched and crawled by many different services

Today EPUB is not really part of the picture for scholarly community. It is not clear why; there is an issue of tradition (on line presence of that community is older then the very existence of EPUB), the predominance of PDF, etc. There is also a perception issue: the perception of EPUB is that it is really for books, whereas this is not, technically, true.

It has been agreed that the use cases have to be looked at from the scholarly publishing point of view, and that outreach efforts should be made to include EPUB in their world view…

DPUB IG Telco, 2015-06-01: F2F Recap, Charter Recap, Packaging Requirements

See minutes online for a more detailed record of the discussions.

F2F Recap

Tzviya gave a summary of the F2F (held in NYC, on the 26th of May); see the separate summary of the F2F blog for further details. We also thanked the IDPF Board and Diane Kennedy (IDEAliance) for attending and contributing to the meeting.

Charter recap

Ivan Herman also gave an overview of the new charter discussion, as well as a separate discussion held with the IG chairs later in the week. The discussion of the week have been summarized in a series of github issues. Some of the issues worth mentioning are

  • make it clear that the new IG is a continuation of the old one, i.e., any work that has already been started is taken over
  • the formulation around EPUB should avoid any possible misunderstandings on the mutual role and position of EPUB and the WEB (as such a tiny change in naming has been adopted, namely to refer to EPUB+WEB instead of EPUB-WEB)
  • the list of issues in the charter should be more focussed
  • a number of stylistic issues

Ivan Herman plans to make a new version of the charter this week. The IG is encouraged to contribute in terms of comments and issues.

Packaging functional requirements

Tzviya Siegman has prepared a preliminary list that was discussed and updated. Some of the issues that were discussed were:

Size limits on packages?

This is an issue that came up during the F2F; current EPUB spec says that ZIP 64 should be used, it is unclear what the limits should be in future. It was agreed that it is not possible to put an explicit limit, instead something like “be sensible”:-)

This issue is also related to the question of video or audio sizes and whether there should be a requirements on those. This is related to the requirement of streamability of the content.

Role of packaging and identification

Prior to the meeting Ivan Herman started a discussion thread on identification and packages, i.e., on what a ‘canonical’ URI should be for a part within a package. This raised the issue whether packaging is needed in the first place, and what it means to separate the notion of a document on the Web from the Web as a whole. These should be clarified as part of the requirements.

The more or less agreement is that some sort of a packaging format is necessary to transfer a publication among people, but it may also be necessary to have the notion of some sort of a ‘virtual’ packaging on the Web to hold a publication (e.g., have a clear URI for it). This conceptual unity is at the heart of EPUB+WEB. But the issue of what the URI is for a part remains, and it is not yet clear how to solve it.

Dave Cramer also noted that there is a discussion on the Web Performance mailing list on packaging, something to consider. (See Dave’s subsequent email on this.)

DPUB IG Face-to-face, 2015-05-26

The meeting took place at the offices of the Hachette Group, in New York, USA.

See minutes online for a more detailed record of the discussions.


(See the relevant part of the minutes for further details.)

Tzviya Siegman presented some slides to start the discussion on packaging, motivated by the future vision on EPUB-WEB. The goal is to gather a detailed set of requirements on what type of packaging is needed for the purpose of Digital Publishing and EPUB-WEB. At the moment, EPUB is based on OCF, derived from ZIP; other proposals coming to the fore are based, e.g., on multipart mime.

A significant part of the discussion was on the issue whether, for EPUB-WEB, there is a real need for packaging in the first place. Some evolution in the area of Web Applications which, hitherto, considered packaging, are now looking at other directions, like the usage of Service Workers, that may make a specific packaging format unnecessary. There was a consensus that, after all, some sort of a packaging is necessary. However, the coming months should concentrate on defining functional requirements for packaging, as a first step of moving forward. A first draft of such functional requirements is available and will have to be worked on.


(See the relevant part of the minutes for further details.)

Bill Kasdorf gave an overview of the (fragment) identifiers in view of EPUB-WEB. That document already has some entries on what the functional requirements for such identifiers should be in general; this is again a work to be pursued in the months to come. Some issues/requirements that were discussed were:

  • any scheme should reuse existing fragment identification mechanisms that are defined for various media types (and registered by IETF) although it is probably necessary to list the ones that browsers really implement
  • it may be necessary to include some ways to express timing/versions; this is very important in, e.g., references in scholarly publishing
  • the question arises whether fragments should be able to express non-contiguous data (e.g., collection of pages)
  • the exact structure on how fragments are used both in offline and online setting has to be developed (and that may touch on web architectural issues)


(See the relevant part of the minutes for further details.)

Dave Cramer gave an overview of what is currently happening in the CSS Working Group that may affect the way pagination could be done in future reading systems (currently this is huge problem and hack). Two approaches are being considered:

  1. The Houdini project (a joined task force between the TAG and the CSS WG) aims at exposing the internals of the CSS rendering, the box model, etc. If a standard API is defined, then a correct pagination could be built on top of it in a standard way
  2. There is work going on with fragmentation and overflow; the combination of these can be used as a basis to build a pagination system

In effect what this means is that there are two very different ways of approaching the problem. The Houdini approach is more likely to come up with useful ways of pagination, but it is not absolutely clear at this moment when and how that project will be completed.

The group should clearly express its needs in terms of a requirement document; a first version thereof is on the IG Wiki. Note that this document should also emphasize the importance of pagination itself, not only for traditional books but, e.g., the reading experience advantages it may offer on small screen reading. (Reference to such studies have been mentioned at the F2F.)


(See the relevant part of the minutes for further details.)

The accessibility discussion, led by Deborah Kaplan and Charles LaPierre, had two main areas.

On the one hand, the current Accessibility TF is looking at WCAG to see how that document relates to digital publishing: what is relevant (or not) in WCAG for the publishing community, but also what the missing issues and features are. Most of the work has been done, and the group will now focus on the publication of an IG note (before the end of the year).

The other part of the discussion was conducted together with the representatives of BISG. BISG has a group on accessibility, led by Robin Seaman. The BISG group is on outreach, not on developing new things; the goal is to collect all relevant documents so that the book industry uses the same approaches wherever possible.

There was a discussion on the exact role of the BISG group v.a.v. W3C’s WAI Education and Outreach Working Group, whose goals are very similar; it was agreed that the BISG group should refer/use the WAI EO WG output whenever possible. The best way is that the DPUB IG’s Accessibility task force would serve as some sort of a liaison between the two entities.

Education & Outreach

(See the relevant part of the minutes for further details.)

Nick Ruffilo and Karen Myers gave a list of ideas and possible approaches on how to make the results and work of the group more visible. The goal would be to use existing possibilities (e.g., blogs both on the DPUB IG and the W3C levels) but also use the possibilities of press releases, executive summaries, etc. The main goal is to develop relationships with media outlets (e.g., PW, DBW) to regularly inform the larger community of the results of the group, give one or several webinars. It has been agreed that Nick would develop a DPUB PR plan further and try to get some text into one of the major media outlets.

Rechartering of the Interest Group

(See the relevant part of the minutes for further details.)

The current charter of the group expires in September; work has begun on a draft charter. The main shift of emphasis in the new charter is to explicitly refer to the EPUB-WEB “vision” as a guiding principle for future work (this change has already occurred in the current IG, and the discussions of the F2F on packaging and identifiers are clearly related to that). Another important change is that the group would also plan to prepare ‘prototype’ specifications, explore technical avenues more deeply; these can then handed over to other Working Groups or can be served as a basis of the creation of new, dedicated W3C or IDPF groups. If new groups are created then the (new) IG would actively contribute in the chartering of those groups.

Details of the charters were discussed and some amendments and/or clarification were proposed (clarify the exact relationships to EPUB also in view of considering other, non-book documents; that the current task forces on, e.g., accessibility would be continued in the new group). There were also discussions on whether the term ‘EPUB-WEB’ is the good one or whether it is misleading and should be changed (though no real alternative was found). Finally, there were some clarifications on how the the work will be cooperatively pursued by W3C and IDPF (the way the EDUPUB alliance works on its own goals may be a good pattern here).

The goal is to have a final version of the charter by the end of June, to be then submitted to W3C members for vote.

DPUB IG Telco, 2015-05-18, Review of EPUB 3.1 Charter

See minutes online for a more detailed record of the discussions.

EPUB 3.1 Charter

The EPUB 3.1 Charter (https://goo.gl/TNvYEX) was shared with the IG. Behind EPUB-WEB is collaboration between W3C and IDPF. IDPF is working on chartering EPUB 3.1, which will continue to be backward compatible, but includes many notions with repercussions on IG work. Some examples of items under consideration for EPUB 3.1 are support of HTML serialization and exploration of server-side manifestation of the package. As EPUB WG considers these possibilities, they may look to IG for collaboration and consultation.
The DPUB IG Spring F2F meeting will be on 26 May. There will be no DPUB IG Telco on 25 May. There was discussion of the detailed agenda for the F2F

DPUB IG Telco, 2015-05-11: updates on Accessibility, DPUB ARIA, Packaging

See minutes online for a more detailed record of the discussions.

Accessibility TF update

The A11y TF put together detailed spreadsheet of what aspects of WCAG are relevant or not relevant to DPUB. This also helped us understand what aspects of publishing are not addressed by existing standards. Our plan is to pull the spreadsheet into a note, to be published, eventually, by the IG. The skeleton of the note is already available, should be completed shortly.

There are discussions with the Accessibility Working Group at BISG; the plan is that, at the upcoming DPUB IG F2F, we would sit down with them to see how we can cooperate. We would want to assist in a communication between BISG and W3C on that matter. If we see gaps in the publishers’ knowledge, we let BISG know; if BISG sees holes in the standards, we would let the relevant W3C people know.


We did a consensus call on the DPUB ARIA Role draft, and the DPUB IG agreed to publish it. However, some issues surfaced in the PF WG (the guardians of ARIA); indeed, this is the first time an extension to ARIA is being defined and the limits and approaches seem to be undefined yet. There is tension within the ARIA group on whether the @role attribute is bound to Assistive Technology usage, or whether it can be used more generally for structural information. If the former, that would drastically reduce the number of @role attributes in the DPUB ARIA module and would make it unusable for the purpose of structural semantics.

This issue must be sorted out by the PF Working group and, until then, the DPUB ARIA publication is put on hold. One alternative may be that we would have to move away from ARIA towards a separate, targeted extension of HTML5, possibly identifying a number of values that would also have their counterpart in the new version of the core ARIA spec (ARIA 1.1). Hopefully this will be sorted out soon.

Packaging Update

There has been some discussion with the chair of the TAG (co-author of the packaging specification), and it seems there are discussions now within the community on whether packaging is something browser vendors really want in the first place. For some of the use cases (web applications access) a combination of service workers with manifests would seem to work, too. One of the goals of the discussion at the F2F is to clarify the requirements and needs of the publishing community in this respect and forward it to the TAG and the Web Application Working Group asap.

DPUB IG Telco, 2015-04-27: STEM Survey, Fragment ID-s, footnotes

See minutes online for a more detailed record of the discussions.

STEM Survey

The STEM Task Force has conducted a survey among experts on their experience in publishing STEM content. The Survey is now closed; there was a first glance at the results during the call (all this is preliminary, a more systematic evaluation is still to be done).

There were 34 responses (out of 93 asked). Overall, the results are fine, although (at first glance) nothing overly exciting. Most of the responders were “end users”, i.e., researchers who publish in the area. The answers also highlighted some issues with the survey itself, e.g., the questions may not have been as clear as necessary. The bias or the responders was clearly towards CS and Mathematics.

There was a clear tendency towards making the content reusable and using the Web as a primary platform. Beyond MathML, no one additional STEM format came to the fore as major trend (CML was mentioned several times). As for delivery format, HTML was ahead of PDF as a primary format, but publishing in PDF is almost always present as a secondary format (without enthusiasm, just out of necessity).

The next goal is to have a more systematic evaluation of the result with the goal of summarizing in a note. The raw results of the survey will also be put into public, although it has to be strictly anonymized first.

Fragment ID-s

There was already a discussion on identifiers a few weeks ago, that referred to the selectors of the open annotation model as a possible approach for defining fragment identifiers in EPUB-WEB. That meeting was followed by and email discussion with the Web Annotation Working Group (that works on the model), to see if the selectors could be transformed into bona fide fragment identifiers.

The problem that arose during the discussion is the way fragment identifiers are defined (in general). Indeed, fragment identifiers are never defined in isolation; they are defined for a specific media type and registered as such by IANA. In this sense, serializing the selector model in general is not a real option. However, it is possible to do so for specific media types; in the case of EPUB-WEB, HTML is an obvious target.

It has been emphasized that if such a serialization is done, it should be done together with the Web Annotation Working Group to avoid discrepancies. That Working Group has already touched upon this issue (in the context of rangefinder) in their recent F2F meeting.

In the context of EPUB-WEB, CFI has to be evaluated first, though; after all, CFI defines, essentially, a fragment ID for EPUB3 already. Finding out whether CFI works (or not) for EPUB-WEB, if yes, how, if not why, is important before engaging into anything else. This is clearly a topic for the upcoming F2F meeting of the Interest Group.

HTML5 and footnotes

There was a recent email discussion on the possibility of defining a footnote element in HTML5. This was followed by some separate discussion with the experts of the HTML WG. As of now, the situation is that HTML will not have a formal proposal for such an element, so the DPUB IG should pursue ARIA Role approach for defining footnotes. Maybe it will be taken up in the future, though.

DPUB IG Telco, 2015-04-20: Consensus call for DPUB ARIA Module, F2F Agenda

See the minutes online for a more detailed record of the discussions.

Consensus call for DPUB ARIA Module

After DPUB-ARIA task force met last week, the DPUB IG put out a call for consensus to put the Digital Publishing Module of ARIA to FPWD. The task force did not resolve all outstanding issues but will attempt to resolve them in a call with PF next week and move forward with FPWD. All present voted for the publication. An email will be sent formally requesting consensus.

Finalize Agenda for May F2F

Agenda for May F2F  will be closed at end of day today.

DPUB IG Telco, 2015-04-13: Aria Module, Packaging, Identifiers

See the minutes online for a more detailed record of the discussions.

Discussion on the ARIA module

The DPUB ARIA module should get a publication approval (to publish the document as a first public draft) from both the DPUB IG and Protocols and Format (PF) Working Group. However, it seems that, during their last teleconference, the PF Working Group has raised some issues that may have to be solved before this first publications. The issues they have are:

  • some participants would prefer to have prefixes for the terms for modules, such as the publishing one; essentially they end up in different domains
  • there were also some concerns about specific terms that may clash with similar terms elsewhere in ARIA

The IG discussed these issues; it seems that the Digital Publishing community at large would be very much against the usage of extra prefixes to the role attribute terms; some publishers may decide to completely ignore the terms altogether if that was the case.

The issue was discussed and was agreed that an email discussion should follow to flesh out the issues before a telco planned with the PF Working Group in about two weeks

Packaging examples

A new Wiki page has been created to list the functionality of the current packaging used in EPUB: what additional information, files, etc, are defined and used. On longer terms, the use cases on packaging should be used to identify possible differences between the current packaging format and the Web Packaging format as worked on elsewhere at W3C. This is an ongoing work.


There were some discussion on the mailing list and this led to a refresh of the corresponding wiki page of the task force. An interesting approach is provided by the so called “selectors” or the Open Annotation Model (which is currently a Working Draft): this provides a general structure to describe ranges, exact positions, etc, in a very flexible manner.

The problem with that approach, however, is that the selectors are not expressed in form of a URI. Indeed, the example in the document:

 "selector": {
    "@id": "http://example.org/selector1",
    "@type": "oa:DataPositionSelector",
     "start": 4096,
     "end": 4104

is a structure describing an anchor point in a document, but it is not a fragment identifier that can be part of a URI. Although it may be possible to translate that into a fragment, i.e., something like:


The ideal would be if there were some sort of a standard to make this mapping if possible. It was agreed that the question should be asked to the editors of the annotation document to find out whether there is, or has been, work on this, or whether there are fundamental issues that makes this type of mapping impossible or undesirable.

DPUB IG Telco, 2015-03-30: Structural Semantics, Packaging Use Cases

See the minutes online for a more detailed record of the discussions.

DPUB-ARIA a.k.a. Structural Semantics update

For background of this work (quoting from the abstract of the document):

Accessibility of web content requires semantic information about widgets, structures, and behaviors, in order to allow assistive technologies to convey appropriate information to persons with disabilities. This specification defines a WAI-ARIA module encompassing an ontology of roles, states, and properties specific to the digital publishing industry. These semantics are designed to allow an author to convey digital book user interface behaviors and structural information to assistive technologies and to enable semantic navigation, styling, and interactive features used by digital book readers. It is expected this will complement HTML5.

The ARIA DPUB Module has undergone significant changes, and the first public Working Draft (published by the W3C PF WG) is planned to be published mid-April. Comments, issues are very welcome, e.g., through emails or github issues.

The big change, compared to previous versions, is that the definitions of terms have been tightened up significantly. They used to be very book-centric, but they are now more general, meaning that the same terms can be used more broadly on the Web. Lots of details had to be handled (and there is still work to do) to align with terms used elsewhere in ARIA. The superclass roles, related aria attributes, examples for all terms, etc., have all been added.

Subsequent discussions concentrated mostly on how to make this document more understandable to Digital Publishing experts who are not familiar with ARIA. It has been agreed that more examples should be added using the aria attributes, too, that a few words should be added on how this work relates to the current practice of epub:type, the work happening within the EDUPUB initiative, etc.

Packaging use cases

A first batch of packaging use cases has been published on the group’s Wiki pages.

These cases don’t address the basics but they are packaging requirements required for a publishing workflow. Some examples:

Some other issues and possible directions were also discussed at the call (necessity—or not—to add DRM related features, finding metadata, etc.) The goal is to develop these and other use cases and provide these as input for the ongoing Web Packaging Work at W3C (which will have an influence on EPUB-WEB).


Due to Eastern Monday, which is a holiday in most of Europe, there will be no meeting on the 6th of April.

DPUB IG Telco, 2015-03-23: CSS fragment draft review, Identifiers, latinreq update

See the minutes online for a more detailed record of the discussions.

Review on CSS Fragments

The Interest Group was asked to review the CSS Fragmentation Module Level 3 Draft, published in January. The overall view of the specification is that it will provide, when finalized, will take care of a lot of problems on how to handle page-break, column-break, fragment-break, etc., combined with various situations like float.

There were also a number of areas that were identified as practical problems and that are may not have been (yet) addressed (or addressed adequately) by the draft. One notable area is with placed elements like images, video, or movable blocks: how to you handle those (e.g., by possibly reducing their size a bit) to still keep the page breaks acceptable, etc. In general, dynamic reflow when handling pagination is not (yet) addressed.

The reviewers in the DPUB IG have collected their detailed comments in one or several mails that have been posted to the discussion mailing list of the CSS WG; the detailed discussions will be continued there.


The IG has set up a new task force, on Identifiers whose goal is to consider the technical challenges, in relations to EPUB-WEB, on defining identifiers. Some introductory materials has been prepared, and has been added to the Wiki page, with some background materials and a rough proposed strategy.

The discussion concentrated on what the detailed goals of the Task Force will be. The feeling was that the group should, primarily, formulate what kinds of requirements the Publishing Community in general, and EPUB-WEB in particular, would have v.a.v. identifiers.

When considering packages, there are two different aspects: how to get to a specific content file within a package, and then how to get to a final content within the content. The latter should reuse, whenever possible, existing media fragment definitions, e.g., as registered by an xpointer scheme and/or by IETF; the former requires further work (and is exemplified today by EPUB’s CFI or the Fragment specification of the Web Packaging Draft). However, it should be emphasized that, on long term, if one creates a URI, that should look the same no matter what the publication is (archive or online) which is an important thing to remember moving forward.

The discussion will continue on the mailing list…


The status of Latinreq (a document trying to document how page layout should be done in Western languages): it is considered to be a living document, with new issues and aspects added to it. Contribution on additional features to be added are very welcome.

During the meeting additional feature requests were mentioned: e.g., how to publish fitting monolithic content into a fixed-size page, placing captions relative to images, handling tables (e.g., diagonal headers for tables)


The group also handled administrative issues like open action items and plans for upcoming face-to-face meetings.

Posted in Activity News, Meeting reports | Comments Off on DPUB IG Telco, 2015-03-23: CSS fragment draft review, Identifiers, latinreq update