DPUB IG Telco, 2015-08-03: Pagination, portable documents

See minutes online for a more detailed record of the discussions. (The headers below link into the relevant sections of the minutes.)

Pagination and Prioritization

A first draft of the “Priorities for CSS from the Digital Publishing Interest Group” is now available publicly. Some other issues are still to be edited into the document. There is a face-to-face meeting of the CSS Houdini project at the end of the month in Paris, the goal is to have a more stable version of the document available by then. The plan the group accepted is to publish this also as an Interest Group Draft sometimes mid-August.

Portable Package Requirements

Tzviya Siegman has edited an initial document on the wiki. The document makes a distinction between three ‘forms’ of documents:

  1. Online
  2. Offline (cached)
  3. Portable (network independent)

The document focuses on the issues mainly related to the third alternative because that is where packaging may come in. The group agreed that this terminology may be misleading, though, mainly for “off-line” (which, for most of the people, seems to indicate network independence. Instead, the terminology to be used would rather be

  1. Online
  2. Cached
  3. Portable (offline)

The main question arising during the meeting was to understand what the main goal is for this document, and to filter the various issues through those goals. One major goal is to serve as a basis with other groups at W3C on whether work on Web packaging format (currently in Working Draft) should be pursued in the first place, i.e., whether publishing does have specific requirements that should be taken into account.

(Worth noting that, though the IETF started some work on a top level media type for archives, that initiative has been abandoned due to a lack of enough manpower.)


The group also spent time discussing administrative issues (steps to be taken for the charter renewal, preparation for the face-to-face meeting in October).

DPUB IG Telco, 2015-07-27: Portable Documents, STEM Update, Math Role

See minutes online for a more detailed record of the discussions.

Portable Documents

The IG has been talking about the abstract concept of a package. Tzviya Siegman presented a draft document outlining requirements for a portable document. The group discussed the distinction between the package as an offline state, functioning as an extension of the browser cache, and a truly portable publication that exists without a network and is persistent. The group will clarify the document to distinguish these issues and add comments about maintaining identifiers.

STEM Survey

Peter Krautzberger reports that the STEM task force is cleaning up the data from their survey and slicing it in interesting ways. They have created a spreadsheet that will contribute the TF’s note.

Math and the role attribute

Peter Krautzberger, MathJax manager, discussed concerns about the ARIA role “math” that he encountered in conversations with AT vendors. The role is primarily useful for content that is MathML (uses the <math> tag). However, most browsers do not support MathML. Role=”math” is more valuable for polyfills and converters, but the role conveys very little information. It would be helpful if ARIA exposed some of the underlying of MathML to AT. The IG will pass Peter’s discoveries on to PF.

DPUB IG Charter Renewal

If you have not already voted to renew the DPUB Charter, please do!

DPUB IG Telco, 2015-07-13: dpub-aria, fragment identifiers

See minutes online for a more detailed record of the discussions.

Digital Publishing WAI-ARIA Module

The first public working draft of the Digital Publishing WAI-ARIA Module is out, and the IG is now focusing on receiving input from the community about its viability and future. A separate blog post identifies a set of questions that needs discussion.

Meanwhile, the IG taskforce will continue working on the next public draft of the module. In particular, we will focus on

  • investigating additional terms to be added to the document
  • discussion of moving of certain terms into ARIA 1.1 core
  • coming to terms with the handling of link types (the role and rel attributes)
  • starting work on the separate AT API mappings companion document

Fragment Identifiers Status Update

Through the notion of using Service Workers as the vehicle for handling offline/caching in EPUB+WEB, we reach a point where the discussion of specialized fragment identifiers for digital publications becomes moot. The identifiers taskforce will instead be able to focus on working with the relevant media type authorities to make sure that fragment identification needs of digital publishing are met in the generic fragment identifier schemes for these media types (HTML, SVG, etc). One example of such currently unmet needs is the ability to specify a range of text in an HTML document using a fragment identifier in a URL.

From the work on the Range Finder API within the Web Annotations WG, it has become increasingly clear that the ephemeral nature of web content sometimes clashes with the needs within (certain) publishing domains to have completely persistent and reliable identifiers. For example, the range of text returned by the Range Finder may change over time as the document changes; this would not be workable in for example scholarly and legal publishing.

There is not yet a URL syntax for Range Finder, but we have it on good authority that the Web Annotations WG is working on this.


DPUB IG Telco, 2015-07-06: CSS , Houdini, & Pagination

See minutes online for a more detailed record of the discussions. (The header below links into the relevant section of the minutes.)

CSS, Houdini, and Pagination

After some introduction (by Dave Cramer) to the current work at the DPUB IG on these issues, Chris Lilley, technical director of the Interaction Domain at W3C, gave an overview of Houdini.

A feature of the web is using polyfills—so people don’t have to wait for features to be added (this term is also used for non-CSS features, but at the moment we concentrate on CSS only). This, sort-of, works but tends not to work if you use a bunch of them together. It ends up doing lots of re-implementation, which is pointless as the browser already knows how to do it. Also there are some things that are really hard to extend as it happens under the hood. The idea of Houdini (and it’s named after a magician) because it’s trying to remove some of the hand-waving. In contrast to the more declarative nature of CSS, this is more an API based work (done together with the TAG).

To make it less abstract, the plan is to expose the box tree through API-s. Pages appear there as well, which belong to the box tree rather than to the DOM tree.

The discussion that followed concentrated on how this project influences the work of the publishing community, i.e., and how the requirements of that community are better formulated for the CSS WG. The situation is that

  • There should be a “traditional” declarative layer in CSS to describe what is needed for pagination; this is used by authors who would not and should not “see” houdini at all
  • Reading system implementors, as well as experimentation with the new CSS features, would have to happen through the houdini API. The reading systems should implement polyfills

This interest group should then concentrate on the first: do we have all the features in CSS that publishing authors need in terms of pagination? The answer is (obviously:-) ‘no’, and a document is in the making that concentrates on this issue (as an outcome of the work that has been happening for the last few weeks). What would be needed is

  • an analysis between what is specified and what browsers actually do—CSS has suffered from multiple inconsistent implementations
  • an analysis of the features in CSS around pagination to decide which features are actually useful and usable and which should be left behind because it is not really good design.

An important question is how pagination as a whole is seen by many CSS implementers. There is a misconception that pagination is only important for print; as a consequence, it is usually pushed aside by browser vendors as not really important. We have to make it clear that pagination is a much more general and important concept: it includes slide shows, flashcard, cards, tiles, and it may also be an important UI features when very long texts are read. All the documents should make this fact much more visible to raise interest in pagination overall.

DPUB IG Telco, 2015-06-29: Annotations, CSS, minor white paper changes

See minutes online for a more detailed record of the discussions. (The headers below link into the relevant sections of the minutes.)

Update on the Annotation WG

Ivan Herman gave an update on the progress in the Annotation WG.

The group started with an input document provided by the Open Annotation Community Group. They had a specification for a data model, i.e., how to store annotations, what their structure is, etc. Their specification has been essentially re-published as a Working Draft, and we are continuing work on it. The biggest difference is that the CG document had all its example in the Turtle language, whereas the WG’s document includes both Turtle and JSON-LD. This is because, while the CG was strongly Linked Data oriented, i.e., Turtle was fine, the WG’s target is at Web Developers who feel more comfortable with JSON.

Next step is to transfer annotations through the network; for that, the Web Annotation Protocol has been developed. This document is in a fairly good shape, it will be published as a WD soon. It is actually a specialization of an existing W3C Recommendation, called Linked Data Protocol (LDP) which is good, because there are already implementation that can be reused, for example.

The final piece, soon to be an official Draft, is the RangeFinder API. This is a (JavaScript) API specification to find ranges of text or DOM nodes in a document, i.e., to be able to “anchor” of finding a text that may not have its own @id attribute, whose context may change, etc. This document is an API, i.e., aimed at developers; however, as part of the group’s goals we will also discuss a ‘serialization’ of that, i.e., a possibility to define a URI (more exactly, a fragment identifier) using the RangeFinder concepts.

There are also some other issues that may be discussed in the group, though the work has not really started yet. This includes a possible Client side API (i.e., an API whereby Javascript developers could handle annotations on a higher level, hiding the details of the data model), or a HTML based serialization. The latter could be used in a client to add such elements into the DOM tree; since it’s in terms of the DOM, it can be styled easily in general—which is probably something very useful. Whether that would use existing HTML elements, or whether it would require an extension to HTML is still to be discussed.

There are some overlaps between the DPUB IG’s and the Annotation WG’s membership, which is a good thing.

It was noted that the annotation WG’s further use cases could also be very useful for this group, and more regular contacts would be good.

CSS Prioritization

Dave Cramer has begun a spreadsheet listing some of the CSS features that are important for the Publishing Community and that are not fully covered by the CSS work. (There is also a textual version which will, eventually, possibly merge with the latinreq document.) Eventually, this document should be communicated with the CSS Working Group to synchronize the needs and priorities. The group (and everybody) is encouraged providing comments, adding their wish lists, etc, to this document.

A question arose around footnotes and why they are not of a higher priorities; but the problem is that there is no real consensus within the digital publishing community on what the optimal approach handling those would be, i.e., how that should reflected in CSS. There were also discussion on how to include MathML related features into the document; there are clearly missing features (like aligning equations vertically on a specific character).

There was a longer discussion on the discrepancy between browsers and reading systems on the level of control they provide to end users in terms of styling (fonts, character size, etc). CSS had the notion of user stylesheet, which pretty much disappeared from browsers, and it is also not sure that is the right level of control; further discussion is needed on how that would translate into CSS.

It was also agreed that the document should strictly separate those features that do exist in a CSS spec, but are poorly implemented, from those features that don’t exist in specification (and should). It was agreed that the table would be extended accordingly (e.g., looking at what XSL-FO has, or what systems like Antenna House implements for publishers).

Finally the issue of efficiency was also addressed (like, e.g., the blog on the subject) that is indeed a problem, although it is difficult to see what this Interest Group or indeed the CSS WG could do about it.

Small changes on the white paper

Ivan Herman also reported that the draft version of the EPUB+WEB have been updated to reflect a recent discussion on caching and resulting architectural principles; it would be good to get the relevant section reviewed as soon as possible, because the changes may have effects on the new charter, too.

Planning the future of the Digital Publishing Interest Group

(Reproduced from the “central” W3C blog.)

Time flies… it has almost been two years since the Digital Publishing Interest Group started its work. Lot has happened in those two years; the group

  • has published a report on the Annotation Use Cases (which contributed to the establishment of a separate Web Annotation Working Group);
  • has conducted a series of interviews (and published a report) with some of the main movers and shakers of metadata in the Publishing Industry;
  • is working with the WAI Protocols and Format Working Group to create a separate vocabulary describing document structures using the ARIA 1.1 technology (and thereby making an extra step towards a better accessibility of Digital Publishing);
  • maintains a document on Requirement for Latin Text Layout and Pagination, which is also used in discussion with other W3C groups on setting the priorities on specific technologies;
  • made an assessment of the various Web Accessibility Guidelines (especially the Web Content Accessibility Guidelines) from the point of view of the Publishing Industry, and plans to document which guidelines are relevant (or not) for that community and which use cases are not yet adequately covered;
  • established a reference wiki page listing the important W3C specifications for the Publishing Industry (by the way, that list is not only public, but can also be edited by anybody with a valid W3C account);
  • has conducted a series of interviews with representatives of STEM Publishing and is currently busy analyzing the results;
  • commented on a number of W3C drafts and ongoing works (in CSS, Internationalization, etc.) to get the the voice of the Publishing Industry adequately heard.

However, the most important result of these two years is the fact that the Interest Group contributed in setting up, at last, a stable and long term contacts between the Web and the Publishing Industries. Collaboration now exist with IDPF (on, e.g., the development of EPUB 3.1 or in the EDUPUB Initiative), with BISG (on, e.g., accessibility issues), and contacts with other organizations (e.g., Readium, IDAlliance, or EDItEUR) have also been established.

The group has also contributed significantly to a vision on the future of Digital Publishing, formalized by experts in IDPF and W3C and currently called “EPUB+WEB”. The vision has been described in a White Paper; its short summary can be summarized as:

[…]portable documents become fully native citizens of the Open Web Platform. In this vision, the current format- and workflow-level separation between offline/portable (EPUB) and online (Web) document publishing is diminished to zero. These are merely two dynamic manifestations of the same publication: content authored with online use as the primary mode can easily be saved by the user for offline reading in portable document form. Content authored primarily for use as a portable document can be put online, without any need for refactoring the content. […] Essential features flow seamlessly between online and offline modes; examples include cross-references, user annotations, access to online databases, as well as licensing and rights management.

But, as I said, time flies: this also means that the Interest Group has to be re-chartered. This is always a time when the group can reflect on what has gone well and what should be changed. The group has therefore also contributed to its new, draft charter. Of course, according to this draft, most of the current activities (e.g., on document structures or accessibility) will continue. However, the work will also be greatly influenced by the vision expressed in the EPUB+WEB White Paper. This vision should serve as a framework for the group’s activities. In particular, the specific technical challenges in realizing this vision are to be identified, relevant use cases should be worked out. Although the Interest Group is not chartered to define W3C Recommendations, it also plans to draft technical solutions, proof-of-concept code, etc., testing the feasibility of a particular approach. If the result of the discussions is that a specific W3C Recommendation should be established on a particular subject, the Interest Group will contribute in formalizing the relevant charter and contribute to the process toward the creation of the group.

The charter is, at this point, a public draft, not yet submitted to the W3C Management or the Advisory Committee for approval. Any comment on the charter (and, actually, on the White Paper, too!) is very welcome: the goal is to submit a final charter for approval reflecting the largest possible constituency. Issues, comments, feedbacks can be submitted through the issues’ list of the charter repository (and, respectively, through the issues’ list of the White Paper repository) or, alternatively, sent to me by email.

Two years have passed; looking forward to another two years (or more)!

Posted in Activity News | Tagged , | Comments Off on Planning the future of the Digital Publishing Interest Group

DPUB IG Telco, 2015-06-22: ARIA, STEM survey, CSS, Web Publications

See minutes online for a more detailed record of the discussions. (The headers below link into the relevant sections of the minutes.)

ARIA described-at attribute

There are some discussions around the aria-describedat attribute defined for ARIA 1.1, and the group was asked to formulate an opinion whether this attribute would be used by the publishing community. The discussion led to the conclusion that

  • the attribute is important for the publishing community and would be good for digital publishing
  • the publishing industry moves slowly, so it cannot be expected to be implemented right away; i.e., its acceptance for ARIA 1.1 should not depend on that
  • the fact (and objection) that aria-describedat may lead to an “outside” document (e.g., can require an external link when reading an offline document) should not be considered as major because of the general trend trying to make the differences between offline and online fade away

It has been agreed that Deborah Kaplan will create a more formal answer to the Protocols and Formats Working Group (the guardians of ARIA).

STEM Survey

Peter Krautzberger gave a status overview of the STEM survey evaluation. The data, extracted from the survey, has been put into an SQL database, and the task force is busy formulating “questions” by cross referencing the various tables. The results will be compiled into a W3C Note. The deficiencies of the survey were also discussed; many questions were around workflow rather than tech issues, and there were probably too many of them.

One possible goal would be to see if there are formats (akin to MathML) that could/should be standardized at or around W3C and that the STEM Publishing community would need. 3D, chemical markup formats came up, but, on a different level, standardization of the iPython (now Jupyter) format may also come to the fore (although this is still very early and not sure whether it is appropriate for W3C).


Tzviya Siegman reported on the advances for the DPUB-ARIA document; the latest draft is now ready to go for a formal First Public Working Draft. The major change is to adopt the dpub-* style for all the attribute to avoid clashes with other vocabularies (e.g., dpub-abstract) and an explicit callout to the role of IDPF in the creation and commenting of the spec.

CSS Priority

Shinyu Murakami has add some CJK specific items to the CSS priority list. During the discussion the issue of an explicit mention of Bopomofo came up, and it was agreed that this would be added.

On a more general level the need of adding comments and priorities to each CSS entry came up, and Dave Cramer agreed to start working on this.

Web Publication (packaging, etc)

Markus Gylling and Ivan Herman reported on some discussion they had, as a followup on packaging and related subject. The important point that came up is that we may need an abstract concept of a “Web Publication”, which refers to a group of resources that together can be considered to be a publication. Such a Web Publication should have a unique ID, and it is regardless of wether the publication is offline or online. When online, an HTTP GET may return a Web manifest (which then would list the constituents that clients may cache and store), when offline, it may refer to a real physical package that can be downloaded and unpacked (and may also contain a manifest). The core issue is that the primary identifier should be transparent to online/offline status. The online version may be the “canonical” one, when it “goes” offline it needs to carry with it that original Identifier to handle incoming references.

What should be done is to work out some scenarios using HTTP protocol work and some elements of a client’s functionalities.

DPUB IG Telco, 2015-06-15: Charter renewal update, CSS Needs

See minutes online for a more detailed record of the discussions.

Charter renewal update

The proposed draft charter of the Interest Group has been sent to the Advisory Committee members for an informal discussion before a formal vote. The W3C Team has already started to explicitly reach out to persons in various communities for comments; any comments are welcome through, e.g., the relevant github issues system.

CSS Needs

The past few weeks concentrated on the pagination issues and its relations to CSS. However, it is worth collecting other issues that the publishing community has v.a.v. CSS. Dave Cramer set up a first set of such issues which was discussed.

At the moment, the page contains a categorized list of requirements. It is not clear whether those requirements are fulfilled at all or not by current or planned CSS documents, whether the issue is the lack of implementations or problems (or not) in the specifications, etc. It was agreed on the meeting that an overall table will be created, on the basis of this list, showing those issues as well.

Some additional issues that were discussed are

  • cross references, and their control (e.g., reference to generated content like bullet point numbers) is vital in, e.g., scholarly publishing; a better control (and implementations) of these features is very important. Note that this raises serious accessibility issues, too.
  • additional control over the details of typography of fonts should be added, e.g., to make use of the various features that existing fonts have (different weights, optical properties, etc)
  • to be checked whether the requirements of, e.g., CJK languages are properly handled (although some of these issues are taken care of by the Internationalization Activity at W3C)
  • to be checked whether the current control over colors (reba, hsla) is adequate for publishing (although the limitations are probably on the monitor, i.e., hardware side that CSS cannot really control)

Finally, the group also has a publication called “Requirements for Latin Text Layout and Pagination” (latinreq). However, that document is (intentionally) very detailed; this dependency table can be seen as some sort of an executive summary.

Work to be continued…

DPUB IG Telco, 2015-06-08: Pagination requirements, scholarly publishing

See minutes online for a more detailed record of the discussions.

Requirements for Pagination

Dave Cramer has put together a first set of items as requirements for pagination. It is only a start. The are some broad categories (margin controls, orientation control, page display, etc.). However, there are lots of steps between this set of requirements and the final development of specs. The point of the document is to get the group’s ideas down so that we would have a discussion with the CSS Working Group, also in view of the Houdini project.

Some discussions followed with new requirements coming to the fore. Issues arising included whether we should talk about non-rectangular regions as pages (it was agreed that, at this point, we should stick to rectangular areas); pops, overlays, naming pages (e.g., a title page may look different than the rest), using templates. These should be used to expand the use cases.

A more broader issue was also brought up on the relationships between a basically “declarative” control over pages, much like the current CSS specification, and a more JavaScript API oriented view, which is closer to what the Houdini project is supposed to deliver. It has been recognized that, at present and close future, some level of programming will remain necessary, but the general goal is to try to reduce that as much as possible. The requirements should include a clear statement on that effect.

Scholarly publishing

The general question is whether scholarly publishing is properly addressed in the group’s use cases as well as, in general, its work. Tzviya Siegman gave an overview of some of the particularities of scholarly publishing, which was then completed by Bill Kasdorf and Ivan Herman. Some of the particularities cited were

  • Scholarly publications are focusing on articles, bound to journal issues or proceedings, but where each individual article is a publication by itself
  • The scholarly community has been on-line for a long time; these days the printed versions tend to disappear. The on-line versions are dominated by PDF usage; reasons include tradition, the (false?) requirements of having pixel level control, faithful reproduction of printed journals. With the disappearance of printed journals more and more journals produce articles in HTML, though the downloaded versions are still in PDF. A particular issue is the predominance of 2 column PDF, which is very bad on, say, tablets.
  • Different production workflow, with an emphasis on peer review. The internals are based on various XML specifications (e.g., JATS)
  • Publication of scientific data become integral part of publications, i.e., should be part of the packaged content, it influences fragment identifier specifications, included JavaScript for visualization, etc.
  • Publications routinely include lots of metadata, as part of the content itself, searched and crawled by many different services

Today EPUB is not really part of the picture for scholarly community. It is not clear why; there is an issue of tradition (on line presence of that community is older then the very existence of EPUB), the predominance of PDF, etc. There is also a perception issue: the perception of EPUB is that it is really for books, whereas this is not, technically, true.

It has been agreed that the use cases have to be looked at from the scholarly publishing point of view, and that outreach efforts should be made to include EPUB in their world view…

DPUB IG Telco, 2015-06-01: F2F Recap, Charter Recap, Packaging Requirements

See minutes online for a more detailed record of the discussions.

F2F Recap

Tzviya gave a summary of the F2F (held in NYC, on the 26th of May); see the separate summary of the F2F blog for further details. We also thanked the IDPF Board and Diane Kennedy (IDEAliance) for attending and contributing to the meeting.

Charter recap

Ivan Herman also gave an overview of the new charter discussion, as well as a separate discussion held with the IG chairs later in the week. The discussion of the week have been summarized in a series of github issues. Some of the issues worth mentioning are

  • make it clear that the new IG is a continuation of the old one, i.e., any work that has already been started is taken over
  • the formulation around EPUB should avoid any possible misunderstandings on the mutual role and position of EPUB and the WEB (as such a tiny change in naming has been adopted, namely to refer to EPUB+WEB instead of EPUB-WEB)
  • the list of issues in the charter should be more focussed
  • a number of stylistic issues

Ivan Herman plans to make a new version of the charter this week. The IG is encouraged to contribute in terms of comments and issues.

Packaging functional requirements

Tzviya Siegman has prepared a preliminary list that was discussed and updated. Some of the issues that were discussed were:

Size limits on packages?

This is an issue that came up during the F2F; current EPUB spec says that ZIP 64 should be used, it is unclear what the limits should be in future. It was agreed that it is not possible to put an explicit limit, instead something like “be sensible”:-)

This issue is also related to the question of video or audio sizes and whether there should be a requirements on those. This is related to the requirement of streamability of the content.

Role of packaging and identification

Prior to the meeting Ivan Herman started a discussion thread on identification and packages, i.e., on what a ‘canonical’ URI should be for a part within a package. This raised the issue whether packaging is needed in the first place, and what it means to separate the notion of a document on the Web from the Web as a whole. These should be clarified as part of the requirements.

The more or less agreement is that some sort of a packaging format is necessary to transfer a publication among people, but it may also be necessary to have the notion of some sort of a ‘virtual’ packaging on the Web to hold a publication (e.g., have a clear URI for it). This conceptual unity is at the heart of EPUB+WEB. But the issue of what the URI is for a part remains, and it is not yet clear how to solve it.

Dave Cramer also noted that there is a discussion on the Web Performance mailing list on packaging, something to consider. (See Dave’s subsequent email on this.)