Planning the future of the Digital Publishing Interest Group

(Reproduced from the “central” W3C blog.)

Time flies… it has almost been two years since the Digital Publishing Interest Group started its work. Lot has happened in those two years; the group

  • has published a report on the Annotation Use Cases (which contributed to the establishment of a separate Web Annotation Working Group);
  • has conducted a series of interviews (and published a report) with some of the main movers and shakers of metadata in the Publishing Industry;
  • is working with the WAI Protocols and Format Working Group to create a separate vocabulary describing document structures using the ARIA 1.1 technology (and thereby making an extra step towards a better accessibility of Digital Publishing);
  • maintains a document on Requirement for Latin Text Layout and Pagination, which is also used in discussion with other W3C groups on setting the priorities on specific technologies;
  • made an assessment of the various Web Accessibility Guidelines (especially the Web Content Accessibility Guidelines) from the point of view of the Publishing Industry, and plans to document which guidelines are relevant (or not) for that community and which use cases are not yet adequately covered;
  • established a reference wiki page listing the important W3C specifications for the Publishing Industry (by the way, that list is not only public, but can also be edited by anybody with a valid W3C account);
  • has conducted a series of interviews with representatives of STEM Publishing and is currently busy analyzing the results;
  • commented on a number of W3C drafts and ongoing works (in CSS, Internationalization, etc.) to get the the voice of the Publishing Industry adequately heard.

However, the most important result of these two years is the fact that the Interest Group contributed in setting up, at last, a stable and long term contacts between the Web and the Publishing Industries. Collaboration now exist with IDPF (on, e.g., the development of EPUB 3.1 or in the EDUPUB Initiative), with BISG (on, e.g., accessibility issues), and contacts with other organizations (e.g., Readium, IDAlliance, or EDItEUR) have also been established.

The group has also contributed significantly to a vision on the future of Digital Publishing, formalized by experts in IDPF and W3C and currently called “EPUB+WEB”. The vision has been described in a White Paper; its short summary can be summarized as:

[…]portable documents become fully native citizens of the Open Web Platform. In this vision, the current format- and workflow-level separation between offline/portable (EPUB) and online (Web) document publishing is diminished to zero. These are merely two dynamic manifestations of the same publication: content authored with online use as the primary mode can easily be saved by the user for offline reading in portable document form. Content authored primarily for use as a portable document can be put online, without any need for refactoring the content. […] Essential features flow seamlessly between online and offline modes; examples include cross-references, user annotations, access to online databases, as well as licensing and rights management.

But, as I said, time flies: this also means that the Interest Group has to be re-chartered. This is always a time when the group can reflect on what has gone well and what should be changed. The group has therefore also contributed to its new, draft charter. Of course, according to this draft, most of the current activities (e.g., on document structures or accessibility) will continue. However, the work will also be greatly influenced by the vision expressed in the EPUB+WEB White Paper. This vision should serve as a framework for the group’s activities. In particular, the specific technical challenges in realizing this vision are to be identified, relevant use cases should be worked out. Although the Interest Group is not chartered to define W3C Recommendations, it also plans to draft technical solutions, proof-of-concept code, etc., testing the feasibility of a particular approach. If the result of the discussions is that a specific W3C Recommendation should be established on a particular subject, the Interest Group will contribute in formalizing the relevant charter and contribute to the process toward the creation of the group.

The charter is, at this point, a public draft, not yet submitted to the W3C Management or the Advisory Committee for approval. Any comment on the charter (and, actually, on the White Paper, too!) is very welcome: the goal is to submit a final charter for approval reflecting the largest possible constituency. Issues, comments, feedbacks can be submitted through the issues’ list of the charter repository (and, respectively, through the issues’ list of the White Paper repository) or, alternatively, sent to me by email.

Two years have passed; looking forward to another two years (or more)!

Posted in Activity News | Tagged , | Comments Off on Planning the future of the Digital Publishing Interest Group

DPUB IG Telco, 2015-06-22: ARIA, STEM survey, CSS, Web Publications

See minutes online for a more detailed record of the discussions. (The headers below link into the relevant sections of the minutes.)

ARIA described-at attribute

There are some discussions around the aria-describedat attribute defined for ARIA 1.1, and the group was asked to formulate an opinion whether this attribute would be used by the publishing community. The discussion led to the conclusion that

  • the attribute is important for the publishing community and would be good for digital publishing
  • the publishing industry moves slowly, so it cannot be expected to be implemented right away; i.e., its acceptance for ARIA 1.1 should not depend on that
  • the fact (and objection) that aria-describedat may lead to an “outside” document (e.g., can require an external link when reading an offline document) should not be considered as major because of the general trend trying to make the differences between offline and online fade away

It has been agreed that Deborah Kaplan will create a more formal answer to the Protocols and Formats Working Group (the guardians of ARIA).

STEM Survey

Peter Krautzberger gave a status overview of the STEM survey evaluation. The data, extracted from the survey, has been put into an SQL database, and the task force is busy formulating “questions” by cross referencing the various tables. The results will be compiled into a W3C Note. The deficiencies of the survey were also discussed; many questions were around workflow rather than tech issues, and there were probably too many of them.

One possible goal would be to see if there are formats (akin to MathML) that could/should be standardized at or around W3C and that the STEM Publishing community would need. 3D, chemical markup formats came up, but, on a different level, standardization of the iPython (now Jupyter) format may also come to the fore (although this is still very early and not sure whether it is appropriate for W3C).

DPUB-ARIA

Tzviya Siegman reported on the advances for the DPUB-ARIA document; the latest draft is now ready to go for a formal First Public Working Draft. The major change is to adopt the dpub-* style for all the attribute to avoid clashes with other vocabularies (e.g., dpub-abstract) and an explicit callout to the role of IDPF in the creation and commenting of the spec.

CSS Priority

Shinyu Murakami has add some CJK specific items to the CSS priority list. During the discussion the issue of an explicit mention of Bopomofo came up, and it was agreed that this would be added.

On a more general level the need of adding comments and priorities to each CSS entry came up, and Dave Cramer agreed to start working on this.

Web Publication (packaging, etc)

Markus Gylling and Ivan Herman reported on some discussion they had, as a followup on packaging and related subject. The important point that came up is that we may need an abstract concept of a “Web Publication”, which refers to a group of resources that together can be considered to be a publication. Such a Web Publication should have a unique ID, and it is regardless of wether the publication is offline or online. When online, an HTTP GET may return a Web manifest (which then would list the constituents that clients may cache and store), when offline, it may refer to a real physical package that can be downloaded and unpacked (and may also contain a manifest). The core issue is that the primary identifier should be transparent to online/offline status. The online version may be the “canonical” one, when it “goes” offline it needs to carry with it that original Identifier to handle incoming references.

What should be done is to work out some scenarios using HTTP protocol work and some elements of a client’s functionalities.

DPUB IG Telco, 2015-06-15: Charter renewal update, CSS Needs

See minutes online for a more detailed record of the discussions.

Charter renewal update

The proposed draft charter of the Interest Group has been sent to the Advisory Committee members for an informal discussion before a formal vote. The W3C Team has already started to explicitly reach out to persons in various communities for comments; any comments are welcome through, e.g., the relevant github issues system.

CSS Needs

The past few weeks concentrated on the pagination issues and its relations to CSS. However, it is worth collecting other issues that the publishing community has v.a.v. CSS. Dave Cramer set up a first set of such issues which was discussed.

At the moment, the page contains a categorized list of requirements. It is not clear whether those requirements are fulfilled at all or not by current or planned CSS documents, whether the issue is the lack of implementations or problems (or not) in the specifications, etc. It was agreed on the meeting that an overall table will be created, on the basis of this list, showing those issues as well.

Some additional issues that were discussed are

  • cross references, and their control (e.g., reference to generated content like bullet point numbers) is vital in, e.g., scholarly publishing; a better control (and implementations) of these features is very important. Note that this raises serious accessibility issues, too.
  • additional control over the details of typography of fonts should be added, e.g., to make use of the various features that existing fonts have (different weights, optical properties, etc)
  • to be checked whether the requirements of, e.g., CJK languages are properly handled (although some of these issues are taken care of by the Internationalization Activity at W3C)
  • to be checked whether the current control over colors (reba, hsla) is adequate for publishing (although the limitations are probably on the monitor, i.e., hardware side that CSS cannot really control)

Finally, the group also has a publication called “Requirements for Latin Text Layout and Pagination” (latinreq). However, that document is (intentionally) very detailed; this dependency table can be seen as some sort of an executive summary.

Work to be continued…

DPUB IG Telco, 2015-06-08: Pagination requirements, scholarly publishing

See minutes online for a more detailed record of the discussions.

Requirements for Pagination

Dave Cramer has put together a first set of items as requirements for pagination. It is only a start. The are some broad categories (margin controls, orientation control, page display, etc.). However, there are lots of steps between this set of requirements and the final development of specs. The point of the document is to get the group’s ideas down so that we would have a discussion with the CSS Working Group, also in view of the Houdini project.

Some discussions followed with new requirements coming to the fore. Issues arising included whether we should talk about non-rectangular regions as pages (it was agreed that, at this point, we should stick to rectangular areas); pops, overlays, naming pages (e.g., a title page may look different than the rest), using templates. These should be used to expand the use cases.

A more broader issue was also brought up on the relationships between a basically “declarative” control over pages, much like the current CSS specification, and a more JavaScript API oriented view, which is closer to what the Houdini project is supposed to deliver. It has been recognized that, at present and close future, some level of programming will remain necessary, but the general goal is to try to reduce that as much as possible. The requirements should include a clear statement on that effect.

Scholarly publishing

The general question is whether scholarly publishing is properly addressed in the group’s use cases as well as, in general, its work. Tzviya Siegman gave an overview of some of the particularities of scholarly publishing, which was then completed by Bill Kasdorf and Ivan Herman. Some of the particularities cited were

  • Scholarly publications are focusing on articles, bound to journal issues or proceedings, but where each individual article is a publication by itself
  • The scholarly community has been on-line for a long time; these days the printed versions tend to disappear. The on-line versions are dominated by PDF usage; reasons include tradition, the (false?) requirements of having pixel level control, faithful reproduction of printed journals. With the disappearance of printed journals more and more journals produce articles in HTML, though the downloaded versions are still in PDF. A particular issue is the predominance of 2 column PDF, which is very bad on, say, tablets.
  • Different production workflow, with an emphasis on peer review. The internals are based on various XML specifications (e.g., JATS)
  • Publication of scientific data become integral part of publications, i.e., should be part of the packaged content, it influences fragment identifier specifications, included JavaScript for visualization, etc.
  • Publications routinely include lots of metadata, as part of the content itself, searched and crawled by many different services

Today EPUB is not really part of the picture for scholarly community. It is not clear why; there is an issue of tradition (on line presence of that community is older then the very existence of EPUB), the predominance of PDF, etc. There is also a perception issue: the perception of EPUB is that it is really for books, whereas this is not, technically, true.

It has been agreed that the use cases have to be looked at from the scholarly publishing point of view, and that outreach efforts should be made to include EPUB in their world view…

DPUB IG Telco, 2015-06-01: F2F Recap, Charter Recap, Packaging Requirements

See minutes online for a more detailed record of the discussions.

F2F Recap

Tzviya gave a summary of the F2F (held in NYC, on the 26th of May); see the separate summary of the F2F blog for further details. We also thanked the IDPF Board and Diane Kennedy (IDEAliance) for attending and contributing to the meeting.

Charter recap

Ivan Herman also gave an overview of the new charter discussion, as well as a separate discussion held with the IG chairs later in the week. The discussion of the week have been summarized in a series of github issues. Some of the issues worth mentioning are

  • make it clear that the new IG is a continuation of the old one, i.e., any work that has already been started is taken over
  • the formulation around EPUB should avoid any possible misunderstandings on the mutual role and position of EPUB and the WEB (as such a tiny change in naming has been adopted, namely to refer to EPUB+WEB instead of EPUB-WEB)
  • the list of issues in the charter should be more focussed
  • a number of stylistic issues

Ivan Herman plans to make a new version of the charter this week. The IG is encouraged to contribute in terms of comments and issues.

Packaging functional requirements

Tzviya Siegman has prepared a preliminary list that was discussed and updated. Some of the issues that were discussed were:

Size limits on packages?

This is an issue that came up during the F2F; current EPUB spec says that ZIP 64 should be used, it is unclear what the limits should be in future. It was agreed that it is not possible to put an explicit limit, instead something like “be sensible”:-)

This issue is also related to the question of video or audio sizes and whether there should be a requirements on those. This is related to the requirement of streamability of the content.

Role of packaging and identification

Prior to the meeting Ivan Herman started a discussion thread on identification and packages, i.e., on what a ‘canonical’ URI should be for a part within a package. This raised the issue whether packaging is needed in the first place, and what it means to separate the notion of a document on the Web from the Web as a whole. These should be clarified as part of the requirements.

The more or less agreement is that some sort of a packaging format is necessary to transfer a publication among people, but it may also be necessary to have the notion of some sort of a ‘virtual’ packaging on the Web to hold a publication (e.g., have a clear URI for it). This conceptual unity is at the heart of EPUB+WEB. But the issue of what the URI is for a part remains, and it is not yet clear how to solve it.

Dave Cramer also noted that there is a discussion on the Web Performance mailing list on packaging, something to consider. (See Dave’s subsequent email on this.)

DPUB IG Face-to-face, 2015-05-26

The meeting took place at the offices of the Hachette Group, in New York, USA.

See minutes online for a more detailed record of the discussions.

Packaging

(See the relevant part of the minutes for further details.)

Tzviya Siegman presented some slides to start the discussion on packaging, motivated by the future vision on EPUB-WEB. The goal is to gather a detailed set of requirements on what type of packaging is needed for the purpose of Digital Publishing and EPUB-WEB. At the moment, EPUB is based on OCF, derived from ZIP; other proposals coming to the fore are based, e.g., on multipart mime.

A significant part of the discussion was on the issue whether, for EPUB-WEB, there is a real need for packaging in the first place. Some evolution in the area of Web Applications which, hitherto, considered packaging, are now looking at other directions, like the usage of Service Workers, that may make a specific packaging format unnecessary. There was a consensus that, after all, some sort of a packaging is necessary. However, the coming months should concentrate on defining functional requirements for packaging, as a first step of moving forward. A first draft of such functional requirements is available and will have to be worked on.

Identifiers

(See the relevant part of the minutes for further details.)

Bill Kasdorf gave an overview of the (fragment) identifiers in view of EPUB-WEB. That document already has some entries on what the functional requirements for such identifiers should be in general; this is again a work to be pursued in the months to come. Some issues/requirements that were discussed were:

  • any scheme should reuse existing fragment identification mechanisms that are defined for various media types (and registered by IETF) although it is probably necessary to list the ones that browsers really implement
  • it may be necessary to include some ways to express timing/versions; this is very important in, e.g., references in scholarly publishing
  • the question arises whether fragments should be able to express non-contiguous data (e.g., collection of pages)
  • the exact structure on how fragments are used both in offline and online setting has to be developed (and that may touch on web architectural issues)

Pagination

(See the relevant part of the minutes for further details.)

Dave Cramer gave an overview of what is currently happening in the CSS Working Group that may affect the way pagination could be done in future reading systems (currently this is huge problem and hack). Two approaches are being considered:

  1. The Houdini project (a joined task force between the TAG and the CSS WG) aims at exposing the internals of the CSS rendering, the box model, etc. If a standard API is defined, then a correct pagination could be built on top of it in a standard way
  2. There is work going on with fragmentation and overflow; the combination of these can be used as a basis to build a pagination system

In effect what this means is that there are two very different ways of approaching the problem. The Houdini approach is more likely to come up with useful ways of pagination, but it is not absolutely clear at this moment when and how that project will be completed.

The group should clearly express its needs in terms of a requirement document; a first version thereof is on the IG Wiki. Note that this document should also emphasize the importance of pagination itself, not only for traditional books but, e.g., the reading experience advantages it may offer on small screen reading. (Reference to such studies have been mentioned at the F2F.)

Accessibility

(See the relevant part of the minutes for further details.)

The accessibility discussion, led by Deborah Kaplan and Charles LaPierre, had two main areas.

On the one hand, the current Accessibility TF is looking at WCAG to see how that document relates to digital publishing: what is relevant (or not) in WCAG for the publishing community, but also what the missing issues and features are. Most of the work has been done, and the group will now focus on the publication of an IG note (before the end of the year).

The other part of the discussion was conducted together with the representatives of BISG. BISG has a group on accessibility, led by Robin Seaman. The BISG group is on outreach, not on developing new things; the goal is to collect all relevant documents so that the book industry uses the same approaches wherever possible.

There was a discussion on the exact role of the BISG group v.a.v. W3C’s WAI Education and Outreach Working Group, whose goals are very similar; it was agreed that the BISG group should refer/use the WAI EO WG output whenever possible. The best way is that the DPUB IG’s Accessibility task force would serve as some sort of a liaison between the two entities.

Education & Outreach

(See the relevant part of the minutes for further details.)

Nick Ruffilo and Karen Myers gave a list of ideas and possible approaches on how to make the results and work of the group more visible. The goal would be to use existing possibilities (e.g., blogs both on the DPUB IG and the W3C levels) but also use the possibilities of press releases, executive summaries, etc. The main goal is to develop relationships with media outlets (e.g., PW, DBW) to regularly inform the larger community of the results of the group, give one or several webinars. It has been agreed that Nick would develop a DPUB PR plan further and try to get some text into one of the major media outlets.

Rechartering of the Interest Group

(See the relevant part of the minutes for further details.)

The current charter of the group expires in September; work has begun on a draft charter. The main shift of emphasis in the new charter is to explicitly refer to the EPUB-WEB “vision” as a guiding principle for future work (this change has already occurred in the current IG, and the discussions of the F2F on packaging and identifiers are clearly related to that). Another important change is that the group would also plan to prepare ‘prototype’ specifications, explore technical avenues more deeply; these can then handed over to other Working Groups or can be served as a basis of the creation of new, dedicated W3C or IDPF groups. If new groups are created then the (new) IG would actively contribute in the chartering of those groups.

Details of the charters were discussed and some amendments and/or clarification were proposed (clarify the exact relationships to EPUB also in view of considering other, non-book documents; that the current task forces on, e.g., accessibility would be continued in the new group). There were also discussions on whether the term ‘EPUB-WEB’ is the good one or whether it is misleading and should be changed (though no real alternative was found). Finally, there were some clarifications on how the the work will be cooperatively pursued by W3C and IDPF (the way the EDUPUB alliance works on its own goals may be a good pattern here).

The goal is to have a final version of the charter by the end of June, to be then submitted to W3C members for vote.

DPUB IG Telco, 2015-05-18, Review of EPUB 3.1 Charter

See minutes online for a more detailed record of the discussions.

EPUB 3.1 Charter

The EPUB 3.1 Charter (https://goo.gl/TNvYEX) was shared with the IG. Behind EPUB-WEB is collaboration between W3C and IDPF. IDPF is working on chartering EPUB 3.1, which will continue to be backward compatible, but includes many notions with repercussions on IG work. Some examples of items under consideration for EPUB 3.1 are support of HTML serialization and exploration of server-side manifestation of the package. As EPUB WG considers these possibilities, they may look to IG for collaboration and consultation.
The DPUB IG Spring F2F meeting will be on 26 May. There will be no DPUB IG Telco on 25 May. There was discussion of the detailed agenda for the F2F

DPUB IG Telco, 2015-05-11: updates on Accessibility, DPUB ARIA, Packaging

See minutes online for a more detailed record of the discussions.

Accessibility TF update

The A11y TF put together detailed spreadsheet of what aspects of WCAG are relevant or not relevant to DPUB. This also helped us understand what aspects of publishing are not addressed by existing standards. Our plan is to pull the spreadsheet into a note, to be published, eventually, by the IG. The skeleton of the note is already available, should be completed shortly.

There are discussions with the Accessibility Working Group at BISG; the plan is that, at the upcoming DPUB IG F2F, we would sit down with them to see how we can cooperate. We would want to assist in a communication between BISG and W3C on that matter. If we see gaps in the publishers’ knowledge, we let BISG know; if BISG sees holes in the standards, we would let the relevant W3C people know.

DPUB ARIA Update

We did a consensus call on the DPUB ARIA Role draft, and the DPUB IG agreed to publish it. However, some issues surfaced in the PF WG (the guardians of ARIA); indeed, this is the first time an extension to ARIA is being defined and the limits and approaches seem to be undefined yet. There is tension within the ARIA group on whether the @role attribute is bound to Assistive Technology usage, or whether it can be used more generally for structural information. If the former, that would drastically reduce the number of @role attributes in the DPUB ARIA module and would make it unusable for the purpose of structural semantics.

This issue must be sorted out by the PF Working group and, until then, the DPUB ARIA publication is put on hold. One alternative may be that we would have to move away from ARIA towards a separate, targeted extension of HTML5, possibly identifying a number of values that would also have their counterpart in the new version of the core ARIA spec (ARIA 1.1). Hopefully this will be sorted out soon.

Packaging Update

There has been some discussion with the chair of the TAG (co-author of the packaging specification), and it seems there are discussions now within the community on whether packaging is something browser vendors really want in the first place. For some of the use cases (web applications access) a combination of service workers with manifests would seem to work, too. One of the goals of the discussion at the F2F is to clarify the requirements and needs of the publishing community in this respect and forward it to the TAG and the Web Application Working Group asap.

DPUB IG Telco, 2015-04-27: STEM Survey, Fragment ID-s, footnotes

See minutes online for a more detailed record of the discussions.

STEM Survey

The STEM Task Force has conducted a survey among experts on their experience in publishing STEM content. The Survey is now closed; there was a first glance at the results during the call (all this is preliminary, a more systematic evaluation is still to be done).

There were 34 responses (out of 93 asked). Overall, the results are fine, although (at first glance) nothing overly exciting. Most of the responders were “end users”, i.e., researchers who publish in the area. The answers also highlighted some issues with the survey itself, e.g., the questions may not have been as clear as necessary. The bias or the responders was clearly towards CS and Mathematics.

There was a clear tendency towards making the content reusable and using the Web as a primary platform. Beyond MathML, no one additional STEM format came to the fore as major trend (CML was mentioned several times). As for delivery format, HTML was ahead of PDF as a primary format, but publishing in PDF is almost always present as a secondary format (without enthusiasm, just out of necessity).

The next goal is to have a more systematic evaluation of the result with the goal of summarizing in a note. The raw results of the survey will also be put into public, although it has to be strictly anonymized first.

Fragment ID-s

There was already a discussion on identifiers a few weeks ago, that referred to the selectors of the open annotation model as a possible approach for defining fragment identifiers in EPUB-WEB. That meeting was followed by and email discussion with the Web Annotation Working Group (that works on the model), to see if the selectors could be transformed into bona fide fragment identifiers.

The problem that arose during the discussion is the way fragment identifiers are defined (in general). Indeed, fragment identifiers are never defined in isolation; they are defined for a specific media type and registered as such by IANA. In this sense, serializing the selector model in general is not a real option. However, it is possible to do so for specific media types; in the case of EPUB-WEB, HTML is an obvious target.

It has been emphasized that if such a serialization is done, it should be done together with the Web Annotation Working Group to avoid discrepancies. That Working Group has already touched upon this issue (in the context of rangefinder) in their recent F2F meeting.

In the context of EPUB-WEB, CFI has to be evaluated first, though; after all, CFI defines, essentially, a fragment ID for EPUB3 already. Finding out whether CFI works (or not) for EPUB-WEB, if yes, how, if not why, is important before engaging into anything else. This is clearly a topic for the upcoming F2F meeting of the Interest Group.

HTML5 and footnotes

There was a recent email discussion on the possibility of defining a footnote element in HTML5. This was followed by some separate discussion with the experts of the HTML WG. As of now, the situation is that HTML will not have a formal proposal for such an element, so the DPUB IG should pursue ARIA Role approach for defining footnotes. Maybe it will be taken up in the future, though.

DPUB IG Telco, 2015-04-20: Consensus call for DPUB ARIA Module, F2F Agenda

See the minutes online for a more detailed record of the discussions.

Consensus call for DPUB ARIA Module

After DPUB-ARIA task force met last week, the DPUB IG put out a call for consensus to put the Digital Publishing Module of ARIA to FPWD. The task force did not resolve all outstanding issues but will attempt to resolve them in a call with PF next week and move forward with FPWD. All present voted for the publication. An email will be sent formally requesting consensus.

Finalize Agenda for May F2F

Agenda for May F2F  will be closed at end of day today.