The Accessible Rich Internet Applications Working Group has published a First Public Working Draft of Digital Publishing Accessibility API Mappings 1.0 (DPub-AAM). This defines how user agents map the Digital Publishing WAI-ARIA Module markup to platform accessibility APIs.
See minutes online for a more detailed record of the discussions. (The headers below link into the relevant sections of the minutes.)
Dave Cramer gave an overview, with screen sharing, of an experiment of a minor proof-of-concept approach (originally created by Jake Archibald) for what a simple PWP reader based on Service Workers (SW) could do. As an example, a (first chapter of Moby Dick)(https://dauwhe.github.io/epub-zero/acme-publishing/MobyDick/html/c001.html) is loaded into the browser, that publication as a whole can be taken off line (via a dedicated button), i.e., the same publication can be read while off line, too. (This works in Chrome, for the moment.) The “magic” is done via a script running service workers which is responsible for the local caching. The same module can also be used to produce and/or read a “package” (at the moment it is zip).
The site contains some examples, including a scholarly article (running MathJax).
The exact goals of the experiment will have to be described (a discussion on email should follow), and this would also focus on some specific questions that should be solved in future. A major goal here is to figure out how to make a file/folder format that works well, that can lead to a more comprehensive PWP solution.
Peter Krautzberger gave an overview on a white paper that MathJax plans to publish soon. The background is that MathJax itself is pretty old. In 2010 was version 1.0 and design started a year before that. They’ve been facing a problem that they need to revamp the internals, but the internal plumbing hasn’t changed much.
Originally, in 2009 and 2010, the goal was to help move math. Browsers weren’t supporting it because no one was using it, and users were not using it because browsers were not supporting it, etc… However, about 5 years later it hasn’t really moved. The choice is either (a) go for a full polyfill approach or (b) to really make use of the advances the browser have made and map everything on top of HTML, CSS, and SVG. Because (a) would not really solve the efficiency issue today, the direction planned is to go for (b). Doing things with grid layout is a good place to start, and it’s the direction they’re working towards.
The discussion at the call was concentrating on the accessibility issue, how this approach would affect it, etc. There may be some proposals coming up on how to use ARIA, CSS, and how these standards (and possibly a next version of MathML) should be adapted to handle accessibility. It has been agreed that this group is in a good position to initiate such an activity, if a clear set of requirements and proposals are on the table.
Charles LaPierre reported on the work of the TF. They made an overview of WCAG regarding the relevancy of WCAG to Digital Publishing. All of the concerns of WCAG are relevant, but there are also a number of issues in Digital Publishing that should be addressed more. These (as listed in the current draft): page numbering, drop caps, position/location of text, indication of text, nouns, layouts, influences, deeply nested headings, semantic list-heads, skipability, escapability, diagram models, appendix, and also needed but being addressed elsewhere: notes & footnotes (aria), and annotations.
The TF plans to publish a draft and, eventually, an IG Note, in cooperation with he relevant WAI group.
One of the results of the busy TPAC F2F meeting of the DPUB IG Interest Group (see the separate reports on TPAC for the first and second F2F days), the group just published a new version of the Portable Web Publications for the Open Web Platform (PWP) draft. This draft incorporates the discussions at the F2F meeting.
As a reminder: the PWP document describes a future vision on the relationships of Digital Publishing and the Open Web Platform. The vision can be summarized as:
Our vision for Portable Web Publications is to define a class of documents on the Web that would be part of the Digital Publishing ecosystem but would also be fully native citizens of the Open Web Platform. In this vision, the current format- and workflow-level separation between offline/portable and online (Web) document publishing is diminished to zero. These are merely two dynamic manifestations of the same publication: content authored with online use as the primary mode can easily be saved by the user for offline reading in portable document form. Content authored primarily for use as a portable document can be put online, without any need for refactoring the content. Publishers can choose to utilize either or both of these publishing modes, and users can choose either or both of these consumption modes. Essential features flow seamlessly between online and offline modes; examples include cross-references, user annotations, access to online databases, as well as licensing and rights management.
The group already had lots of discussions on this vision, and published a first version of the PWP draft before the TPAC F2F meeting. That version already included a series of terms establishing the notion of Portable Web Documents and also outlined an draft architecture for PWP readers based on Service Workers. The major changes of the new draft (beyond editorial changes) include a better description of that architecture, a reinforced view and role for manifests and, mainly, a completely re-written section on addressing and identification.
The updated section makes a difference between the role of identifiers (e.g., ISBN, DOI, etc.) and locators (or addresses) on the Web, typically an HTTP(S) URL. While the former is a stable identification of the publication, the latter may change when, e.g., the publication is copied, made private, etc. Defining identifiers is beyond the scope of the Interest Group (and indeed of W3C in general); the goal is to further specify the usage patterns around locators, i.e., URL-s. The section looks at the issue of what an HTTP GET would return for such a URL, and what the URL structure of the constituent resources are (remember that a Web Publication being defined as a set of Web Resources with its own identity). All these notions will need further refinements (and the IG has recently set up a task force to look into the details) but the new draft gives a better direction to explore.
See minutes online for a more detailed record of the discussions. (The headers below link into the relevant sections of the minutes.)
Following the discussions on PWP identifiers last week a task force has been set up, led by Bill Kasdorff. There were some discussions on the call as for the goals of the task force (this has to be cleaned up), but the general ideas are:
- The task force should concentrate on locators (as opposed to identifiers) both for the PWP level as well as on the individual resources’ level
** I.e., dealing with identifiers (ISBN-s of different sort, ISTC work, DOI-s, etc) is out of scope, as well as the issues around fragment identifiers, hence also the name of the task force
- The task force should dig into the addressing/identifier work described in the PWP document, should flesh out the details, possibly have some mock-up implementation, and identify if and what of this work would require a targeted Recommendation/Standardization work (either at W3C, or at IDPF, or in a joint group)
- The task force should also provide input to the IDPF EPUB3.1 work, which is looking at a “browser friendly manifestation” of EPUB. The goal of EPUB3.1 work, in this respect, would be to be forward compatible with an eventual PWP work
There were also some technical discussion, emphasizing the fact that a PWP can be a collection of very different resources from all over the place, where the order of the resource access (reading) can be different from one PWP to the other even if they share resources. The locator structure should make this possible (e.g., via a manifest).
There is a need for a more generic planning on where the PWP work ought to be going. The terminology-state-identifier-locator discussion has resulted in a more stable bases, and the task force on locators will dig into the details. What else? Ideas that came up:
- Looking at the library and archiving community. A focussed work will be pursued to see what specific needs that community may have and whether what is in the PWP document is adequate or not, whether it has to be extended, etc.
- The presentation control issue needs further work
- Other issues listed in the PWP draft should also be checked.
- Some sort of a proof-of-concept implementation is necessary to identify the necessary missing bits
For the last issue: Dave Cramer has recently created a simple mock-up based on the earlier discussion with, and work of Jake Archibald. (The repo of Dave is also available for cloning.) This is a tremendous start, and it has been agreed that Dave would give a more detailed overview on what is happening there on one of the next calls.
The Interest Group has agreed to publish the next version of the PWP document as a formal Interest Group Draft. Should be out on Thursday the 26th.
The group has been reminded on the need of having better CSS examples, and some further ideas did come up.
See minutes online for a more detailed record of the discussions. (The headers below link into the relevant sections of the minutes.)
Note that we experienced telco problems which cut some of the discussions a bit short and slightly chaotic…
As agreed on the last call, the IG is supposed to collect CSS examples on typesetting issues the community has. This is an ongoing effort; participants were reminded on this. Some new volunteers came forward on the call.
The ARIA technology has two parts
- The definition of the ARIA terms proper for which, in the digital domain, there is now a (soon to be updated) working draft
- Mapping of the ARIA terms on the various Assistive Technology Interfaces available today; this makes it possible to use the aria terms with those technologies.
Richard Schwerdtfeger has edited a draft for the mapping of the DPUB ARIA terms. That should be complement of the DPUB ARIA term specifications themselves. The DPUB IG was asked to approve the publication of that draft (formally done by the ARIA Working Group). The approval was voted on at the meeting.
Ivan Herman gave an overview of some of the proposed changes on the PWP draft. The new, proposed draft introduces changes based on the various discussions at the Sapporo F2F meeting.
Some of the proposed changes are minor: reinforcing the importance of manifests, or raising issues on how files on the local file systems should be handled by service workers. The major changes relate to the role and usage of identifiers, based on the specific session at the meeting (introduced by a slide set for the discussion). There are several aspects listed below; it has been agreed to provide more comments and issues on the draft and try to publish a new, official draft soon.
What type of identifiers do we have
The previous discussions included references to the fact that identifiers may have several usages (the work, a particular copy, a particular edition, etc.) and each would have to have several identifiers. However, it was also emphasized that the DPUB IG, or a future formal PWP specification, cannot decide on these issues. On the other hand, a clear locator, to uniquely ‘find’ a PWP on the Web, is essential. The proposal is therefore to include, in the document both an identifier and a locator; the identifier is stable, can be any kind of URN (i.e., can be a DOI, an ISBN, etc.), whereas a locator should be unique, and should be a HTTP(S) reference on the Web. Subsequent discussions made it clear that (a) the two URI-s may coincide and (b) it may be possible to have several identifiers. The PWP level metadata may include some extra relationships (e.g., on provenance) between those two URI-s, but, at this moment, those are not specified.
If one dereferences the canonical URL, what is returned?
Essentially, a manifest: either directly, or via
<link> element or a
LINK: header in the HTTP return. The role of the manifest, beyond containing additional metadata, is to “represent” the PWP as a whole.
What is the URL of the constituent Resources within a PWP
The URL of the PWP as a whole establishes some sort of a “context” for URLs. Ie, if the URL of the PWP is
http://example.com/2, then the constituents may be
http://example.com/2/index.html. Ie, everything is interpreted with the scope of URL as the base.
This is a simple approach, though the Resources may be spread over the Web, so this may not be enough. An idea is to have some sort of a mapping within the manifest to map this view onto “real” URI-s in that case
What about fragments?
Fragments should not be defined by and for PWP. With this approach, the fragment identifiers are “simply” those that are defined by the community at large for the specific media type.
Cooperation with the IDPF EPUB 3.1 effort on identifiers
The EPUB 3.1 effort also looks at the issue of identifiers in a possible approach of “forward compatibility” with en eye on PWP. Details of this should be discussed. To be picked up on future meetings.
The latest version of the DPUB ARIA Module Working Draft was published in July; a new version should come soon. Tzviya Siegman summarized some of the new terms that have been added since: colophon, credits, epigraph, errata. There were also discussion on noteref, glossref, etc.
There were also discussions on the ARIA mailing list on how to handle roles on links. EPUB has been doing that (‘noteref’) and it is very useful for Assistive Technologies. It is not clear, at this moment, whether
@role should be used for that, or whether, for example,
@rel is more appropriate. This is a decision the ARIA WG should make.
Another issue is the long term planning of the evolution of the vocabulary vs. the vocabulary currently available in EPUB (the latter is much larger). At the moment the plan is to back port the ARIA terms into EPUB and, at the same time, shrink the EPUB terms. A golden middle will have to be found. (There are some tensions with the ARIA group about how many terms we should use there.)
Overall, the IG is in favor of publishing a new WD (although the final decision is the ARIA WG’s). There is also a call for testimony from organizations that use, or plan to use, this vocabulary.
The discussion with the CSS WG at TPAC (see the minutes of the relevant TPAC session) revealed that a more systematic set of use cases should be provided, including screen dumps, etc, to show what should be rendered and how this should be achieved through CSS. Florian (who is also part of the CSS WG) gave additional rationale for this.
There were some discussion on how to do that in practice, and what the priority for those should be. At the moment, two use cases came to the fore for a first round: table alignment (e.g., aligning table cells on, say, the fraction sign of numbers) and inline grid management for CJK languages. A wiki page will be set up to collect these and there is a general call for members of the IG (and anybody else…) to provide as many cases as possible.
The Publication Object Model Community Group has been set up by Daniel Glazman, following a discussion at TPAC. That community group needs examples for various publication format beyond EPUB and PDF to have enough input to be able to define the POM API in a general way. This may include Manga and Comic formats, and also KF8. It is important to provide such information to the POM CG. (Information on Kindle’s KF8 has already been provided after the meeting.)
Full minutes for Day 2 are available.
After an exciting night of hunting for green Kit Kats and Pokemon, we regrouped for another day of meetings.
Education and Outreach
Karen Myers is looking for more authors from the DPUB IG to write ~500-word pieces, frequent updates to tell everyone what we are doing. Topics might include working with IDPF or TF updates. Every time we publish anything, we should blog about it. Our blog (the one you are reading right now) needs to have more than short minutes.
Conversation then shifted to trade press and conferences. We had a great brainstorming session about organizations outside of the US that Karen and others can contact. We are also considering running webinars. This is a call for action to the whole IG. Do you have a reflection on what happened this week? Write a short blog post! Tweet about us. If you’re speaking at a conference, please let Karen know. Please let Karen know where you go for publishing industry news.
Slides are available here.
Ivan Herman prepared a quick overview of PWP and the need for identifiers. PWP is basically a URL for a collection of web resources with the advantage of portability. We need an identifier to get to the package and as well as a method to get to its components and sub-components.
Ivan mentioned that the publishing industry has a variety of identifiers, and this group is not setting out to resolve the issue of creating one identifier to solve them all. It is important to keep in mind that PWP is a collection of resources, so we need to be able to access the collection as well as the insides.
Overall consensus is that DPUB must decide what is in the package before deciding how to point to it. Further, it is important to understand that a URL is a locator, not an identifier. It points to a page. The page may have all sorts of stuff on it, but that URL is not the thing. It is a location. We still have a lot of questions, but we have some direction about how to begin answering them.
Joint Meeting with ARIA WG
We met with several members of the ARIA WG to go over several loose ends.
Extended descriptions: DPUB provided feedback about the ARIA WG’s extended descriptions grids. The ARIA WG plans to rule out some of the proposed options. ARIA will compile feedback and bring back to the stakeholders with info about how specific use cases might meet use cases.
Mark Hakkinen provided an overview of his work on web components with ETS and IMS Global. Mark has been transforming the DIAGRAMMAR model into web components. There was some discussion about whether it is possible to implement this today and browser support for web components.
After some discussion about roles, attributes, and code samples, we agreed that DPUB-ARIA Module will go to Second Public WD in mid-November. If code samples are not updated at that point, we will release a third public working draft later.
And, that closed our formal sessions. We had a few breakouts with great attendance. I have compiled about 20 action items from the F2F. We have a lot of work to do! Thanks everyone for a great week!
Full minutes are available
What a turnout! We had about 10 of the regular IG participants in Sapporo. At all points, we had at least 20 people present, sometimes closer to 30. This shows the growth and impact of the DPUB IG. Ivan commented that in Shenzhen (just 2 years ago), few had heard of us. Dave pointed out that of all things happening at TPAC (and there are so many things happening at TPAC at once), several people considered DPUB to be the most interesting. Maybe it was the cream puffs and Pocky! Thanks to all who contributed, scribed, memed, called in, and provided Japanese sugar.
After three days of meetings with others and two days of DPUB meeting, my biggest take away for DPUB is that we don’t yet have a clear idea of what the PWP manifest must/may/should a manifest include. Without understanding this, it is difficult to move forward with several of the topics discussed. So, we have a lot of work ahead of us, we have accomplished a lot already. Here’s a summary of a great few days.
Rob Sanderson provided overview of Annotations WG model. We discussed TextFinder API (formerly Rangefinder API), which accomplishes both search and locate in the URL. Doug Schepers explained that this stores hashes not strings. Character offsets are possible. The group is also exploring other selectors, including XPath and CSS Selectors. DPUB and Anno should remain in contact, especially if we know of real world implementations.
Summary of current work:
Take a look at the minutes to see how much we have already accomplished and how much is in progress. Here is a quick list:
- PWP: we outlined a vision. Now we must work toward functionality
- CSS: published modules, priorities list, keep the highly-informed input coming
- DPUB-ARIA: module exists. People are eager to use it in EPUB world as well as in scholarly publishing
- A11y TF: working with ARIA WG to get extended descriptions right and then point out other issues specific to publishing or where publishing can contribute
- STEM TF: a lot of exploratory work, next steps will probably be around the future of math on the web. Major outcome is that there is a need for those with understanding of Math/MathJax/polyfills to talk to Houdini
- Metadata: published interviews, learned that publishers use metadata heavily, need some rights expression/management, and maybe make metadata more aligned w OWP
This became a fascinating discussion about intersection of Math and CSS and the need for communication between those who will implement Houdini and those working on Houdini. End result: MathJax has done a great deal of research regarding MathML and polyfills. Houdini wants to know and wants to talk you.
- Reconsider schedule (note: Ivan pointed out that IDPF and W3C have different modes of working, and this was not really up for discussion)
- Bring in libraries, especially wrt metadata (Heather and Lars offer to help)
- Do not deprecate elements. Kill them. Deprecation will cause problems
- Assess what is the interoperable core of EPUB 3.0.1 to determine the best way to move forward with EPUB 3.1.
- CSS Profile: snapshot may not be best option because it includes CSS specs that are rec level or almost rec level. It would be unwise to require all UAs to support all of snapshot. Good starting point though.
Meeting with CSS WG
Dave Cramer led a discussion of CSS priorities. He chose to skip the topic of pagination, because it’s too big. The group covered several topics, and the CSS WG wants more detailed examples from DPUB for all of these items. It is important for DPUB to file bugs as well. Need help with samples or filing bugs? We have several members of CSS WG among us, and they are friendly. Don’t hesitate to ask.
- Table alignment: CSS WG asks DPUB what is missing? Send your sample tables to Dave and Florian. (note that David Baron filed issues on https://drafts.csswg.org/css-text-4/#character-alignment while we spoke)
- a11y of generated content: There is concern that generated content is not accessible. CSS WG concludes this is an implementation bug, and DPUB should file implementation bugs.
- Hyphenation control: There was much discussion about parameters that control hyphenation, line breaking, line balance, and how this affects performance. Discussion pointed to this being an issue with line breaking, not hyphenation, which means that it would not affect performance and is an issue for Houdini.
- Keeping image and caption together in paged view: This pains the publishing industry. Fantasai wrote some CSS using flexbox. Dave is testing it out.
Actions: DPUB should not hesitate to file bugs. If you need help, ask members of CSS WG. If they don’t know about issues, they can’t fix them. Provide specific examples, not just complaints. Explain reasoning, not just requests. Communicate often. (These are friendly people who also want beautiful typography.)
Next steps: Assess different publication formats, what is common to all of them, how the components are connected. Daniel created POM CG.
Guest: Jake Archibald, picking up where we left off after IG call on 19 Oct.
The group asked Jake many questions about Service Workers, including whether our thought experiment in PWP makes sense. One hot topic is that SW requires https, not local file. If it’s localfilehost, it can be http. SW cannot use file: protocol. There is a same-origin policy for the Worker script, and the requests intercepted by the SW are limited to a scope defined by the script location (although as Jake pointed out, this can be configured via the “Service-Worker-Allowed” HTTP header).
Readium’s experiment with SW aims at transparently extracting + serving packaged content (i.e. from a zip archive), based on a “deep” URL syntax that references bundled resources. By contrast, the proposed use of SW in PWP focuses on enabling a seamless online-offline reading experience for regular content URLs. The PWP use-case requires some sort of offline local storage. There is a cache API accessible within SW, so this can be used to manage offline copies of remote resources.
There was also much discussion of benefits of SW for publications wrt security considerations and the ultimate goals of the industry. Jake requests use cases, specifically what is hard on web vs native + web view? As hybrid native apps do not normally need a built-in HTTP server to feed content to the webview, and as native URL protocol handlers can be implemented to manage access to bundled or filesystem resources, what is the role of SW in this context?
After the session, Jake created an offline-enable publication available at https://github.com/jakearchibald/ebook-demo. Test it and send your feedback.
Jake Archibald, one of the editors of the Service Workers Draft, was a guest of the meeting, and gave a short overview of Service Workers.
(The reason of this discussion is because Service Workers have been identified as one of the possible means to create a PWP Architecture.)
Service workers can also be used to unpack content on the fly, whether the packed content is in some ZIP format or other packaging format (the issue around the streaming ability of ZIP and other formats came up during the discussion).
There were some discussion about the availability of SW in browsers (is true for Chrome and Firefox, Microsoft Edge is in the process of development, nothing is known of Safari). There are also plans to create more tutorials and introductory texts for the specification.
Subsequent discussions on the meeting concentrated on the experience using Service Workers. Dave Cramer, from Hachette, has already played with this with a good first impression. The Readium consortium has also tried using it, and Daniel Wreck, from the consortium, shared his experiences and questions. The Readium proof-of-concept implementation is able to handle an EPUB content that is exploded on the web server or is able to do some ZIP unpacking (the current implementation does not do caching, only fetch intercepts). There were issues about same-origin constraints, usage of HTTPS as opposed to HTTP.
The PF Working group has created a document analyzing the various proposed content description techniques for accessibility purposes (i.e., usage of
<detail>, etc. That document aimed describing the various needs of the publishing industry; that was now reviewed by Deborah and Mia, primarily, with comments from others. The document is now ‘back’ to the PF Working Group.
- New Working Draft of Web Annotation Data Model. Annotations are typically used to convey information about a resource or associations between resources. Simple examples include a comment or tag on a single web page or image, or a blog post about a news article.
- First Public Working Draft of FindText API. The FindText API specification describes an API for finding ranges of text in a document or part of a document, using a variety of selection criteria.