EPUB 3 Working Group vF2F, 1st Day — Minutes

Date: 2021-05-27

See also the Agenda and the IRC Log

Attendees

Present: Dave Cramer, Wendy Reid, Masakazu Kitahara, Ben Schroeter, Matthew Chan, Zheng Xu (徐征), Brady Duga, Shinya Takami (高見真也), Dan Lazin, Marisa DeMeglio, Ryo Kuroda

Regrets:

Guests:

Chair: Dave Cramer, Wendy Reid

Scribe(s): Brady Duga, Matthew Chan

Content:


1. satellite specs

Dave Cramer: https://w3c.github.io/publ-cg/

Wendy Reid: We have a long tail of specs
… Some we have discussed, eg multiple renditions, a11y, cfi
… Some we have not discussed
… We should figure out the fate of these docs
… Currently they are in the archives, and will always be there regardless of what we decide
… Some are now moved to notes because we are discussing them
… What do we do with the ones that we have yet to pull in?
… Could be leave them alone, could be to pull them all in
… personally, think we should only pull in the ones we plan to change/discuss, otherwise leave them alone
… If we leave them, they continue to exist but are not w3c docs

Dave Cramer: We have an obligation to inform people which of these are viable
… Most of them are not
… If you make an epub against one these specs, you will waste time since they are not implemented

Dan Lazin: Question about what we have pulled in. Looks like about 8
… Not clear what we are trying to do with them from a w3c specs perspective

Wendy Reid: All docs that are currently notes are not on the rec track
… They never have to become recs. The ones on the rec track are going through the rec process
… Notes can be elevated, eg CFI could move to rec track

Brady Duga: is it easy to make something a note?
… so it doesn’t hurt us to leave something where it is, right?

Wendy Reid: yes

Brady Duga: does leaving them in the archive achieve the end of signifying that these are dead?

Dave Cramer: i was thinking of something more than that
… especially with stuff like EDUPUB

Dan Lazin: I do want consistency and clarity
… It’s unclear if something is in an archive to muddle through the history and figure out if it is supported
… Fragmentation is bad, we should consider the user and pull in everything we think might be alive and add a note to the clearly dead ones

Wendy Reid: +1

Dan Lazin: Aim for being as tidy and helpful to new users as possible

Dave Cramer: Strongly agree
… It is easier to change things in w3c web space, as opposed to IDPF web space
… Feel more comfortable if we made the w3c stewardship more obvious
… Might be good to point to w3c from idpf page, then pull in all specs as notes, with a specific comment about status

Wendy Reid: A single change that says “all docs are moved to www.xxx.yyy” is easier than lots of changes

Dave Cramer: Even the IDPF home page isn’t that clear that it is over
… Would like to get a feel from the WG what the general principles are
… then on chairs calls we can decide on specifics, since there is a lot of w3c admin stuff

Wendy Reid: Question for the group, what do we think the best direction is?
… Migrate all to w3c space with updates with disclaimer at IDPF, OR do we leave them in IDPF space?
… Currently epub search goes to w3c
… Of the current list, migrated are 3.3, 3.2, a11y, multiple renditions, and CFI
… all are note or rec track
… not moved are legacy (3.1, etc) and a whole bunch of docs we all forgot about

Dave Cramer: Scriptable components never really implemented, Adaptive stuff was too CSSy

Brady Duga: Softpress(?) implemented a bunch of these satellite specs

Wendy Reid: Criteria is, will we use it?
… we could pull them all in and add a deprecation note

Dave Cramer: Pull over if we need it makes sense and it is less work overall
… but batch change might make it clearer in the idpf site

Brady Duga: i’m leaning towards pulling everything in, and adding deprecation notice
… because right now its not clear who owns these specs now
… on the other hand, this could clutter up W3C notes

Dave Cramer: I agree

Wendy Reid: No matter what, there will be a note saying it is a dead spec
… Real question is do we want to do it at w3c or idpf
… logistical question is where it is easiest/best to do it?
… takeaway is go to Ivan and see what is easiest

Dave Cramer: Makes sense, we have an end in mind, need to talk to Ivan and Matt about the means

2. What does it mean to “support” a foreign resource? (issue epub-specs#1464)

See github issue #1464.

Dave Cramer: When writing some spec tests, one of the foundational aspects is a core media type
… ie something that does not need a fallback
… so manifest fallbacks are also foundational
… Wrote some fanciful tests with docbook, binary files, etc
No reading system implemented manifest fallbacks
… Some of the behavior was not ideal for the end user
… for instance a .dmg file was downloaded to the local system [Scribe note: !!!]
… Looks like everyone just throws it at a webview and let the webview handle it
… Even Readium doesn’t handle them
… What are the implications of a core RS feature not working in the real world?

Shinya Takami (高見真也): In Japan, in some cases manifest fallback is implemented, but may be domain specific
… Would like to discuss with Voyager people in Japanese
… [Japanese]

Masakazu Kitahara: [via shiestyle] Voyagers RS does not support this in RSes, but in some places we do use the feature in Japan

Brady Duga: we have two pipelines at Google, 1 for publishers, and 1 for people sideloading
… the one for publishers is better for support for this type of thing

Wendy Reid: Don’t think it is supported by Kobo

Brady Duga: dauwhe can you make your sample epubs available?

Dave Cramer: yes, i’ll let you know where

Dan Lazin: I have some tests that are not checked in, because I don’t know what the proper behavior is supposed to be
… Do we need a graveyard for these sorts of things? Since I can’t set the “does it pass” field

Dave Cramer: The tests are great, as it points to where we should be investigating
… Seems like a case I would like more semi official information on what RSes support/claim to support with regard to manifest fallbacks
… If no one implements it, we need to have hard conversations
… If there aren’t implementations we need to remove it
… But need to keep concept of core media type

Wendy Reid: Since we now have several tests without clear passes
… Does it make sense to make a list and send the tests out to the community pre-CR
… Typically this is done during CR phase, but would be good to know where we are in trouble

Dan Lazin: To pursue, in my test sheet, I have some blank cells so we can add notes there
… Can use that as a way to corral a list

Brady Duga: this sounds like a problem, but if nobody has implemented manifest fallbacks, then maybe we just say that you can only use core media types in the spine, period

Dave Cramer: Part of this is how epub has changed over time
… used to have the idea of general epub container, but that is not what really happened
… Have several next steps to gather info and look into hard to test tests
… Have enough useful take aways

Ben Schroeter: Where did we leave the first conversation?

Dave Cramer: We have the goal of communicating the status, but where to do that is still an open question
… Will check with Ivan and Matt to figure out the best path forward
… Break now?

Wendy Reid: Yes!

Dave Cramer: Brady can eat dinner

Wendy Reid: Will reconvene at the upcoming whole hour

3. HTML Serialization

See github issue #636.

Dave Cramer: i’ve made epubs that have used HTML that was not well-formed XHTML
… sometimes it works
… but we hear that RS is sometimes built using XML toolchains, and would have to be reworked if they can’t expect well-formed XHTML
… there are arguments in favor, but I have not felt large support from authors, publishers, or RS

Brady Duga: sounds like a lot of work, so maybe we should wait until there is need for it

Dave Cramer: in my mind epub 4 doesn’t have to worry about backwards compatibility with epub 3, chief among which is support for XHTML

Shinya Takami (高見真也): i have no objection, but we have to consider compatibility with existing epubs
… so we must differentiate between HTML5 epub and XHTML5
… now might be a good time to start talking about how spec would have to change

Dave Cramer: we would never mandate a change to HTML5
… compatibility issues would arise where people begin to try to open new HTML5 epubs in older RSes
… has Kobo looked into this?

Wendy Reid: when I looked into it I specifically talked to ingestion side (where most of the problems would pop up)
… they said it probably wouldn’t be too bad
… we’d have to add in new libraries for parsing HTML, and maybe do some additional validation
… not impossible, but work would be involved
… agree that we might have to identify HTML5 epubs separately
… speaking for Kobo, we have a long tail for device support

Dave Cramer: comment from Daniel Weck was that Readium would experience several issues if we did this
… also issue with epub:type, given that it is namespaced, it won’t carry over to HTML serialization
… another issue was about CFI, which uses a very Xpath like syntax to point to places in XML files
… concern that that might break in HTML serialization
… but it might break in XHTML serialization too (e.g. parser inserting tbody element into the DOM)

Brady Duga: CFI was intended to work with the text, not with the DOM
… so yes, that issue could arise
… this is a hard topic because of how much work would be involved, and the lack of a clear reason to do it, especially vs all the other features we could be working on

Dave Cramer: the other argument that has been put forward for HTML is that HTML tools could be used, but i’ve never seen an example of one that works on HTML and breaks on XHTML

Proposed resolution: Defer HTML serialization to EPUB 4, close issue 636 (Wendy Reid)

Brady Duga: if we’re deferring, should it be closed?

Brady Duga: +1

Shinya Takami (高見真也): +1

Wendy Reid: +1

Masakazu Kitahara: +1

Ben Schroeter: +1

Wendy Reid: i think the idea for epub 4 is that we start with clean slate

Dave Cramer:

Matthew Chan: +1

Marisa DeMeglio: +1

Wendy Reid: not resolved yet. Will return to this with tomorrow’s group.

4. iframes and external content

See github issue #1061.

Dave Cramer: example use case would be embedding youtube video in iframe
… right now epub doesn’t like this
… there are security issues here too
… this issue was raised almost 3 years ago, but I haven’t felt strong push for this from publishers, and I’m not sure why

Ben Schroeter: I think publishers may be doing it already

Dave Cramer: I think there is a fairly high amount of epubs with video content in higher learning books

Brady Duga: but are those videos embedded in the epub, or a link to a youtube video that is supposed to open in a player?
… from a security perspective, we disable networking in our webviews (at least on Android)
… so this would be a fairly major change

Shinya Takami (高見真也): the epub spec should allow this external content, and RS should take care of making associated alerts to users
… if we allow this sort of foreign resource, then RS should check this for users
… the ability to have this sort of external content could be valuable to users, but for security reasons, RS should take care of the related risks

Dave Cramer: that puts a lot of burden on RS to evaluate every URL of this nature and try to figure out if it is safe or not

Shinya Takami (高見真也): the alert could just tell users that they are about to access a foreign resource, and that it may be dangerous

Brady Duga: users might be scared away by such alerts
… and enabling network access means more than dealing with the linked URL
… it could also present privacy issues in addition to the security issues
… more so if we also enable scripting
… authors could be trying to report all sorts of user behaviour

Wendy Reid: agreed
… we’d then have to address all of these issues in the security and privacy review
… in the issue Ken presented use cases, e.g. RS sending quiz results to a server somewhere
… a valid use case, but also one which suggests an untold number of nefarious uses
… within something like the VitalSource system, the user knows they can trust foreign resources also in the system
… but the more generalized use cases are risky

Shinya Takami (高見真也): how about adding features to RS to allow user to toggle whether to permit or deny things like foreign resources, scripting?

Wendy Reid: i think that’s outside of scope. We can’t tell RS how to handle privacy.
… and sure, users could then choose to provide informed consent, but it presents an uncomfortable situation for both the user, and probably the publisher

Brady Duga: it also assumes that the user is legally able to give consent to whatever might happen

Dave Cramer: in some circles i’ve heard concern over how web handles this sort of issue today (e.g. ubiquitous cookie consent pop-ups)
… but we will continue this discussion tomorrow. It doesn’t feel like we are close to consensus right now. But we’ve raised good desires and concerns

5. Task Forces Update

Wendy Reid: https://w3c.github.io/epub-specs/epub33/fxl-a11y/

Wendy Reid: https://w3c.github.io/epub-specs/epub33/locators/

Wendy Reid: the TFs are both developing their own WG notes
… the FXL a11y group is writing documentation to provide concrete guidance on how to produce more accessible FXL content
… the link above will give you idea of structure and topics
… the other TF is debating re-using CFI as a locating method
… after a lot of discussion we’ve decided to pause, and put together a list of use cases to understand the problem space of locations in epub
… and we’re working towards potentially coming up with some sort of algorithm that creates segments within an epub that resemble pages but are not pages
… i.e., a fallback location scheme should a pagelist not be provided (because epub is digital first and has no physical version, or they just left out pagelist to physical correspondence)
… document will also have solutions for those use cases
… might result in us recommending that we add something to spec, but we’re not there yet

Dave Cramer: about that algorithm, would it be a script that use you on an epub? Or is it something that RS implements?

Wendy Reid: it would be script run by author, or maybe at distribution level
… we want the locations to be as platform agnostic as possible
… and we’d want to locations to be roughly identical to each other if the same book is distributed through various different platforms

Dave Cramer: is the FXL TF still working on the idea of alternate style sheets?

Wendy Reid: yes, and we have an explainer doc for that
… but it is in very early stage of draft

Wendy Reid: See Visual to Textual Explainer

Wendy Reid: there are lots of a11y solutions for people who use AT to parse DOM, but not many solutions for low vision users
… this proposal creates a method by which content creators can provide secondary style sheet for their FXL, that would allow compatible RS to switch reading mode from FXL to reflow
… then user could change font size, font face, etc.
… while preserving the reading order
… but there is still a lot to hash out

Dave Cramer:

@media (-epub-prefers-reflow: reflow) {
  /* styles for reflow */
  @viewport { size: auto; }
  body {
    width: auto;
    height: auto;
  }
  p, div, h1, h2 {
    position: relative;
  }
}

Wendy Reid: one of the challenges we have is EUAA, which comes into force in 2025, and specifically states that content should be alterable by user
… under those requirements FXL would not pass as it is today
… lends additional importance to something like this

Dave Cramer: interesting, might be able to do link rel=alternate stylesheet
… also media queries for user preferences (prefers dark scheme, prefers low motion)
… so a lot of this might be able to be achieved using existing web technologies

Brady Duga: this is not legal advice from me nor my employer, but I think there is carveout in EUAA for content that cannot be reasonably modified to comply
… e.g. a 1967 Spiderman comic book

Wendy Reid: right, like some kind of undue burden clause
… one of the challenges is just the breadth of content that becomes FXL
… e.g. novels that are FXLs
… use cases like that will probably have a hard time bringing themselves within that carveout
… this method will not work for all types of content, but for things like cookbook or textbook, where there is a lot of text alongside pictures, it could work

Brady Duga: it could work really well for some books
… e.g. where publisher feels that layout is critical to the book, but reflow just makes it easier to read to end user

Dave Cramer: is there an issue where a spread extends across both pages?

Wendy Reid: i wonder if we’re going to have to provide some sort of stitching hint

Dave Cramer: i see implementation issues with this

Wendy Reid: the other question was what to do with the images
… in a particularly image heavy book, there might be a background image, with some images overlaid, and then a block of text too
… so what happens if the background image is semantically relevant?
… or what happens if a single image runs across the gutter in a spread?
… might lose some of the meaning

Dave Cramer: we’ll continue tomorrow