EPUB 3 WG TPAC Face to Face Meeting Day 1 — Minutes

Date: 2021-10-28

See also the Agenda and the IRC Log


Present: Dave Cramer, Wendy Reid, Toshiaki Koike, Matt Garrish, Shinya Takami (高見真也), Masakazu Kitahara, Deborah Kaplan, Matthew Chan, Charles LaPierre, Brady Duga, Ben Schroeter, John Roque, Theresa O’Connor, Hadrien Gardeur, Victoria Lee, Dan Lazin, Victor Lopes


Guests: Myles Maxfield

Chair: Wendy Reid, Dave Cramer

Scribe(s): Matt Garrish, Brady Duga, Ben Schroeter, Wendy Reid


1. Welcome and goals.

Wendy Reid: welcome to the virtual face-to-face - there are a couple of items on the agenda for tonight.
… we are working to CR - have done horizontal reviews - approved by i18n and a11y - working on ping at tag.
… we’ll be looking at issues and also task forces over next two days.
… we are also going to look at testing - how to get more tests done and get to finish line.

2. Virtual Locators.

Wendy Reid: See Current draft.

Ben Schroeter: we formed this group to banty about the perception that digital publications we slow on uptake and epub needed a way to locate things across digital environments.
… we’re sort of solving old problems again - we got a group of indexers and reading system people and looked at CFI-s for pointing to particular spot.
… after figuring out use cases and whittling them down we wanted to be able to do index references and citations that work across the ecosystem.
… we’re finding there isn’t always a print equivalent for epubs so publishers make up page lists.
… would be useful to align so that reading systems can define the same pagination - what recommendations can we give.
… we’re currently developing best practices for how to cite or locate a spot in an ebook.

Wendy Reid: the order of operation is that publishers provide the page list - that is the best possible scenario - in the absence of that this algorithm can provide a good experience platform to platform.
… a lot of affordances currently cannot be done because there isn’t a page list.

Charles LaPierre: this has been brought up with us recently that publishers always have page breaks - we’re going to require GCA partners put page numbers in.
… it would be good if publishers use these recommendations to put pagination in, not just for reading systems to use.

Wendy Reid: we’re not going to state that this is only for reading systems - could be used for tools that produce digital only works or the print version no longer exists.
… important part is that it is always done the same way.

Brady Duga: also works as a threat that if you don’t do the pagination we’ll do it for you in ways that may not be ideal in all cases.

Ben Schroeter: having others beyond me say this is important is really helpful in getting this in publications - good to have these consequences to show people.

Dan Lazin: the ideal scenario is that the algorithm is never used - we would rather page numbers than an algorithm - meant as a fallback.

Ben Schroeter: the guidance is going to be really helpful to publishers to understand where to put their page numbers.

Hadrien Gardeur: I think page lists are good in theory more than practice - the fact you can use a string and they are not exhaustive doesn’t help reading systems.

Charles LaPierre: for accessibility and academic space having page breaks and being able to jump to the spots is important - there may be issues we need to address but is helpful.

Hadrien Gardeur: not arguing the use cases but the technology is not great right now - it is easy to write a page number that is not useful.

Shinya Takami (高見真也): when thinking about webtoons in Japan they consist of very long images and we have to separate the images and be able to index inside the image - a new use case for virtual locators.

Wendy Reid: we have talked a little bit about scrolled content because we know all content is not paged but we should be able to get a list of locations in any type of scrolled document.

Dave Cramer: what suggestions do people have to avoid the problems of using strings - print equivalents come in lots of formats with lots of different numbering systems - sounds like a user interface problem.

Dan Lazin: to my mind strings as page numbers is not critical as you can still arrive at the page you want but typing it - if you stick to conventions it’s not too bad.

Brady Duga: not sure it’s a terrible impossible UI problem - having users choose from a small list will allow them to determine if they’re entering wrong info - we haven’t had a big problem with this - more problematic is when publishers do not provide a full list of pages.

Charles LaPierre: you could scrub the data - use a slider to move around instead of typing in - knowing how far you are in is another important use case for pages.

Hadrien Gardeur: listing the type of affordances built on top of pagination is important - displaying the page number while reading is one and is easy enough as the value doesn’t matter - a progression is another but here page numbers get confusing if they are not numeric - another is jumping somewhere which is sometimes a dedicated affordance or sometimes in search and here strings are problematic too.
… augmenting existing content via annotations is a last one and string or no string is not a big issue - navigating content is the big issue when you don’t have controlled values.
… it helps to understand all these and what we need.

Wendy Reid: another case where the reality is we cannot enforce a scheme on everyone - we can make strong recommendations like using a consistent and understandable scheme to it - for example, use only numbers - very common to have roman numerals for front matter and numbers starting from 1 in body - so long as it is consistent it helps the user.
… having good UI can ameliorate the problem.

Charles LaPierre: ordering doesn’t matter so much as being able to get to a specific page - missing pages and blank pages are another issue we need to address.

Ben Schroeter: whatever the epub implementation is it has to accommodate being faithful to the print equivalent - it has to accommodate what is print otherwise we have to change what print does.

Brady Duga: I think ordering is important because the pagination is powerful and important in determining how big ranges are - can we get people to stop using roman numerals as they’re really annoying.

Hadrien Gardeur: given that we allow people to do messy things reading systems will always encounter epubs with missing info - by allowing what we allow will only make them useful for context - reading systems won’t be able to adopt this because the method isn’t consistent and there is too much existing content without.

Wendy Reid: the unknown quantity is there is a desire from users that pushes reading systems to provide this - if we can demonstrate there is a tangible value to providing consistent locations by showing the affordances it helps make the case for publishers to make this change.
… the task force meets every two weeks for those who are interested - meet at different times to accommodate different regions - I will send out a refresh about meetings after TPAC.
… we can continue to work on this after epub goes to CR.

3. Flow Control.

See github issue epub-specs#1378.

Dave Cramer: this issue originally came from Ivan that we have rendering properties for authors to express if content is paginated or scrolled.
… we have two types of scrolled - one continuous one where all documents scroll and scrolled-doc where each document scrolls but you transition between them.
… you can set these globally for all spine items and also override them per spine item.
… I believe this was clarified in the issue discussion but there was some appetite to have better images and descriptions of this in the spec.
… the larger point is how do these things interact with synthetic spreads and that gets complicated.

Matt Garrish: That sounds generally correct.
… The interactions between these fixed layout properties are confusing.
… We (probably) originally envisioned them as global.
… We aren’t even sure if this even work (eg scrolled).

Dave Cramer: do all the possible combinations of these properties make sense.

Hadrien Gardeur: it’s worth pointing out that scrolled continuous is not the most implemented feature - it is difficult to implement - it is a useful feature for certain types of publications but it’s questionable whether the author or the user should control - webtoons are an example of where it gets tricky on how to control as the author intent is arguably more important.

Shinya Takami (高見真也): in Japan we need scrolled continuous for webtoons and Apple reading system handles it - I don’t know any use cases for scrolled-doc.

Dave Cramer: I wonder if this is a case for writing a grid of test possibilities and see what happens - maybe this will reveal what makes sense and what doesn’t and what has actually been implemented.

Hadrien Gardeur: aside from webtoons is manga let user decide if they want to use spreads or paginated view or scrolled continuous - not something the content needs except for webtoons - also used as an easy way of reading manga - need to distinguish author and user preference and that’s where our issue is - if it is important it shouldn’t be overridden.

Wendy Reid: this falls into the same category of potential spec overreach as the fxl orientation properties - we had good intentions in providing the option for authors to specify landscape orientation but user choice comes above all else - whatever the user wants to do should is paramount.
… do we need all these options?.

Dave Cramer: I have seen interesting uses of odd combinations of the properties - some educational material uses synthetic spreads to put a graphical page on one side and a scrolling doc in the other side so they can read about the graphic while still being able to view it - not sure I want to lose possibilities.

Charles LaPierre: with orientation you can’t force displays as it affects the accessibility of the document - user needs to be able to override.

Dan Lazin: Dave wrote a nice summary of what epub is for the TAG - that reading systems have tended to give preference to users should also be in it - there is still a lot of disagreement about what epub is and should be so we should try to agree on what the fundamental difference is between epub, html, and pdf.

Theresa O’Connor: you said users come first in epub but that’s also the priority of constituents on the web.
… you’re saying that web page rendered in browsers implement their own UI while epubs rely on reading system chrome - browser UI provides similar control but not akin to applications.

Dave Cramer: when I author an epub I do not provide a way to change fonts or move page to page - completely dependent on reading system UI - when I make a web page I decide how it works.

Hadrien Gardeur: even when the user doesn’t override any setting the reading system injects a lot of script and styling - quite different from web in that regard.
… always in constant negotiation between what the publisher wants, what the reading system has to do, and what the user wants.

Dave Cramer: it doesn’t feel like we’re reaching a resolution on the issue - I think testing is key here to figuring out what we support.

Theresa O’Connor: another example of browsers doing invasive things by default on behalf of users is text auto-sizing on mobile browsers - very complicated and invasive and has lots of effects on page that the author doesn’t intend but the user wants.

Shinya Takami (高見真也): I agree user should be able to select rendering - for webtoons we have to put in this property otherwise the images are not connected smoothly - maybe we could use new metadata for webtoons or discuss another solution.

Charles LaPierre: for a scrolled doc couldn’t we mark it at-risk if we don’t have implementations?.

Dave Cramer: yes, that’s what we want to find out but we can’t make them at-risk in advance of knowing - also have to make sure we don’t break existing epubs.

Hadrien Gardeur: one last note to say that we’re in the middle of implementing this in readium - presentation API to combine these things together and figure out what gets precedence - it is pretty complex in fxl but user settings in general also are - adding any feature requires figuring out all the interaction and we haven’t always done that in the past.

Dave Cramer: if there is anything in the spec that doesn’t make sense please open issues.

4. Horizontal Review - PING.

Dave Cramer: we met with the privacy group about horizontal review.
… worked through a list of items they found in our spec.
… we broke that list up into discrete issues.

4.1. EPUB and the Browseable Web (issue epub-specs#1871)

See github issue epub-specs#1871.

Dave Cramer: if an ebook links out to the web, is there danger of passing user’s personal information?.
… we make assumptions as to what user agent does when a link is activated.

Brady Duga: presumably the website you go to would know where you are coming from and what book you are reading.
… it’s an inherent danger of using the web.

Dave Cramer: no different than a link from anywhere - not peculiar to EPUB.

Matt Garrish: these dangers are not just for ebooks.
… what can we really say about this other than making it clear? it’s beyond our control. how do we enforce it? don’t know where this fits in the spec.

Charles LaPierre: we don’t have control over what the user agent does.
… I think they want some sort of disclaimer that there is nothing in the spec that prevents passing personal info when you follow a link.

Brady Duga: scripting is harder for the user to understand, but link following is a user action and user’s should accept the inherent danger. not sure what we can do in the spec for either but they are separate issues.

Matt Garrish: browsers can spy on you too, so this is no different than any other privacy concerns for the web.
… don’t know how we can solve this in any meaningful way.

Wendy Reid: we should probably try to write a privacy statement for EPUB. It will likely not be normative. It would be a useful practice for us to outline these things. Reading systems do have privacy frameworks to adhere to..
… we can advise reading systems on privacy best practices, but I’m not sure we can or should make normative statements.

Theresa O’Connor: these are good things to include in a privacy consideration section - there is a cascade of different actors that can do hijinks if you do certain actions that may not be intended by author or user.

Charles LaPierre: audience for our documentation is mostly publishers; this might be useful information to them.

4.2. Information Exposure and Fingerprintability (issue epub-specs#1872)

See github issue epub-specs#1872.

Brady Duga: this section would be important for RS developers too, like what security models to follow under different circumstances.

Dave Cramer: other things they brought up - user agent strings, trying to figure out what RS was in use.
… are people giving accurate user agent strings?.

Wendy Reid: See IEEE EPUB Security review.

Wendy Reid: link to security review of EPUBs, behind a IEEE paywall.

Dave Cramer: we need to look into user agent strings and how much information is in them for PING.
… that will be homework..

4.3. Obfuscation (issue epub-specs#1873)

See github issue epub-specs#1873.

Dave Cramer: another thing PING brought up: obfuscation.

Matt Garrish: it would be nice if we didn’t have this, but now that we’re 10 years in, not sure what we can do about it.
… we’re kind of stuck with it.
… there is a SHOULD that should be a MUST.
… we can point out that it is trivial.

Brady Duga: we can’t remove it; we could recommend using WOFF.

Brady Duga: WOFF.

Theresa O’Connor: clarification on obfuscation?.

Dave Cramer: how to keep fonts from escaping the EPUB to protect font vendors.

Matt Garrish: See the obfuscation section in the spec.

Dan Lazin: keeping in mind that Adobe is a main exporter of custom ebooks, they have had a strong interest in both DRM and protecting their fonts.
… if you take an InDesign ebook and change the publication ID, the font will no longer work because it is ties to that hash.

Dan Lazin: See extra test commits on obfuscation.

4.4. Privacy and DRM (issue epub-specs#1874)

See github issue epub-specs#1874.

Dave Cramer: what are the privacy concerns with DRM and how can they be mitigated:.

Wendy Reid: a concern they raised was that the user can’t view source of an EPUB with DRM in the reading system like they can elsewhere on the web.
… we don’t have a good way to address this because it manifests because of business reasons.

Dave Cramer: also not sure why privacy mitigation is to view source of complex computer files.

Brady Duga: we can take the DRM stuff out of the spec but it will have no effect on the world. Many reading systems don’t use conventional DRM schemes.
… it’s more a political than a practical issue. We could move it from the spec or a note..

Dave Cramer: but obfuscation relies on encryption.xml.

Dan Lazin: echoing that obfuscation uses encryption.xml.

Deborah Kaplan: if we punt on the political issue, then we can’t talk about accessibility in DRM, which is a big problem.

Theresa O’Connor: primary purpose for spec is interoperability and primary audience is implementors. Not sure we are doing them any favors by removing this from the spec. Would rather acknowledge this.

Matt Garrish: encryption.xml is a W3C thing, so it would be weird to throw it out. we should let it slide..

Dave Cramer: there is an open source spec for EPUB designed for interoperability with DRM - DRM is important for library lending of eBooks too. I think I’m supportive of not backing away from our modest provisions to acknowledge how this technology is used in the real world..

Dan Lazin: is encryption.xml used in libraries?.

Brady Duga: yes.

Dan Lazin: so EPUB provides one way to provide DRM, but reading systems use their own anyway?.
… there are other ways to talk about encryption. Should we explore alternate ways to talk about obfuscation and DRM?.
… other purpose of the spec is to write down how stuff works..

Matt Garrish: do authors ever author in encryption.xml?.

Dave Cramer: it was always intended that reading systems would implement the DRM.

Theresa O’Connor: the document has authoring requirements that the tools that generate it need to follow.

Wendy Reid: we need to do some more investigation about this. Per our charter, DRM is supposed to be out of scope, but we also need to respond to horizontal review..

Dan Lazin: what section of the spec are we talking about? DRM is out of scope because it is not cross-compatible..

Dave Cramer: See relevant section of the spec.

Dave Cramer: intention was that you could buy an EPUB from one retailer and use it in another - but that never materialized.

Brady Duga: you can from GooglePlay Books even though it is DRM protected using Adobe.

Wendy Reid: Adobe DRM - you can download and sideload and share limited times but it authenticates everything (and breaks often).

5. Outcomes of the EPUB.inext meeting.

Wendy Reid: We had the epub next meeting on Wednesday. Discussed what happens next (ie after epub rec).
… There is some interesting discussion on the email thread for the minutes of that meeting.

Ben Schroeter: This is riffing of a Twitter discussion: if we could figured out the backwards compatibility problem it might open the door for imaginative improvements.
… So we would look at what it would mean to support the current ecosystem while also looking at a new format.

Wendy Reid: This is based on a personal statement, but if we don’t rethink backwards compatibility (very strict - any published book is still valid epub 3.3) it will be hard to move forward.
… Does that mean a file created today must work on every legacy system? Or does it mean we work in the same tech space (webby), but advance beyond where we are.

Dave Cramer: We have been unable to make some changes even though they are backward compatibility.
… For instance the html serialization wouldn’t break existing content.
… but we can’t do for other ecosystem reasons.
… We are really constrained because we can’t even make epubcheck emit warnings.

Deborah Kaplan: We don’t expect anyone else to follow a similar backwards compatibility constraint (eg no one else in the W3C does this).
… The web just doesn’t do this.

Shinya Takami (高見真也): The most important thing the Japanese community needs is sustainability and interoperability of epub.
… A lot of the epubs made there are hand generated and were very costly to produce.

Theresa O’Connor: When we talk about compatibility in the web, we are talking about the corpus of the web.
… but interop and compatibility are different.
… The constraint of any new book being openable in older readers may tie your hands.
… It make sense to align with the way the web works (old content can still be opened, but new content needs new readers).

Deborah Kaplan: hober++, existing pubs should continue to work in new readers, new books shouldn’t necessary work in old readers..

Theresa O’Connor: But also understand that books are a little different than the web (I bought a book, but not a web page).

Wendy Reid: +100 hober.

Dan Lazin: We don’t have 100% fidelity today, so we can’t guarantee it for the past.
… Is there a way to have a graceful degradation mode?.
… We don’t really need 100% fidelity, we could get by with a poorer display (eg text is still available).
… Q: How do living specs work? How does the web survive with ever changing specs and we can’t?.

Theresa O’Connor: Web pages aren’t modal. So we don’t say “this is xhtml 1.0, and you must render it that way or not at all”.
… everything just goes through the same path (roughly).
… The web does it with strong guards to not make breaking changes in the living specs.
… HTMLparser was a reverse engineering effort to standardize html parsing across browsers.
… For years isindex was in that, even though it was terrible.
… But we were finally able to remove it because it was barely used on the internet.
… Basically we manage living starts with a strong social contract not to break things.
… But that means there are ways we can’t change the internet in certain ways.
… But modes are bad. Whenever there is a mode switch, we introduced lots of problems and had regrets.

Charles LaPierre: Can’t we just start over?.
… Take what we have learned and make an epub-like system that is in line with the web.
… It would be completely new and different.

Matt Garrish: Maybe one thing we got right with pub manifest was the migration path.
… You could go between the two.
… We probably need to be able to improve in what we have. So long as there is a migration path we are good.
… At least it provides a continuation.

Wendy Reid: The publishing WG came at a bad time, but the path to something new has been laid.
… The book I bought 5 years ok should still work forever.
… So maybe part of the book fails (eg package) but the content still works because it is just html.
… Do we want epub to be an interchange format or a display format?.
… epub 2 and 3 got caught in both.
… Now we are time-boxed with the intention of being archivable.

myles: Tess described a situation where an old feature was removed.
… This wasn’t just mechanical, it was a big deal.
… The usage being low isn’t sufficient to remove (though probably necessary).
… The web is very conservative about removal.
… second, the example was about old content in new browsers.
… But the inverse is also required (new content in old browsers).
… Ruby is a good example. Ruby text is still visible in an old browser that doesn’t support it.
… We even added the ability for an old browser to style specially (eg with parens), that wouldn’t happen in the new browser.
… So we care about both directions.

Brady Duga: A lot has been said. First off, the idea of a new break and starting over.
… feels a lot like modes.
… but like Tess said, bad idea.
… I’m worried about trying to do that.
… if we do need a break, we need to create something where we don’t have this problem again.
… the features we have used, we are stuck with.
… books live a long time.
… maybe we want to move away from archiving, and just be for display, maybe?.
… I’m reluctant to move away from the things we used..
… Does removing a feature mean that we no longer sell a book?.
… are we going to have this same problem with a new spec.
… 10 years into that.
… I’m not saying it’s a bad idea, but are we opening a horrible modes can of worms.

Matt Garrish: If we make a break and start over, we get away from having to do it all ourselves.
… If we work with what works on the web as much as we can, then we don’t get stuck with bad old features.
… When we look at what comes next, the biggest issue is the xml serialization of the package document.
… What is the long term impact of staying in the html world?.

Dave Cramer: https://twitter.com/dauwhe/status/1149681148957745152.

Dave Cramer: Nothing technical is stopping us from putting books on the web.
… There is plenty of room for experimenting directly on the web.

Wendy Reid: We are over time! But thanks all, wish we had more time to discuss it.
… See you all in eleven hours!.