EPUB3 + PING Joint Meeting — Minutes

Date: 2021-10-26

See also the Agenda and the IRC Log


Present: Ivan Herman, Deborah Kaplan, Yu-Wei Chang (Yanni), Matthew Chan, Shinya Takami (高見真也), Ben Schroeter, Rick Johnson, Avneesh Singh, Brady Duga, George Kerscher, Wendy Reid, Dave Cramer, Tzviya Siegman, Nick Doty, Matt Garrish, John Roque, Theresa O’Connor, Victoria Lee, Bill Kasdorf, Brian May, Julien Rossi, Lubna Dajani, Eric Mwobobia, Joshua Ssengonzi, Charles LaPierre, Katie Haritos-Shea

Regrets: George Kerscher

Guests: Aram Zucher-Scharf, Matt Wilson, Victor Lopes


Scribe(s): Matthew Chan, Brady Duga


George Kerscher: I cannot join zoom today, sorry, conflict.
I wanted to mention that many teachers and parents want to track the progress of their student/child as they read. What is the reading progress. Just want to make sure this use case is in the mix..

1. PING horizontal review.

Wendy Reid: welcome everyone to epub wg / PING joint meeting.
… this meeting is all about the horizontal review, performed by Nick.
… would you kindly give us the highlights now?.

Nick Doty: I appreciate the active discussion we’ve been having.
… the challenge is that epub is large and complicated technology, that takes advantage of other web technology, but not always in an obvious way.
… that’s why the review is long, and the discussion is valuable.
… the categories are self-contained packages, interactivity, and DRM.
… the self-contaged packages is about the way that epub could have even stronger privacy than generally on the web.
… like reading a book, where you don’t expect surveillance.
… interactivity is about whether users know who they are interacting with.
… i.e. guarantees of authenticity, distinction between private highlights in your book vs when this goes back to the publisher.
… 3rd DRM. The DRM situation seems a bit extreme here.
… i’ve tried to purchase and analyze ebook myself, but it seems like users cannot inspect the ebooks they interact with.
… DRM provider may be surveilling users.
… other thing is an obfuscation section, not sure if this provides much business value, but does prevent users from inspecting their content.
… re. fingerprintability, our concern is that this may reveal to publisher or retailer identity via configuration data.
… also concern whether user details could get leaked out into the web at large, e.g. via a link in the ebook.
… these are the 5 categories that are detailed in my report.

Wendy Reid: i think it might be worthwhile to go thru each of these one by one.
… some might require more discussion than others.
… i’m going to start with the last one, epub and browsable web.

1.1. epub and browsable web

See github issue #1871, #1872.

Wendy Reid: right now there is not really a relationship between epub and browseable web.
… you can’t link from web into a specific part of an epub.
… we are working on CFI but it has never gone into practice.
… you usually cannot get to an epub from the open web without an intermediary.

Dave Cramer: I thought npdoty comment was more about what happens when you go from epub into the open web via link.

Brady Duga: i think it goes beyond that.
… also the ability to put scripts into epub, e.g. that tracks progress of reader thru the book that reports this back to the publisher.
… that would bypass the privacy policy of the RS.
… and it would not be clear to the user that this is happening.
… we try to get around that by making interactive content a special separate thing, but there is worry that this can be worked around via script.
… other case is external resources, that when loaded could flag for publisher that progress has been made.
… at Google we do things to make this safer, but what we do isn’t part of the spec.
… also not sure how we would spec this.

Nick Doty: there are advantages to privacy and interop if you say that “content is proxied, scripts are disabled” etc..

Samuel Weiler: npdoty pointed to interop reasons why you would put that into spec, but there are other advantages to documenting those things as well.

Wendy Reid: this is not something we’ve done before, but telling authors and RS what could possibly be done could give ideas for what to do (or NOT to do).

Nick Doty: yeah, I think the starting point is to document the threat model and how it applies to ebooks.
Currently only privacy of the content author is considered, but useful to consider the privacy of the reader, and privacy from whom.

Charles LaPierre: From George Kerscher: Make sure the Youth case of a teacher or parent wanting to track their students progress through a title as they read is an important use case here. (both George an I are in another meeting, sorry we couldn’t attend this call).

Ivan Herman: as one of the co-editors of some of these spec docs, it would be helpful to separate what needs to go into the privacy section from those issues that require a change in the normative spec.
… keeping in mind that anything that goes into the normative spec must be testable.

Samuel Weiler: agree, and most of those mitigations should be part of the normative text.
… in general i expect those things to be part of the normative spec.

Nick Doty: right, there is a separate reading system conformance specification, yes?.

Ivan Herman: not sure how that would be done. There are very few normative statements about what RS should do, and how.

Wendy Reid: let’s continue on with the recommendations, and we’ll sort out which is normative vs informative later.
… so information exposure and fingerprintability is next.
… this is the section where the use-case of a teacher or parent wanting to monitor progress of student comes up.
… epub is used by a number of parts of publishing, from general trade to education and academics.
… here the trade use case will be different from education sphere.
… for educational purposes, use cases may be more invasive than what we’d do at Kobo for example.

Rick Johnson: you’ve got domains of understanding that need to be accomplished. Publisher may have good reason to know what is happening with book, without what is happening to individual users.
… institution may want to know what is happening for a book in a course, but not outside of that.

Nick Doty: i was a student and students have privacy interests as well.
… i didn’t expect my teachers to have proof of what reading I had done.
… there might be a reason that learners want to share info about their reading habits, but they need transparency and control.
… which would be consistent with both privacy and those use cases.

Rick Johnson: right, we need to understand the use cases. There are some competency-based cases where the student gets their credential based on how they interact with the content, and that needs to be measured.

Tzviya Siegman: i agree with both position, and the difference between grad school and first grade. Part of the way a kindergartener’s ability to read is assessment of reading pace.
… a tool might be able to do this better than a person, in a less biased way.
… trading off against the need for privacy.

Aram Zucher-Scharf: it seems likely that there will be bias in such tools, but more to the point, we’ve seen student lead movements against this sort of monitoring.

Nick Doty: I’d be happy to help make connections to colleagues working on student privacy. I don’t think purchase/ownership is the most important distinction.

Aram Zucher-Scharf: they’re interested in not being tracked, whether the reading materials are being provided by the institution or not.
… and recognizing that there may be edge cases here.
… but students have actively resisted, especially in the college setting.

Deborah Kaplan: i think its a mistake to get too bogged down in particular use cases.
… there are obviously clear use cases (not collegiate) where tracking is useful, and a user could conceivably consent.
… so #1, should that be part of epub?.
… because a RS can always do whatever it wants outside of the framework of the spec.

Aram Zucher-Scharf: See A good summary of resistance to educational surveillance.

Deborah Kaplan: can we require that if you do tracking it is disclosed, and can we make it so that it must be blockable.

Wendy Reid: I think its worth looking to make sure that we’re not encouraging this, but a lot of this falls onto the RS.

Nick Doty: I’m not aware of having consented to surveillance of my reading habits of when I’ve purchased an ebook, for what it’s worth :).

1.2. DRM and obfuscation

See github issue #1873, #1874.

Wendy Reid: next, DRM and obfuscation.
… is obfuscation only for fonts? No. It can also be applied to images, I believe. But most common use-case is for fonts.
… the obfuscation is done primarily by tools.

Dave Cramer: not aware of real world use of obfuscation aside from fonts.
… adobe was heavily involved in original epub. They were concerned about fonts easily accessed from the epub package..

Nick Doty: the obfuscation can be undone easily though.

Dave Cramer: some font vendors have told me that even these very ineffective means are good because then they can say that if you work around them then you violated DMCA etc..
… the legal case is clearer.
… this is anecdotal only. As a publisher we’ve not obfuscated fonts..
… but i’m open to discussion of whether we need this or not.

Rick Johnson: not a single title in our 2mm title inventory with font obfuscation included.

Wendy Reid: obfuscation tends to break things in RS.

Nick Doty: I’m not sure the goal of making it easier to sue the user is important.

Wendy Reid: DRM is tricky because the spec does not specify the DRM to be applied.
… there are a number out there, with the most popular being Adobe DRM.
… though implementation is also platform specific.
… in our charter we’ve said that we are not going to talk about DRM, which also complicates things.

Brady Duga: i think you can only obfuscate fonts.
… boils down to whether or not the publisher’s license of the font allows it to be used in ebook or not.
… it appears to be okay to font vendors as long as font can be embedded.
… removing this may limit fonts that publishers can use.
… but this was also written before WOFF was a thing.
… so i’m open to recommending WOFF, but leaving it open to RS to do this (esp. for backwards compat).

Nick Doty: deprecating the feature would be helpful in terms of the risk of expanding user-agent-implemented obfuscation to other w3c work.

Rick Johnson: my opinion is that we shouldn’t address this in spec.

Matt Garrish: we’ve never wanted to go into DRM implementation in spec.

Samuel Weiler: but you have the have the hooks for it in the spec, even if you aren’t fully specifying the DRM.
… so we care about this.

Nick Doty: and the hooks make it so that the full contents of files are encrypted, rather than some smaller subset.

Tzviya Siegman: epub exists in an ecosystem that has been around for a long time. If we took out those hook we would be ceding our standard to a world that would not accept the lack of it.

Wendy Reid: we wouldn’t have the support of most of the publishing industry, RS would be happy, and retailers would also be impacted.

Nick Doty: I think our goal in doing a privacy review is to note the current privacy situation and the potential harms.
But we could also consider harm reduction or mitigation, as W3C has done in the past about DRM.

Dave Cramer: I might disagree. If we took this out of the spec, would this change anything in practice?.
… as far as i know every epub created is done without DRM, because DRM is the responsibility of retailer rather than publisher.
… so removing from spec would not change this.
… and the DRM that exists and is being used may or may not rely on the hooks in the spec.

Matt Garrish: i agree with dauwhe. It doesn’t break anything if we take this out..
… it might just make things more difficult for implementors.
… we’d need more information certainly, but not sure that anything breaks by taking this out.

Wendy Reid: okay, good points everyone. This is something we need to assess as part of our privacy threat model.

1.3. interactivity

See github issue #1875.

Wendy Reid: next is interactivity.
… you had a question about RS and full screen reading.
… full screen reading tends to be an option.
… user-generated text tends to be in annotations, or specific text fields put in by the author.
… but epub doesn’t really do local storage.
… and then there is other interactivity that we’ve touched upon, e.g. some RS lock scripting.

Rick Johnson: EPUB does not do local storage, however, reading systems do (we do).

Ivan Herman: re. the annotation system, that is 100% outside purview of the spec.
… implementation of that is down to the RS.
… it would be nice to have a standard annotation system, but today that doesn’t exist.
… we can and should put something into RS spec about annotation related privacy concerns.
… but the normative standards should probably not change, since we don’t talk about this.

Matt Garrish: there is a spec for annotations, but not sure that it’s widely implemented.
… but even that was only about interchange, not about what happens within each RS.

Nick Doty: i think the point of interactivity is that it might not always be obvious who you’re interacting with.
… i’m assuming that when i highlight something in my book, that i’m interacting with the RS, but could I also be interacting with the publisher?
… for forms inside books, does that get shared or not?
… the web has historically not been very transparent about this, but there’s an opportunity here to be better.

Dave Cramer: on the web we have the idea of origin and first party. When you choose to go to a website, you’ve given first party defacto permission to collect some info.
… with epub there is a separation between the content creator and the UA.
… if I buy from Kobo, I interact with Kobo but the content is from publisher.
… as far as I know all the data goes back to the retailer, but the publisher knows nothing.
… so in terms of user expectations, a user might have a baseline expectation of greater privacy in ebook than in open web, but not sure that is really the case now.

Nick Doty: but there is an expectation of interoperability, that a book from one retailer can be readable in multiple different UAs.

Tzviya Siegman: +1 to goal of interop.

Nick Doty: there should be some privacy properties that user can rely upon, no matter where they take the book.

Rick Johnson: we used to always need to explain that epub is 2 things, a file format, but also a thing in the supply chain used to ferry around data.

Wendy Reid: remote resources is something we’ve talked a lot about in the wg, but perhaps we won’t get into that now for time.

2. next steps.

Dave Cramer: thank you npdoty for you thorough and comprehensive review. You’ve given us a lot to digest here, and I’m looking forward to doing that.

Wendy Reid: yes thank you, this meeting was very productive.

Nick Doty: what do we want to do next? and I think we need to explicitly consider where this diverges from ethical web principles or design principles and what we might do to converge in the future.

Wendy Reid: Most important next step is for epub WG and go through all the comments.
… A lot of the topics are things we can comment on directly, but others are more ecosystem. Perhaps we can give guidance on those.
… Might want to build a threat model and get feedback on that from PING.

Tzviya Siegman: I have never written a privacy threat model - do you have examples?.

Nick Doty: This is kind of a unique case.
… we can give some examples, but not sure how helpful they will be.

Nick Doty: See target privacy threat model for the web; is in progress.

Wendy Reid: Looking forward to seeing the web ones, I think they will be useful.
… Next steps are to continue in the epub group, then share when we have something for review.
… Thanks everyone!.