EPUB 3 Working Group Telco — Minutes

Date: 2022-01-28

See also the Agenda and the IRC Log

Attendees

Present: Masakazu Kitahara, Dave Cramer, Tzviya Siegman, Toshiaki Koike, Matthew Chan, Ken Jones, Charles LaPierre, Brady Duga, Ben Schroeter, Zheng Xu (徐征), Ivan Herman, Bill Kasdorf, Murata Makoto, Dan Lazin, Aimee Ubbink, Jen Goulden

Regrets: Wendy Reid, Matt Garrish

Guests: Nick Doty

Chair: Dave Cramer

Scribe(s): Matthew Chan

Content:


1. Security and Privacy PR review.

Dave Cramer: the plan today is to go over the new security and privacy sections currently in a PR. Original text written by wendy, put into a PR by mgarrish.
… opening the floor to comments.

1.1. Content document PR.

See github pull request epub-specs#1972.

Dave Cramer: See Preview of the content document of the PR.

Dave Cramer: link to preview here.
… starts with an overview of what is unique about epub format, single file container, but using core web content.
… we inherent security and privacy issues from these core web content.

Nick Doty: (apologies, I got confused by that distinction and only read over the Reading System spec section last night).

Dave Cramer: based on feedback from PING and TAG we wrote up a threat model, list primary ways that users could be at risk.
… e.g. remote resources, scripting, DRM, etc..
… npd is this closer to what you were expecting to see?.

Nick Doty: in short yes.
… list of threats is important.
… epub has a slightly different threat model then browser, so this is a good start.
… can we say what we want the threat model to be? What assurances do we want to provide to epub users? What is the privacy experience supposed to be like? Is it just like the web, or something different?.

Dave Cramer: right, what are user expectations here?.
… might be a difference between what we think is happening and what non-insider users are thinking is happening.

Nick Doty: lessons from web model may or may not be transferrable here, different parties involved.

Tzviya Siegman: in educational environments sometimes reading pace is tracked.
… users may not think about this.
… can we bring this to light?.

Dave Cramer: there are times when RS keep close track of users for legitimate reasons - education being obvious case here, books with quizzes, etc..

Dan Lazin: in the recommendations section, maybe we can say “users’ default expectation is that ebooks behave like print”. We recognize legit reasons to deviate from this, but RS should declare to users when their behaviour intrudes on privacy more than tradition library book experience..
… this wouldn’t be normative, just recommendation.
… frame it like ‘we don’t have authority over this, but we are recognizing that ebooks come from a physical format with very specific user expectations attached’.

Nick Doty: re. education use case, i would caution against the assumption that information collection for education is always beneficial.
… cases where students are being coerced about type of book or technology they are made to use.
… students are a vulnerable population.
… so useful to describe those uses cases, but not to make it seem like privacy is not important there.

Brady Duga: i think education is a bit of red herring. We track everyone (so do libraries, who know who has their books)..
… Google knows which books you own, which you read recently, etc. This is so that we can provide a reasonable UI. For ordering books in bookshelf, opening to last read, bookmarks, etc..
… everyone is being tracked, and it is necessary. Most people rely on these features..

Dan Lazin: i think the difference there is that we have an ability to decide between local tracking and network tracking.
… there is a theoretical implementation where local tracking and network tracking user experience is less distinguishable.

Dave Cramer: i know a common use case is users accessing their books on multiple devices, not sure how local tracking only would work with that use case.

Nick Doty: this is part of why it’s useful to describe the privacy issues of the different actors. my browser knows what websites I’ve visited, but epubs have this additional step of, not just the reading software, but maybe also the bookstore.

Dan Lazin: yes. But there is a point at which we are making a decision between what we think user wants, and what they are willing to give away for it.
… we treat use data very carefully, and have decided to implement RS in this way. But we can’t assume that other RS implementors take user data as seriously.

Brady Duga: today users buy their books from their RS provider. So we need to track at least user purchases, which has to go off device. No good alternative.
… perhaps we can clarify when certain things happen off device. e.g. sideloading.
… we let users open epubs that we didn’t buy from us, and when we do, we make it clear that you are uploading your epub to us from your device.

Nick Doty: “sideloading” is such an interesting term!.

Nick Doty: I think maybe it’s a synonym for what we used to call “opening” or “loading” or “reading”.

Dave Cramer: most of these interactions using web language are first party (e.g. it is the Google Play way of doing things).
… in the general web model, this means that I’m trusting Google with certain things.
… but we’ve come up with a couple good ideas for adding to the section.

Nick Doty: to add to that, i’ve heard from this group that there is interest in making epubs the sort of thing where you could buy epubs from one source, and open in an unaffiliated RS.
… is there any movement towards this? Do we work towards this in this group, or would we do it somewhere else?.

Dave Cramer: many of us had hoped that this market would evolve differently than it did.

Nick Doty: maybe that’s too general a question for spec work, but it’s going to be pretty relevant for these privacy and interoperability questions.

Dave Cramer: also we live in a universe where common RSes are made by Google, Apple, Amazon, so there are economic and political challenges beyond the reach of a W3C wg.

Ivan Herman: let’s put DRM aside for a moment.
… we are in a very reasonable state in this respect.
… non-DRMed epubs can be read in many different RS, without any technical tricks.
… and with some tricks I can even sideload into, e.g., Kobo or others.
… epub gives you this interop. Even if it’s not ideal in terms of interoperability of reading systems,.
… this wg is trying to reduce interops issues, that’s why we’re doing testing..
… DRM makes life difficult in this respect. Most DRM is platform specific. But there are non-platform specific DRMs..
… but DRM is outside of our purview.

Bill Kasdorf: VST and Redshelf receive content from all higher ed to be made available to students.
… Bookshare does the same thing, but not in retail setting.
… this interop is enabled by epub.

Brady Duga: until you can open your epub in your browser, epub will likely be used mainly for books. This has limited non-book use cases for epub.
… we allow users to download the epubs that they buy from us, but they would only do this if their epub isn’t DRMed.

Ivan Herman: i have quite a lot of books on my Mac that are not DRMed.
… re. the browser thing, there are browser extensions that are trying to enable this functionality.
… Zheng is working on one such browser extension/application.
… epub interop is very high in our priorities.

Nick Doty: appreciate that, thank you.
… the implication of that is that the privacy analysis should consider both the current state (i.e. vendor driven ecosystem), and the goal of user-side interop (notwithstanding DRM).

Dave Cramer: we have some recommendations in the PR.
… and some practical steps (e.g. avoiding 3rd party resources, using https, ensuring the use of stable links).
… do we need to go further in discouraging remote resources unless they are strictly necessary?.

Tzviya Siegman: strictly necessary can mean different things to different people, so I hesitate to try to define it.
… e.g. in education space, what is necessary could differ based on whether epub is for 3rd grade or university student.
… no proposed language from me just now, I can think about it.

Nick Doty: are the recommendations normative?.

Dave Cramer: no, the whole section is non-normative.

Nick Doty: so are you telling authors not to do something? Or are you saying they probably will, but you would rather they didn’t?.

Dave Cramer: it’s hard for us to say ‘don’t use features that we’ve had in the standard for 20 years or your epub will be invalid’.
… especially given that users have provided consent for some of this.

Nick Doty: we’ve had, I would say, mixed results on putting considerations or requirements for content authors in w3c specifications.

Ivan Herman: most of the specs that i’ve worked on in the past few years have had non-normative privacy and security sections.
… drawing user attention to things they may not have thought about.
… spec itself may have features that are normative (i.e. origin, etc.), but this section is non-normative.
… use of term ‘recommendation’ may have been a mistake since it carries specific meaning in w3c process.
… leave it to the group whether to change use of this term.
… but yes, absolutely valid question.

1.2. next steps….

Ivan Herman: i am co-editor here, but i didn’t write this. I’m anxious about having this discussion leading to specific action in editing this document. Anyone here willing to try to take this discussion and make it into changes to the PR?.

Dave Cramer: maybe we can discuss this in the chairs call?.
… we need volunteer(s) to turn this discussion into changes into the text.

Ivan Herman: there are already a number of comments in the PR which I made a couple weeks ago.
… it would be good if those were discussed and adjusted/rejected, and then merged.
… the PR is getting big.
… too messy to track everything otherwise.
… and I don’t want to edit my own suggestions, but either mgarrish or wendy should look at them before we go to the next round.

1.3. Reading System document PR.

Dave Cramer: See Preview of the reading system document of the PR.

See github pull request epub-specs#1972.

Dave Cramer: these are proposed changes to RS spec.
… again starting with an overview, a threat model (scripting, remote resources, user generated content, etc.).
… and some related recommendations.
… any thoughts on the RS side?.

Zheng Xu (徐征): what was different between epub and android app? Is there a difference in perspective?.
… in terms of the content itself.
… not the RS. In terms of the security and privacy considerations of the content itself.
… epub is a package of XML, but android app can access web too, etc..

Brady Duga: significant difference is that epub involves 3rd party content.
… android app dev writing a game controls privacy aspects of the game.

Nick Doty: it’s app stores within app stores.

Brady Duga: but with epub, Google has an app and a store, but the content comes from someone else.
… so what privacy and security issues exist around me displaying this epub in my app.
… given that I don’t know exactly what the 3rd party content will do.

Ivan Herman: i think, beyond what Brady said, the big difference is the potential for author putting scripts into the content.
… if it was only HTML and CSS, then the security problems from RS point of view will be smaller.
… in the current PR, I asked whether we could put some guideline to the author to ‘please use scripts only where strictly necessary’.
… not sure how we would express this, but essentially that would be the message - keep it simple.
… if you make a mistake in your script, it will stay in the file (and out in the world) for years.

Brady Duga: if you think scripting is your only remote code exploit, you will have a bad time. Many other things that can happen. But not using scripts is a start..

Nick Doty: if this section isn’t normative, why isn’t it?.
… e.g. ‘RS that allow local storage should allow users to view data’ could be normative.

Ivan Herman: the problem is that the current RS spec is exclusively related to how to render content of epub file.
… we say nothing about what RS user interface should behave.
… like how HTML spec says nothing about how browser UIs work.
… if we open that floodgate, there are other things that we would need to specify.
… e.g. we have in the spec a separate notion of navigation file (a table of contents).
… we say the format of the toc, and where RS can find it, but we say nothing about how RS should do with that data, or how to display it to user.

Dave Cramer: just because we make a few normative requirements on important issues, we don’t have to decide everything.
… e.g. for the nav file, we say that if RS displays nav file outside spine, then it can’t show numbers by default on OL element.

Tzviya Siegman: some of the recommendations are easier to make normative than others.
… e.g. ‘provide user indication that network activity is occuring’ is more easily testable, could be normative.
… others are more vague.

Dave Cramer: it’s worth seeing what could be done.
… seeing what RS would be likely be willing to implement.

Brady Duga: in terms of easily testable, if you do something that would access the network but nothing comes up, does that mean that you failed the test? Or that your connection is bad?.
… so it is a little tricky to test some of these things.
… but i’m not against making some of these normative.
… but do we make them normative here, or normative somewhere else in spec and just reference them here.

Ivan Herman: editorially it makes more sense to move the normative ones somewhere else (into a normative section).

Dave Cramer: we’ve uncovered a lot of things to think about, and a lot to do.
… thanks npd for joining, we appreciate your contributions.

Nick Doty: i had an issue with the user agent string?.
… why was there an EPUBReadingSystem object? History of privacy issues related to this.

Brady Duga: the idea was that you could figure out which RS you were on, and behave accordingly.

Dan Lazin: but why do we have this if navigator.useragent already exists.
… and it might be because RS are built on top of other tech, like Chrome.
… so navigator.useragent returns Chrome, but EPUBReadingSystem returns the RS.

Nick Doty: not sure it’s necessary to have both of those, but we can take that offline.
… thanks for your explanations.

Ivan Herman: there are already tests in these sections, so you can find out what the RS returns if the RS implements it.
… okay thanks everyone, this meeting is closed!.