EPUB 3 Working Group Telco — Minutes

Date: 2022-01-14

See also the Agenda and the IRC Log

Attendees

Present: Ivan Herman, Dave Cramer, Avneesh Singh, Masakazu Kitahara, George Kerscher, Jen Goulden, Tzviya Siegman, Rick Johnson, laura, Wendy Reid, Matthew Chan, Aimee Ubbink, GeorgeK, Laura Brady, Charles LaPierre, Brady Duga, Gregorio Pellegrino, Ben Schroeter, Bill Kasdorf, Victoria Lee, Dan Lazin, Hadrien Gardeur, Zheng Xu (徐征)

Regrets: Toshiaki Koike

Guests:

Chair: Dave Cramer

Scribe(s): Matthew Chan

Content:


Dave Cramer: today we have 3 issues which were all filed by TAG, but all on the same theme of privacy and security.

1. TAG’s Privacy and Security comments.

See github issue epub-specs#1957.

See github issue epub-specs#1959.

Dave Cramer: that was the main concern of PING of TAG, that we have not said much about the threat model involved in epub, about handling PII.
… our goal is to address these.
… this first issue is about PII, could we do something to discourage collection of PII, can we recommend RS are clear with customers about what is being collected?.
… wendyreid wrote a summary of the issues for us, and some of our possible responses.

Dave Cramer: See First draft by Wendy.

Dave Cramer: the industry has seemed to settle on policy of user agreements, but the general public is probably not aware of how much information in being collected in this process.

Ivan Herman: just to make it clear from the w3c point of view is that we need to provide a section that documents problems and guidelines for what authors and RS should do to address the problems.
… there is no requirement to change the spec.

Avneesh Singh: +1 Ivan.

Wendy Reid: i can give an overview.
… the way I have written this is that because we are talking about privacy and security, there are two parts to each of content authors and rs.
… for both security and privacy, i wanted to lay out our objectives.
… preserve confidentiality, content integrity, transparency.
… threat modelling for content: falsification of creator information, remote resources, etc..
… recommendations (mostly around privacy aspect): protect users from threats, avoid collection of data, “content processors” should be careful.
… for RS threats and recommendations are also set out, with threats like rs spoofing.

Dan Lazin: from disclosure point of view, Apple now requires that apps have a user data collection “nutrition label” in the app store.
… not all rs are apps (i.e. some are stand alone devices), but this means that there are already some higher level requirements in many cases.
… but also, the most common RSes come pre-installed on device, not via app store.

Wendy Reid: other considerations like CCPA also come into play here, but these recommendations apply to.

Dan Lazin: we could model our response based on apple’s list of things they do with user data.

Rick Johnson: the more specific we get with the current state of things, the more revs of this we will have to do.
… so we should err on the side of common sense, and maybe reference other places that deal with this sort of thing, but not get too specific.

Brady Duga: we should focus on epub, even though rs do lots of things that don’t specifically have to do with epub format.
… odd for an epub format spec to try to tell rs what to do with other formats, or as a UA generally.

Wendy Reid: See PING Target Privacy Threat Model.

Brady Duga: also, the rs privacy policy doesn’t apply to the publisher - e.g. if a publisher includes a tracking pixel, rs can’t control that.

Dave Cramer: the industry has gotten in trouble before - e.g. ADE sending unencrypted user information back to Adobe.
… i looked up the policies of a few major epub retailers.
… e.g. Apple says they anonymize everything, but Kobo doesn’t.
… so I’m a little less concerned with how other specs handle privacy, because there are specific user expectations about privacy when it comes to books.

Dave Cramer: See Privacy section of the HTML webstorage section.

Ivan Herman: we have 2 specs, content and rs. And wendyreid separated the threat model into these 2 parts..
… in the rs part, we already say things about origin and other security related policies, so we aren’t complete silent.
… indeed there are areas where spec is silent, but the general expectations that a user should have over privacy are probably in scope.

Wendy Reid: one thing that separates us from general web browser is that there are book related affordances that RS are expected to do for user, but some of these rely on collection of user data.
… but users don’t think this way, they just expect that these features are there.
… e.g. collection of data for annotations, could syncing.
… so we would be doing our due diligence by providing some guidance to implementers.
… the reality is that these recommendations aren’t normative anyway, but we are being good global citizens by doing so.

Rick Johnson: as a reminder, the epub marketplace is also the dominant format for education, an the customer is not the user. It’s the institution..
… so we need to be careful when we say things that affect that use case.

GeorgeK: there are also rs that track the individual student and how much time they spend reading, progress, etc. and report back.
… and many times teachers and parents can see that info.
… i wonder if one of our suggestions would be to have the privacy policy available in a rs, e.g. in the help section, or if there is anything in the content that is phoning home, then it would be the publisher who informs people about that.

Tzviya Siegman: we need to keep in mind that we tend think of epubs as separate from the web, but there are a lot of websites that do similar things. We’re not that different.
… but the “nutrition label” might solve the problem, by clarifying the user’s position without scaring them.
… better than a user agreement, where user knows that they just have to click to agree or else the app won’t work.

Ivan Herman: UX people have come up with a vocab that describes the a11y issues that might be present in a given book, can we do something similar?.
… but most of the privacy issues are on the rs side rather than the content side, so it may not be that helpful.
… since wendyreid has already started, I think this text should become part of the spec.
… so next step should be to open a PR to incorporate it.
… re. applications disclosing privacy features in general, maybe we should incorporate the labels that we want RS to provide.

Wendy Reid: I considered including specific examples of data collection behaviours, or how to communicate to users when these things happen.
… but I thought that being specific might lead people to think that the examples constitute a closed list, when the recommendations are more like principles.

Dan Lazin: do we know what the TAG wants? or can we ask?.
… most specs don’t touch this. Really we just want to satisfy them..
… can they give us an example of what other specs have done in response to similar concerns. Rather than producing something on our own and risking it not being what they want.

Ivan Herman: Examples for privacy section in the DID spec: https://www.w3.org/TR/did-core/#privacy-considerations and for security section: https://www.w3.org/TR/did-core/#security-considerations.

Wendy Reid: https://www.w3.org/TR/audiobooks/#security-privacy <– quite minimal.

Wendy Reid: https://www.w3.org/TR/pub-manifest/#security-privacy <– more informative.

Tzviya Siegman: Web Authentication has several sections on privacy https://www.w3.org/TR/webauthn-2/#sctn-privacy-considerations-authenticator, https://www.w3.org/TR/webauthn-2/#sctn-privacy-considerations-client, https://www.w3.org/TR/webauthn-2/#sctn-privacy-considerations-rp.

Tzviya Siegman: privacy is increasingly important now. We’re not just doing this to check off the privacy review box.

Ivan Herman: the DID spec recently addressed similar privacy concerns.
… i.e. what implementors should be aware of when they try to implement, what privacy pitfalls are they likely to encounter, etc..
… we also went through a similar process with audiobooks and pub manifest.
… i don’t quite agree that we’re just doing this to satisfy TAG.

Dave Cramer: I can reply in the issues with a link to the document that we already have?.

Ivan Herman: we’ll let them know once we have a PR.

Hadrien Gardeur: we have very very different rs, and for some of them, the fact that rs need to be distributed already applies some requirements (e.g. apps distributed via app store).
… similar thing will happen with Play store.
… but no analogy process on the web - maybe just a privacy section of the page.
… we could have best practices section about what to disclose to users, but not sure we can go much further than that.

Ivan Herman: i wonder whether there are things specific to security that we need to call out.
… we know most of rs have been quite averse to using scripts, some don’t allow it at all.
… mostly due to security concerns.
… so having a fairly good idea of why rs shouldn’t allow scripts might be helpful.
… maybe say that content authors should really consider whether they need to include scripts in their content.

Wendy Reid: i think the best we can do is identify some common threats that arise because of the way the spec is written, and the way content is likely to be written.
… in terms of recommendations, we could recommend virus checking as part of ingest, checking origin or links.
… security is tricky because we can make recommendations, but it will ultimately come down to the authors.

Dave Cramer: one of the big problems with security is that Hachette might write the script, but then Google executes it, knowing nothing about it.

Ivan Herman: you could say that a content creator on the web writes scripts, and then the browser has to execute it.
… but for ebooks, once you put something in content, those books won’t be automatically updated.
… so a malicious script could stay other there for a very long time, whereas on a website, the content can be updated.
… the fact that the book becomes its own entity is a difference between book and website - might be worth pointing out, as a reason not to include scripts.
… e.g. old versions of incorporated js libraries incorporated in ebooks.

GeorgeK: what a user is reading is certainly private. Governments knowing what people are reading is something we should point out.
… whether or not people are using assistive tech is also sensitive, we should call that out.

Rick Johnson: perhaps we don’t bifurcate by content and rs, but rather content creation and distribution channel.

Charles LaPierre: the other thing is that the content creator could have done everything right, and then someone in the middle injects malicious code into the epub and repacks it.
… right now we don’t have anything to do with signing ebooks, etc..
… so it really falls onto whoever is ingesting this at the end to make sure the content is safe.

Hadrien Gardeur: the top reason why rs don’t like js in content is that it can mess with what rs does.
… rs will most likely always inject js to get desired result, which js in content can mess up.

Brady Duga: we don’t do js because of security, and because it’s a pain. When a rs implements something in the webview they are limited in the resources they have available. But a browser operates at a higher level.
… in terms of serving content, when Hachette writes a book with a script, it’s different whether that script is run via rs or via browser. When run via rs, the origin is Google..
… very different security and privacy model.

Ivan Herman: about signatures on epubs, there could be. Would it be a good thing to say in spec that we suggest that publishers make sure of signatures?.
… Of course, all the intermediates that modify content during ingestion will be against that, but there are all sorts of technology out there for providing signatures.

Zheng Xu (徐征): i was talking in mini-app CG yesterday, and their zips have a signature.
… maybe we can incorporate something similar.

Hadrien Gardeur: we have signatures.xml in OPF, but how to use it was not defined.

Ivan Herman: agree that some infrastructure is already there, but probably too early to try to standardize anything.
… but we can recommend that industry start to look into using it, which we might try to standardize in a future revision of epub.

Charles LaPierre: +1 to looking into signing of the EPUB maybe a Community Group task to investigate?.

Hadrien Gardeur: i think its more likely to happen if a major player in the industry starts requiring signatures - it’s a chicken and egg situation.

Zheng Xu (徐征): for epub nobody really understands how to use signature.
… once use cases are clearer, then we can start to standardize.

Bill Kasdorf: should we be incubating signatures in the CG?.

Zheng Xu (徐征): Bill_Kasdorf_: yes I feel like so.

Dave Cramer: sounds like we have stuff to do.
… polish wendyreid’s text and get it in PR.
… notify TAG re. same.
… research re. use cases of signing epubs.
… (alternatives to what we already have with XML signatures?).
… and hopefully we can help wendyreid with PR.

Wendy Reid: i think mgarrish is working on the PR piece, but we can help if he needs us to.

Dave Cramer: okay, thanks everyone, we’ll see you next week!.