EPUB 3 Working Group Telco — Minutes

Date: 2022-06-03

See also the Agenda and the IRC Log

Attendees

Present: Toshiaki Koike, Dave Cramer, Masakazu Kitahara, Shinya Takami (高見真也), Avneesh Singh, Wendy Reid, Dan Lazin, Charles LaPierre, Brady Duga, Matthew Chan, Ivan Herman, Murata Makoto, Ben Schroeter, Tzviya Siegman, Jen Goulden, Zheng Xu (徐征)

Regrets:

Guests: Gertjan Franken

Chair: Dave Cramer

Scribe(s): Matthew Chan

Content:


1. Gertjan Franken presents “Reading Between the Lines: An Extensive Evaluation of the Security and Privacy Implications of EPUB Reading Systems”.

Ivan Herman: See The published paper.

Ivan Herman: See presentation slides.

Dave Cramer: this is a special meeting of the epub3 wg, we have a guest.
… in the last meeting we talked about the paper written by our guest about security and privacy implications for epub 3 rs, and how that relates to review we’ve had from TAG and security and privacy wg.
… would you like to introduce yourself?.

Gertjan Franken: I am a security researcher, we researched security properties and vulnerabilities of epub rs.
… I’m a big reader, and when I discovered that web tech is underlying, that is how this interest started.
… happy to advise on this.

Dave Cramer: i know from reading your paper that epub 3.2 was the standard, this group is working on the next revision of the standard.
… this is the first time that epub is going through the full format w3c rec track.
… the earlier revision was subject to a much less involved process.
… so the big change is that we have to have tests for conformance.
… the other thing is the concept of horizontal review by other w3c groups, e.g. for a11y, for i18n, and of particular interest, security and privacy wg.
… also TAG, which also has an interest in security and privacy issues.
… based on that we’ve made some major changes, especially to that security and privacy section.
… we’ve written a threat model, our recommendations on security and privacy are now closer to normative.

Ivan Herman: we’ve also tried to be more precise, we have explicit reference to the origin model – e.g. each epub must have its own origin.
… we also clarified what the root URL of an epub instance is, something which influences the way relative URLs are interpreted.
… any relative URL that is used in a content document can be checked by epubchecker to see whether it would go “out” of the content.
… so where you researched the possibility of epub going into the fs, this is what we are trying to address.

Dave Cramer: See definition about the root URL.

Ivan Herman: of course, there are many things that a epubchecker cannot catch.
… we also started a discussion about file: URLs, which currently should not be used.
… we are discussion whether we should strengthen this to must not be used.
… but per our charter, we are under a strong requirement to be backwards compatibility.
… i.e. any valid epub 3.2 must be a valid 3.3.
… this would be an exception to that, but for many of us security is a stronger priority than backwards compatibility.
… that is why we need to discuss this now.
… as far as we know, there are no real epubs in the wild that use file: URL.

Gertjan Franken: can files be local to the user fs?.

Ivan Herman: it is local in the sense that files can be in the package, but we try to stop epubs from going outside the package.

Dave Cramer: you can use seven layers of ../ to try to climb out of the epub into the fs, but our new definition prevents that from happening.

Gertjan Franken: what about symbolic links? Does it address that?.

Ivan Herman: do you mean putting a symbolic link in the package that points outside the package?.
… could that work provided that file: URLs are disallowed?.

Gertjan Franken: you might circumvent it by making a symbolic file that you put into the zip archive, where the symbolic file refers to ../ for example.
… but we didn’t find a lot of rs where this could be abused.

Ivan Herman: from our point of view, I agree that we should document that as part of the possible threats.
… i don’t think specification wise, in the definition of the epub content, we could do that. That’s a concept that is not part of epub..
… but I understand, and yes, we should document that point.

Wendy Reid: should we start the presentation part of the meeting?.

Ivan Herman: can you link us to your slides too?.

Gertjan Franken: okay, no problem, I will put it online afterwards.

Ivan Herman: great, you can just send me the link and I will include with the minutes.

Gertjan Franken: this is the presentation that I gave at a security conference, but the expertise in epub there was lower, so I will skip those parts.
… i also added some of our concerns at the end of the presentation, to highlight what was most important in our findings.
… the epub spec describes how epub should be formatted, but also how to render this format.
… it’s a zip archive, with web technologies inside.
… since developers of rs don’t want to reinvent the wheel, they rely on existing browser engine.
… how could this be exploited? One of the most interesting areas to us was Remote Resources.
… so epub can refer to local files, but also files online.
… it is recommended that rs notifies the user when trying to do so, but also to limit activity to read only.
… what were our research questions?.
… 1. what is the state of freely available epub RS? (capabilities, security considerations).
… 2. are these capabilities being abused in the wild? (malicious epubs, platforms tracking users?).
… we evaluated 97 epub rs over a variety of OS, including physical devices.
… we used a semi-automated black box evaluation method.
… test epubs are loaded, and results are shown on the screen and logged.
… this is available on github.
… for example, re. js support, are inline scripts executed? Are external scripts executed?.
… we limited our scripts to ES5.
… for remote communication, we refer to repo of HTTP that are known to execute requests. If communication was received, then we say that remote communication was allowed. We also checked whether user was notified.
… for local fs access, first we just tried render the file.
… if that didn’t work, then we tried timing attack, to use the time it would take an event to fire to detect whether a file exists on the fs or not.
… File System in Userspace used to check whether file is available.
… for URI schemes, e.g. mailto: links. But in some apps URI schemes execute actions when clicked, e.g. tel: links and Skype for business on MacOS. Clicking these links could also be automated via js.
… re. Web engine evaluation, we developed an engine fingerprinting script, and then compared the fingerprints of known engines and compared to fingerprints of unknown engines embedded in rs.
… Results. 48% of rs tested support js. Re. remote comm. only 1 rs required user consent..
… In 16 cases we were able to infer existence of local files, and in 8 cases we could even read local files.
… this was a concern mainly on desktop and smartphone.
… 24 rs supported URI handles, but only 3 were relying on an insecure web engine. This latter issue was only a concern for desktop.
… this is what hackers do. They analyze vulnerabilities and write exploit. As a case study, we looked specifically at Apple Books, EPUBReader in browser, and Amazon Kindle.
… Apple Books we could read user info, EPUBReader extension allowed CSP circumvention leading to universal XSS, Amazon Kindle had an input validation issue due to old version of webkit which resulted in leaking of user’s library.
… so are these being abused in the wild?.
… to analyze if any malicious epubs are being shared, we looked at about 9000 epubs shared on Pirate Bay.
… we did not find any malicious epubs, fewer than 1% contained js (which was benign).
… we also checked legal channels and found same results.
… but what is the feasibility of distributing a malicious ebook?.
… you would upload it to a file sharing platform with no security.
… you could also try distributing via the self-publishing platform.
… are self-published epubs sufficiently sanitized?.
… so we tested self-publishing standards of 6 vendors, and unfortunately we found it inadequate for some.
… so what are our main takeaways?.
… almost none of the js-supporting rs adhere to security recommendations. Most did not isolate local fs, allowed reading local file contents..

Dave Cramer: See Collection of the test EPUB publications.

Gertjan Franken: we contacted devs and urged them to fix the problems.
… we found no abuse in the wild yet, but we found it very possible, even via legal channels.
… our evaluation method is open-source, to try to help devs and make users confident in their security.
… our concerns: js execution. Not a lot of use in real epubs, but it greatly increases attack surface. We would argue that maybe you should prohibit js execution..
… of course this would limit use-cases greatly.
… so what about compromise? Have js execution disabled by default, subject to user consent?.
… Remote resources. Also increases attack surface, opens risk of reading user files. We are in favor of limiting ability to do this..
… Online remote resources. Also a huge security impact. But we’re not sure of all the use-cases for online remote resources..
… e.g. spec says it could be used if some resources are too big to put inside the container.
… URI schemes. Can be exploited to perform malicious actions. We would disallow use of certain schemes, not sure there is a legit use-case for this..
… the possibility of malicious action probably outweighs any usability impact.
… Embedded web engine configuration. ebooks should not be able to geolocate, or access webcam, but all of these are inherited as part of relying on web engines..
… some of the rs developers overwrite security defaults with flags (e.g. –allow-file-access-from-files). This is not recommended by devs of web engines, it’s for testing only..
… this could also be mentioned explicitly in spec.
… we did not find many rs that employed outdated web engine, but you could flag for rs devs that updating embedded web engines is very important.
… hard/strict requirements instead of recommendations. At the time of our evaluation, only a few rs adhered to the recommendations.
… creating awareness among users and developers. What is even more useful than a compliance checker might be having practical developer guidelines about these security issues.
… we didn’t have full overview of what use-cases were intended, so we erred on the side of cautions.
… thank you. Are there any questions?.

Dave Cramer: thank you, that was extremely informative and helpful.
… obviously js in general has been a thorny issue from the start, and not just because of security issues.
… we’ve had issues about who gets to intercept clicks when both rs and epub want to use js.
… but ability to use js is especially important in educational publishing.
… it seems that a lot of your research was done using trade books, as opposed to the specialized rs used for high ed, which rely really heavily on rs.
… unfortunately we can’t go back in time to when we first wrote epub 3.

Gertjan Franken: if I understand correctly these edu epubs have interactive examples, is that right?.

Dave Cramer: sometimes it may be tied to online learning, i.e. recording student progress on a server.
… we want to shut down local remote resources, but for online remote resources, we do have use-cases related to file size, and web fonts.

Gertjan Franken: what I saw with Apple Books on iOS, there if you load a chapter that needs a remote resources, it prompts users with a modal asking permission to load from a particular domain (and remembers the decision).
… of course it comes with a certain amount of time in re-designing the UI, but worth looking into.

Brady Duga: about js, the epubs are supposed to say whether they support js or not. So we could have recommendation that rs disable js when epub does not say it uses js. And when epub says it does use js, then we can prompt user..
… we may already say something like that in the spec.

Ivan Herman: right, the mechanism is there, because we have the “scripted” property.
… we should put duga’s suggestion in the security section..
… When I read your paper it made me really surprised to realize that we do allow accessing local fs.
… in paper it seemed that it was intention that this was possible, but in reality it was more just because file: was part of the web.
… we simply did not explicitly disallow it.
… and that’s why I think disallowing the file scheme should happen.
… so you have these epub files that you used for testing. We are developing a pretty large test suite for epub testing in general, I was wondering whether your test books can either be incorporated into ours, or referenced.
… to re-use those.
… which would force rs vendors who want to self-test to use those.
… how many is that?.

Gertjan Franken: at least one per area of experimentation, but might be more.
… agree that it would be useful to incorporate into your testing too (but I haven’t look at yours yet).

Ivan Herman: it’s really a collection of small test epub books, 1 epub for every feature that needs to be tested.
… i propose we have a separate call on how we can re-use your tests.

Dave Cramer: See Collection of the test EPUB publications.

Ivan Herman: similarly, compared to 3.2, we have added a significant amount of text on security and privacy issues, so it would be immensely helpful if you could review those text in light of your experience.
… i would be really grateful.

Gertjan Franken: I have skimmed some of the added security text, I would be happy to spend some time to read it and see if we have any suggestions.

Ivan Herman: re. your suggestion that we use strict requirements instead of recommendations, under the w3c process when we have to test for conformance.
… currently we have a PR that changes those recommendations into SHOULD statements, which is at least a step in the direction you propose.

Gertjan Franken: so if you say MUST you have to enforce it in the testing tool?.

Ivan Herman: exactly.

Zheng Xu (徐征): in my platform we are trying to encourage creators to use js to be more interactive with end-use, like a normal website.
… from your perspective, what do you think should be different between an epub and a normal website?.
… to me, a rs is similar to a web browser.

Gertjan Franken: it depends on the use-case, of course.
… i’m fine with using epub as a complete website, but browsers are updated frequently, they are more secure.
… so if you want to use rs as browser, they should be treated the same way re. updates and security.
… it’s possible to do it in a secure way.
… the rs that were not secure did not adhere to the same standards as web browsers.

Charles LaPierre: this was amazing, congrats and well done.
… on slide 23 you uploaded malicious content to a number of different platforms, and a couple of them failed, right?.
… i would have thought that all of them would use epubcheck as part of ingestion pipeline, so I’m wondering if we could add some of these warnings to epub check.
… it would prevent malicious epubs from getting out via legit channels.

Gertjan Franken: its difficult to enforce a requirement that online remote resources should be on benign domains, for example.
… but maybe something could be done re. local fs.
… we found some platforms are doing well. Play Books is good. Barns & Noble rs crashed when you tried to load malicious epub, so that was good..

Dave Cramer: evaluating arbitrary js for security implications is probably impossible, yeah.

Tzviya Siegman: what epubcheck does formally is check against the spec, so if we wanted to add this to epubcheck then we’d need to update the spec.

Ivan Herman: but if the file URL is disallowed then epubcheck will flag it.

Tzviya Siegman: yes, gotta go, thank you!.

Dave Cramer: this was a great presentation.

Ivan Herman: how do we move on from here?.
… maybe I will go through minutes and distill some points info github issues.
… and then, Gertjan, if you and your team can review those and let us know?.

Gertjan Franken: okay, we can schedule a next meeting to discuss the specifics.
… and thank you for inviting me here.

Wendy Reid: we’ll keep in touch!.

2. AOB.

Zheng Xu (徐征): we have a joint meeting with the epub cg next week. Everyone is invited..