13:35:08 RRSAgent has joined #epub 13:35:08 logging to https://www.w3.org/2022/06/03-epub-irc 13:35:11 RRSAgent, make logs Public 13:35:12 please title this meeting ("meeting: ..."), ivan 13:35:26 ivan has changed the topic to: Meeting Agenda 2022-06-03: https://lists.w3.org/Archives/Public/public-epub-wg/2022Jun/0000.html 13:35:27 Chair: dauwhe 13:35:27 Date: 2022-06-03 13:35:27 Agenda: https://lists.w3.org/Archives/Public/public-epub-wg/2022Jun/0000.html 13:35:27 Meeting: EPUB 3 Working Group Telco 13:38:31 scribejs, set gertjan Gertjan Franken 13:38:39 guests+ gertjan 13:39:28 -> “Reading Between the Lines: An Extensive Evaluation of the Security and Privacy Implications of EPUB Reading Systems” https://lirias.kuleuven.be/retrieve/616428 13:45:22 dauwhe has joined #epub 13:48:54 dauwhe has joined #epub 13:54:44 shiestyle has joined #epub 13:58:04 dauwhe_ has joined #epub 13:58:54 toshiakikoike has joined #epub 13:59:08 MattChan has joined #epub 13:59:11 AvneeshSingh has joined #epub 13:59:15 present+ 13:59:23 present+ 13:59:35 MasakazuKitahara has joined #epub 13:59:46 present+ 14:00:07 wendyreid has joined #epub 14:00:14 present+ 14:00:19 present+ 14:00:34 present+ 14:00:35 CharlesL has joined #epub 14:00:43 scribe+ 14:01:01 dlazin has joined #epub 14:01:04 present+ 14:01:08 duga has joined #epub 14:01:12 present+ 14:01:18 present+ 14:01:32 present+ 14:01:53 present+ 14:02:29 dauwhe_: this is a special meeting of the epub3 wg, we have a guest 14:03:07 ... in the last meeting we talked about the paper written by our guest about security and privacy implications for epub 3 rs, and how that relates to reivew we've had from TAG and security and privacy wg 14:03:11 present+ shiestyle 14:03:14 ... would you like to introduce yourself? 14:03:50 Franken: I am a security researcher, we researched security properties and vulnerabilities of epub rs 14:04:14 ... I'm a big reader, and when I discovered that web tech is underlying, that is how this interest started 14:04:20 s/Franken/Gertjan/ 14:04:20 ... happy to advise on this 14:04:38 scribejs, set Franken Gertjan Franken 14:04:42 dauwhe_: i know from reading your paper that epub 3.2 was the standard, this group is working on the next revision of the standard 14:04:54 ... this is the first time that epub is going through the full format w3c rec track 14:05:04 ... the earlier revision was subject to a much less involved process 14:05:18 ... so the big change is that we have to have tests for conformance 14:05:46 ... the other thing is the concept of horizontal review by other w3c groups, e.g. for a11y, for i18n, and of particular interest, security and privacy wg 14:05:58 ... also TAG, which also has an interest in security and privacy issues 14:06:11 present+ makoto 14:06:16 ... based on that we've made some major changes, especially to that security and privacy section 14:06:42 ... we've written a threat model, our recommendations on security and privacy are now closer to normative 14:07:12 ivan: we've also tried to be more precise, we have explicit reference to the origin model -- e.g. each epub must have its own origin 14:07:35 ... we also clarified what the root URL of an epub instance is, something which influences the way relative URLs are interpreted 14:08:05 ... any relative URL that is used in a content document can be checked by epubchecker to see whether it would go "out" of the content 14:08:32 ... so where you researched the possibility of epub going into the fs, this is what we are trying to address 14:08:44 ... of course, there are many things that a epubchecker cannot catch 14:09:03 ... we also started a discussion about file: URLs, which currently should not be used 14:09:17 ... we are discussion whether we should strengthen this to must not be used 14:09:31 ... but per our charter, we are under a strong requirement to be backwards compat 14:09:41 ... i.e. any valid epub 3.2 must be a valid 3.3 14:10:01 ... this would be an exception to that, but for many of us security is a stronger priority than backwards compat 14:10:10 ... that is why we need to discuss this now 14:10:25 ... as far as we know, there are no real epubs in the wild that use file: URL 14:10:39 Gertjan: can files be local to the user fs? 14:10:55 https://w3c.github.io/epub-specs/epub33/core/#sec-container-iri 14:11:01 ivan: it is local in the sense that files can be in the package, but we try to stop epubs from going outside the package 14:11:23 dauwhe_: you can use seven layers of ../ to try to climb out of the epub into the fs, but our new definition prevents that from happening 14:11:35 Gertjan: what about symbolic links? Does it address that? 14:11:58 ivan: do you mean putting a symbolic link in the package that points outside the package? 14:12:14 ... could that work provided that file: URLs are disallowed? 14:12:47 Gertjan: you might circumvent it by making a symbolic file that you put into the zip archive, where the symbolic file refers to ../ for example 14:12:58 ... but we didn't find a lot of rs where this could be abused 14:13:09 q? 14:13:27 ivan: from our point of view, I agree that we should document that as part of the possible threats 14:13:53 ... i don't think specification wise, in the definition of the epub content, we could do that. That's a concept that is not part of epub. 14:14:01 ... but I understand, and yes, we should document that point 14:14:20 wendyreid: should we start the presentation part of the meeting? 14:14:35 ivan: can you link us to your slides too? 14:14:37 tzviya has joined #epub 14:14:38 present+ BenSchroeter 14:14:50 Gertjan: okay, no problem, I will put it online afterwards 14:15:05 present+ tzviya 14:15:06 ivan: great, you can just send me the link and I will include with the minutes 14:15:24 present+ JenG 14:15:52 Gertjan: this is the presentation that I gave at a security conference, but the expertise in epub there was lower, so I will skip those parts 14:16:08 ... i also added some of our concerns at the end of the presentation, to highlight what was most important in our findings 14:16:12 q? 14:16:36 ... the epub spec describes how epub should be formatted, but also how to render this format 14:16:49 BenSchroeter has joined #epub 14:16:49 ... it's a zip archive, with web technologies inside 14:17:02 present+ 14:17:05 ... since developers of rs don't want to reinvent the wheel, they rely on existing browser engine 14:17:28 ... how could this be exploited? One of the most interesting areas to us was Remove Resources 14:17:45 ... so epub can refer to local files, but also files online 14:18:11 ... it is recommended that rs notifies the user when trying to do so, but also to limit activity to read only 14:18:24 ... what were our research questions? 14:18:51 ... 1. what is the state of freely available epub RS? (capabilities, security considerations) 14:19:18 ... 2. are these capabilities being abused in the wild? (malicious epubs, platforms tracking users?) 14:19:47 ... we evaluated 97 epub rs over a variety of OS, including physical devices 14:20:02 ... we used a semi-automated black box evaluation method 14:20:19 ... test epubs are loaded, and results are shown on the screen and logged 14:20:27 ... this is available on github 14:20:52 ... for example, re. js support, are inline scripts executed? Are external scripts executed? 14:20:58 ... we limited our scripts to ES5 14:21:43 q? 14:21:59 ... for remote communication, we refer to repo of HTTP that are known to execute requests. If communication was received, then we say that remote communication was allowed. We also checked whether user was notified 14:22:31 ... for local fs access, first we just tried render the file 14:23:00 ... if that didn't work, then we tried timing attack, to use the time it would take an event to fire to detect whether a file exists on the fs or not 14:23:23 ... File System in Userspace used to check whether file is available 14:25:06 ... for URI schemes, e.g. mailto: links. But in some apps URI schemes execute actions when clicked, e.g. tel: links and Skype for business on MacOS. Clicking these links could also be automated via js 14:25:58 ... re. Web engine evaluation, we developed an engine fingerprinting script, and then compared the figerprints of known engines and compared to fingerprints of unknown engines embedded in rs 14:26:35 ... Results. 48% of rs tested support js. Re. remote comm. only 1 rs required user consent. 14:27:12 ... In 16 cases we were able to infer existence of local files, and in 8 cases we could even read local files 14:27:24 q? 14:27:29 ... this was a concern mainly on desktop and smartphone 14:28:30 ... 24 rs supported URI handles, but only 3 were relying on an insecure web engine. This latter issue was only a concern for desktop 14:29:17 present+ zheng 14:29:25 zheng_xu has joined #epub 14:29:30 ... this is what hackers do. They analyze vulnerabilities and write exploit. As a case study, we looked specifically at Apple Books, EPUBReader in browser, and Amazon Kindle 14:29:46 present+ 14:30:43 ... Apple Books we could read user info, EPUBReader extension allowed CSP circumvention leading to universal XSS, Amazon Kindle had an input validation issue due to old version of webkit which resulted in leaking of user's library 14:30:53 ... so are these being abused in the wild? 14:31:32 ... to analyze if any malicious epubs are being shared, we looked at about 9000 epubs shared on Pirate Bay 14:31:54 ... we did not find any malicious epubs, fewer than 1% contained js (which was benign) 14:32:06 ... we also checked legal channels and found same results 14:32:26 ... but what is the feasibility of distributing a malicious ebook? 14:32:45 ... you would upload it to a file sharing platform with no security 14:33:01 ... you could also try distributing via the self-publishing platform 14:33:10 ... are self-published epubs sufficiently sanitized? 14:33:34 Jen_G has joined #epub 14:33:36 ... so we tested self-publishing standards of 6 vendors, and unfortunately we found it inadequate 14:33:40 Present+ 14:33:46 ... so what are our main takeaways? 14:34:13 ... almost none of the js-supporting rs adhere to security recommendations. Most did not isolate local fs, allowed reading local file contents. 14:34:13 https://github.com/DistriNet/evil-epubs 14:34:22 ... we contacted devs and urged them to fix the problems 14:34:37 ... we found no abuse in the wild yet, but we found it very possible, even via legal channels 14:34:58 ... our evaluation method is open-source, to try to help devs and make users confident in their security 14:35:51 ... our concerns: js execution. Not a lot of use in real epubs, but it greatly increases attack surface. We would argue that maybe you should prohibit js execution. 14:35:53 q+ 14:36:02 ... of course this would limit use-cases greatly 14:36:20 ... so what about compromise? Have js execution disabled by default, subject to user consent? 14:36:56 ... Remote resources. Also increases attack surface, opens risk of reading user files. We are in favor of limiting ability to do this. 14:37:11 q+ 14:37:24 ... Online remote resources. Also a huge security impact. But we're not sure of all the use-cases for online remote resources. 14:37:44 ... e.g. spec says it could be used if some resources are too big to put inside the container 14:38:21 ... URI schemes. Can be exploited to perform malicious actions. We would disallow use of certain schemes, not sure there is a legit use-case for this. 14:38:38 ... the possibility of malicious action probably outweighs any usability impact 14:39:19 ... Embedded web engine configuration. ebooks should not be able to geolocate, or access webcam, but all of these are inherited as part of relying on web engines. 14:39:40 q? 14:39:53 ... some of the rs developers overwrite security defaults with flags (e.g. --allow-file-access-from-files). This is not recommended by devs of web engines, it's for testing only. 14:40:00 ... this could also be mentioned explicitly in spec 14:40:27 ... we did not find many rs that employed outdated web engine, but you could flag for rs devs that updating embedded web engines is very important 14:40:54 ... hard/strict requirements instead of recommendations. At the time of our evaluation, only a few rs adhered to the recommendations 14:41:37 ... creating awareness among users and developers. What is even more useful than a compliance checker might be having practical developer guidelines about these security issues 14:41:53 ... we didn't have full overview of what use-cases were intended, so we erred on the side of cautions 14:42:12 ... thank you. Are there any questions? 14:42:24 dauwhe_: thank you, that was extremely informative and helpful 14:42:48 ... obviously js in general has been a thorny issue from the start, and not just because of security issues 14:43:12 ... we've had issues about who gets to intercept clicks when both rs and epub want to use js 14:43:30 ... but ability to use js is especially important in educational publishing 14:43:44 q+ 14:43:51 ... it seems that a lot of your research was done using trade books, as opposed to the specialized rs used for high ed, which rely really heavily on rs 14:44:10 ... unfortunately we can't go back in time to when we first wrote epub 3 14:44:29 Gertjan: if I understand correctly these edu epubs have interactive examples, is that right? 14:44:44 dauwhe_: sometimes it may be tied to online learning, i.e. recording student progress on a server 14:45:15 ... we want to shut down local remote resources, but for online remote resources, we do have use-cases related to file size, and web fonts 14:45:22 ack dau 14:46:03 Gertjan: what I saw with Apple Books on iOS, there if you load a chapter that needs a remote resources, it prompts users with a modal asking permission to load from a particular domain (and remembers the decision) 14:46:04 q later 14:46:07 q+ later 14:46:18 ... of course it comes with a certain amount of time in re-designing the UI, but worth looking into 14:46:24 ack du 14:47:00 q+ 14:47:18 duga: about js, the epubs are supposed to say whether they support js or not. So we could have recommendation that rs disable js when epub does not say it uses js. And when epub says it does use js, then we can prompt user. 14:47:28 ... we may already say something like that in the spec 14:47:38 ack ivan 14:47:45 ivan: right, the mechanism is there, because we have the "scripted" property 14:47:58 ... we should put duga's suggestion in the security section 14:47:58 ack ivan 14:48:26 ... when I read your paper it made me really surprised to realize that we do allow accessing local fs 14:48:57 ... in paper it seemed that it was intention that this was possible, but in reality it was more just because file: was part of the web 14:49:03 ... we simply did not explicitly disallow it 14:49:21 ... and that's why I think disallowing the file scheme should happen 14:50:02 ... so you have these epub files that you used for testing. We are developing a pretty large test suite for epub testing in general, I was wondering whether your test books can either be incorporated into ours, or referenced 14:50:06 q+ 14:50:08 ... to re-use those 14:50:22 ... which would force rs vendors who want to self-test to use those 14:50:25 ... how many is that? 14:50:43 Gertjan: at least one per area of experimentation, but might be more 14:51:05 ... agree that it would be useful to incorporate into your testing too (but I haven't look at yours yet) 14:51:25 ivan: it's really a collection of small test epub books, 1 epub for every feature that needs to be tested 14:51:40 ... i propose we have a separate call on how we can re-use your tests 14:52:24 ... similarly, compared to 3.2, we have added a significant amount of text on security and privacy issues, so it would be immensely helpful if you could review those text in light of your experience 14:52:37 ... i would be really grateful 14:53:03 Gertjan: I have skimmed some of the added security text, I would be happy to spend some time to read it and see if we have any suggestions 14:53:56 ivan: re. your suggestion that we use strict requirements instead of recommendations, under the w3c process when we have to test for conformance 14:54:24 ... currently we have a PR that changes those recommendations into SHOULD statements, which is at least a step in the direction you propose 14:54:42 Gertjan: so if you say MUST you have to enforce it in the testing tool? 14:54:45 ivan: exactly 14:54:49 ack zheng_xu 14:55:14 zheng_xu: in my platform we are trying to encourage creators to use js to be more interactive with end-use, like a normal website 14:55:31 ... from your perspective, what do you think should be different between an epub and a normal website? 14:55:53 ... to me, a rs is similar to a web browser 14:56:08 Gertjan: it depends on the use-case, of course 14:56:29 ... i'm fine with using epub as a complete website, but browsers are updated frequently, they are more security 14:56:47 ... so if you want to use rs as browser, they should be treated the same way re. updates and security 14:56:54 s/more security/more secure 14:57:16 Gertjan: it's possible to do it in a secure way 14:57:29 ack char 14:57:32 ... the rs that were not secure did not adhere to the same standards as web browsers 14:57:38 CharlesL: this was amazing, congrats and well done 14:57:58 ... on slide 23 you uploaded malicious content to a number of different platforms, and a couple of them failed, right? 14:58:31 ... i would have thought that all of them would use epubcheck as part of ingestion pipeline, so I'm wondering if we could add some of these warnings to epub check 14:58:41 ... it would prevent malicious epubs from getting out via legit channels 14:59:13 Gertjan: its difficult to enforce a requirement that online remote resources should be on benign domains, for example 14:59:24 ... but maybe something could be done re. local fs 15:00:11 q+ 15:00:17 ... we found some platforms are doing well. Play Books is good. Barns & Noble rs crashed when you tried to load malicious epub, so that was good. 15:00:23 ack tz 15:00:30 dauwhe_: evaluating arbitrary js for security implications is probably impossible, yeah 15:00:58 tzviya: what epubcheck does formally is check against the spec, so if we wanted to add this to epubcheck then we'd need to update the spec 15:01:12 ivan: but if the file URL is disallowed then epubcheck will flag it 15:01:19 tzviya: yes, gotta go, thank you! 15:01:32 dauwhe_: this was a great presentation 15:01:38 q+ 15:01:42 ivan: how do we move on from here? 15:01:43 ack zh 15:02:15 zheng_xu: we have a joint meeting with the epub cg next week. Everyone is invited. 15:02:48 ivan: maybe I will go through minutes and distill some points info github issues 15:03:05 ... and then, Gertjan, if you and your team can review those and let us know? 15:03:16 q+ 15:03:18 Gertjan: okay, we can schedule a next meeting to discuss the specifics 15:03:24 ... and thank you for inviting me here 15:03:32 wendyreid: we'll keep in touch! 15:03:50 rrsagent, draft minutes 15:03:50 I have made the request to generate https://www.w3.org/2022/06/03-epub-minutes.html ivan 15:04:28 it's alright, I was going to ask Gert if could have similar presentation to PCG as well later 15:05:07 zakim, end meeting 15:05:07 As of this point the attendees have been toshiakikoike, dauwhe_, MasakazuKitahara, shiestyle, AvneeshSingh, wendyreid, dlazin, CharlesL, duga, MattChan, ivan, makoto, BenSchroeter, 15:05:10 ... tzviya, JenG, zheng, zheng_xu, Jen_G 15:05:10 RRSAgent, please draft minutes 15:05:10 I have made the request to generate https://www.w3.org/2022/06/03-epub-minutes.html Zakim 15:05:12 I am happy to have been of service, ivan; please remember to excuse RRSAgent. Goodbye 15:05:16 Zakim has left #epub 15:05:24 rrsagent, bye 15:05:24 I see no action items