EPUB 3 Working Group Telco — Minutes

Date: 2021-02-18

See also the Agenda and the IRC Log

Attendees

Present: Wendy Reid, Matthew Chan, Dave Cramer, Leonard Rosenthol, Masakazu Kitahara, Shinya Takami (高見真也), Ben Schroeter, Marisa DeMeglio, Toshiaki Koike, Brady Duga, Matt Garrish

Regrets:

Guests:

Chair: Wendy Reid

Scribe(s): Matthew Chan

Content:


1. Republish Core and RS

Wendy Reid: we just voted to publish FPWD of a11y and techniques note
… but we also want to republish the other two pieces

Proposed resolution: Publish new drafts of the EPUB 3.3 Core and Reading Systems documents (Wendy Reid)

Matthew Chan: +1

Matt Garrish: +1

Ben Schroeter: +1

Brady Duga: +1

Marisa DeMeglio: +1

Leonard Rosenthol: 0

Toshiaki Koike: +1

Masakazu Kitahara: +1

Resolution #1: Publish new drafts of the EPUB 3.3 Core and Reading Systems documents

2. Remaining i18n issues

See github issue #1508, #1509.

Wendy Reid: 2 i18n issues remain after the review

Dave Cramer: the two issues are pretty intertwined
… i18n should require valid lang tags
… there is a formal grammar which describes the formal structure of language tags
… so, well-formedness

Leonard Rosenthol: I believe that lang is ISO 3166

Matt Garrish: we enforce well formed, but nothing about validity

Dave Cramer: a valid tag is one which matches the actual languages
… so should we require lang tags to be valid?
… and what should RS do when faced with invalid lang tag?
… not want to make requirement more stringent
… hard to check validity of lang tags
… its a list of strings that changes over time
… burden on epubcheck
… also, there could be existing epub with well formed but invalid lang tags - a change could cause those epubs to fail epubcheck
… should having a valid lang tag just be a best practice?
… so epubcheck would just flag it as an informative warning

Wendy Reid: leonardr:

Leonard Rosenthol: [here’s the info](https://developer.mozilla.org/en-US/docs/Web/HTML/Global_attributes/lang
… BCP-47 is the technical spec, you can validate against that if you want
… but i think we should do what HTML does, no more, no ness

Matt Garrish: some of this came up in epubcheck itself, when someone complained that there was no check for lang validity
… also, the epubcheck folks seemed to say that it would be a hard thing to implement
… and unless there is some critical function, a RS is going to ignore this
… and there are currently no critical functions that rely on this very general metadata
… nothing bad comes of this lang tag not being specified properly
… in pub manifest we said that well-formedness is good enough

Wendy Reid: also, we’d run into issues with testing this if we wanted to make this stricter

Dave Cramer: matt has more or less convinced me that this isn’t broken

Brady Duga: +1 to not broken

Dave Cramer: … the costs of fixing it are higher than the purely theoretical benefits of conforming to broadly worded i18n guidelines

Dave Cramer: i propose that we close this without fixing

Matt Garrish: something like WCAG could have stricter rules about lang tags, but for us its not a critical piece of metadata

Ben Schroeter: i like the idea of doing some sort of warning in epubcheck
… also, i don’t want to tell RS what to do in general
… RS want to be as lax as possible when it comes to what they will ingest
… they don’t want to keep content authors off their RS

Proposed resolution: Close issue 1509 with no action (Wendy Reid)

Marisa DeMeglio: +1

Ben Schroeter: +1

Matthew Chan: +1

Wendy Reid: +1

Brady Duga: +1

Matt Garrish: +1

Toshiaki Koike: +1

Masakazu Kitahara: +1

Shinya Takami (高見真也): +1

Resolution #2: Close issue 1509 with no action

Dave Cramer: for 1508 (i.e. question of what RS should do with poorly-formed lang tag)

Dave Cramer: there was a suggestion that the RS treat the lang as “und” (i.e. undefined)
… that to me is a satisfactory solution to this somewhat theoretical problem

Leonard Rosenthol: +1

Proposed resolution: Close issue 1508, add text to RS specification instructing reading systems to treat a poorly-formed language tag as “und” (undefined) (Wendy Reid)

Brady Duga: is this yet another untestable assertion?
… should we tell RS what to do with this at all?

Matt Garrish: i suggested the “und” thing because i thought we’d done this in pub manifest as well
… but i think we actually went back and decided to remain silent on it
… “we’re not going to define what it means for the RS”

Dave Cramer: testing it would require reading the minds of the RS
… what we’re actually using the lang tag for is trying to guess at page progression direction?
… how would we know if the RS is actually doing this?
… so maybe another “close, won’t fix”?

Ben Schroeter: if we feel the RS wants some sort of guidance we could change the proposal to say “suggest”
… but i’m also happy to drop it

Matt Garrish: i think there’s something worrisome about RS determining lang for the author
… i’d rather RS do nothing

Dave Cramer: would also add that RS don’t seem to be looking for guidance

Proposed resolution: close issue 1508, won’t fix (Wendy Reid)

Ben Schroeter: +1

Brady Duga: +1

Wendy Reid: +1

Matthew Chan: +1

Matt Garrish: +1

Toshiaki Koike: +1

Masakazu Kitahara: +1

Resolution #3: close issue 1508, won’t fix

3. Origin, cont’d

See github issue #873, #1156.

Wendy Reid: this is continuing from last week’s meeting

Dave Cramer: i think most of the discussion is in issue 1153
… we’ve struggled with how to specify scripting in epub
… we’ve gotten lots of questions from outside the group about how our security model ties in with the security model of the rest of the web
… we have non-normative text in the spec
… leonard has mentioned the concept of security boundaries, with origin being main boundary in Web world
… my opinion is that the text we currently have is wrong
… boundary should be around the epub and not the content document
… e.g. where content documents within an epub want to share a resource
… also, origin is more the concept we’re going for, not domain (which is what we currently reference)
… could we say that each epub should be an opaque/unique origin
… even if not particularly testable, at least it is a stake in the ground
… re. how we are trying to fit into the web security model

Leonard Rosenthol: the thing that is most problematic is the difference between actually doing this in a browser with a content hosted on a real domain vs doing this on a device (mobile, desktop, etc.)
… in the device scenario the RS can completely control the origin
… the RS sets up the origin
… so your statements about every epub being its own RS can be done on device, but you can’t do that on the web
… so controlling scripts within the context of an origin makes sense in the device scenario, but not in the web scenario
… that’s the main issue

Dave Cramer: i hear you
… in the RS i’m aware of that are web-based, there is a pretty big disconnect between what you see in the URL bar and what is actually happening inside the RS
… is it reasonable to ask the RS to follow a stricter set of rules than would be required by the generic web security model
… say Hachette decided to put all their books on a domain, I think its reasonable to say that if an RS were to do that they need to architect it so that all books aren’t on the same origin as each other
… i see this as adding requirements to implementation if the implementation happens to be web-based

Leonard Rosenthol: the problem is that you can’t do that
… we tried to do that with sub-origins, but that hasn’t been touched since 2017
… never implemented seriously
… never made it through the webapp sec WG
… in your example, all your epubs are originated off the same thing, they would all share the same local storage etc.
… if those are all your books, that’s fine, but once that content goes outside, there’s no guarantee that books from different publishers won’t be able to see each other

Dave Cramer: could you solve that problem with different subdomains for each title?

Leonard Rosenthol: yes, but only in a world where all the epubs come from the same publisher
… e.g. an epub from patreaon uploaded to dropbox or onedrive
… that book would have access to all of dropbox

Dave Cramer: you’re kind of creating a non-conforming RS in this example

Leonard Rosenthol: that would make all web-based RS non-conforming

Wendy Reid: I think dropbox actually does have an ebook reader….

Leonard Rosenthol: they’re probably taking advantage of no scripting then

Wendy Reid: i think the solution that most RS have come to is just to avoid scripting entirely
… easiest way out of the origin problem

Leonard Rosenthol: that doesn’t solve other things, e.g. referencing
… trying to reference a font or other resource inside that domain as a relative link
… nothing prevents referencing outside the epub at that point (e.g. ../../)
… and assuming this is served via HTTPS, that gives it a lot more privileges than an non-secure URL

Brady Duga: this really seems like a scripting issue
… you have to make sure you don’t access things you don’t own
… but that isn’t an origin issue, that a rights access issue
… the real problem is storing cookies, and then someone else’s book accessing it

Dave Cramer: Jiminy has real world examples of this sort of stuff
… e.g. an epub in ibooks that goes and finds info about other books
… is there anything in the spec right now that says that’s bad?

Brady Duga: maybe? It depends on the RS and the content
… e.g. RS for a school, where every student shares every book, that would be okay
… one book might want to check how far a student got in another book
… i.e. not a bad idea in ALL cases

Leonard Rosenthol: if, say, you’re building your own software and documents, and you control the entire system there’s no reason why you wouldn’t want to do it that way

Dave Cramer: one thing to do is go back to our current language
… do we still want to say that every content document in the same epub should belong to a different domain?

Leonard Rosenthol: can probably change that so that each epub is its own origin, like you said earlier

Matt Garrish: the original wording came at a time when we were just starting to open epub to scripting
… we were designing it to be as restrictive as possible
… we’ve tried to dodge this in the past by limiting where scripting is allowed

Dave Cramer: to me i feels like a little bit of progress if we relax the current language to say “per epub” instead of “per content document”
… this leaves us vulnerable to intra-epub security issues
… but that really seems like more of an authoring problem than a problem with the spec

Brady Duga: right now the spec is more restrictive, but we’re already finding examples IRL where RS are not honoring it
… from testing perspective, its not clear how this would be implemented

Matt Garrish: depends where we are going with this
… right now that section is only informative, so that’s fine
… if we change the section to be normative, then yes, that might be an issue

Dave Cramer: given all that, should we take the baby step of updating the non-normative guidance that the boundary should be “per epub”?
… consensus on this?

Leonard Rosenthol: +1

Matt Garrish: +1

Brady Duga: does that include changing from “domain” to “origin”?

Dave Cramer: yes, i think so
… stuff about port randomization scares me a little bit as someone who wants to do something useful with scripting

Proposed resolution: Update the informative statement in the core specification about origin from “content document” to “EPUB”, and “domain” to “origin” (Wendy Reid)

Matt Garrish: +1

Matthew Chan: +1

Leonard Rosenthol: +1

Wendy Reid: +1

Brady Duga: +1

Toshiaki Koike: +1

Ben Schroeter: +1

Resolution #4: Update the informative statement in the core specification about origin from “content document” to “EPUB”, and “domain” to “origin”

Wendy Reid: that’s everything that was on the agenda tonight

Dave Cramer: i think i do have an action item to talk to TAG about the general ideas around epub security

Wendy Reid: there is most likely going to be a special session at the business group next week about WCAG3
… silver is going to be presenting to business group about WCAG3
… extending the invitation here
… i will send out meeting details on the mailing list
… WCAG3 calls out epub as a standard several times
… probably worth providing our feedback
… meeting date is Tues 233d, noon Boston time
… AOB?
… no? Thank you everyone, and thank you leonardr!


4. Resolutions