EPUB 3 Working Group Telco — Minutes

Date: 2022-10-07

See also the Agenda and the IRC Log

Attendees

Present: Rick Johnson, Ivan Herman, Masakazu Kitahara, Dave Cramer, David Hall, Wendy Reid, Romain Deltour, bduga, Charles LaPierre, Brady Duga, Bill Kasdorf, Gregorio Pellegrino, Murata Makoto

Regrets: Matthew Chan, Tzviya Siegman, toshiakitoike

Guests:

Chair: Dave Cramer, Wendy Reid

Scribe(s): Dave Cramer, Wendy Reid

Content:


Wendy Reid: first agenda item is viewport metadata.

1. Some comments on the viewport meta tag (issue epub-specs#2442)

See github issue epub-specs#2442.

Romain Deltour: I had a late review of new viewport spec.
… wondering if it was too strict.
… is it enough for reading systems?.
… talking about viewports meta tag for FXL.
… for reflowable we are covered (RS must ignore viewport).
… in FXL this is important because it sets ICB.
… there’s an EBNF grammar to defined.
… it’s more constrained than what’s in the CSS draft it comes from.
… it’s more constrained than what browsers/reading systems can extract info from.
… so should we relax the grammar a bit.
… or further specify how reading systems should extract useful values from it.

Ivan Herman: to pick up on what romain said.
… the spec does say somewhere that if the viewport spec is wrong,, it should use device height/width.
… it’s not only strict, it.

Ivan Herman: https://github.com/w3c/epub-specs/issues/2442#issuecomment-1271300202.

Ivan Herman: for example Apple can make sense of an invalid viewport tag.
… what I propose is to relax the requirement to use device sizes when there is a problem.
… and instead say the RS should warn the user/author.
… and then it can do a best attempt to do something sensible.
… it can try to extract a window size.
… the only place where there’s a different problem.
… if you have a number of content files and forget to set the viewport for one of them.
… today there is nothing there so you use device sizes.
… the other possibility is to [1] warn user and [2] go back to previous spine element and use that viewport size.

Romain Deltour: I agree that the spec already set that device height/width should be used when faced with an invalid value.
… so maybe relaxing RS spec is the way to go.
… I’m not convinced RSs should warn users.

Wendy Reid: +1.

Romain Deltour: if the reading system can do its best to render something, or make up a value that’s good enough.
… as transparently as possible.
… there’s a caase with valid viewport but specifies the width several times.
… this is valid per grammar but don’t say if it should pick first/last/any.
… do we need to say something in RS spec? If there’s dupes then it becomes invalid?.

David Hall: I like your rec there.
… I would lean toward should report errors to user.
… some could be unrecoverable, like xml parsing errors.
… or there could be dev mode and remotely inspect.
… but most readers won’t be interested.
… so “should” warn the users.
… on multiple width values, add something to spec to prefer first value found.

Romain Deltour: or even MAY warn the users.

David Hall: separately, what to do when there isn’t anything specced.
… typically in FXL books you have the same page size throughout.
… different page size present challenges to the RS.
… I like the idea of falling back to the most recent valid viewport.
… if nothing is defined, do you fall back to device dimensions or something else?.
… because device dimensions change with orientation.
… and what happens when dimensions change?.

Wendy Reid: as much as i like the idea of warning the user.
… in the real world of RSs, telling the user that something is wrong with their viewport isn’t helpful. There’s nothing they can do..
… this puts an undue burden on the user.
… I think this should be an epubcheck thing.
… I like idea of preferring first value.
… RSes do lots of things to optimize for speed when they render FXL.
… we use first viewport value and don’t reparse.
… so we don’t support changing viewport sizes.

Brady Duga: agree with David and Wendy.
… I would not say should warn the user, it’s a bad idea.
… and these are often kid’s books :).
… we have end users and publishers.
… might be helpful to talk about these two groups.
… I want my publishers to know but not my users.
… you can reject something with a bad viewport.
… I’m agnostic on which to pick of multiple values.
… I think we do support multiple page sizes.

Ivan Herman: we have multiple viewports in the spec.

Rick Johnson: +1 to Brady (et. al.) and differentiating messaging between readers and publishers.

Ivan Herman: (different page sizes for different files).
… we can’t change that.
… if they don’t implement that’s ok.
… for reporting, I understand that I don’t want reports going to kid.
… I like the safari analogy of David.
… I wonder if we should say something in the spec.
… from standards point of view it’s ok to say reading systems should issue a warning.
… matt and I can come up with a larger PR.

David Hall: when I think of users, do you think of author and publisher as the same user.

Ivan Herman: we don’t specify that.

David Hall: I can see a publisher being interested in results from epubcheck.
… I can see that authors that write their own epubs would benefit from warnings from RS.
… do we use term “May”?.

Ivan Herman: yes.

Dave Cramer: We’re talking a lot about the situation where the author messed up the viewport.
… what choices the RS has in that case.
… this sounds like EPUBCheck’s job.
… is the viewport parse-able.
… compatibility issues, maybe it’s a warning.
… seems wrong to put this on the end user.

Wendy Reid: if we reach any sort of consensus we don’t want to put anything on the end user/reader.
… we do want to warn the content creator (publisher/author).
… maybe we want to increase robustness of the section.

Bill Kasdorf: I would recommend “EPUB creator” vs. “content creator” because the person writing the book created the content but not the EPUB.

Wendy Reid: but we want requirements around validity of viewport.
… RSes rely on epubcheck.

Brady Duga: +1 to using may instead of should.
… this section isn’t a real world problem except there was no spec to reference.
… we’re not actually seeing real problems in FXL books.

Charles LaPierre: Q about checking viewport on mobile devices.

Dave Cramer: viewport is property of EPUB not device, spec says how to adapt.

Resolution #1: Ivan to adjust language to reflect content creator/EPUBCheck responsibility for viewport.

2. XML security risk: expansion of internal parsed entities (issue epub-specs#2447)

See github issue epub-specs#2447.

See github issue epub-specs#2433.

Murata Makoto: old issue; more than 20 years ago.
… related to external parsed entities issue 2433.
… this issue is about INTERNAL parsed entities.
… doesn’t use external identifiers or URLs.
… these are defined in internal DTD subsets.

Dave Cramer: See description of the attack.

Murata Makoto: if an internal reference references a different internal reference, things can get out of hand quickly.
… some EPUB reading systems ignore rec of XML and ignore internal parsed entities. This is non-conformant but avoids security issue.
… what should we do? Prohibit internal DTD subsets?.
… which gets rid of internal parsed entities..
… can exissting RSes be destryeed by malicious content?.

See github pull request epub-specs#2451.

Ivan Herman: thanks Makoto for raising the issue.
… the problem is that removing the possibility to use internal entities is something we can’t do because of existing epub docs that might use this.
… we cannot invalidate existing content per our charter.
… as part of a pull request that makoto and I did together is a separate section.
… in the security section of RS spec.

Ivan Herman: See proposed text in the PR’s security section.

Ivan Herman: which say that RS should be aware of this problem and should deal with it..
… so we should draw attention to the issue.

Dave Cramer: I don’t think we can or should forbid internal entities.
… the problem has been around 20 years and EPUB is still functional.
… it’s a security vulnerability, we may need to warn people in our security section.
… ordering RSs to follow a specific algorithm might be overkill.
… let’s warn people.

Murata Makoto: Basically, not our problem.

Brady Duga: this is a known problem in XML. Are there known solutions?.
… do they have mitigations? Like in libxml?.

Dave Cramer: “Defenses against this kind of attack include capping the memory allocated in an individual parser if loss of the document is acceptable, or treating entities symbolically and expanding them lazily only when (and to the extent) their content is to be used.”.

Ivan Herman: makoto may know about libraries.
… makoto made some sample epubs, and we tested them.
… I tested on Thorium and Apple Books.
… both reacted by rejecting the EPUB.
… without identifying the problem.
… we suspect they don’t accept internal entities.

Murata Makoto: I’m not aware of general solutions.
… XML WG was aware of this issue, but had no good solutions.
… thorium doesn’t reject publication, it ignores the entity but not the entire publication.
… this behavior is non-conformant.

Dave Cramer: I don’t think it’s entirely fair to say that this implementation is non-conformant because it’s not deploying my vulnerability.
… not fair to RS.

Wendy Reid: it’s like the viewport discussion.
… we want to avoid putting burdens on readers.
… does the fail quietly solution work?.
… is that a bad thing?.
… I think because we are running out of time.
… people should look at the PR.
… there are other things in the PR.
… then we move on.

Dave Cramer: I think we add something to the security section.
… it hasn’t been a problem in the past, internal entities aren’t usually a problem unless malicious.
… what are the consequences of a malicious use.
… the consequences seem small.
… good enough to say “XML has a known vulnerability:.

Brady Duga: I agree putting this in the security section, maybe mentioning other xml vulnerabilities.
… any attack like this can jeopardize personal information.
… although no one has bothered to do this.

Dave Cramer: Some of this is also outside the scope of the WG, we can’t ask for people to patch the OS they are running dev machines on.

Wendy Reid: consensus: let’s look people look at Ivan’s PR.
… comment on it.
… ten minutes remaining.
… two more topics.

3. satellite specs.

Wendy Reid: some of these are in our w3c repo.
… multiple renditions, CFI.
… some aren’t.
… which might be a good thing.
… we should pull in region-based navigation, which has implementations.
… are there any other that need attention?.
… they will remain in idpf space if we don’t move them.

Ivan Herman: we have to be precise.
… should any of these be published as w3c note?.
… CFI is in our space, I pulled in the existing idpf doc, turned into a format for w3c.
… but it hasn’t been published.
… and we did have some discussion with brady etc.
… publishing as a note would require a new section on the processing model.
… it’s more than editorial work.

Brady Duga: we should pull over CFI, unless I have to do more work.
… it is the one that always comes up.

Bill Kasdorf: Has the Locators group dealt with CFI?.

Rick Johnson: ignoring extra work, we’re five years into w3c owning epub.
… I still see articles linking to IDPF.
… is there an arguement for moving them to the w3c domain ?.

David Hall: to comment on CFI, what’s the remaining work?.
… we could maybe get that work done.

Dave Cramer: Just to Rick’s question, I still think there’s no intention of shutting down IDPF web links.
… bad web practice.
… we have lots of pointers to W3C-land.
… I certainly understand the problem.

Ivan Herman: the epub cfi doc today is a spec of syntax.
… what’s missing is a processing model, of what the machine should do.
… I cannot write that because I don’t know enough.
… if you want to write it I can bring it into the w3c world.
… then it would be worth publishing.

Brady Duga: that’s what needed.
… I started to write that once and was told not to waste my time.

David Hall: if there are examples of a parallel algo description that would help me.

Ivan Herman: do you have that, brady?.

Murata Makoto: Who uses EPUBCFI now?.

Brady Duga: I modeled it on something in html.

Ivan Herman: we did processing for publication manifest: https://www.w3.org/TR/pub-manifest/#manifest-processing.


4. Resolutions