EPUB 3 Working Group Telco – 07 October 2022

Meeting minutes

<ivan> Date: 2022-10-07

wendyreid: first agenda item is viewport metadata

https://github.com/w3c/epub-specs/issues/2442

romain: I had a late review of new viewport spec
… wondering if it was too strict
… is it enough for reading systems?
… talking about viewports meta tag for FXL
… for reflowable we are covered (RS must ignore viewport)
… in FXL this is important because it sets ICB
… there's an EBNF grammar to defined
… it's more constrained than what's in the CSS draft it comes from
… it's more constrained than what browsers/reading systems can extract info from
… so should we relax the grammar a bit
… or further specify how reading systems should extract useful values from it

ivan_: to pick up on what romain said
… the spec does say somewhere that if the viewport spec is wrong,, it should use device height/width
… it's not only strict, it

<wendyreid> https://github.com/w3c/epub-specs/issues/2442

<ivan_> https://github.com/w3c/epub-specs/issues/2442#issuecomment-1271300202

ivan_: for example Apple can make sense of an invalid viewport tag
… what I propose is to relax the requirement to use device sizes when there is a problem
… and instead say the RS should warn the user/author
… and then it can do a best attempt to do something sensible
… it can try to extract a window size
… the only place where there's a different problem
… if you have a number of content files and forget to set the viewport for one of them
… today there is nothing there so you use device sizes
… the other possibility is to [1] warn user and [2] go back to previous spine element and use that viewport size

romain: I agree that the spec already set that device height/width should be used when faced with an invalid value
… so maybe relaxing RS spec is the way to go
… I'm not convinced RSs should warn users

<wendyreid> +1

romain: if the reading system can do its best to render something, or make up a value that's good enough
… as transparently as possible
… there's a caase with valid viewport but specifies the width several times
… this is valid per grammar but don't say if it should pick first/last/any
… do we need to say something in RS spec? If there's dupes then it becomes invalid?

dhall_: I like your rec there
… I would lean toward should report errors to user
… some could be unrecoverable, like xml parsing errors
… or there could be dev mode and remotely inspect
… but most readers won't be interested
… so "should" warn the users
… on multiple width values, add something to spec to prefer first value found

<romain> or even MAY warn the users

dhall_: separately, what to do when there isn't anything specced
… typically in FXL books you have the same page size throughout
… different page size present challenges to the RS
… I like the idea of falling back to the most recent valid viewport
… if nothing is defined, do you fall back to device dimensions or something else?
… because device dimensions change with orientation
… and what happens when dimensions change?

wendyreid: as much as i like the idea of warning the user
… in the real world of RSs, telling the user that something is wrong with their viewport isn't helpful. There's nothing they can do.
… this puts an undue burden on the user
… I think this should be an epubcheck thing
… I like idea of preferring first value
… RSes do lots of things to optimize for speed when they render FXL
… we use first viewport value and don't reparse
… so we don't support changing viewport sizes

duga: agree with David and Wendy
… I would not say should warn the user, it's a bad idea
… and these are often kid's books :)
… we have end users and publishers
… might be helpful to talk about these two groups
… I want my publishers to know but not my users
… you can reject something with a bad viewport
… I'm agnostic on which to pick of multiple values
… I think we do support multiple page sizes

ivan_: we have multiple viewports in the spec

<rickj> +1 to Brady (et. al.) and differentiating messaging between readers and publishers

ivan_: (different page sizes for different files)
… we can't change that
… if they don't implement that's ok
… for reporting, I understand that I don't want reports going to kid
… I like the safari analogy of David
… I wonder if we should say something in the spec
… from standards point of view it's ok to say reading systems should issue a warning
… matt and I can come up with a larger PR

dhall_: when I think of users, do you think of author and publisher as the same user

ivan_: we don't specify that

dhall_: I can see a publisher being interested in results from epubcheck
… I can see that authors that write their own epubs wouold benefit from warnings from RS
… do we use term "May"?

ivan_: yes

dauwhe: We're talking a lot about the situation where the author messed up the viewport
… what choices the RS has in that case
… this sounds like EPUBCheck's job
… is the viewport parse-able
… compatibility issues, maybe it's a warning
… seems wrong to put this on the end user

wendyreid: if we reach any sort of consensus we don't want to put anything on the end user/reader
… we do want to warn the content creator (publisher/author)
… maybe we want to increase robustness of the section

<Bill_Kasdorf> I would recommend "EPUB creator" vs. "content creator" because the person writing the book created the content but not the EPUB

wendyreid: but we want requirements around validity of viewport
… RSes rely on epubcheck

duga: +1 to using may instead of should
… this section isn't a real world problem except there was no spec to reference
… we're not actually seeing real problems in FXL books

CharlesL: Q about checking viewport on mobile devices

dauwhe: viewport is property of EPUB not device, spec says how to adapt

RESOLUTION: Ivan to adjust language to reflect content creator/EPUBCheck responsibility for viewport

https://github.com/w3c/epub-specs/issues/2447

XML Security and internal parsed entities

MURATA: old issue; more than 20 years ago

<ivan_> https://github.com/w3c/epub-specs/issues/2433

MURATA: related to external parsed entities issue 2433
… this issue is about INTERNAL parsed entities
… doesn't use external identifiers or URLs
… these are defined in internal DTD subsets

https://en.wikipedia.org/wiki/Billion_laughs_attack
… if an internal reference references a different internal reference, things can get out of hand quickly
… some EPUB reading systems ignore rec of XML and ignore internal parsed entities. This is non-conformant but avoids security issue
… what should we do? Prohibit internal DTD subsets?
… which gets rid of internal parsed entities.
… can exissting RSes be destryeed by malicious content?

ivan_: thanks Makoto for raising the issue
… the problem is that removing the possibility to use internal entities is something we can't do because of existing epub docs that might use this
… we cannot invalidate existing content per our charter

<ivan_> https://github.com/w3c/epub-specs/pull/2451

ivan_: as part of a pull request that makoto and I did together is a separate section
… in the security section of RS spec

<ivan_> https://raw.githack.com/w3c/epub-specs/makoto-xml-conformance-change/epub33/rs/index.html#security-privacy-recommendations

ivan_: which say that RS should be aware of this problem and should deal with it.
… so we should draw attention to the issue

dauwhe: I don't think we can or should forbid internal entities
… the problem has been around 20 years and EPUB is still functional
… it's a security vulnerability, we may need to warn people in our security section
… ordering RSs to follow a specific algorithm might be overkill
… let's warn people

MURATA: Basically, not our problem

MURATA: this is not our problem :)

<Zakim> duga, you wanted to ask about known solutions

duga: this is a known problem in XML. Are there known solutions?
… do they have mitigations? Like in libxml?

"Defenses against this kind of attack include capping the memory allocated in an individual parser if loss of the document is acceptable, or treating entities symbolically and expanding them lazily only when (and to the extent) their content is to be used."

ivan_: makoto may know
… makoto made some sample epubs, and we tested them
… I tested on Thorium and iBooks
… both reacted by rejecting the EPUB
… without identifying the problem
… we suspect they don't accept internal entities

MURATA: I'm not aware of general solutions
… XML WG was aware of this issue, but had no good solutions
… thorium doesn't reject publication, it ignores the entity but not the entire publication
… this behavior is non-conformant

dauwhe: I don't think it's entirely fair to say that this implementation is non-conformant because it's not deploying my vulnerability
… not fair to RS

wendyreid: it's like the viewport convo
… we want to avoid putting burdens on readers
… does the fail quietly solution work?
… is that a bad thing?
… I think because we are running out of time
… people should look at the PR
… there are other things in the PR
… then we move on

https://github.com/w3c/epub-specs/pull/2451

dauwhe: I think we add something to the security section
… it hasn't been a problem in the past, internal entities aren't usually a problem unless malicious
… what are the consequences of a malicious use
… the consequences seem small
… good enough to say "XML has a known vulnerability:

duga: I agree putting this in the security section, maybe mentioning other xml vulnerabilities

duga: any attack like this can jeapardize personal information
… although no one has bothered to do this

dauwhe: Some of this is also outside the scope of the WG, we can't ask for people to patch the OS they are running dev machines on

wendyreid: consensus: let's look people look at Ivan's PR
… comment on it
… ten minutes remaining
… two more topics

satellite specs

wendyreid: some of these are in our w3c repo
… multiple renditions, CFI
… some aren't
… which might be a good thing
… we should pull in region-based navigation, which has implementations
… are there any other that need attention?
… they will remain in idpf space if we don't move them

ivan_: we have to be precise
… should any of these be published as w3c note?
… CFI is in our space, I pulled in the existing idpf doc, turned into a format for w3c
… but it hasn't been published
… and we did have some discussion with brady etc
… publishing as a note would require a new section on the processing model
… it's more than editorial work

<Zakim> duga, you wanted to ask about CFI

duga: we should pull over CFI, unless I have to do more work
… it is the one that always comes up

<Bill_Kasdorf> Has the Locators group dealt with CFI?

rickj: ignoring extra work, we're five years into w3c owning epub
… I still see articles linking to IDPF
… is there an arguement for moving them to the w3c domain ?

dhall_: to comment on CFI, what's the remaining work?
… we could maybe get that work done

dauwhe: Just to Rick's question, I still think there's no intention of shutting down IDPF web links
… bad web practice
… we have lots of pointers to W3C-land
… I certainly understand the problem

ivan_: the epub cfi doc today is a spec of syntax
… what's missing is a processing model, of what the machine should do
… I cannot write that because I don't know enough
… if you want to write it I can bring it into the w3c world
… then it would be worth publishing

duga: that's what needed
… I started to write that once and was told not to waste my time

dhall_: if there are examples of a parallel algo description that would help me

ivan_: do you have that, brady?

<MURATA> Who uses EPUBCFI now?

duga: I modelled it on something in html

ivan_: we did processing for publication manifest

<ivan_> https://www.w3.org/TR/pub-manifest/#manifest-processing

– DRAFT –
EPUB 3 Working Group Telco

07 October 2022

Attendees

Meeting minutes

https://github.com/w3c/epub-specs/issues/2442

https://github.com/w3c/epub-specs/issues/2447

XML Security and internal parsed entities

satellite specs

Summary of resolutions

Diagnostics