EPUB 3 Working Group vF2F, 2nd Day — Minutes

Date: 2021-05-28

See also the Agenda and the IRC Log

Attendees

Present: Brady Duga, Ivan Herman, Avneesh Singh, Wendy Reid, Toshiaki Koike, Deborah Kaplan, Gregorio Pellegrino, Ben Schroeter, Masakazu Kitahara, Tzviya Siegman, Ken Jones, Dave Cramer, Charles LaPierre, George Kerscher, Matthew Chan, Dan Lazin, Matt Garrish

Regrets:

Guests:

Chair: Dave Cramer, Wendy Reid

Scribe(s): Matt Garrish, Ben Schroeter, Charles LaPierre

Content:


1. HTML serialization

See github issue #636.

Continuation of the discussion on the first vF2F meeeting

Dave Cramer: time to talk about serialization of html5 - I used to be more of an advocate for this but reality is that this is going to require a fair amount of work for reading systems that depend on xml tools
… I don’t see strong demand for a change like this from the authoring side - authors are not asking for it

Brady Duga: last night we discussed this and my position is that yes we can do this but it’ll be a bit of work - we’re not hearing for demand for this from publisher so we’d rather spend the time on more important things
… thinking about self-publishers, while big publishers use xml, these publishers do more of their own work and using their own tools that generate html - not sure if it’s true as they don’t come to these meetings but should see if this is important to them

Tzviya Siegman: I agree with Dave. I think we need to defer this to another major version of epub. There are some things we’ve been thinking about doing in epubcheck that would make it easier to maintain. Some of that will be deferred until will have HTML epub.
… we talk a lot of things like identifiers and we don’t have that in epub but would be nice to have

Wendy Reid: self-publishing is a constituency we need to think about - we have a big community of them at kobo. I haven’t heard these publishers expressing difficulty with xhtml because they have tools to create epubs or getting outsourcers to do the work for them
… it is worth asking as the authors are innovative and see if they’re missing out on anything

George Kerscher: the word-to-epub add-in does an excellent job of making epubs. Google docs produces them but would be nice to have a facelift. Scholarly publishing content that is distributed in pdf is awful for disability. Wonder if the process of converting from pdf is easier or harder with xhtml.

Gregorio Pellegrino: what are the benefits of switching from xhtml to html - is it only non-closing tags?

Dave Cramer: some arguments are that it makes it easier to repurpose html content - most scripting libraries aren’t tested with xhtml so may work better with html
… it’s more annoying to have to change html pages to xhtml to make them valid for epub

Dan Lazin: this seems like the kind of thing we should defer to later - xhtml is losing popularity - we may be fine with it for a while but in the future we may need to change - put on a note on the issue to up-vote and explain why they need it so there is more info
… I came to epub as a self-publisher and I would say that it was not at all difficult to do xhtml - InDesign just does it for you but I cleaned up some things by hand - self-publishers may not be fully tech savvy but are willing to do things that we don’t expect

Ivan Herman: example: I made a script to translate W3C TR documents into EPUB. It was very easy to find HTML parsers but when I need to generate XHTML from it I get syntactically incorrect markup. I had to spend much more of time to find a library to convert the HTML to proper XHTML. The evolution of tools long term works against XHTML long term
… the doctype, the xml declaration, etc. - the small things from our point of view to make it proper xhtml are complex. Today we may not want to do this but we need to keep the issue open so the community is aware we have to make this change eventually

Brady Duga: most reading systems display their content as html, not xhtml. Moving to html removes an unnecessary intermediary step for us. I’m in favour of punting but we do this every single time

Tzviya Siegman: +1 to duga - we can’t keep kicking the can

Avneesh Singh: from a management perspective with only 6 months remaining we need support in reading systems and epubcheck. we should have compelling reason to move on this now given this timeline.
… it looks safer not to do HTML at this time. we don’t need to drop or delay it but we can move it to the community group to incubate it and get momentum. will give us a longer timeline to research before the next revision

Charles LaPierre: +1 to Avneesh suggestion to punt to CG.

Ken Jones: the self-publishers I’m aware of are not writing any code - they use tools so they are not pushing for this. they will wait for their tools to update and use whatever is available

Dan Lazin: agree with Brady that most reading systems are using html internally - we could relax validation standards so the document is supposed to be xhtml but we allow syntactic invalidities - authors just want to dump html into the body, they don’t care about the head
… most reading systems will take whatever you throw at them so not complicated to display what is in the body

Ivan Herman: I would be scared to describe formally this kind of looseness. On the other hand the we could say that reading systems spec could allow them to use html5 but formally the content spec says you must use xhtml5

Brady Duga: I don’t think this would solve the problem - reading systems still process the xhtml as xml prior to display so having html in the body would cause us to reject

Dan Lazin: what I’m thinking is to use an attribute on the body tag so you could fork the pipeline or skip the step that requires xml conformance

Dave Cramer: I’ve gotten the sense from all the discussions that we have kicked this can for a long time and is not ideal but I think we’ve had good reasons for doing that. I don’t see eagerness to change without a more compelling reason that we supplied so far
… as Dan said, xhtml becomes less viable with time. I think at some point we have to do a bunch of changes at once - EPUB 4 - we’re constrained on both sides by EPUB 3 and the inability to change requirements
… we’re improving the spec but we’re not changing how epub works
… we could also remove all the deprecated and unwanted parts that exist today

Tzviya Siegman: we tried that in 3.1 and it didn’t go well
… we’d have to solve the namespace issue, etc. - something we’d all like to do but now is not the time
… maybe this is a community group topic - not html serialization but issue like how to get rid of epub:type, etc.

Dave Cramer: we have the goal of making EPUB 3.3 a W3C REC but do we want to persist this incremental backwards compatible mode for 3.4, 3.5, etc. - we need to make a break at some point and not just have working groups to make editorial changes

Avneesh Singh: looking at W3C culture, a lot of work begins in the CG before getting a WG to formalize

Wendy Reid: I want to have some of these ideas incubated as we’re running into the limits of EPUB 3 compatibility. we need to go big at some point.
… I think we need a clear goal for the CG not just incubate ideas

Dave Cramer: are we prepared to defer this issue and hand it off to the CG to look at the long-term future of epub?

George Kerscher: is there a reason to keep the ncx?
… I know why it got in there so that US publishers could provide it

Ivan Herman: we can’t do anything about people including it - not part of epub 3

Gregorio Pellegrino: reading systems use it for compliance to allow old readers to open epub 3

Brady Duga: forbidding it doesn’t help me as I have legacy epubs with it
… means more bookkeeping in the ingestion pipeline

Dave Cramer: would be different in an epub 4

Ivan Herman: you said defer this - what do you mean? are we closing in github for now or are we keeping it open for next version?

Dave Cramer: github issue are easy to find with labels. I would push-to-CG-closed label on it

Proposed resolution: Close issue 636, move discussion to the community group (Wendy Reid)

Deborah Kaplan: +1

Dan Lazin: +1

Ivan Herman: +1

Wendy Reid: +1

George Kerscher: +1

Gregorio Pellegrino: +1

Ken Jones: +1

Tzviya Siegman: +1

Matt Garrish: +1

Brady Duga: +1

Avneesh Singh: +1

Ben Schroeter: +1

Dave Cramer:

Charles LaPierre: +1

Ivan Herman: in the earlier discussion last night there was a similar agreement and partial vote yielding the same result.

Resolution #1: Close issue 636, move discussion to the community group

2. Satellite Specs and EPUBCheck

Continuation of the discussion on the first vF2F meeeting

Tzviya Siegman: romain had some comments about epubcheck
… had some comments about the satellite specs - epubcheck supports edupub, indexes and dictionaries and partial support for scriptable components, multiple renditions, and previews

Dave Cramer: concerns are about reading system support and author use - some of these specs do nothing more than extend markup patterns and epub:type

Dan Lazin: do we consider epubcheck to be a reading systems for the purpose of candidate recommendation

Dave Cramer: no. it is purely a validator
… you can use the features but nothing supports them and user don’t benefit from them

Ivan Herman: can someone tell me how many documents we are talking about?

Dave Cramer: something like 10 or 12

Ivan Herman: if we republish all of them as w3c notes it is a non-trivial and boring amount of work - they all have to be transformed to respec
… they will also have to be formally approved by the group and by the w3c director
… it’s doable but we need to make sure it’s worth it

Avneesh Singh: an alternative way might be to have a member submission?

Ivan Herman: the editorial work is the same

Brady Duga: decision last night was to let ivan and matt decide what to do - whatever makes the most sense from a bookkeeping perspective

Dave Cramer: and communicating to users that implementing this stuff is pointless

Ivan Herman: process-wise it might be easier to publish them as CG notes - still have to convert but less publishing work

Dan Lazin: one of the intents is to make sure users find the right document - with specs in both idpf and w3c it’s hard to know which to review

George Kerscher: I agree with Dan that as long as these are in a place where they can be repurposed in the future and without confusing people about whether they are in 3.3 is a good thing

Ivan Herman: if the issue is to find them easily then the working group note is the best solution as it will go in the TR library of documents
… let me and the chairs talk to Ralph to see if there is a way to publish these without going the full formal route.

Dan Lazin: as these are not important spec maybe we can farm these out to other people to work on - not important to have these done on the same timeline as 3.3

Ivan Herman: the idpf documents may or may not be updatable - need some way to point people to the new docs

3. Issues with epubReadingSystem object

See github issue #716.

dawhe: not sure what problem we need to solve right now

Ivan Herman: we have a resolution saying scripting is at risk. What more should we do at this point?

Dave Cramer: we have learned that real -world support is spotty, so hard to address. Should we move on?

4. testing

See github pull request epub-tests#20, #1685.

Dan Lazin: See Spreadsheet on statements to test

Dan Lazin: state of testing: some are easy, many are tricky. I put together a spreadsheet that breaks down the rs spec at the time into testable normative statement.
… roughly 200 tests that need to be written; we are through about 40 - many of the MUSTS
… what we need is a ‘testing guide’ that organizes the tests so that people can jump in and contribute
… how to name, organize, track the tests
… we should use data tests attribute for test repo
… here are pull requests
… we could discuss whether to approve and merge the second PR here
… then I can write up a document explaining how to participate in testing
… having trouble in some cases finding reading systems viable to support tests
… I feel behind on testing; we need to chip away at these

Ivan Herman: need to think about how to report results in a consistent format - some sort of data file that includes metadata about the tests
… and we need advanced structure on how to report the testing result
… for audiobooks we had metadata that we could use to produce readable reports

Ivan Herman: See Audiobook test results

Dan Lazin: we are going to have a lot more failures than audiobooks testing

Ivan Herman: CR has to prove that spec is implementable through >2 cited implementations
… not worried about lots of failures, conceptually
… sometimes we get requests to retest after the publication as implementations catch up

dawhe: one example is manifest fallbacks, which is not widely supported across reading systems

Dan Lazin: maybe we need a regular testing meeting to help answer questions as they come up
… for example, SVG requirement is hard to test thoroughly

dawhe: I would attend such a meeting

Tzviya Siegman: we need help putting EPUB 3.3 into epubcheck

Matt Garrish: I’ve been doing this as we go along

Ivan Herman: what is the role of EPUBcheck in our testing strategy? In some sense it’s an implementation
… epubcheck checks content, not RS
… what does the CR mean for the content document?
… with a declarative spec, the approach to testing is to see whether those terms and features are used in the community
… are fallbacks implemented or not? do publishers use those features effectively?
… we should have evidence for each feature in the package document to justify it to be included in the spec

Matt Garrish: we have a lot of features that are rarely used, or only used by a single feature. Does that mean we should deprecade stuff that isn’t in wide use?

Ivan Herman: maybe we have to have a list of those features where we have doubts that they are widely used
… and ask the community in advance before we deprecate

Matt Garrish: we did a survey of publishers for 3.1

Ivan Herman: we are obliged to honor backwards compatibility, but don’t have to keep features that nobody uses

George Kerscher: pronunciation task force is putting forth a new proposal; we should harmonize our requirements with theirs

Wendy Reid: we don’t know if manifest fallbacks are used by anyone - should we do a survey across publishers and reading systems for this type of questionable feature?

Charles LaPierre: if we can’t find implementations, we could deprecate those because we can’t find 2 implementations
… for Dan’s tests, epubtest.org might already have tests or you could use those as a blueprint

Tzviya Siegman: we did an investigation for epub cfi to see if they were in use in Google books
… so it is possible to this sort of survey

Matt Garrish: it gets complicated when we drop specs from authoring but not reading systems or vice-versa so it would have to have zero support anywhere to have no impact

Tzviya Siegman: +1 mgarrish

dawhe: it’s a disservice to authors if they have stuff in their content that we know doesn’t work
… would rather push a little to far and get blowback than not do anything

Charles LaPierre: new developers have to support legacy code; we shouldn’t keep supporting if not necessary

Matt Garrish: if we drop fallbacks, are images in the spine valid? there is a ripple effect to getting rid of stuff

George Kerscher: mathml continues to be poorly implemented, but that’s a core component of something we really need and support will slowly improve over time

Ivan Herman: there is no risk for mathml even though implementation needs to be improved

Dan Lazin: if we have a weekly meeting we can go through these topics. epubtest repo had a nested hierarchy - does this organization make sense or should it just be flat?

Dave Cramer: fine with me

Wendy Reid: sounds like we have agreed to a regular testing call. We can discuss organization of tests and how to formulate results.

Ivan Herman: Dan, you own the repository and can organize it the way you want.
… re: the SVG case, this applies to testing html and CSS as well. we need to clearly position ourselves that the reading systems must be compliant with html and css and we are not in a position to test that.
… we can only rely on the fact that reading systems render html

Brady Duga: there is a large provider that uses a bunch of different ways to render

Matt Garrish: do you want us to review the PR?

Dan Lazin: I would like people to review the individual tests to validate my interpretation of the spec
… for the PR, is this the right direction? are we polluting the spec?

Matt Garrish: so the specs will be coupled with the tests

Dan Lazin: yes
… will set of Doodle poll for testing meeting times
… need to be an owner of the spec to approve pr 1685
… would like to do a branch in the test repo

5. What does it mean to “support” a foreign resource?

See github issue #1464.

Continuation of the discussion on the first vF2F meeeting

Dave Cramer: came about from testing.
… json with HTML fallback, all normal EPUB stuff
… when looking at it in an EPUB Reading system, but they just displayed the raw text.
… weird behavior. Spec says if a RS does not know about or support something it should use the fallback, but I didn’t see this.
… JSON as raw text, or as a tree view, etc..
… This shocked me that the core media types if not supported should default to the fallback but that doesn’t seem to be happening.

Ivan Herman: So what? 😀
… I can imagine that I write a paper which relies on a bunch of data files, and xml or csv data etc. It may not be nice but we just ack that and leave it.

Dave Cramer: write something in doc book but may not render in all systems so you put in the HTML as a fall back. but that didn’t seem to happen.

Ivan Herman: is it a problem if a RS displays the data directly? having a fallback would be a nice idea but maybe have to accept it. Is it ok to put a docbook in the spine, if I link to it or put it in the spine are two separate issues.

Dave Cramer: RS offering to download a some mimetype seems really bad.

Matt Garrish: Why do we care what renders in spine items? Offering guidance.
… seems its not working as intended, but we don’t want to reproduce PDF.
… EPUB was trying to do better. we can look at these issues separately. Why are we concerned about Core media types. what do we say about the spine. we are an accessible text format.

Wendy Reid: DMG file is a massive security issue, that an issue that RS would handle on their side. If a RS detects an executable within an EPUB it should do something. But we don’t want to spec that executables be included in the spine that may open up a can of worms.

Ivan Herman: It must be part of the security section in the RS document if you are a RS be careful about downloading a binary file. EPUB check could react to this. But the manifest fallback is ignored anyways… so it may not make sense there. EPUBCheck may not complain.

Brady Duga: Yes core media types are important and maybe the RS are already supporting them and not worry about fallbacks.
… here is this fundamental thing in the spec and core principle and Dan tested it and it seems it isn’t.

Matt Garrish: I have no issue getting rid of fallbacks. Spine XHTML / SVG, but we could make it a formal allowed spine item. But there could be neat stuff and our fallbacks don’t work. We are just calling it a failed feature of EPUB. We don’t really want to have them all in the spine just the two of them.

Wendy Reid: agreed.

Brady Duga: like ChemML and you may want to use those or have a fallback with a static page, but everyone just uses HTML & CSS so I think we weren’t originally sure but we didn’t want to limit people.

Matt Garrish: Manifest fallbacks with replacements for HTML, there is some really bad manifest fallbacks and HTML has improved to solve that, we have the picture element etc. So sounds like its all irrelevant.

Ivan Herman: spine restricted to XHTML & SVG, and epubCheck would shout at me so thats ok. Not having a fallback there epubCheck will shout at me. So for files which refer to from html, maybe something musicML and have this ML and they may have their own special renderer, but they should be able to use that without, … so what would be the fallback for MusicML so I link to HTML and for valid reasons and I render it via some extra scripts; I cannot rely on fallbacks since I cant rely on something else doing this for me.
… HTML gives me this, but EPUB version have a fallback why would I do that?

Brady Duga: I was going to do some more research on playbook side.

Wendy Reid: ingested versions vs. side loaded.

Proposed resolution: Close issue 1464, change language in spec around manifest fallbacks to restrict to xhtml and svg in the spine and rely on HTML conventions (Wendy Reid)

Proposed resolution: Close issue 1464, change language in spec around manifest fallbacks to restrict to xhtml and svg in the spine and rely on HTML/SVG conventions (Wendy Reid)

Matt Garrish: wonders, restricting to XHTML/SVG are we deprecating? Can’t remove for manifest fallbacks.

Proposed resolution: Close issue 1464, change language in spec to deprecate manifest fallbacks and recommend xhtml and svg in the spine and rely on HTML/SVG conventions (Wendy Reid)

Avneesh Singh: +1 Brady, it is a big change

Brady Duga: feels like a really big change, I am hesitant to do it. To understand the nuances.

Dan Lazin: we talked about things to talk about at risk, add this?

Avneesh Singh: +1 Tzviya, there is a long tail

Tzviya Siegman: trying to think about 1000’s of backlist titles, we don’t update those. If I need to do an update it may be a more significant update, and what a retailer but this may be a breaking change.

Wendy Reid: this doesn’t break, just deprecate it so moving fwd. Legacy content using this is not really working anyways

Ivan Herman: If we say deprecated what does EPUBCheck do today?

Matt Garrish: Issue a warning

Ivan Herman: that could scare off a publisher right?

Matt Garrish: Yes

Ivan Herman: what can we use?

Matt Garrish: “Strongly Encourage”

Ivan Herman: I understand what Brady says, in the meantime, Matt & I can come up with a PR what that means in the spec.
… not merge the PR and Brady can do his research, and not trigger adverse reaction from EPUBCheck.

Wendy Reid: warning is not a bad thing… if Publisher is scare of that, what are publishers doing I will look at the warnings or errors it may be inconvent but they are looking at them. If they are seeing warning, its not a big deal, at least for me.

Tzviya Siegman: warnings and errors for some reading systems won’t accept EPUBs even f they have warnings. unfortunately they are at the same level.
… some organizations regard warnings as errors
… I can’t publish with warnings.

Avneesh Singh: +1 Matt and Ivan, start working on the language

Charles LaPierre: we had this discussion before, we really need to go to those RS and shame them or inform them to allow warnings, because otherwise we shouldn’t use warnings since its the same thing as an error.

Matt Garrish: we shouldn’t resolve anything now, I will work with Ivan to come up with new language on this.

Brady Duga: Dave has some sample books.

Wendy Reid: we can do a little more testing.


6. Resolutions