EPUB 3 Working Group vF2F — 1st Day — Minutes

Date: 2020-10-22

See also the Agenda and the IRC Log


Present: Toshiaki Koike, Wendy Reid, Shinya Takami, Masakazu Kitahara, Zheng Xu (徐征), Ben Schroeter, Laura Brady, Marisa DeMeglio, Brady Duga, Matthew Chan, Garth Conboy, George Kerscher, Juliette McShane, Yu-Wei Chang (Yanni), Yutaka Suzuki


Guests: Dave Cramer

Chair: Wendy Reid, Shinya Takami (高見真也)

Scribe(s): Garth Conboy, Wendy Reid, Matthew Chan, Laura Brady


Dave Cramer: Welcome
… Some technical issues first.
… Agenda:

  1. Welcome/Overview of Topics (10 min)
  2. Navigation Order Issue #1283 [1] (30 min)
  3. External Entities Issue #1338 [2] (30 min)
  4. Break (15 min)
  5. Testing EPUB (60 min)
  6. FXL Accessibility (25 min)
  7. Wrap Up/AOB (5 min)

Wendy Reid: Like to finish with accomplishments; closing initial issues (Nav order, and external entities); progress on FLX A11Y & Testing.

Dave Cramer: And volunteers to work on same.

1. Must navigation order match spine order?

Dave Cramer: https://github.com/w3c/epub-specs/issues/1283

Dave Cramer: Must navigation order match spine order?
… Required by EPUB 3, only recently enforced by epubcheck.
… Causing existing content to start failing.
… EPUB 3 CG: make it a warning rather than error; out of alignment with check.
… MUST is spec is an error; SHOULD is a warning (in epubcheck).
… Dave reads relevant portion of EPUB 3.2 spec.
… Why is this a problem? Existing content doesn’t conform. Perhaps valid editorial reasons.
… This, though, may be an A11Y concern. At least one RS stumbles on difference between spine and nav order.
… Should we keep or relax the restriction?

Shinya Takami (高見真也): Presentation to explain issue.
… EPUB 3.0 required spine/nav to match.
… Request de-regulation of current restrictions.
… Compatibility with existing content and even paper books.
… Priority or grouping maybe more important than reading order
… Yes, A11Y could be an issues — order of story should be consistent.
… Examples
… Manga magazine example
… Cookbook example
… (multiple navs for single book)
… Magazine of novels example
… This all “violate” spec in their paper form – want to do the same in valid EPUB 3.2’s.

Brady Duga: Is this the right way to do multiple-TOCs?
… Is this the right time to make such a change?
… Columns could be done via nesting.
… Sympathetic to existing content using the non-matching feature.
… Maybe relax restriction and make it something caught by an A11Y checker.
… Not too many folks will write reverse Moby Dick.
… maybe a SHOULD or remove.

Dave Cramer: Don’t get why nav is so important isn’t spine order used?

Brady Duga: nav to next chapter is done by Nav.
… Don’t need it for next page (that spine order).
… N pages left if chapter comes from Nav — out of order, could be an issue.

Dave Cramer: Other RS’s?

George Kerscher: Small portion of text, reading order is very important.
… Like a “this is poison” warning next to paragraph.
… History comes from audiobooks, always linear.
… Nav in docs – some don’t make sense from beginning to end.
… We are happy to remove restriction.
… Folks need to understand how they use the TOC. Cookbook is a fine example. Multiple TOCs make sense there.

Dave Cramer: Nav docs are interesting as there may be multiple Navs, one is primary.

Wendy Reid: Chairs hat: we do have a proposed solution.
… Chairs hat off: RS’s have to fall back to spine due to incomplete/confusing Nav docs (like ncx and Nav).
… Spec needs more verbiage around Nav.xhtml and RS behavior.
… Some clarification is required regardless of approach.

Zheng Xu (徐征): Nav is displayed in RS TOC.
… Random pointers into Spine.
… Examples (like multi-TOC) could be pages in the book.
… RS’s need the high level info form the Nav file.
… Linear order or otherwise?
… Maybe spine be the linear order, let Nav be otherwise.

Garth Conboy: I suspect the proposed solution is to make the MUST a SHOULD, I am ok with this
… Maybe just make the MUST should be a SHOULD.

Shinya Takami (高見真也): +1
… Linear order should be ordered by spine.

Brady Duga: Last time we talked about this, we invented tours; lets not do this again.
… Multi-TOCs could be in content.
… Okay with MUST->SHOULD

Dave Cramer: Propose MUST->SHOULD. Can be more MUSTy in ACE.

Shinya Takami (高見真也): +1

Dave Cramer: This approach would leave epubcheck unchanged, making many folks happy.

Dave Cramer: Does this work?

Proposed resolution: The nav order SHOULD match the spine order and the content order (Dave Cramer)

Zheng Xu (徐征): +1

Toshiaki Koike: +1

George Kerscher: +1

Yu-Wei Chang (Yanni): +1

Laura Brady: +1

Masakazu Kitahara: +1

Wendy Reid: +1

Garth Conboy: +1

Ben Schroeter: +1

Brady Duga: +1

Matthew Chan: +1

Marisa DeMeglio: +1

Dave Cramer: All votes in favor, let’s resolve!

Resolution #1: The nav order SHOULD match the spine order and the content order

2. Problem with XML content requirements and external entities: make MathML and SVG1.1 invalid

Dave Cramer: https://github.com/w3c/epub-specs/issues/1338

Dave Cramer: https://github.com/w3c/epubcheck/issues/1114

Dave Cramer: External entities.

Dave Cramer: https://www.w3.org/publishing/epub32/epub-spec.html#confreq-xml-extmarkupdecl

Dave Cramer: Lots of XML statements in specs:

External identifiers MUST NOT appear in the document type declaration [XML].

Dave Cramer: This is a problem.
… Breaks normal things — like inline SVG with DOCTYPE.
… I think that’s bad.
… Forcing folks to edit machine created content seems like a waste of effort.

Brady Duga: Agree with the summary and history

Dave Cramer: EPUB RS’s are not validating SVG’s with DOCTYPE of against that.
… Should not expect an RS to run an validator.
… Hope RS’s don’t need to be validating XML parsers.
… Maybe just say “ignore DOCTYPE’s” — don’t use content that depending on processing DTD features

Brady Duga: Stunned if anybody other than epubcheck is doing said validation.

Wendy Reid: +1 we do this too

Dave Cramer: Assuming WV’s doing to such validation.

Brady Duga: What is this XML of which you speak?
… RS’s likely display HTML, not XML, content.
… Natively not XHTML in RS’s.

Wendy Reid: +1

Dave Cramer: Principle in HTML that we should try to match reality — maybe we should do the same in here?

Zheng Xu (徐征): +1 to not restrict

Dave Cramer: Resolve tonight restricting DTDs in XML documents that are part of an epub.

Garth Conboy: I think we should try it!

Wendy Reid: In line with what Dave has say it — try it for now. It will come done to testing too.
… Can that bridge when we come to it.

Proposed resolution: remove bullet “External identifiers MUST NOT appear in the document type declaration [XML].” Add reading system conformance saying that reading systems SHOULD NOT validate against DTDs. (Dave Cramer)

Garth Conboy: +1

Wendy Reid: +1

Dave Cramer: RS’s have no validation requirement against DTDs that may be found.

Shinya Takami (高見真也): +1

Brady Duga: +1

Yu-Wei Chang (Yanni): +1

Ben Schroeter: +1

Wendy Reid: If you use inline SVG’s out of ID (and other commercial tools) you won’t get unexpected errors.

Matthew Chan: 0

Wendy Reid: Found W3C listing DTDs that should be used; seems strange to prohibit.

Laura Brady: +1

Wendy Reid: No objection; relax restriction on external identifiers in DTDs.

Resolution #2: remove bullet “External identifiers MUST NOT appear in the document type declaration [XML].” Add reading system conformance saying that reading systems SHOULD NOT validate against DTDs.

3. Testing

Dave Cramer: moving on to testing
… what we have to do is go through all the specs
… mark where it says rs “must” and then make a test
… then try that test in all rs and document what the result is, conformed or not
… i’ve started making little test epubs
… in package doc, rs must trim leading and trailing white space in dc-title, dc-creator etc.
… make an example epub to test this, opened it in ibooks
… ibooks didn’t seem to trim white space from package file
… just as an example

Brady Duga: how is this different from previous testing that was done? Before it turned out it was difficult to make use of this data

Dave Cramer: epubtest.org, described as caniuse, but for epub
… goal was to answer “can i use” different aspects of epub in each rs
… bit of a focus on higher level features, media type support, types of css, etc.
… not as much testing of foundations

Marisa DeMeglio: What we tried to test: https://daisy.github.io/old-epub3-support-grid/features/

Dave Cramer: in css there is infrastructure to automate testing, they have 1000s of tests
… but how could we automate testing for rs? Most rs support little/no scripting
… but epubtest.org was trying to be comprehensive in testing a lot of rs, but we only require 2 independent implementations
… we could get away with not testing everything
… although we’d want all the features to work for all the major rs

Wendy Reid: i’ve done this once before with a different spec
… one difference vs. epubtest
… when i did epubtest we retested very frequently, like with every os update
… but with what we are doing now, we only need to test once, at a particular point in time
… not so worried about rs conformance after
… also, the spirit of epubtest was to make it so that content authors could see which features they could use for a given rs
… some of the stuff they tested was very very specific
… but for our tests, we’re only testing certain assertions, fewer tests
… only concerned with the assertions in the spec, required to meet CR requirements

Garth Conboy: is the idea that we’d end this process with a collection of test epubs?

Dave Cramer: we should put some thought into how we build tests
… instead of having a bunch of independent epubs, maybe have a more upstream solution that could be scripted
… maybe make a core file that contains all the assertions, and then build a structure around it?
… multiple epubs means more maintenance for keeping all the test epubs up to date

Wendy Reid: https://w3c.github.io/publ-tests/test_reports/manifest_processing/index.html

Wendy Reid: our final report might look something like a grid of all the assertions, tested against all the major rs
… we’d be looking for something like at least two passes for each assertions
… we’re looking to see that
… 1. the spec is actually possible, and can be implemented
… 2. and that the spec is understandable, and actually makes sense
… i.e. whether the assertions both actually make sense, and are actually testable

Marisa DeMeglio: epubtest has 2 separate lives
… the old version was really really hard to maintain, to have the test results updated regularly
… it was volunteer run
… new version of epubtest dedicated to accessibility
… now we have staff who are dedicated to moderation of testing, they are experts in what is being tested
… accessibility testing is very specialized, so manual testing is effective
… testers can leave notes for things that can’t just be quantified as a checkmark in a box
… testing via multiple books allowed the manual tests to be distributed to different people
… maintaining the test books isn’t fun, but its possible (maybe we could use scripts for this)

Zheng Xu (徐征): maybe we don’t need to rush into automation testing
… even 10 years ago, css compatibility on browsers wasn’t automated
… (testing of css compatibility)
… maybe we can start with simple tests, like javascript support
… another example of something easy to check is core media types
… but with other things, like trimming white space, maybe we can take the opportunity to revisit whether this needs to be an assertion
… if these things are difficult to test
… so let’s start with the assertions that are easy to confirm, and built a framework

Wendy Reid: before we decide anything, first, we need to understand our scope
… next few days, we just need to decide a direction, and some assignment of tasks
… first, what is the scope?
… we have to test all the must statements, this is non-negotiable
… but we have other statements, e.g. support for javascript
… but how much javascript support do we want to test? How thorough does the support need to be?
… do we mean vanilla, react, angular?
… Another thing, what is our testing methodology?
… one potential way, test as a series of statements, with particular statements that can be tested via automation

Dave Cramer: https://docs.google.com/spreadsheets/d/1wOe-6MqFk296dCoRUmrd3ghmVXH3yM-i97hFOuGlX4k/edit#gid=1730700660

Wendy Reid: for tasks, we need to figure out how to divvy up the responsibility

Dave Cramer: this google sheet is a first attempt at identifying testable assertions
… there’s a lot in here, testing is going to take significant amounts of time
… ideally there’d be a test lead who would direct and organize
… any volunteers?

Garth Conboy: awesome sheet dave!

Dave Cramer: it isn’t complete

Garth Conboy: our total number of tests would still probably be less than what we had on epubtest
… i’m representing an rs that has chosen not to support js
… i don’t think an rs should fail automatically if it fails the js test

Dave Cramer: js support is optional, so we couldn’t have all tests dependent on js support

Zheng Xu (徐征): i could help create a test epub
… if we had a standard file that could be loaded into each rs, and the results could be collated, that’d be a good start

Dave Cramer: based on my experience in other wg, in css for example, a lot of the tests were donated by the browsers themselves
… maybe the rs could donate some of the tests that they use internally?

Zheng Xu (徐征): well, some of the tests developed for an rs will only make sense for that rs
… but i could contribute to a communal test suite

Wendy Reid: the tests i’ve written to test kobo rs have been posted in public, they’d work
… we have to test all the normative statements (musts, should-s), but we also make informative statements in spec
… maybe we should test these informative statements too
… best practice type stuff
… should these statements form part of our testing scope?
… although we should focus on the tests which get us through the CR process

George Kerscher: we could ask rs to donate their tests and tell us how they perform, and have other people validate these reported results
… rs have a business case for contributing, but we’d also have a safeguard against exploitation

Wendy Reid: we have a tool that someone wrote to pull must and should statements out of the spec for audiobooks/pub manifest
… we should start by asking if we could repurpose this tool for our own spec

Dave Cramer: we need to do a lot of planning before we start writing tests
… there’s a lot of work to do to define scope
… do we do all this in our existing repo? or start a new testing-specific repo?
… some of these things require input from W3 staff contact

Brady Duga: as rs implementor, we don’t really test against spec, we test against epubs
… would it be more useful to go through all the weird edge cases we deal with, and come up with tests based on that?
… not sure what else we could offer, in terms of donating tests
… in terms of js testing, not sure how this would work in practice
… if we get an epub with js, we can just not display the js, and that would also be valid…

Zheng Xu (徐征): we can create subfolders to categorize tests
… populate the subfolders with test epubs

Dave Cramer: this will be a learning process, maybe we could pick a section of a spec and try different testing ideas on this section

Shinya Takami (高見真也): in japan, rs browser based rs just convert epub to images, and display those images in a browser window
… these rs would be inoperable with js based testing

Wendy Reid: rs will be running the tests, but the test results will be relied upon by authors
… so what would an author want to see out of these tests? What would an author find useful?

Dave Cramer: we have a number of people who have expressed interest in helping with testing
… i would help too
… we should consult with Ivan tomorrow
… and then collect a group of people who will be responsible, who could then meet outside regular wg meetings

Zheng Xu (徐征): +1 to baby step

Dave Cramer: and report back to the entire wg

4. FXL Accessibility

Wendy Reid: FX: a11y. One of the biggest problems is that we don’t have a lot of advice here.
… FXL is growing, it’s not going away so we have to deal with it.
… did some reachout via Twitter and got some expressions of interest

Wendy Reid: https://comica11y.humaan.com/

Wendy Reid: Some suggestions. Project, a link to which Wendy will post.
… This project was developed by Human (a studio). Dev met with Wendy to explain their process and challenges.
… That dev said it’s really difficult to develop this content. They used SVGs + captions. Widgets to change page layout, font size — things that were common to EPUB.
… But they use dynamic SVGs that regenerates on the fly. Translates into 30 languages and adjust based on line length.
… Not sustainable. Too much work.
… Most useful feature was captioning — caption that shows across the bottom of the page. They are considering developing it further.

George Kerscher: Comments about how cool this project was.

Wendy Reid: Loved Comicly, but not sustainable.
… also talked to Pablo Defendini who suggested SVGs. There is tooling for SVG (i.e. Adobe). Some work but worth considering.
… also spoke to Ken Jones of Circular Flo who has developed tools that can do the work for content creators. He has a series of tools that follow the basic principles: reading order, captioning, image alt which are then applied to books and can work.
… The work we have to do for FXL is that we have to write guidelines. This will mean more work for content creators. We need to think through what is the most practical for content creators. That makes it tool-able.
… Need to find a solution to bridge that gap.

Brady Duga: How useful is the captioning? How does it help? It looks cool. It takes text out of bubbles and puts it beneath. How does that help a11y?

Wendy Reid: that it also includes the name of the character speaking and a signifier.
… Works well for Manga where we may never get to a place where there is live text. Captions would be useful here.
… Descriptions of scenes is buried in the alt text but could be dragged into Comicly’s captions.
… Does captioning work well for everything? No, but there are likely some good uses.

Zheng Xu (徐征): Can publishers attach reflowable information to fixed image+text? Text could be attached to each page in some way. Without speech bubbles per character it gets complicated so read-along reflowable type information could help make things accessible.

George Kerscher: Was the caption or alt text being pulled through to Comicly’s captions? Managa may be an easier solution than we think. The populatio we server would be pretty happy with that. However when you get into other types of FXL we will run into major problems. Low vision needs character enlargement and no panning.
… Dyslexic readers need spacing/layout changes. Color changes. This won’t work for them.

Wendy Reid: these varieties of content types will inform the scope of what we are doing here. There may be different solutions for different types of FXL content.

Shinya Takami (高見真也): Manga content — very difficult to put it into any kind of reflowable content. It is image + script.
… Japanese language has kanji characters that script in manga page uses different fonts on the same page. Sans-serif and serif types shifts make this difficult.

Shinya Takami (高見真也): add description in the file. XML or SVG can be used to describe the text. Adding descriptions is another approach.

George Kerscher: SVG has a description element already.

Shinya Takami (高見真也): Is it for each page or for whole content?

George Kerscher: Five SVGs need one description?

Wendy Reid: Descriptions for the entirety of, say, a 100-page book. Descriptions mapped with IDs.

Marisa DeMeglio: This has come up before, maybe the EPUB 3 CG. Historically it’s been problematic to create a separate but equal format for a11y reasons.
… SVG would be a problem for FXL or reflowable so what problem are we solving here? Are we creating solutions because there is a lot of content in this format? A circular problem.

Wendy Reid: this needs to be part of the discussion too. What are the barriers to accessible children’s books, for example? It is possible. So maybe we need to provide advice and guidance for authors and RSs.

Dave Cramer:
… The fundamental problem of FXL is really shitty HTML. A soup of absolutely positioned spans containing one letter, ignoring the semantic potential of HTML. Shitty markup made by tools.
… There are a lot of FXL that could be made with simple HTML but that’s not easy to do today.

Zheng Xu (徐征): Worried about reading order in comics. He’s seen interesting content but the order isn’t clear so it’s hard to use. How do we make this kind of book accessible.
… Layouts are so complex that they default to PDF.
… The tooling alone is part of the problem.

Brady Duga: FXL content is mainly bad because so much is comics that is just images. There is no accessibility. Play books has done some work to pull out the text layer to make it bigger. But it’s constrained by the size of the bubble so lots of limitations.
… Some of the tools may already be there. Better descriptions and making those workable is worth working on. Read-aloud is nice for kids books. But not really accessible b/c it only helps with the words, not the images.

George Kerscher: Is there a way to search Manga content? Might the content become more valuable with this kind of captioning content available?

Wendy Reid: Clarifies. Would the ability to search help with accessibility in manga?

George Kerscher: If there is text there, there would be something to search for.

Shinya Takami (高見真也): In Japan, searchability is an ongoing issue that it difficult to realize. It is a problem. Pubs don’t have text for manga content.

Wendy Reid: So that gets down to what would be a recommendation. Putting text in some form back in manga.

Shinya Takami (高見真也): Descriptions in metadata for ebook stores.

George Kerscher: That’s not enough. There is no way to follow the flow of the action.

Brady Duga: Q for George. Searching descriptions of scenes, not just the dialogue?

George Kerscher: Don’t go into too much detail. Don’t kill me with too many words. The Comicly demo was really good.

Brady Duga: Should searching work for description and the text?

George Kerscher: Search what I read.

Wendy Reid: You would want the description and the dialogue to be searchable.

George Kerscher: Comicly had the name of the character when they were speaking so that was helpful.
… All we have to do is provide the guidelines in the specification. We need to provide the functionality and guidelines to do it. Best practices will evolve.

Zheng Xu (徐征): Has seen some real cases where invisible-to-HTML metadata was searchable. Or reflowable chapter that corresponds with image chapter. Not standardized.
… If RSs want to support this, it needs to be standardized.

Wendy Reid: Spine item is an image. Wondering is we are getting into multiple renditions. Transcripts attached to images.
… Alternate HTML or XML file travelling with an image needs to be accessible to the RS.
… The more files you have to produce for content, the more expensive and challenging. Raising the complexity means people might not use it.

Brady Duga: Not certain I understand. You can’t just put an image in a file. Needs HTML or SVG. How much easier can we make it for creators? The formats you can use are already accessible. Creating a new format is a non-starter.

Zheng Xu (徐征): It’s really hard to satisfy all purposes. Accessible and otherwise.
… We need a standard for reading systems.

Shinya Takami (高見真也): manga content in japan can be up to 200 pages, even 400 or 500 pages. Asking pubs to prepare all those descriptions is a lot of costs.

Zheng Xu (徐征): For pubs who want to provide this info, we don’t have standards. This can be resolved by standardization.

Wendy Reid: All we can do is provide a method for publishers to use. They can take it or leave it.
… Any revisions we do for a11y documentation, we want to present ourselves as a resource for pubs who want to meet European requirements.
… Having a spec for pubs who publish FXL content and the tools for publisher who want to move toward that.

Brady Duga: I remain confused. Q of giving best practices make sense. Both HTML and SVG are already accessible. Don’t understand what we need on top of that.
… Those have well known a11y guidelines for how your read them back. Or vague guidelines.

George Kerscher: Adobe has launched liquid for PDF on iOS and Android. PDF is sent to server and returns reflowable version. It’s imperfect. But recognize the need to go from fixed to reflowable for small screens.
… If we could make sure that view of FXL content when it was put into a page that didn’t rely on FXL stuff and reflowed, that would be really good. Is that what we are looking for?

Zheng Xu (徐征): Agree with duga to create best practices. Already 1000s of magazine created. Will they recreate them as SVG? Would be easier to make comments accessible. The way we can add something on top of what exists is to let publisher attach a11y support.
… Make is easier for RS to support.

George Kerscher: We may be looking at something for legacy content to make it reflow. We could develop solutions that are not bolt on but integrated.

Wendy Reid: Doesn’t think this is a specification conversation but the best practices side of the conversation.
… We have the tools. The tech exists. We don’t have to bring in a new language. It’s there. We just have to give advice on how to use them. As zhengxu said it’s important to provide advice so we don’t have 600 ways of doing footnotes.
… We have learned that providing guidance goes well. We know this from the past 20 years. When we don’t provide advice, we have interop problems.

George Kerscher: I could envision building a tool that would take a FXL page and present it as it flows and people would see what a screen reader sees. Could be gobbledlygook. Provide a DOM view of the FXL page for people to proof would horrify people. They could very easily see the issues.

Ben Schroeter: It seems to me that the comic/manga use case needs to have the reading order defined. It can vary widely in those kinds of publications. Imagining an accessible usable experience would be less like reflowable and more like what we do with audio description for video.
… A description of what’s happening visually simultaneous with the speech bubbles.

Wendy Reid: This came up in conversation with Ken Jones. InDesign naturally allows reading order but might export with the wrong reading order.
… Circular Flo did something about this. Why can’t InDesign point this out?
… nearing the end of our day. We will keep talking about this tomorrow and come up with some direction to point ourselves in. Time to call it.

George Kerscher: Good night

Wendy Reid: Thank you for attending first F2F. Hopefully we will be in the same city next year.
… 10am EST is the start of the meeting. Social hour at 9am EST.
… Thank you all for your contributions.

5. Resolutions