Publishing Working Group Telco — Minutes

Date: 2019-02-11

See also the Agenda and the IRC Log


Present: Luc Audrain, Wendy Reid, Simon Collinson, Deborah Kaplan, Joshua Pyle, George Kerscher, Ric Wright, Gregorio Pellegrino, Rachel Comerford, Franco Alvarado, Teenya Franklin, Ivan Herman, Ben Schroeter, Caroline Hayes, Matt Garrish, Tzviya Siegman, Laurent Le Meur, David Stroup, Bill Kasdorf, Brady Duga, Garth Conboy, Mateus Teixeira, Marisa DeMeglio, Benjamin Young, Tim Cole, Geoff Jukes

Regrets: Dave Cramer, Avneesh Singh


Chair: Wendy Reid

Scribe(s): Simon Collinson


Wendy Reid:

Simon Collinson: Wendy: Minutes accepted

Resolution #1: last week’s minutes accepted

1. PEP in a package

Wendy Reid:

Wendy Reid: First topic today is final topic from last week: issue 33 from the Packaged Web Publication repo, about primary entry page becoming optional…
… a quick recap, and clearing up some questions. The main proposal right now is Ivan’s…
… PWPs may give you the option to make index.html or a JSON manifest the primary entry page…
… an alternate proposal was brought up on GitHub, the so-called ‘minimal’ PWP. The index.html only exists to point to the manifest
… this is specific to PWPs – just for the package context, not for web publications as a whole. Does anyone have comments?

Ivan Herman: Matt came with a proposal which I personally consider complimentary to the previous proposal, but some people might disagree
… but I would prefer Matt to make the proposal

Matt Garrish: In the discussion around primary entry pages and whether it’s required, we may be overlapping too much with audiobooks and web publications, expecting audiobooks to always be web publications…
… what I proposed was separating the manifest so the primary entry page doesn’t always have to be present with a packaged audiobook/epub… but it can be
… if the publisher wants it to be a conformant PWP, the publisher can include the entry page
… Ivan posted a better clarification of this morning this morning, which is that the packaged formats are somewhat separate from a WP… the package format doesn’t always have to be a WP
… everything becomes compatible. In its packaged form, the audiobook can be valid. It’s essentially making everything more abstract…
… getting out of the mess of how something remains logically consistent, even if these other formats don’t want all these other WP requirements to be present

Wendy Reid:

Ivan Herman: I want to re-emphasize: everything that Matt said is obviously true, but one important thing is missing: a reading system MUST be able to understand the packaged version of full web publications…
… must be able to understand the primary entry, find the manifest out of that, so that a WP in the traditional sense, by just zipping it, even if I called the manifest file something else (although the index file must be kept) it’s still valid

Laurent Le Meur: I totally agree with Matt’s point. This is conceptual. Nevertheless, when we describe the spec, it will be very complex to express that conceptual model and at the same time to explain what Ivan said: a reading system must be able to understand everything with a primary entry page
… I propose we follow what Ivan suggested before: stating the primary page OR the manifest: one or the other…

Ivan Herman: +1 to laurent_

Laurent Le Meur: which for processing is easy to understand. It doesn’t mean we don’t have to explain this model, but I’d be careful not to make the concept too complex…

Wendy Reid: We’re back to the original proposal. The resolution would be: a PWP may include either the primary entry page or manifest but must contain one of those two

Ivan Herman: I think the proposal has two equally important parts.

Wendy Reid: So there are two resolutions

Proposed resolution: Restructure the document to reflect the publication structure as primary, with web publications and packaged web publications as modules (Wendy Reid)

Ivan Herman: -> Matt’s description for the proposal:

dkaplan31: +1

Tzviya Siegman: +1

Ivan Herman: +1

Geoff Jukes: +1

Joshua Pyle: +1

Rachel Comerford: +1

Ric Wright: +1

Franco Alvarado: +1

Mateus Teixeira: +1

Simon Collinson: +1

Ben Schroeter: +1

Luc Audrain: +1

Bill Kasdorf: +1

Gregorio Pellegrino: +1

Wendy Reid: Resolution accepted

Resolution #2: Restructure the document to reflect the publication structure as primary, with web publications and packaged web publications as modules

Proposed resolution: WP keeps PEP as a requirement, PWP will give the option of using the PEP or the Manifest, but one must be present in the package (Wendy Reid)

Ivan Herman: -> Ivan’s consensus proposal:

dkaplan31: -1

Laurent Le Meur: +1

Ivan Herman: +1

Garth Conboy: How does this fit in with what Laurent just said about or/both?

Wendy Reid: OR/BOTH would work here, but at least one has to be present

Deborah Kaplan: I’m -1 unless it becomes EXACTLY one must be present
… one, but only one, must be present…
… from my experience of working with small producers, people will be confused, which means publications will be wrong, which means reading systems will behave inconsistently…
… creators will have to come up with workaround due to inconsistent implementation…
… the option of having two will end up with badness.

Ivan Herman: The resolution which I proposed said that the processor MUST look for a primary entry page and if it finds it, MUST process according to the rules. If it doesn’t find it, it looks for a manifest

Deborah Kaplan: As long as it’s 100% clear to creators and reading systems what will happen, that’s fine

Laurent Le Meur: I was very clear about that

dkaplan31: in that case I am changing my vote to a +0 from a -1

Wendy Reid: Would it be better to rephrase the resolution as either or but at least one must be present?

George Kerscher: As someone producing a publication, I’m going to start with my manifest. For an audiobook, I zip that up and distribute it to various places and they process it…
… if I want to add a primary entry page then I could serve it up on the web and all is well. To my mind, I’m progressively enhancing the publication

Ivan Herman: That workflow is correct

Matt Garrish: My question is one of consequences. When we require specific names, it’s going to mean that if you unpackage it on the web, you can only do this with one WP in a directory due to collisions
… are we putting a limitation that we got away from earlier back into play – that you can’t have multiple articles in one directory?

Deborah Kaplan: +0 and not +1 because I still dislike giving people choices, because small creators are confused by choices, while meanwhile large publishers can create a PEP trivially as part of production workflow whether they need one or not. But not -1 as long as clear flow is documented.

Ivan Herman: Matt is right. If we don’t have a name restriction, we have to do something to the package itself to find where to start. This is zip, we aren’t having web packaging, so I’m not sure what else we can do

Matt Garrish: It’s a circular problem: if we don’t have specific names, how do you find what you’re looking for - if you have something else finding the names, how do you prevent those from colliding?…
… trying to prevent the index.html problem from re-occuring, but I’m not sure how much of an issue it is…

Tzviya Siegman: This makes me uncomfortable too – it’s something we’ve always tried to avoid doing. In the world of scholarly publishing, if I have a journal of 30 articles, each published on their own, each will have this problem…
… I feel like this is going to come back to bite us…

Benjamin Young: My question was similar: if we have specified names for these things and a tree of inheritance where down a certain road you have index.html and down another road you don’t…
… is it possible to make a web audiobook in that world, or do they no longer intermingle?…

Charles LaPierre: Thinking about a journal made up of multiple article, wouldn’t each article be its own subdirectory, hence no collisions?

Ivan Herman: This whole filing thing reflects that what we’re using is a packaging format that isn’t Web friendly. And we know that, which is why we consider the current format as a lightweight temporary solution…
… I’d be happy if we had today a format which allowed me to refer to a URL for every file, and maybe we’ll have one before I retire. But we’ve agreed to define a lightweight packaging format now, and we have to live with it… we don’t really have a choice
… we could require a specific way of zipping which puts the file first in the zip file, which makes the publication more complicated, because I can’t just take a directory and zip it… this isn’t the solution…

Wendy Reid: We have to find a compromise

Ivan Herman: We have to accept the deficiencies of the system right now

Garth Conboy: I agree with Ivan. We dislike the alternatives more – anything that makes it harder is a no-no…
… there’s a manifest.json and index.html which are both magic names…
… the actual manifest can be standalone or included in the PEP…
… what are the changes that we propose to ensure there is no possible duplication?

Ivan Herman: What I’ve proposed: the first step the reading system does is locate the PEP. If it finds that, then it follows the processing steps that are described in the WPUB document…
… at first look at your own file, otherwise look for a Manifest file and that’s your manifest…

Matt Garrish: We’re making our WP format dependent on the packaging… I can live with this, but what if a better packaging format comes along in future?…
… would we drop these restrictions?

Ivan Herman: If we find a packaging format that allowed that, then yes

Laurent Le Meur: In future, this packaging will be used by publishers as a booster for leaving earth. When the publication is exposed on the Web, pure web packaging becomes important then

Benjamin Young: A general question: are we open to analyzing other formats for web archiving and distribution, the primary component being that they keep URLs around, or continue with zip?

Wendy Reid: If you recall a few weeks ago, we did open up the request for analysis of the different potential formats. They were analyzed based on the pros and cons of that table. If we missed anything in that table, it’s good to know about…
… we made the decision based on the ≈7 formats we looked at in that table

Ivan Herman: Let’s not reopen closed issues. For now we’ve decided to go with what we have, knowing that eventually the committee will produce a webby packaging format
… we explicitly said that if and when that happens, then this working group or its successor will look at it and consider it…
… but we need something today if we want to produce anything before the end of the life of the working group, less than 18 months from now

Proposed resolution: WP keeps PEP as a requirement, PWP will give the option of using the PEP or the Manifest (with rules agreed to resolve any possible duplication [start with looking for PEP, and process that first; if not present look for standalone manifest]), but one must be present in the package. (Garth Conboy)

Ivan Herman: we decided on something and shouldn’t reopen today

Bill Kasdorf: Quick question: if we’re seeing that not all packaged audiobooks are web publications, then we shouldn’t call them packaged web publications, right?

Laurent Le Meur: in fact we don’t.

Bill Kasdorf: Then they aren’t really PWPs, then

Proposed resolution: (Less typos version) WP keeps PEP as a requirement, Lightweight Packaging will give the option of using the PEP or the Manifest (with rules agreed to resolve any possible duplication [start with looking for PEP, and process that first; if not present, look for standalone manifest]), but one must be present in the package. (Garth Conboy)

Ivan Herman: +1

Garth Conboy: +1

Charles LaPierre: +1

Tzviya Siegman: +1

Matt Garrish: +1

Laurent Le Meur: +1

Rachel Comerford: +1

Ben Schroeter: +1

Joshua Pyle: +1

Bill Kasdorf: +1

Tim Cole: +1

Geoff Jukes: +1

Luc Audrain: +1

George Kerscher: +1

Mateus Teixeira: +1

Gregorio Pellegrino: +1

Resolution #3: WP keeps PEP as a requirement, Lightweight Packaging will give the option of using the PEP or the Manifest (with rules agreed to resolve any possible duplication [start with looking for PEP, and process that first; if not present, look for standalone manifest]), but one must be present in the package.

Wendy Reid: We’ve made a decision, with 20min to spare… moving on to our next issue…

Ivan Herman: I propose that we merge the two requests from Laurent whenever he feels comfortable…
… reading that document via the pull request is a pain, and it’s better if we merge it in

Laurent Le Meur: I prepared the merge last week. We can work from that

Wendy Reid: If no opposition, we’ll merge as soon as Laurent is ready

Resolution #4: Laurent will merge the pull request as soon as he can

2. Table of content issue

Wendy Reid:

Wendy Reid: Next issue is related to tables of contents
… posting link to GitHub issue, regarding the location of the TOC…
… should we be directly encoding the TOC into the manifest, or should it be a separate file referenced in the manifest?
… the main issue is formatting. Keeping it HTML allows for rich formatting, but a separate file would be easier to process…
… a lot of user agents aren’t currently using styling, but it doesn’t have to remain that way…
… some providers may prefer a JSON file

Ivan Herman: I have a question to various reading system implementers: at the moment, the TOC is defined to be in HTML and we’ve spent an inordinate amount of time defining the format in HTML…
… my favorite option is to say that that’s where we stop, realizing that this means reading systems must be able to parse an HTML file, extract the TOC out of it, even if it doesn’t use any styling…
… what I have difficulty judging, is it really such a huge deal for reading systems, knowing that these days taking a public domain HTML parsing library and running it to extract the TOC is really not such a huge deal…
… it would make things much clearer if we had one and only one format for the TOC, and we didn’t have yet another option we have to define…
… is it really such a big deal?
… (i.e., to process the HTML file)

Garth Conboy: I’m fine with HTML and I’m also fine with only HTML, just because my take is that any reading system taking this packaged audiobook is likely to also be taking EPUB

Laurent Le Meur: I still agree that if we can make something simple in HTML, something easy to create by the publisher/studio, then I agree with only HTML. But we have to realize that the difficulty is with the authors who have to use this format, and not the implementers.

Geoff Jukes: I would be against (?) HTML due to the assets we receive - hundreds and hundreds a week…
… putting it in the manifest is eminently usable for us

Garth Conboy: It’s a serialization issue: is it that much easier to encode in JSON than HTML? I don’t understand why that’s hard

Geoff Jukes: It’s not hard, but for me it’s redundant in a lot of our use cases. I can’t think of a use case where creating an HTML file is of any use
… but a structured manifest, when we’re processing third party books, extracting structured metadata is easy. In the B2B world, if the specification requires HTML, some people will just ignore it…
… turning HTML as something freeform, into something structured, is harder. Whereas a manifest is structured

Garth Conboy: It sounds like you don’t want a TOC - you just want the manifest

Geoff Jukes: From an audiobook perspective, what publishers produce and what we exchange is just a bunch of files. They have filenames that are supposed to aid in sorting, but it’s super loose…
… we sometimes add what we call a chapter list, but not every book conforms to that…
… sometimes it’s literally just track 1, track 2, track 3…
… publishers don’t necessarily cut at chapter/part boundaries, it could be random or every X minutes…
… the list of files doesn’t correlate 1:1 to a neat book structure…
… every publisher has their own approach to displaying a chapter which might be broken into two or more parts…

Geoff Jukes:

Tzviya Siegman: I’m growing confused about what Geoff is describing. I can think of numerous scenarios for audiobooks where the TOC isn’t necessarily designed…
… a lot of reading systems strip out CSS, but publishers and users want more information than just the filename. Eg if chapter 3 is read by a special narrator, that should be visible…
… we want to align with the work in synchronized media that Marisa is working on…

Mateus Teixeira: +1 to tzviya

Tzviya Siegman: if it’s just going to be a list in HTML, with CSS from the publisher, you can discard the CSS

Brady Duga: One of the uses for HTML: you could have ruby for eg Japanese chapter titles. This would be hard in JSON…
… you can recreate the properties of HTML in JSON if you want to, but it’s hard…
… what we want to do with the TOC is something that doesn’t exist in audiobooks: create a rich table of contents around the audio that has nothing to do with the structure of the audio files…
… I don’t care how the files are actually structured, I want to say ‘here are the chapter breaks throughout the audiobook’ - can reference one or many files…
… I want something analogous to an ebook TOC. It’s not something that exists today in the audiobook market

Garth Conboy: What Geoff is saying is that your classic audiobook doesn’t have this TOC in it - if you want to create a new audiobook with no or a barebones TOC, so be it - but doing a good job of Brady’s enriched TOC clearly doesn’t have to be part of the spec, but we want a way to do it

Wendy Reid: We’re talking about this in the context of the core spec. Maybe a better way to consider this is how each module can deal with TOCs. This problem could look very different in eg manga, academic publications

George Kerscher: I understand with the audiobook that you might get a flat list of filenames - I’ve seen different reading systems throw away the styling and add a machine-readable version of the TOC…
… I totally get lost, I don’t know what the subsections are when it’s just a flat list. The accessibility features of HTML provide me with useful information…
… I can understand that this is a subsection rather than a higher level chapter heading. Not in CSS, just natively in HTML

Geoff Jukes: +1 if optional

Ivan Herman: Maybe it’s not clear: the TOC is optional - this is already the case
… the modular approach (referred to by Wendy) could be ok, provided we make it clear that an audiobook reader MUST understand the HTML version
… every reading system must understand how the TOC is supplied

Wendy Reid: We will continue this discussion next time. No meeting next week

3. Resolutions