Publishing Working Group Telco — Minutes

Date: 2019-02-04

See also the Agenda and the IRC Log


Present: Tzviya Siegman, Luc Audrain, Deborah Kaplan, Rachel Comerford, Ivan Herman, Wendy Reid, Dave Cramer, Vladimir Levantovsky, Romain Deltour, Ric Wright, Nick Ruffilo, Simon Collinson, Laurent Le Meur, Mateus Teixeira, Teenya Franklin, Avneesh Singh, Jun Gamou, Joshua Pyle, Gregorio Pellegrino, Franco Alvarado, Ben Schroeter, George Kerscher, Jasmine Mulliken, Charles LaPierre, David Stroup, Marisa DeMeglio, Matt Garrish, Brady Duga, Bill Kasdorf, Benjamin Young, Zheng Xu (Jeff)

Regrets: Garth Conboy, Caroline Hayes, Yu-Wei Chang (Yanni)


Chair: Wendy Reid

Scribe(s): Nick Ruffilo


Simon Collinson: +present

Wendy Reid:

Wendy Reid: Last week’s minutes - any comments or questions/suggestions? Otherwise, approved

Resolution #1: last week’s minutes approved

1. Issue cleanup

Wendy Reid: Next topics - list of issues that were sent around Thursday. I’ve reviewed issues in the project because we have 68 open issues - and that’s quite a bit. Many have not been touched in a while or there was a consensus that never made it to call…
… so I want to close a bunch of those if we can - so if there are any strong feelings or pet issues, now is the time to speak up.
… If you don’t want to speak up, you can re-open with more details on something.

Ivan Herman: List of issues sent around for closure ->

Ivan Herman: Avneesh comments on one issue about a full accessibility reading order, he isn’t on the call but I wanted to point that out.

Wendy Reid: OK _ I’ll leave that one open. Any other opposition?

Avneesh Singh: Thanks for highlighting, Ivan

Wendy Reid: expect to see a much smaller github issues list.

2. affordances document

Wendy Reid: Next agenda item: Josh and Franco have been doing work on the affordances - especially now that we’ve changed the format of the main document.

Franco Alvarado: I can speak to this. We’d like to go through the current UCR requirements and document what the user agent behavior should be. We’d like to be more explicit about the UA behavior. We’d like a rep for the different user agents to help us to determine how the UAs should behave.
… that is what we spoke about and would like to bring to all of you

Tzviya Siegman: In case you missed that - that was a call to action to the user agents here. We talked about having an implementors meeting, for those who think we shouldn’t specific user agent here, we’re trying to set expectations about what is needed.
… this is based off feedback from browsers where they wanted to know what it is they needed. This doesn’t have to be a huge commitment - maybe just one meeting, but we would love to hear from kobo, blackstone, google, microsoft books, Readium, etc.

Laurent Le Meur: rebus

Tzviya Siegman: but if you could be there… We can set up a time that works for everyone.

Dave Cramer: speaking of everyone, but something of extreme interest to everyone in the group - I guess I’m uncomfortable of a ‘implementer only’ call.

Ric Wright: +1 Dave

Wendy Reid: We could just make it the agenda for a meeting in a couple weeks.

Tzviya Siegman: that works for me - as long as the implementors are welcome to speak up in a regular call

Ric Wright: I will participate in a larger call.

Wendy Reid: NickRuffilo: I like the idea of using a regular meeting for this. Make sure that time slot works for the implementors as well. I know it is difficult to get everyone together.

George Kerscher: Would it - Microsoft was… I don’t know if Apple has or other implementors - should we reach out to them and let them know that this is happening?

Wendy Reid: Why don’t we set up a meeting for a few weeks from now. And then sound out any invitations to other implementors who might want to be there. Is that agreeable?

Nick Ruffilo: +1

Laurent Le Meur: +1

Avneesh Singh: +1

Ric Wright: +1

Ivan Herman: +1

Joshua Pyle: +1

Jasmine: Is this something a publisher without a specific platform (other than web in general) should or would be part of?

Wendy Reid: Non implementors could raise questions or comments.

George Kerscher: Would we want the browser developers to be there?

Tzviya Siegman: The goal here is still writing the use cases document. We’re not talking about real-world implementations, we’re worried about getting this documented so that implementors know what to do. If Apple doesn’t show up, it doesn’t mean they won’t implement…
… they just aren’t putting their input in. We need to get the use cases written, so I don’t want to push this out too long. We’re talking about calling it User Agent Behavior to clarify what it is we are talking about. We need to get to real-world implementations, but we need to tell implementors what we want first
… but if we invite people who might not have been following along, there might not be much to say.

Laurent Le Meur: Which UCR will be working on? The new drafts?

Laurent Le Meur: old one

Laurent Le Meur: We have the old pwp-ucr, but is that totally obsolete so far?

Joshua Pyle: I hope not, because we’ve been working on it.

Laurent Le Meur: are we discussing from the pull request?

Benjamin Young: this one is the latest and greatest (afaik)

Wendy Reid: we are just modifying the original

Benjamin Young: the TR stays the same. Editor’s Draft has the updates

Joshua Pyle: OK - the new one is different, so the DPUB-PWP-UCR is the right one, not the old one.

Wendy Reid: OK we will make this an agenda item, and we can invite implementors, but if they don’t show up it is not the end of the world.

Dave Cramer: Are we at a place where we should republish the UCR?

Ivan Herman: So far, what we said, is that we will republish the document on the same PR - so taking over from the DPUB Interest group. It’s a note, so we can do it without further ado. but that’s what we discussed in the past.

Dave Cramer: Can we publish now?

Ivan Herman: doesn’t depend on me

Wendy Reid: Can we resolve to publish the current draft of the UCR document in advanced of discussing it?

Ivan Herman: it depends on the editors, but I ask for a 1 week delay because i’ll be away starting tomorrow.

Joshua Pyle: I would vote to publish what we have because what we currently have out there is very old. As the editor, I was confused about how old the TR - my vote - I want to know what Franco and Tzviya have to say…

Ivan Herman: As I said Josh, it cannot be done automatically, it requires my intervention, and I cannot do it this week, but next week i’m happy.

Joshua Pyle: I can wait a week. It’s two, what’s one more week?

Proposed resolution: We will publish the latest draft of the UCR document to replace the out-of-date draft (Wendy Reid)

Wendy Reid: As a resolution we’ll publish the draft of the UCR document next week.

Joshua Pyle: I just wanted to ask 1 question about the title. The title is PWP UCR - the whole PWP is deprecated - shoudl we retitle the document?

Ivan Herman: Yes

Joshua Pyle: So it’s the WP UCR? Or the DPUB UCR?

Wendy Reid: WP
… Everyone in agreement?

Tzviya Siegman: +1 to publish

Nick Ruffilo: +1 to publish

Bill Kasdorf: +1

Franco Alvarado: +1 to publish

Mateus Teixeira: +1 - publish

Ivan Herman: +12

Joshua Pyle: +1

Luc Audrain: +1 to publish

Ivan Herman: +1

Jasmine Mulliken: +1

Rachel Comerford: +1

Benjamin Young: +1

Dave Cramer: +1

Deborah Kaplan: +1

Resolution #2: We will publish the latest draft of the WP UCR document next week

3. primary entry page in a package

Wendy Reid:

Wendy Reid: Next item - the primary entry page. If anyone is not prepped up on this, it’s because the github is in the Packaged web publication github - so I don’t blame you for missing this one.
… picking up on the github comment, we might have come to a mild consensus. Ivan had a great comment: The question is about the primary entry page - and whether it should exist or not.

Wendy Reid: The current consensus proposal: Either the HTML entry page or the separate JSON-LD Manifest MUST be in the package. It is not required to have both, though. The user agent MUST first look for the entry page with the name index.html if found, it MUST follow the rules as described in the WP spec to extract the Web Publication Manifest; otherwise, the user agent looks for the Web Publication Manifest with the name manifest.json to extract the Web Publication Manifest.

Wendy Reid: Ivan do you want to elaborate?

Ivan Herman: Are there any questions?

Benjamin Young: If we disallow the index.html or make it optional - we’re going to make some sorta-works-on-the-web and sorta-not - so I’m finding it confusing. The more we diverge from ‘unzip this and put it on the web’ VS the naming conventions of a zip file…
… so some of this is related to our vision of packaging - if this is a web package, or a package for moving around pieces of the web, that are themselves webby and have a 1:1 mapping to something on the web. Or if something has webby pieces but isn’t meant for the web, like epub…
… so defining which one we are after defines if the landing page is browser ready is required or not.

Deborah Kaplan: To speak to the same concerns as Benjamin - there are - if we say that index.html is optional - then we are … by doing so - we are saying that either it won’t natively work - or be openable in a browser - or we have strong expectations that the native browsers will become WP aware…
… because this is a standard… OR - that we are saying - they are webby as long as you have a polyfill or some sort of extension in your browser. Those are the things we are saying if index.html are optional, but only if we’re trying to say one of those things.

George Kerscher: I think I have the same question as Deborah - but stated my own way - if there is no index.html, and the JSON-LD is referenced, the expectation is that the index.html will be generated from the information in the manifest. Is that correct?

Laurent Le Meur: Yes

George Kerscher: would that be a script/polyfill that would ship with the manifest? Or would we expect the browser to fill that.

Wendy Reid: It could be either-or

Laurent Le Meur: As you said, it could be either or. We’re speaking first about the package. So the package would not be consumed by the browser. It would need to be translated first. The conversion will involve the generation of a landing page.
… it can be done statically or dynamically. By having the entry page option, we care keep it mandatory and generate automatically.

George Kerscher: If the script is transferred in the package, that would auto-generate the index. Would a browser who is aware - would they know not to do that and run their own?

Bill Kasdorf: this is sounding messy

Deborah Kaplan: Bill_Kasdorf agreed

Laurent Le Meur: it’s open - and it’s outside the packaging mechanism itself. If we set the scope is ‘what do we put in the package’ then it’s out of scope - if it’s ‘what will the user agent do’ then it is a WP aware agent? Then it could even process something that has no index.html. I don’t think it’s the best, but it could. If not WP aware, it must have an embedded reader, and this reader could be inside…
… the package - but it depends what we do when we explode the package. It comes down to what we do within the package.

Ivan Herman: Lets look at it from the history - It’s easy to get lost in the details. As an aside, we are not talking of making the primary entry page optional - so something that is on the web MUST have a primary entry page. Lets put that issue aside.

Deborah Kaplan: thank you for clarifying that Ivan; that was unclear to me!

Ivan Herman: The issue is - if you look back - the whole discussion that has been going on - is if you have a publisher who publishes an audiobook, and the only thing they want to do is publish it as a package thing and carry it from one place to another, and they don’t care about the web - for those, having the primary entry page is completely unnecessary burden…

Geoff Jukes: +1 Ivan

Ivan Herman: we can say we don’t care about that use case, and we care when everything is on the web, but I don’t think we should. We should have a situation where we say: “if a publisher cares about publishing their content on the web, they should zip it up and it can be a perfectly valid audiobook…

George Kerscher: Based on Ivan’s explanation, I am a happy guy.

Ivan Herman: and the UA should be able to render it. But if you have a publisher who doesn’t care about the web, then they should be able to create a WP without the index.html. If you take a package without it, then the primary entry page can be done pretty easily. And that whole thing is orthagonal if the browser works with a polyfill or not.

Bill Kasdorf: so we are saying you can distribute a publication on the web that is not a web publication. I buy that; e.g. a PDF can be distributed on the web.

Laurent Le Meur: great explanation Ivan

Ivan Herman: that is not what we’re talking about. We want to give an option to publishers who do not want to publish to the web. That’s the sort of consensus solution we came up with - no more no less

Tzviya Siegman: Thank you for clarifying Ivan. I think that when we put this in the spec, we need to make it much more explicit what the UA does when it encounters these things. What happens if you encounter both. I expect this is where the concern is coming from - right now our primary instance is audiobooks…
… in the world we’re living in, it’s just something that lives in the package. We don’t have the use case of the audiobook written on the web or living on the web. I think that we’re hearing - I’m concerned that if we have other types of publications - this duality is going to create problems.

Mateus Teixeira: +1

Tzviya Siegman: so if I want scholarly packaged material, having these options will create problems down the line. So the concern is about tunnel vision when it comes to audiobooks - so we’re doing things in a way that is too closely tied to audiobooks.

Bill Kasdorf: +1 to Tzviya

Bill Kasdorf: if we do this we just need to be very clear that that audiobook is not a Web Publication

Dave Cramer: I think that variety can help us. We have an industry need for packaged audiobook format that seems largely to industry publications. There seems no need for an index.html. I fear that trying to use the full machinery for web publications on this use case will prevent adoption or scare publishers away. We don’t have to use every piece everywhere. We’ll have…
… more flexibility if everything has to use everything - such as having an HTML page where we don’t need it out of theoretical purity. We try to solve our audio use-case as best as we can. It’s best if we can find simple pieces that work in large spots of our use cases, but we won’t find one solution for everything…

Ivan Herman: +1 to dave

Dave Cramer: i don’t want cross pollination making things worse.

Benjamin Young: A bit off what Dave said - and Laurent’s point of polyfills, if it’s for audiobooks not on the web, eschew the index.html - but since it’s for audiobooks, and possibly on the web, we’ll need the HTML page to polyfill…
… to assume something somewhere will generate an HTML file - it won’t compute. The only way to polyfill is via a javascript, and you do that through an HTML file, otherwise it’ll just show you the JS code - so html is the conduit…
… including things that read JSON. Sure - browsers display HTML files for images, but images have been here forever. So to think that browsers will display HTML files for the custom JSON we’re creating… that’s a bit far…
… so when it comes to audiobooks, if we just want something that reads audio files, in a package, then lets create something new with a different name, that handles those things properly… But if we want to focus
… on browsers and make things work on the web, then lets focus on that use case.

Nick Ruffilo: I like bigbluehat’s comment, we’re focusing on web publications, we not only have to think of the now and the past, but also the future
… nothing is stopping a publisher from using things against spec, if someone is only using it to package audio, and may not have a index.html
… anyone who is new, and wants to develop an app that reads audiobooks in the browsers, they would have an index.html to reference to and it allows for availability
… maybe it is too much for implementors, outputting an HTML file does not seem like a stretch for software, it is not rocket science, I think as much as we can keep the web in mind, there is value in that

Deborah Kaplan: - I will not -1 on Ivan’s language - it’s not a hill I’m going to die on. I think it’s a bad idea to reopen the issues at writ large. We have resolved some things and I don’t think they are broken…

Ivan Herman: +1 to dkaplan3

Tzviya Siegman: +1000 to dkaplan3

Deborah Kaplan: what I do think is true - in my experience - working with usually small publishers. One thing that is most confusing is anything that is optional. In practice, reading systems will not deal the same way with things that are optional…

Mateus Teixeira: +1 dkaplan3, bigbluehat, NickRuffilo

Ric Wright: +1 tzviya

Benjamin Young: +1 there is no SHOULD or MAY there is only MUST

Deborah Kaplan: perhaps if we say something like: “this is optional ONLY if there are no HTML pages in the document…” because anytime anything is optional, publishers will get it wrong and reading systems will not get it consistently - take for example covers in epub…

Matt Garrish: A lot of what I was going to say has been said. The problem that we’re trying to solve is the distribution of audiobooks. The web is a nice to have. We’re trying to enforce that everything must be a web publication. We should be looking at harmony of the two.
… if the primary is distributing audiobooks, great, but if the publisher wants to publish to the web, then we should make an easy path for that - but I would rather find a harmony in terms of allowing the author decide what it is they want to do.

Ivan Herman: +10000 to matt

Luc Audrain: +1 to Matt

Matt Garrish: let it work for audio publishing, but not disallow the index, but don’t push everything into one box because then there is too much compromise for requiring the index.html

Avneesh Singh: the entry page has another important function - it gives access to the non-WP aware browser. It was not the original plan, but it became the gateway for a non-WP aware agent to actually display the content to the user
… if we say it’s optional, in a way, it looks OK, but it means we’re saying that the entry page can be neglected forever. So the question is - how much effort are we asking them to make…

Benjamin Young: +1 to Avneesh

Avneesh Singh: it can be easily automated by a production tool. Slowly we can encourage them to make a TOC and have it more valuable to a non-WP aware browser, but it’s hard to get people to come back to that later.

Mateus Teixeira: +1

Avneesh Singh: I would like to discuss again - and all of these problems are focused that we’re focused on audiobooks. We don’t want things skewed because of this.

Deborah Kaplan: +1 that it can be easily automated in production

Wendy Reid: +1 Avneesh

Ivan Herman: A number of things - We are influenced by the Audiobook profile, but I don’t think that it’s true to say that we’re working exclusively with the audiobook profile. For example manga may want to be distributed only via packages, which hit only the same problems as audiobooks.
… Nick said some things I don’t agree with - the text that I propose made it very clear - and it must be part of the document - a user agent MUST understand a primary entry page and handle it accordingly. A compliant user agent can understand a WP. The choice is up to the publisher and the publisher only…
… it’s up to the publisher if they want to address a specific area or not. A scholarly publisher will have a primary page and it will handle it. But the publisher does not want to go that way because the only constituency only have special vendors, then they won’t have to do it… This is the only goal…
… you might say that adding that extra primary entry page is no big deal - but on the other hand, if there seems to be no consensus in having that, why forcing it?

Geoff Jukes: Speaking as an audiobook publishers, aggregator, and distributor. From our perspective - the index would be an indicator if the publisher has desires on how the audiobook is presented.
… I could well imagine a publisher wanting to theme all their books with the same font. Certainly, home publishers will never have the warewithall for making index.html for all their books - they will rely on aggregators to “package” their books in the way they see fit.
… From my perspective - i came from the data driven point of view - the manifest is the most important part of the publication. From a data perspective, how to deal with it. What you call reading order…
… any additional items like bonus materials - it’s described in the manifest. The index.html would be this desire for themeing of how to present. As a distributor, we have our web sales. We read in a JSON file now - then render the content themed for our site…
… we use HTML5 to pull in the audio as needed - it’s a thin client. In summary - the manifest is the thing that describes everything. The index, which seems optional to me, would be a way for publishers to define their theme

Bill: I hope a simple question - What we’re calling the audiobook format - the simple format without the index.html, is there anything about it that would prevent us from distributing any proprietary not webby content.

Wendy Reid: If you’re referring to the packaging format - that’s not audio specific

Bill: so it could be any non-web content. This is why I’m wondering how it can be part of our web-publication work. Unless we specifically say ‘we’re addressing non-web publications in our work’

Nick Ruffilo: +1

Bill: Two common formats that I want to get on the table. PDF - a perfectly fine way to distribute PDFs. Something more amenable to this group - is it a way to distribute a 3.N that is not a web-publication. It’s still an XML, etc. You see where I’m going

Laurent Le Meur: ok that’s what I wanted to say also

Ivan Herman: Reacting to what bill said - if I take a web publication today - if there’s a resource that’s accessible to the web - like PDF, that’s already a web publication, it doesn’t matter what resources you refer to.
… so we can have a web publication today that refers to a PDF or an epub. I don’t think that’s such an issue in this case. Let me go back to a question to Geoff. If I take the position that we MUST have an index.html, the minimal index.html is, as I said, 5 lines with no body and a link to the manifest file…
… We could, and that was part of the discuss, require the minimal index.html be part of the package and then close it - the feedback I always got was that the audiobook publishers will not want to do that, they’ll run away screaming…

Geoff Jukes: I cannot speak for the publishers, but for blackstone, the idea of injecting an index.html is trivial. We have audio delivery systems that injest 10s of thousands of books, and we repackage them for our distribution partners.
… if the index.html becomes required - because it doesn’t describe the book directly, it will be difficult. What we end up doing is that we end up generating the file itself…

Ivan Herman: The manifest, as we define it, there’s just an index.html that refers to the manifest and allows the browser to read and find that page.

Geoff Jukes: I agree with that and from my perspective, I would turn it on it’s head - if there’s an index… For me, I would treat the manifest as - if there was an index or multiple index, they would be linked within the manifest…
… so the manifest would be mandatory, not the index.html and all browsers and such look for the manifest, and then figure out the landing page from there. Are you following?

Wendy Reid: We are out of time, we’ll continue - if there are additional comments, comment on the issue.

4. Resolutions