Publishing Working Group Telco — Minutes

Date: 2019-01-07

See also the Agenda and the IRC Log


Present: Dave Cramer, Tzviya Siegman, Nick Ruffilo, Yu-Wei Chang (Yanni), Gregorio Pellegrino, Wendy Reid, Laurent Le Meur, Deborah Kaplan, Wolfgang Schindler, Jeff Buehler, Romain Deltour, Ivan Herman, Luc Audrain, Benjamin Young, Teenya Franklin, Joshua Pyle, George Kerscher, Bill Kasdorf, Zheng Xu (Jeff), Garth Conboy, Marisa DeMeglio, Ben Schroeter, Caitlin Gebhard, Franco Alvarado, Geoff Jukes, Brady Duga, Mateus Teixeira, Avneesh Singh, Ric Wright, Charles LaPierre, Tim Cole, Derek Jackson

Regrets: Rachel Comerford


Chair: Wendy Reid

Scribe(s): Nick Ruffilo, Wendy Reid


Wendy Reid:

Wendy Reid: Happy new year, welcome to 2019. Lets approve last year’s minutes… Comments?
… Approved.

Resolution #1: last meeting’s meeting approved

Wendy Reid: We have a new member today - Geoff…

Geoff Jukes: I work at Blackstone audio. We do audiobooks and have for nearly 30 years. We’ve packaged/repackaged audio files for several years, and worked with many publishers since then. This kind of format is new to me and I’m not really sure how the w3c works, but I thought it would be good for me to join in.

1. Audio TF discussion

Wendy Reid: The audio task force got together to have a follow up from the last meeting to get ducks in a row about packaging. To summarize what we discussed: The main focus is OCF-lite. Nothing decided, but OCF-Lite is the forerunner.
… Whatever we did with OCF-lite, we would keep it separate from the EPUB one, we would use EPUB 3.2 as the basis, but we want it separate to reduce confusion. it’ll be only for audio (could be for WP, more decisions to be made) but it’s being used as a basis, but not related to epub in any way.

Wendy Reid:

Wendy Reid: As Ivan mentioned, as it’s not in our charter to create a packaging spec, so we have to be cautious. As a group, we did expand the packaging for audiobooks wiki page a little bit. The one thing we did - we added requirements & success criteria.
… The group could help provide a few more criteria for success and requirements we need to meet to see where the potential options line up with our needs. We need to define success and requirements. One thing that came up - do we proceed in audio only or make a WP-wide solution…
… Our requirements for audio maybe incompatible with other WP packaging, but that needs to be discussed. Laurent…?

Tzviya Siegman: OCF-lite draft ->

Laurent Le Meur: The document I drafted is a simplification of OCF 3.2… I took the spec as-is and removed everything that was only tied to epub. I supressed meta-inf directory, all XML files from the meta-inf (container, etc). I also removed the mimetype file.
… The mimetype signature file comes from ODF and it could be ready by some systems but it’s not, and most reading systems don’t use it, and it’s more difficult to put inside, so we removed it. I also removed font obsfucation.
… I did add a file entry.html and the JSON LD file. I located both at the root of the package. I just also did a small specification on the way relative URIs work from the manifest on the entry page. It’s the equivalent of what’s in the OCF spec. It’s compatible with OCF. Tied to web publications and not epub files.

Ivan Herman: I very small tech question - in OCF we had stuff that there was 1 file that shouldn’t be compressed… (the Mimetype file) That’s gone, correct?

Laurent Le Meur: Yes - that was the mimetype file, it’s gone from this spec. The processor, which wants to open the zip can know more than it’s a zip file, but in the real world, no one was using this, so we felt it wasn’t necessary.

Dave Cramer: Quickly on the last point. I have found reading systems that won’t open an EPUB if you change the capitalization of mimetype. OCF without mimetype and container then just becomes a zip file with strict filenames and only certain compression types. So it limits things you can use within zip. The current OCF-lite draft also forbids embedded manifests, since the manifest must be a separate JSON LD file.
… Copying from OCF right now is complicated. I’ve been reviewing it in detail from the EPUB3 perspective - and I think there are lots of ambiguities.

Tzviya Siegman: Building on what Dave just said - I’m wondering what the benefit of writing OCF-lite vs ZIP.

Laurent Le Meur: Zip is an application note - not a standard. It has many options that no one wants to use or no one uses much. Like encryption, splitting, and there are strictly no constraints on file naming, so what has been done in OCF and what has been replicated, is putting constraints on ZIP to make it easier for authors and reading systems to read/write.

George Kerscher: Is there any requirement on the order of files within the zip in the container?

Laurent Le Meur: Not any more.

Brady Duga: I did want to beat a dead horse and point out that the mimetype magic number is used, today, in production code that a large portion of the world uses. We don’t have to include it, but we shouldn’t decide not to use it because it isn’t used. it is, just not by reading systems.

Garth Conboy: I was going to say something similar. I don’t find the mimetype burdensome — I think what you’ve done Laurent is great — removing meta-inf, certainly for audiobooks, but I can see keeping the mimetype and changing content for audiobooks may make sense. The last comment is maybe targeted best at Ivan — there might be more testability from 3.2 So maybe not draft something that looks like a packaging spec, and just point to 3.2 and note the ch[CUT]

Ivan Herman: For other reasons we looked at the charter again, and the charter does not forbid us to do a spec in this area, it’s not the charter that forbids us, but the comments we got at chartering that were very much against it. It is still to be checked, I would leave this issue open for the time. What makes it clearer for the user and the reader to handle.

Dave Cramer: Doing something in markdown and explaining the concepts might get us better feedback from various communities

Tzviya Siegman: +1 dauwhe

Wendy Reid: OCF-lite still remains to be one of the options we do have. We are still exploring other options as listed in the wiki. Another big one we’re hoping for an explainer in is HEIF. We’re hoping to get that in a few weeks as a counterpoint.

Ivan Herman: With OCF-lite and the others, My view of this is that what Laurent did is not only for audiobooks. We’ve had these discussions that yes, we’ll focus on audiobooks, but whatever we do would be, should be, maybe reusable for a manga format within a year or so…
… OCF-lite has this generality which is good while we are waiting for something better, but going down anything audio-specific puts us in a corner that I would not like.

Garth Conboy: +1 Ivan!

Laurent Le Meur: I would add that going through the OCF-lite is also compatible with OCF, so a standard book would be compatible with EPUB3 and the WP serialization. If you have both EPUB3 files AND the manifest in the same container…
… and you have one set of XHTML files, then you’ve got something that is compliant with EPUB3, web publications, and possibly even epub2. So with one container you have all.

Tzviya Siegman: My hesitation with OCF/OCF-lite, while it is compatible, what happens when we look at the stuff going forward. What happens on a website? How does this unpackage on a website? We are supposed to be working on web publications…
… the format we’re hoping for is web packaging (which doesn’t exist yet) but we have to have this as a consideration before we decide on OCF-lite. Even if it has a layer on top of it (javascript overlay…)

Laurent Le Meur: I put some comments in the document — they are not used for web publications on the web — it’s used for packaging. Our charter notes that we have to deal with packaged web publications. It’s not about packaging something on the Web, it’s about packaging something that will be on the web soon. So it’s about logically moving a package that will be on the web.
… Then you need middleware. We have streamers… Yes, you need software set aside that can dynamically create a document from the package.

Garth Conboy: I view this as interesting in the package world, but I view tangential functionality in EPUB. This effort could get some traction in the short run. Reading systems now can inject epubs - so loading an audiobook package this way is not a huge step…
… In the future if we get back to WP proper, maybe enough time will have passed that web packaging is real…

Luc Audrain: +1 to Garth

Ivan Herman: I must say I share the problems of Tzviya. I understand them. The way I see is slightly different from Laurent — we are forced into a corner. From the Web point of view, it would be much nicer if we had a web-packaging.
… E.g., we would expect a bunch of security issues are taken care of, but that’s not there. We need a solution right now for which OCF-lite cuts it perfectly. What will happen in 2 years from now, because I don’t expect it earlier…
… in 2 years, when Web Packaging becomes a Rec, we can just say it’s an alternative packaging format for audio or anything else. Maybe we don’t have to, but the user community will vote with their feet. They will have 2 possibilities…
… we will see which one we will alter, but we cannot wait 2 years. I must admit that I go into this with big reservations as well. On the other hand, that’s where we are.

Dave Cramer: OCF does not solve the problems we need to solve for web publications. EPUB is running into difficulties because we don’t have a clear solution for origin, for example. We also have lots of non-web use-cases and OCF-lite and some other packaging mechanism that doesn’t involved browsers seems useful.
… Audiobooks which, is just a bunch of MP3, or comics that are just JPEGs, doesn’t require complex tools. We may need different mechanisms for different use cases. I suspect a single mechanism won’t solve all situations. What google is doing is an overkill for some situations…
… good luck telling an audio book publisher/author that they have to create HTTP response/requests - that’s not going to work for everyone. We have different use cases based on the types of content.

Garth Conboy: +1 too

Laurent Le Meur: +1 to dauwhe

Ivan Herman: that is actually a very good point, dauwhe

Geoff Jukes: +1 to dauwhe

Laurent Le Meur: Ivan, you said that web packaging we have to wait 2 years. More than that. Once the spec is there, it’ll take time to get into browsers, then in authoring software, etc. With HTTP security, it could take up to 4 years…

Dave Cramer: It’s already partially implemented in Chrome.

Tzviya Siegman: Wendy mentioned in her email, or at the beginning, that we might consider doing something different. The more I learn about audiobooks, I’m thinking it’s smart. Especially with Geoff’s feedback, I’m wondering if we need to consider something slightly different for audiobooks…
… or make the specification so stripped down that audiobooks and the details are really audiobook specific…

Garth Conboy: +1 Tzviya

Tzviya Siegman: instead of arguing to what is best for web publications, we clean up the web publication, and make it primarily about the data in the web publication and the audiobooks. It’s possible that is the focus…

Wendy Reid: +1 tzviya

Bill Kasdorf: +1 to Tzviya. ODRL took the same strategy: very stripped down spec designed to be augmented for specific uses (e.g., news)

Ivan Herman: Just reacting on the last thing — I would prefer not to go to this extreme. I would prefer one packaging format. The reason why I was on the queue is that the points Dave made are good and important. Dave you may want to talk to the editor of the explainer…

Dave Cramer: Tzviya said something about web publication spec may be focusing on the manifest… I might go a little further there we’ve all experienced some difficulties in the group over the last years reaching consensus on our vision and what needs to be done…
… it feels like there is a core of stuff that is required by everything we’re talking about. Every sort of publication from audiobooks to web pubs to digital sequential art depends on a bit of info - having an ordered list (reading order)…
… that reading order can be expressed in many different serializations - epub has the spine, we have a JSON serialization in readium. Blackstone’s audio format has YAML data structure… Maybe we can have this model for structural metadata that can be serialized in different ways and packaged in different ways…
… which might get us out of the trap of User Interface issues. If we have some really simple data model, we can go experiment and build things - because we don’t know what’s coming after that…

Laurent Le Meur: +1 to dauwhe, building on a conceptual model is key

Nick Ruffilo: I have recently decided simple is always better, having worked with EPUB on all sides, spec is there and people will do what they want
… what is important is that the simple and basic stuff works, and things did and didn’t work in more complicated areas, if we keep it simple, accessibility is handled and a bulk of things are handled, and handle the exceptions from there
… decide on our own that exceptions are handled as needed by the industry

George Kerscher: On the accessibility side - i’m highly in favor of simplicity as well. The JSON file and the HTML file should not preclude the ability to provide a text transcript. As soon as we get into textbooks and such, it’s necessary. As long as we don’t preclude it. The same goes for comics/manga…
… as long as we can have descriptions of the images, that’s good

Tzviya Siegman: RABIT HOLE. I would like to help bring back to the agenda. We’re still looking at packaging in general. It sounds like lots of people like OCF-LITE (Zip with restrictions?) but there’s some hesitation for Web Publications, but people want to ideally use the same package for all WPs…
… and the big concern is that - it comes back to the question of if there will be middleware…

Tzviya Siegman: Maybe we want to start making the success criteria for packaging.

Laurent Le Meur: About accessibility - we will be ready to address the accessibility by having multiple renditions. Yes it will be good for audiobooks and comics as well - to have a full transcript of the content. It’s to be studied…
… Tzviya you were saying there is an issue with having a packing format - having the same for all different types. the epub file format won’t be replaced by this new container because epub works - it’s only for new kinds of publications - audio and comics. Because epub doesn’t work there - which is why we need something else…

Luc Audrain: +1 for new kind of publication

Laurent Le Meur: If we consider that, we create a new container format. We don’t say it will or won’t be used for other - we know it won’t be used for that. Then there won’t be any more friction.

Ivan Herman: The reason why I would be a little bit hesitant about having an audiobook format is the answer to George’s question - with web publication and the OCF-Lite, it’s a clear yes. It’s not problem to add text information to the package, because it’s a resource just like an audio…
… OCF-Lite handles that the way Laurent has handled that. As soon as we begin to lose the generality, and say ‘lets have an audio-specific format’ it might bite us later with the same question George asked.

Nick Ruffilo: If OCF-lite is a simpler format, and would work with the content of an EPUB, and I am a future publisher creating my content, why would i not want to package my content in OCF and have it work anywhere.
… What does EPUB have that is better? There might be cases where the full OCF is needed, but if we are going to get traction and implementors do OCF-lite, if it’s good enough for 90% and accessible, does this become the default and OCF becomes edge cases

Garth Conboy: It raises an interesting point. We can’t replace OCF with OCF-lite today, because EPUB is XML based, and we have to find the container - and all these things Laurent proposed removed. If there’s an EPUB 3.5 effort, and we get nice traction with audiobooks, back in IDPF days ‘browser friendly format’…
… if we do something like that - a half-step for EPUB, I can see OCF-lite - we have reasonable agreement that it’s OK for Audio. We should go there, and learn, and if that comes back to an EPUB 3.5 - more power to us but not sure it’s a bridge we can burn yet.

Wendy Reid:

Wendy Reid: One thing I’ve done is that we’re learning towards OCF-Lite, but I’ve created an issue that is packaging for web publications and success criteria. Lets check our boxes and define what we need for Audio, WP, comics, etc.
… Lets define success, lets define what requirements need to be met. Security, Accessibility, and understand fully the decision we’re making. It could be OCF-Lite, but lets do the due diligence.

Ivan Herman: Agreed. 1) Somebody should come up with an initial list. The worst thing is to have a blank page. 2) Let us give a time limit on this. I would say 2 weeks. In those 2 weeks, we get what we get and then in 2 weeks, we process it.

Luc Audrain: +1 to time limit !

Wendy Reid: I will add my initial list to the first comment. Then I will work from there. I will also add the time limit to the issue.

Garth Conboy: I would like to constrain us on Audiobooks and possibly comics as well. We’re not looking at a packaging for WP - we don’t have a constituency for WP and PWP. If, longer term, web packaging might be something, lets not get off in the weeds if we look there..

Wendy Reid: Next week - the rabbit hole will be discussed. (Reminder: 2 weeks is MLK day in US, no meeting) Also a reminder - the F2F in May is May 6-7 (Monday/Tuesday) which is graduation weekend for Universities - so book hotels EARLY if you can…

Garth Conboy: I didn’t get a list of locations/Hotels. The meeting is scheduled to be at Google in Cambridge. I poked around a little and didn’t see anything super great. I know Ivan got something nice in the Boston side…
… There are still good deals in - mass transit works pretty well, there is a T-stop right at Kendall square where we’re meeting…

2. Resolutions