Publishing Working Group Telco — Minutes

Date: 2018-04-30

See also the Agenda and the IRC Log

Attendees

Present: Wolfgang Schindler, Nick Ruffilo, Deborah Kaplan, Jasmine Mulliken, Derek Jackson, Avneesh Singh, Ric Wright, Matt Garrish, Jeff Buehler, Ivan Herman, Dave Cramer, Laurent Le Meur, Adam Sisco, Jun Gamou, Franco Alvarado, Zheng Xu, Tzviya Siegman, Bill Kasdorf, Peter Krautzberger, Lillian Sullam, Ben Schroeter, George Kerscher, Marisa DeMeglio, Benjamin Young, Joshua Pyle, Heather Flanagan, Chris Maden, Hadrien Gardeur, Brady Duga, Garth Conboy

Regrets: Mustapha Lazrek, Ben Walters, Luc Audrain, Tim Cole, Murata Makoto

Guests:

Chair: Tzviya Siegman

Scribe(s): Nick Ruffilo

Content:


Tzviya Siegman: Lets get started - First the meeting notes - Any comments?

Tzviya Siegman: minutes: https://www.w3.org/publishing/groups/publ-wg/Meetings/Minutes/2018/2018-04-23-pwg.html

Resolution #1: last week’s minutes accepted

1. affordances TF meeting

Tzviya Siegman: The affordances task for met on Friday. Jasmine, Mateos, anything to report?

Jasmine Mulliken: https://www.w3.org/2018/04/26-pwgatf-minutes.html

Jasmine Mulliken: https://github.com/w3c/wpub/issues/143#issuecomment-374888961

Jasmine Mulliken: We have the minutes for that meeting. We went through the affordances list. We decided that we would approve the template that Zheng Xu made. We’re using this template so that we have something to talk about in the face to face. We have a couple already drafted out. Time-based media and text, or offline reading capabilities. Feel free to comment or add to anything here

Tzviya Siegman: There were a few items we were possibly going to close… Do we want to discuss now?

George Kerscher: Do you need the template posted somewhere? I sent it to the list.

Jasmine Mulliken: The template is sitting inside issue #143 - which should be in the spec for affordances.

Tzviya Siegman: Would it help for others to have it outside github? Where it is just text?
… Ivan suggests it gets put on the wiki.
… Hopefully we’ll be adding to these affordances - like Jasmine suggests, we’ll have a robust set of affordances to discuss at the F2F. It should be on the agenda for next week. The next item is the gaps from ‘classic’ epub and our existing documentation. Hadrien opened an issue…

2. Gaps from “classic” EPUB

Tzviya Siegman: https://www.w3.org/publishing/groups/publ-wg/Meetings/Minutes/2018/2018-04-23-pwg.html

Tzviya Siegman: What we need to do is break this down to a number of issues and identify where they go. We’re not duplicating terminology, we’re duplicating functionality. We need to figure out where things go - infoset, navigation, etc. Once we figure that out, we can figure out which repository things belong and where it belongs in the document. We decided that lots of things that could be debated, so we’ll start with a discussion today about how to split things up.

Tzviya Siegman: https://github.com/w3c/wpub/issues/176#issuecomment-382634697

Tzviya Siegman: There is the issue - with the GAP analysis. If anyone else finds gaps, add them. So we need to go through and determine if they go to WP, PWP, or epub4 the EPUB infoset contains a link element that’s currently missing from the WP infoset (this is tied to #159, #162 and #163 as well)

Tzviya Siegman: The first item - Hadrien THANK YOU - the epub has a LINK element - which is what enables externalized metadata in epub. It can do lots of things in HTML. I’m not sure this is missing, and we don’t forbid this.

Hadrien Gardeur: I’ve created a bunch of issues related to linking before the infoset. It isn’t just external metadata - it’s also privacy policy, search, etc. To comment on what you just said - I think it’s conceptually where do items belong. There was a comment ‘ when you add to an HTML resource, you’re adding for that resource, and not for the whole publication.’ For example, if you link to a privacy policy, it relates to the specific resource. But if you have something in the manifest, it relates to the entire document. It could be handled in HTTP headers, but we don’t fully have that today…
… The way I wrote - not the infoset - but the web IDL part of it - the web IDL already has that, mostly because I needed that for other items. Web IDL has it, but infoset does not - or it is very vague about it.

Hadrien Gardeur: WebIDL is different than serializing though, it’s half-way

Ivan Herman: I feel a bit uneasy because the link element is obviously something that might come up - when we get to the manifest but my understanding is that this issue would look at the infoset level. I would prefer to concentrate first - just as we always did - is what are the infoset elements/items that are missing compared to what epub does - and then how would we serialize it. Yes, there is the item of using the link element, but that can come later - lets not mix the two things.

Tzviya Siegman: I was going to say - if we focus on the technology of the link element - link is the way in which you accomplish X, Y, and Z - we need to then understand what LINK can accomplish in epub. I don’t care that link is the spec’d element, what I care about is what is the functional role.

Hadrien Gardeur: And I think thats in the other issues I’ve raised - but it also enables much more than those. Linking and the ability to specify media types and rel values.

Tzviya Siegman: Right, like Ivan was saying - what is it that our infoset needs to accomplish.

Ivan Herman: The 3rd item is examples for features that are in epub and we have to be very careful that these would be in the infoset as well - some way or another. For example, metadata. Those are the issues we need to focus on. The first goal is to have our infoset finalized.

Hadrien Gardeur: For link - I think what we’re missing from our infoset, we’re missing a way to reference a resource that is not in the reading order. We cannot reference something on the web that is on neither of the lists. Metadata records, a search service, a privacy policy… In general we lack the ability to reference something that is not IN the publication, but referenced by the publication.

Benjamin Young: We have the link tag in HTML for referencing stuff for the browsers understanding of anything included, but not presentable - hypermedia semantics, can be cached, etc. It is afforded for - and available at that layer. It does have a correlation in the web, and we can take advantage of it.

Hadrien Gardeur: I think you are going in a circle by saying that. Earlier today I already mentioned a comment about that - if you reference something from a source, the rel value is the relation between the source and the resource - but from an infoset, we don’t have a way to note it on the document level.

Garth Conboy: http://www.idpf.org/epub/301/spec/epub-publications.html#sec-link-elem

Tzviya Siegman: We need to be clear about what is needed and where - so this isn’t about missing a link, it’s about what functionality it represents and where it should go

Ivan Herman: at the moment - we need to separate the infoset and the specific serialization. I want a high level concept as to what it is we need - what is the functionality that epub requires… The question is - do we have to say in the infoset, that the infoset should refer to a DC element … OK - these are the questions we have to decide. This moment I don’t care about the LINK element - if it’s in JSON, or HTML, etc…

Hadrien Gardeur: I was just concerned with Benjamin saying that it was already covered - that link is in HTML therefore it’s covered.
… I was listing what was in epub…

Garth Conboy: Tzviya - should the issue be relevant to the link element? Or should it be related to the functions?
… The question is how to accomplish the link functionality in our infoset…

Benjamin Young: Just one quick thing - a question about data modeling - we need to be clear about what we’re expressing. Epub’s datamodel is one thing, and we should not focus on how it’s expressed, but the functionality of it. Whether the whole publication brain needs to know the relationships is a serialization thing. Lets focus on model not serialization.

Garth Conboy: So this is a WP scope issue?

Tzviya Siegman: I would agree…

Hadrien Gardeur: the list was built by looking at the spec - and there is no infoset in epub 3.2 I did list things like DC and the serialization. But we can look at it and break it out into what goes into an infoset. As long as we stay away from how things are serialized, I don’t think there should be any controversy.

Garth Conboy: Anything in the OPF metadata section would be the infoset.

Hadrien Gardeur: Mostly yes, but for epub, it’s spread a bit everywhere. There are things in the vocabulary list, spine, manifest, etc, that are infoset related.

Tzviya Siegman: The next item - no explicit support for a list of REL (related items using the rel tag)

Peter Krautzberger: sorry, I have to drop off early.

Hadrien Gardeur: to clarify, in epub, the approach is to have a controlled vocabulary. Since we’re on the web, we shouldn’t be doing that, but using a link registry. For now we’re not talking about linking. I think it’s irrelevant to how we serialized, but what’s the common vocab we’re using for linking. We use it for lots of our items.

Ivan Herman: and we should avoid inventing our own.

Tzviya Siegman: it’s not a good idea to create our own, and we should use standard vocabs

Hadrien Gardeur: for the linking part - there are things where there are not equivalents on the web - such as cover. There might not be a rel value - like an icon - where it’s similar but not the same. As part of this work, we might also need to identify what is missing.

Benjamin Young: I wanted to note that we’ve been assuming linked data and using vocabs that exist - they can be done in HTML as well as JSON, so serialization is not a factor here. That’s all quite doable. I don’t think we’ve brought up any issues that aren’t actually craftable.

Tzviya Siegman: Other than cover - what invented rel values did epub create?

Matt Garrish: We had record… We deprecated most in 3.1. We left record and alternate. I think we’re mostly aligned with web-standard rel values.

Hadrien Gardeur: I’m not sure there’s an exact equivalent for “record”

Tzviya Siegman: Especially based on the link, we should be able to find reasonable rel values. But we do need to create/find web equivalent solutions and open an issue for WP (which will affect PWP and epub)

Benjamin Young: Rel values can be URLs so they can be extended. There are also default contexts that put schema.org, so you can do context mapping and make it extensible.
… We should not forget that this is well trodden ground.

Benjamin Young: example of a URL-based rel

Hadrien Gardeur: Just a quick note - referencing a URI is a little better, but not the same as using a value that exists in a registry. If we use something that exists in schema.org, we will need to have a list or registry for what is being used. You could use dozens to represent cover, but we need to be careful. Going in that direction is nearly equivalent to what we’re doing in epub.

Ivan Herman: I’m looking at the link definition of the REL value in the LINK element of HTML5 - and it doesn’t say we can have a URL. I am not sure that statement is true.

Benjamin Young: RDFa did…

Ivan Herman: that becomes a very slippery slope. That may get extremely strong pushback. We have to be very careful of that dependency as we may shoot our own foot.

Hadrien Gardeur: This is actually serialization specific. Depending on the serialization, you’re allowed to use this as an extension method. Part of serialization - there are some that let you have one, and some that let you have many. We cannot say that all link elements have multiple rels.

Tzviya Siegman: We’re getting into too many details. We’ll create the issue and go from there.

Hadrien Gardeur: I think this is getting serialization specific again

Benjamin Young: Garth said something in /me about cover being a manifest and not a rel issue. The question is about data modeling - are we putting the vector of the rel model a thing about a thing or are we doing it directly. Is the cover - by definition - a rel value pulled by some registry, or is it direct. There are formats that different in how much relationships/relating they can handle, but we need to figure out what can be expressed - and what is core to the meaning of the object. And if we want to lean on rel being the key.

Garth Conboy: There are a couple of other - unused rel values - in our vocabulary. And there are external metadata ones. The cover issue is a unique one. Is this linked into the first issue?

Tzviya Siegman: We already have a requirement for cover image - so it should be already covered.

Bill Kasdorf: It strikes me that there’s a difference between the epub and the WP - in WP the cover might be a rel value, but in epub, it might be a property

Hadrien Gardeur: sorry Bill_Kasdorf, but there’s no reason why cover image should be handled differently in WP vs EPUB4

Bill Kasdorf: does a WP actually have to have a cover image? I think only an EPUB has to.

Ivan Herman: What this whole rel thing says - is that the rel thing should define the type of the external reference. There should be a mechanism in the infoset. I think that’s all we have to agree on.

Tzviya Siegman: We’re up to the Dublin-Core terms…

Ivan Herman: The question for each of these terms are that the infoset should have an item which identifies the publisher of the publication. That may be already there. We look at these functionality - what DC gives us. If these are required, then there needs to be something to define these. Whether we use DC or not, it’s not on the infoset level.

Hadrien Gardeur: Once again, I’m not going to talk about serialization. Metadata items are often used by software and not necessarily by the publishers. We need to look about what is actually used, and if they are used often, we can reintroduce them at the infoset level. Some might only be in epub.

Tzviya Siegman: We haven’t talked about the extensibility model - but we haven’t talked about the required fields. No one is going to exclude you from including things like publisher. You can add as much data as you want, just not inside the publication.

Garth Conboy: What did you mean about the last item?

Tzviya Siegman: if I have information someplace on the web - and I point to it, the information is still there, it’s just not in the manifest.

Garth Conboy: all of the DC are things we can include in the epub manifest, but should they be external in the WP world?

Bill Kasdorf: thats what i was getting at?

Hadrien Gardeur: extensibility is another item - but we should be careful about that. While exploring 3.2 I saw that for title, it’s possible in epub to say how it should be done using the opf file. Because this never existed before - the community using Calibre used their own elements. So we ended up in a situation where people handled the same things differently. We risk fragmentation by not addressing them.

Dave Cramer: A quick comment - the OPF attributes will not survive in 3.2

Hadrien Gardeur: yes/no. For people writing software, people might be expressing it differently. It might be easier on the spec side, but it makes it more difficult on the browser/reading system side.

Tzviya Siegman: Are you proposing in WP we are explicit about how to create every metadata element?

Hadrien Gardeur: I am not saying we should do that for every element, but we should look at what is widely used. If something is widely used, but we dont’ address it, i would consider it a bug. Early on in epub, we didn’t define how cover was used, so people were defining it in different ways…
… We can reach out to communities to get data.

Tzviya Siegman: If Calibre for example - if they aren’t going to support or participate in WP, if their trajectory is epub, I’m not going to factor in their use of epub…
… We need data to know what’s going on. We need to know where WP is going to be implemented, or else it will be a guessing game.

Hadrien Gardeur: they might not do WP but they might for epub4… During the 3.1 cycle, one of the reasons we went back and accepted more metadata was that we got a pushback from other communities. So there is some data from the 3.1 tracker.

Tzviya Siegman: What are we proposing to do for this specific issue. Anyone who creates a reading or authoring system who has information about what DC elements should be required or expressed, any feedback would be helpful.

Tzviya Siegman: https://github.com/w3c/wpub/issues/176#issuecomment-382634697

Zheng Xu: I agree to expose title, publisher, more explicitly into the manifest. One reason is that one thing we are facing right now is that we receive metadata from publishers that sometimes come in different formats. When we want to display the cover image, or publisher information… there is some information that we need to display. We need to correspond with CMS data as well.

Deborah Kaplan: I don’t have an opinion if it should be in the infoset or not - but I have a question. We talk about publishers and reading systems - one question is ‘are we worried about librarians and archivists not being part of this.’ If the only 2 players are reading systems and content creators, then there are one set of needs, but if we’re worried about how libraries work, that’s another issue. Dublin Core is a simple set of metadata. It’s small to make it sustainable. I don’t have opinions myself - but the question is, are we only worried about traditional publishing, or are libraries and archives also part of this.

Tzviya Siegman: Yes, we are concerned. But, this is the solution we have to come up with.

3. Goals for the F2F

Tzviya Siegman: our F2F is in a month. I’m not sure we’re meeting every Monday - as the TPAC and the epub summit is in 2 weeks, then there is a jewish holiday. A quick update, how are people doing on meeting goals for F2F. Affordances is on task. WAM taskforce?

Tzviya Siegman: https://docs.google.com/document/d/1Qe8Q8wMC1LKy_-JO-UCy8bFw4D4VN0si1Q5EPW9c-rY/edit?usp=sharing

Tzviya Siegman: If you are working on something for the F2F but not going to be ready, please let me or garth know ASAP.

Hadrien Gardeur: I’m willing to take the list I started and turning that into something more infoset related. I will point out where we have gaps in serialization. I can then add my personal take if it should be WP, PWP, or epub4.

Garth Conboy: Tzviya - you said 3 issues a while ago. External metadata, one on DC? Then what?

Tzviya Siegman: I was saying we got through 3 bullets.
… Any other business? We hoped to talk about offlining - a few people had homework…

Benjamin Young: I wanted to clarify with WAM - we haven’t had any calls about the WAM taskforce, as we had some foundational items we wanted to iron out first. Until that’s resolved, we can’t really look at it more. We don’t want to soldier off until there are direct asks to make. we have routed it to github issues.

Ivan Herman: Hadrien - just one request. I think it would be better if you put the issue and your personal recommendation as a separate comment on the issue.

Ivan Herman: for the time being lets not address serialization


4. Resolutions