Publishing Working Group Telco — Minutes

Date: 2018-09-10

See also the Agenda and the IRC Log

Attendees

Present: Matt Garrish, Avneesh Singh, Dave Cramer, Jeff Buehler, George Kerscher, Nick Ruffilo, Ivan Herman, Chris Maden, Zheng Xu, Gregorio Pellegrino, Ric Wright, Charles LaPierre, Joshua Pyle, Franco Alvarado, Jun Gamou, Benjamin Young, Garth Conboy, Juan Corona, Marisa DeMeglio, Ben Schroeter, Laurent Le Meur, Caitlin Gebhard, Ben Walters, Mustapha Lazrek, Derek Jackson, Teenya Franklin

Regrets: Rachel Comerford, Tzviya Siegman, Murata Makoto, Vladimir Levantovsky, Hadrien Gardeur, Deborah Kaplan, Brady Duga, Luc Audrain, Tim Cole, Romain Deltour, David Stroup

Guests:

Chair: Garth Conboy

Scribe(s): Nick Ruffilo

Content:


Garth Conboy: https://www.w3.org/publishing/groups/publ-wg/Meetings/Minutes/2018/2018-08-27-pwg.html

Garth Conboy: Meeting minutes approved…

Resolution #1: Last meeting’s minutes approved

1. issue 291, TOC

Garth Conboy: https://github.com/w3c/wpub/issues/291

Garth Conboy: Last week, Monday, there was no meeting due to US Labor Day. There was discussion on the call on issue #291 - we got to a semi-consensus that we’d have two - one for referencing and one for machine processing. Those, not including me, hung on the call and that rough consensus evaporated. There has been more discussion…
… the most recent was that this was a proposal that Ivan put together on having a single point of context that could be used to process the renderable one into something that is more machine readable. And getting hierarchy around the lists. That’s sort of where we are…

Garth Conboy: https://github.com/w3c/wpub/issues/291#issuecomment-417554685

Ivan Herman: In some sense, I think we’re in a deadlock. One approach is to have a clear algorithm in the spec that say “this is how - from this HTML over there - the hierarchical menu is created” - which is fine. For the time being, we don’t have that algorithm. My personal belief is that just having an algorithm without some sort of definition of how the toc looks like would be incredibly difficult. If we define some sort of structure then the algorithm should be put in the spec. Then my proposal goes that the way it would operate from the user is that the algorithm would then be the toc; if the algorithm fails, the user agent should simply play the HTML content that is there.

Dave Cramer: I’m an advocate for ‘define an algorithm’ but I’m not a JS expert, although I’m sure we can create an algorithm that creates the hierarchy. I would really like to see what such an algorithm would do to some real-life table of contents before we go much further…

Ivan Herman: Having javascript algorithm that could take any HTML would be terribly complicated and would be about the organization of the DOM… I looked at some examples that are more anecdotal about what Table of Contents are out there, I looked at ones that were generated and not human-created. I think there was only one exception - most are essentially a NAV or a UL or OL with the LI as the children, which is the natural thing to do with a table of contents. The other possibility is to use H1,H2, H3 to specify hierarchy.

Jeff Buehler: I wrote a number of scripts (node JS) that parse out TOCs and create TOCs, generally from spreadsheets and CSVs. There’s some relationship there that might be helpful. It generates TOCs from CSVs.
… The hierarchy is pretty brute force - i’m not using DOM parsing, I’m using regular expressions. There are some tags required for specific elements, but it might be helpful for anyone working on this. It’s been tested over a few years, so it definitely works.

Garth Conboy: I tend to agree with ivan that I am very much in the camp that one of the successful things we did in epub was the NAV file, but it doesn’t sound that view of the world is going to get traction here, but we might revisit in epub4 land. I don’t see any other way forward other than accepting ivan’s suggestion that we’ll have a placeholder for, but there doesn’t seem to be consensus support to have 2 TOCs.
… It seems to me that is where we are. But maybe we do a brief poll for consensus for accepting Ivan’s language with an algorithm TBH to be defined and experimented with.

Ivan Herman: I think we need to modify a bit - we should leave it open and give ourselves a certain amount of time to see if such and algorithm can be created. If it cannot be created, then we need a combination of what’s in epub3, making a reference to this issue - that for me seems to be the best option.

Garth Conboy: The best thing you said is that come back in a month - which happens to be TPAC - so that could be an interesting approach.

Matt Garrish: I was going to agree with Ivan - I am not of the belief that completely random HTML will become structured TOC. The middle ground we might find here is that there is a recommended way of doing a TOC - so YMMV if you don’t go with the recommended structure for TOC…

Joshua Pyle: +1 to recommended structure for TOCS

Matt Garrish: I want to sit on the fence that it’s required of epub but not necessarily epub, and just note there are better ways to do it.

George Kerscher: Would it be helpful for us to gather a collection of recommendations on how to do it? For example, if you’ve got a collection of content documents - how to traverse those files and extract the things you want to put in the TOC?

Garth Conboy: That sort of strikes me as - building a TOC from a whole cloth - this discussion is about how free can we be with an actual doc-TOC but without looking at it from the reading order

George Kerscher: I don’t know how this will be created. I’ve seen TOCs where the names in the doc-toc do not align with the content in your document.

Garth Conboy: we do have a requirement in the current spec that there must have a TOC identified, and it must point to an element identified as doc-toc. If it’s wrong and points to things that don’t exist, that’s the documents fault.

Dave Cramer: The key here is that it’s the author’s responsibility to create a TOC. They know how to best facilitate access to their content. We’re spending time assuming the User Agent needs to do anything beyond displaying the documents table of contents. I’m not sure how this fits into the web publication world, if we take the word web seriously. It’s not standard for a UA to search for nav elements.

Garth Conboy: I think it’s very arguable on either side on the WP side. When we get to epub4, we’ll want the reading system to be able to do things with the TOC, but lets not argue that thing now.

Ivan Herman: I want to make reminders that the TOC is not a required element. if the author doesn’t want the User Agent to do something, they won’t create a DOC-TOC element. We are not contradicting. There is no problem with a creator doing what they want.

Garth Conboy: https://github.com/w3c/wpub/issues/291#issuecomment-417554685

Garth Conboy: I would like to propose consensus on Ivan’s comments, but proposes an algorithm and/or set of constraints we might place on the table of contents if the algorithm is successful - take a few weeks to TPAC, and if we’ve gotten somewhere…

Garth Conboy: +1

Garth Conboy: We would have some constraints, and if the algorithm is not successful then just display the HTML. If that is acceptable to folks, I will go ahead and be the token +1

Jeff Buehler: +1

Ivan Herman: +1

Joshua Pyle: +1

Ric Wright: +1

Zheng Xu: +1

Juan Corona: +1

Marisa DeMeglio: +1

Nick Ruffilo: +1

Dave Cramer: +1

2. issue 276: external resource in ToC

Garth Conboy: https://github.com/w3c/wpub/issues/276

Garth Conboy: The DOC-TOC element - if it can point to resources outside of the default reading order. I will throw out a prejudice that the thing - and Dave is going to come in saying ‘it’s not webby’ - my initial 2 cents is that the TOC should point to items in the resources or reading order.

Garth Conboy: The title of the issue is: “outside the reading order” and additional resources. If it’s a publication, then the TOC should point into the publication as it’s defined by the boundary.

Jeff Buehler: I found that over the years the TOC isn’t always reliable, such as resources not available within. The solution is to create another XML file…

Garth Conboy: I don’t think we want the TOC to have more power, but the pointers in the DOC-TOC, whether the links in there can point only to the items within the publication.

Garth Conboy: it wouldn’t pull all of that section’s links into the TOC (the external links)

Ivan Herman: Any scholarly publication’s structure, yes there’s a section of references, but the TOC includes a reference to the section, but not the external links.

Dave Cramer: This is more a discussion of scope and boundaries than what happens if a link in a TOC goes somewhere beyond ‘inside the publication’ That is a larger question we haven’t really addressed. What I have an external link elsewhere. What happens, how do we show to the user that they have left the web publication.

Garth Conboy: Certainly in web publication land, Nick’s suggestion of additional reading - we do agree that the default reading order + additional resources is the bounds of the publication, so - do we - for the TOC should it only fit within the bounds of the publication. We can’t prevent content creators from putting a link to whatever - but we need the rule if it is ignored or not.

Joshua Pyle: I tend to agree, and prefer for the TOC to be within the bounds of the publication. What we’re doing with web publications is creating the ability to bind different things together. We could mix-and-match pages, but the point of the WP is to bind them together.
… The TOC should direct you within the bounds of what you’ve just created. Things like references, they’re all external links. They should be separate from the TOC. It would be frustrating for a user to click and link and leave the document.

Juan Corona: nice explanation josh +1

Garth Conboy: We’re defining a bounded collection of resources, so the TOC should point within those bounds.

George Kerscher: I’m envisioning reading order in the additional documents to have link relationships to other documents within the publication, whereas when you get outside the publication there is no relationship. I’m in agreement that the TOC defined as such helps solidify this.

Ric Wright: I totally agree with what Josh said - well put. The other comment, more of a question, more relevant here. Isn’t this related to the conversation in Chicago about would we allow resources to be external to the document? As a RS developer, that makes me nervous as it’s a security concern.

Garth Conboy: I agree, moving within the TOC you’re moving within the publication.

Evan Owens: within journal publishing, you have links to where the article was published, so you would be linking out to a different document/page

Garth Conboy: You’re envisioning a page that is a pointer to other publications, but it would be a TOC?

Joshua Pyle: I come from the journal publishing world, and yes, an article would be a web publication. The issue for that article would be a web publication of web publications. In which case, the Table of Contents is - you’re still within the bounds of the issue, we’re still binding that. It could be a compilation - all of that stuff… it doesn’t matter where it lives, as long as the manifest defines it as part of the publication.
… We have to drop the notion of domain - it’s still a single web publication and the TOC is within it.

Ivan Herman: Slightly different question from George if someone wants to respond.

Garth Conboy: I thought it was well summarized as nested web publications, so the parents can point to it’s children, which are part of the boundaries.

Joshua Pyle: An out-of-bounds error :-)

Ivan Herman: Let’s say we decided that a reference in a TOC should be within the bounds. The question is, what happens if someone makes a mistake. What is the reaction? It’s a bit like we already have requirements for a person but if a structure doesn’t have a name, the structure is invalid and taking out of the list.
… Let’s say we have a process that creates a machine readable TOC, and the algorithm realizes that one of the references is out of the doc, what should it do?

Garth Conboy: We clearly have to deal with that, and it comes down to if it becomes a should or a must, and what happens if you violate

Matt Garrish: Probably a must-not is a bit strong, we won’t stop people. It should be a recommendation for the reasons already laid out, it will be confusing to the user and leave it at that. We’re not going to force people, and reading systems are going to have to handle this. Throwing it out is going to be odd, but when it’s extracted from the reading system, it’s not a place we should go, we should just recommend not doing it.

Joshua Pyle: +1 to SHOULD

Garth Conboy: maybe this does inherently answer Ivan’s question, that we can resolve this issue, at least for the moment, that the entries in the DOC-TOC table of contents should points into the bounds of the publication

Proposed resolution: the entries in the DOC-TOC table of contents SHOULD point into the bounds of the publication (Ivan Herman)

Avneesh Singh: +1

Garth Conboy: +1

Derek Jackson: +1

Zheng Xu: +1

Charles LaPierre: +1

Juan Corona: +1

Joshua Pyle: +1

Jeff Buehler: +1 also should, 0 to must

Caitlin Gebhard: +1

Ivan Herman: +1

Franco Alvarado: +1

Garth Conboy: If that is consensus, that is probably suitable for us to get in there and resolve the issue.

Dave Cramer: +1 to should, -1 to must

Ric Wright: +1

Ben Schroeter: +1

Marisa DeMeglio: +1 should

George Kerscher: +1

Joshua Pyle: As we are specification writers, the best we can do is should, must, and must not, but it’s up to implementors to determine what should happen when they come across a should/must/must-not, etc.

Garth Conboy: It’s more incumbent on us to determine what should happen with must and must not….

Ivan Herman: At some point in time we will have to look at this for the document. If there is a should or should-not, then there is a checker that should raise a warning (the difference between an error or warning) so a future epub checker could raise a warning.

Dave Cramer: I think we need to make distinction between HTML and the manifest. We control the manifest, but not HTML. We should not be making any restrictions on HTML

Resolution #2: the entries in the DOC-TOC table of contents SHOULD point into the bounds of the publication

3. naming things, issue 312

Garth Conboy: https://github.com/w3c/wpub/issues/312#issuecomment-415016624

Garth Conboy: The last issue, Rachel summarized in the comment provided, this was an issue Dave raised. Is anyone in a good position to summarize?

Dave Cramer: Because we’re inheriting tons of terminology from schema.org they have set names for items, that don’t always match up with publishing, what we call a title, schema calls it a name. So a book title would have to be a “name” so there is a bit of resistance that the common terms different from what we have to call things in the manifest. Not sure what we can do about it, but it makes our spec harder to understand.

Garth Conboy: Is this an area we need to be more explicit about?

Dave Cramer: Everything we can do in our spec prose to help our readers understand - to express language we use IN language… We should be really aware of as we write text here.

Ivan Herman: To be clear, technically speaking, if we use JSON LD, it is possible to rename those terms to our own liking by extending the wpub context file. I must admit that if we are getting nervous about defining HTML, I get the same about redefining schema.org. Yes we should give full power to Matt to make things more readable, but I would be very reluctant of changing schema.org terms - with ONE exception…
… JSON LD we have @type, @id, @value and @language, and it’s a very common practice in JSON LD, that @type is redefined to use type, and @id is redefined as id. It’s so common that schema.org does it. This will make things easier to port into a javascript structure to simplify the keywords.

Matt Garrish: We touched on that this is something we can improve in our spec language. This is the push and pull of using property tech names and natural language. Our section right now is structured to concepts rather than specific naming. We can flip it around, put property names with the language, and there’s ways of tweaking, but i’m comfortable migrating us away from schema.org. There are ‘read-by’ for narrator, and we should strike up a convo with[CUT]
… that’s probably the most appropriate way to go…

Dave Cramer: I wanted to agree with Ivan - yes, I think in the JSON LD context, the cure is worse than the disease.

Garth Conboy: Much like Ivan said, we need to use the schema.org terms, but we want to be careful in our drafting that we don’t lose the drafting, like the first entry, we live with what we have but we try to be as clear in our language as possible to avoid confusion.

Ivan Herman: we need a resolution on to change, so I moved agreeing on the removal of ‘@’

Proposed resolution: remove the ‘@’ from keywords; leave other terms as they are (Ivan Herman)

Ivan Herman: +1

Joshua Pyle: +1

Nick Ruffilo: +1

Matt Garrish: +1

Jeff Buehler: +1

Garth Conboy: +1

Ben Walters: +1

Dave Cramer: +1

Resolution #3: remove the ‘@’ from keywords; leave other terms as they are

Ric Wright: +1

Marisa DeMeglio: +1

4. TPAC Agenda

Garth Conboy: https://docs.google.com/document/d/1Mt9PTcOdmrCwIsgfxbGMGjwHlUsySU01I0D4oBkSbcA/edit

Garth Conboy: above pasted is the draft of the TPAC agenda. There is an agenda item requests. A fair number are crossed off, which means they were moved into monday/tuesday…
… we will spend more time moving topics down, and some will move around as we co-ordinate with external groups and visitors. We are getting really close to TPAC now, so if there are particular issues we should address in public, please add a comment in here. Please comment and add.

5. incubation

Garth Conboy: https://discourse.wicg.io/search?q=dpub

Garth Conboy: Lastly there was an Agenda item that Tzviya wanted mentioned, we do have a process in the W3C for incubation of ideas. We have had things that are outside of the immediate scope of this group - one is the publishing object model - there is an opportunity to use the process for potential incubation. We certainly encourage people to participate in that process.

Garth Conboy: From Agenda: “We have people in the group who are working on things that would be potentially interesting for the future state of publishing, but the current W3C assumes a formal incubation period. We need to do that in our group as well. We already have a dedicated label in WICG, so we recommend incubation of ideas like the Publishing Object Model and transclusion via iframes there [9.6]. We hope to see projects and demos at TPAC.”

Ivan Herman: There has been lots of discussion in the group about how to discuss ideas that go beyond the group. Such as new ways of handling CSS or HTML - possibly extending those with new features. Even if those require a much longer work and co-operation with other groups, it would be a pity if those were lost. We were looking for some sort of way for these incubation ideas to be recorded and documented. Thats what we were wondering about.

Garth Conboy: We’ll discuss more next week. We are a tad over time, but we’ll have an agenda for next week, and please comment on TPAC agenda. Thank you and bye


6. Resolutions