Publishing Working Group Telco — Minutes

Date: 2018-11-12

See also the Agenda and the IRC Log

Attendees

Present: Juan Corona, Wendy Reid, Tzviya Siegman, Gregorio Pellegrino, Wolfgang Schindler, Deborah Kaplan, Ivan Herman, Caroline Hayes, EvanOwens, Dave Cramer, Zheng Xu (Jeff), Jeff Buehler, Benjamin Young, Romain Deltour, Jun Gamou, Joshua Pyle, Avneesh Singh, Karen Myers, George Kerscher, Tim Cole, Bill Kasdorf, Garth Conboy, Mustapha Lazrek, Yu-Wei Chang (Yanni), Chris Maden, Charles LaPierre, Brady Duga, Daniel Weck, Leonard Rosenthol, Luc Audrain

Regrets: Murata Makoto, Franco Alvarado, Rachel Comerford, Ric Wright, Vladimir Levantovsky, Teenya Franklin, Nick Ruffilo, Marisa DeMeglio

Guests:

Chair: Wendy Reid

Scribe(s): Dave Cramer, Tzviya Siegman

Content:


Wendy Reid: welcome everyone

1. Admin

Wendy Reid: https://www.w3.org/publishing/groups/publ-wg/Meetings/Minutes/2018/2018-11-05-pwg.html

Wendy Reid: minutes are approved
… new members this week!

Resolution #1: last week’s minutes approved

Yu-Wei Chang (Yanni): I’m Yanni from Taiwan

2. origins and web publications

Tzviya Siegman: a lot of people in the group aren’t familiar with the issue
… an overview would be helpful

Ivan Herman: https://github.com/w3c/wpub/issues/321 is the issue leading to this

Tzviya Siegman: https://github.com/w3c/dpub-pwp-ucr/issues?page=1&q=is%3Aissue+is%3Aclosed

Daniel Weck: jiminy wrote eloquently described the history of the origin problem on the web
… and the legacy problems we live with
… and how new things on web are built with origin in mind
… I can summarize how it impacts us
… there are JS apis in web browsers that will rely on a set of security rules
… and there is functionality that will be disabled based on those rules, regardless of whether script is used
… in CSS fonts are affected by origins
… it’s about the https, address, port number
… why is that relevant in web publications?
… we have to obey the rules of the web
… and it goes back to EPUB
… where it was underspecified
… so EPUB reading systems used weird tricks to bring stuff into webviews
… but that’s underspecified
… so most reading systems implement a content server
… with an IP address and port number
… so that when we serve content we don’t enable one publication script to access a different publication
… think of a publication with a built-in user preference
… that preference would be saved in local storage or something
… that should be sandboxed
… and origin is a mechanism for content authors to control what is available to who
… on the web we have to be mindful that origin is important
… and we need to consider the restrictions we put on content
… there are issues around service workers, which depends on scope and is sensitive to origins

Leonard Rosenthol: let me add a bit
… I gave a presentation on this at the first F2F in NY
… when web content is instantiated in a browser or reading system or any UA
… that content is instantiated in a context
… the origin is what defines the security level and infrastructure for the context
… if you’re offline, for example, there’s a null origin
… then it is an insecure context
… when contexts are insecure many JS APIs can’t run. they are not allowed to.
… this includes some interactivity, like access to geolocation
… or microphones or cameras
… but if you have an origin, and the origin is secure (https) then you can access that stuff
… then you run into cross-site stuff. CORS can allow fonts from a different origin.
… many publishers are hosting books for various authors
… you don’t want the script from one pub to access another publication
… but if they’re both on kobo.com, then they are on the same domain and so there’s nothing to prevent from a publication talking to another
… there is talk in the web security task force about suborigins, to address this problem

Daniel Weck: Related issue (discussion about HTTP CORS headers): https://github.com/w3c/wpub/issues/352#issuecomment-435913228

Leonard Rosenthol: https://w3c.github.io/webappsec-suborigins - suborigins work

Dave Cramer: It feels like we haven’t made fundamental decisions. Does the manifest have to be same origin as entry page?
… I’ve had concerns that if WP is just a collection of stuff that are glued together with a manifest, then it leaves open the possibility of someone creating a publication without my consent.
… We should consider whether some pieces MUST be same origin.
… A lot of us have had an EPUB-ish model in mind. If I create a publication at Hachette, it will end up on a Kobo origin. The Web has looked at this a lot, and we need to consider what we’re doing.

Deborah Kaplan: add to what dave says
… not all of everyone’s requirements are compatible, they can’t co-exist
… early on, some requirements are about anthologies, which is a valid use case
… and security and rights means there are limits
… it’s not just a technological limitation; there are competing priorities, and one of them is going to win

Tzviya Siegman: TAG Findings on distributed and syndicated content https://www.w3.org/2001/tag/doc/distributed-content/

Ivan Herman: the question for me is
… what of this is something that should be part of the specification
… and what should be part of an accompanying document on limitations
… the Q that dave was asking, should manifest be same domain as entry page
… if they are on different domains than getting to manifest might be problematic
… but if I publish my manifest with CORS flags set it’s a deployment question

Brady Duga: +1 to Ivan’s comments

Ivan Herman: so this might be a deployment question
… but it’s not something that WPUB defines
… just as HTML does not define these things

Daniel Weck: +1 Ivan

Tzviya Siegman: one issue that we keep kicking down the road is offlining
… which is associated with packaging but is not the same
… and this is related to the boundaries
… which we’ve defined philosophically
… I’m not familiar enough with SWs to know how they operate,a nd that CORS is an issue
… we’re handwaving about offlining, but we’ll have to face origins in reality
… and cross-origin might be causing problems
… so we may have to prioritize our requirements

Deborah Kaplan: Basically, Tzviya is saying specifics of my general point: we have competing and incompatible requirements, and ultimately one of them will win.

Deborah Kaplan: (possibly incompatible.)

Benjamin Young: issue 104 around browsing contexts…
… beyond the origin idea, and we still haven’t answered the question about what environment this spec is operating in
… just like HTML is for browsers, but other user agents might use it for PDF generation or something
… in which case the UAs might not follow all the rules
… HTML defines origins and browsing contexts
… are web publication intended to be run primarily in browsers, and thus any user agent should expect to operate in those confines
… and we’d lean on web app security
… or are we going to redefine those things, and craft our own user agent definition
… that uses OWP content, but might not care about origins etc

Dave Cramer: let’s be careful about the effects of our technology

Luc Audrain: +1 to Dave

Tzviya Siegman: +1 dauwhe

Romain Deltour: +1

Hadrien Gardeur: I’m uncomfortable with the way we describe those issues
… a lot of it comes down to how you implement the affordances
… we say offlining or packaging
… very diff issues depending how you implement
… the audio book example… I used a previously hosted audiobook
… the UA is using an audio element in html, so I don’t have issues with origins
… so it’s hard to talk about vague things, as it depends on how we implement

Brady Duga: responding to some things that Tzviya and Dave
… I”m worrying about limiting what we can do based on how things work today
… like if we restrict what we can do based on the security model of service workers, and then that changes later
… I might want to have stuff in another doc on the realities of deployment today
… to what dave said about white supremacists
… that’s an abuse of origins, and is more about rights management
… and it seems dangerous to use a security model to enforce rights

Deborah Kaplan: there are tech limits and functional requirements
… they’re different and they get conflated
… we need to keep them separate
… the set of requirements: what must we allow, what must we forbid
… I want to move away from Dave’s specific example
… and say that there might be a functional requirement that you can’t take your stuff without permission
… and there are technical limitations
… so we must make sure our tech enables our functional requirements
… my Inner Ivan says that the technology today is all the technology we have. We have to write around today’s technology
… my interpretations of Dave’s statement, wasn’t about the copyright, but we can’t just be building tech and pretending what anyone does with it is not our fault
… we don’t want to be Oppenheimer
… let’s be responsible about our specs
… and let’s be clear about functional requirements, and distinguish them from the technology that enables them

Ivan Herman: what is, if any, the next step re: origins?

Garth Conboy: is the inner or outer Ivan proposing that we don’t put origin stuff in our spec?

Ivan Herman: fundamentally, yes
… there are specific questions, because origins is part of our text on retrieving the manifest
… the other question which we must have, are there security issue that we must mention and describe
… I know this is a horizontal review requirement
… and neither the inner nor outer ivan has any idea

Tzviya Siegman: I want to ask the people in the group who understand this to contribute. I like Deborah’s idea of functional requirements
… we don’t need a task force, but there is some work to be done
… just a list of stuff

Daniel Weck: Sure :)

Dave Cramer: sure

Daniel Weck: I’m sure Jiminy will be delighted :)

Zheng Xu (Jeff): I understand origin is a big issue
… I wonder if we can get some requirements we can put this origin stuff into a spec
… will it be helpful to create task force?
… if we put each book ID into subdomain we remove the issue

Tim Cole: related to this
… part of the WG’s job is to talk to other WGs
… Leonard mentioned suborigins. We should probably provide them with use cases
… we should probably talk about how a more granular CORS would help
… I want cross-origin with these domains, but not those domains

Leonard Rosenthol: +1 on de-origining

Tim Cole: a task force to look at this, but to also inform other w3c work. I would be happy to participate.

3. Tables of Contents

Wendy Reid: OK
… let’s move on to TOCs. Some people should get together to talk about this

Tzviya Siegman: let’s let JuanCorona show his stuff

Juan Corona: https://github.com/jccr/toc-outliner

Juan Corona: I do live demos too much :)
… (shows his screen)
… I have an algorithm
… I got inspired by the h5zero project which implements outline
… before I had a treewalker
… walking the DOM is the first step
… there are some basic similarity with the h5z algo
… you go through children, sibling, then callbacks for entering and exiting nodes
… the first thing it does is keep track of the nesting level
… it finds out if we are entering a list element
… I read the github issue about this, and we agreed that TOC out there in the web are typically using ul/ol and li
… there was an idea to use headings
… this is purely about list items, but it’s flexible, you can have lots of divs and spans
… identify list tags and save nesting level
… it identifies implicit paragraphs and nesting levels and links
… I have some test markup
… there are lots of edge cases
… and I don’t know what data structure people expect
… but first I’ll show the cool stuff
… I collected some samples from the web
… here’s the wikipedia ebook article
… here’s an epub3 nav
… here’s one of our meeting minutes
… it’s not that hard to get a structure out of this
… we talked about SpaceJam
… this is where the algo starts falling short
… I can print out the nodes in raw form
… with a list of anchors you can extract alt text, but it doesn’t do that now
… let’s look at list item descendants, and a span with anchor inline with the span
… (more details of algo)
… I only spent a few hours on this, it could be made more robust, and try it with more examples
… I’ll gather my thoughts on where I found issues

Deborah Kaplan: I have a Q
… this is cool
… what is the takeaway for the WG writ large?

Juan Corona: the takeaway is that we got some results. This is all on my github. some of the samples…. we have a lot of flexibility, more than in EPUB and we can still get stuff that’s workable and is still machine-readable
… that’s the takeaway. You can do this.
… it can be refined.
… and we can loosen the rules in the EPUB3 spec, and still get a machine-readable TOC

Wendy Reid: we are one minute from the end

Luc Audrain: q about the algo
… in epub we use h1-h6 for titles of sections

Ivan Herman: https://github.com/w3c/wpub/issues/291#issuecomment-437571943

Ivan Herman: we need more on the structure of a toc, and the extraction algo must be specified. this is great because we need this.
… but the issue refers to other things. We should look there. I tried to find consensus.
… this is a very important issue.

4. next F2F meeting

Garth Conboy: May 6-7, 2019, we have space at Google in Cambridge, MA in Kendall Square for a F2F. Hotels are expensive, even though it is after the marathon.

Garth Conboy: 6-7 Monday, Tuesday.


5. Resolutions