Publishing Working Group Telco — Minutes

Date: 2017-09-11

Attendees

Present: Dave Cramer, Ivan Herman, Wolfgang Schindler, Tzviya Siegman, Nick Ruffilo, Avneesh Singh, Chris Maden, Jun Gamou, Lillian Sullam, Ben Schroeter, Matt Garrish, Romain Deltour, Deborah Kaplan, Tim Cole, Baldur Bjarnason, Wendy Reid, Vladimir Levantovsky, Rachel Comerford, Benjamin Young, Charles LaPierre, Laurent Le Meur, Bill Kasdorf, George Kerscher, Marisa DeMeglio, Jeff Printy, Garth Conboy, Brady Duga, Hadrien Gardeur, Harriett Green, Reinaldo Ferraz, David Stroup

Regrets: Ric Wright, Peter Krautzberger, Cristina Mussinelli, Murata Makoto

Guests: Ralph Swick, Wendy Seltzer, Jeffrey Yasskin

Chair: Garth Conboy

Scribe(s): Nick Ruffilo

Ivan Herman: We have at least one new member on the call - I think - Wendy.

Tzviya Siegman: Welcome wendy, please take a minute to introduce yourself.

Wendy Reid: My name is Wendy Reid, I’m a QA analyst and Kobo. I also have a side-interest in content rendering, so I have an interest in that as well.

Ivan Herman: “We have Ralph as a guest.

1. minutes approval

Tzviya Siegman: https://www.w3.org/publishing/groups/publ-wg/Meetings/Minutes/2017/2017-08-28-minutes

Tzviya Siegman: Lets move on to approving minutes. Any comments on the minutes from last week?

Tzivya: Minutes approved.

Resolution #1: minutes approved

Garth Conboy: Hopefully you can all hear me…

Garth Conboy: We have a great… I see people on the IRC. I take it Jeffrey more timely - did he arrive on time?

Jeffrey Yasskin: I am here…

2. Web Packaging introduction

Dave Cramer: Web packaging explainer https://github.com/WICG/webpackage/blob/master/explainer.md

Garth Conboy: Tzviya is opt to point out, we’ve spent much time on the first working draft on Web Publications, if you looked at the schedule, we should have the first working draft of the PORTABLE web publication in TPAC time. I’ve asked a colleague of mine from Google who has been doing work on the web packaging arena - which is a candidate technology - as we look at what packages look like. I’m introducing Jeffrey Yasskin - who’s been working on search front end, …, he’s also chaired web-front end group, which lead to working on the chrome platform team… He’s just done alot of stuff, so with that, I’d like to have Jeff do an introduction and talk about history and where web packaging is now.
… where the constituent pieces are being developed, open up to Q&A then move to next topics.

Dave Cramer: https://en.wikipedia.org/wiki/MHTML

Jeffrey Yasskin: I’ve been working on packaging since the beginning of the year. The project came out of our emerging markets - a system might have an expensive or limited data plan - so there is peer-to-peer data sharing. Our team wanted to share web pages in the same way. We’re using MHTML as a stop-gap. MHTML is for a single page, not a web-app. Then I came along and took over the standards effort for that.

Jeffrey Yasskin: https://github.com/WICG/webpackage

Jeffrey Yasskin: I’ll be talking over the github repo that I just posted above. This is where incubations for new web APIs happen. A couple people from this group have already made comments. Every change to all of the proposals is happening here. We put together an initial format for the packages and have started trying to shop it around to other implementors to see what people thing. Because it’s a format, and it sits tightly on top of HTTP, i figured i’d involved the IETF as well. I brought it to their meeting in prague and we fleshed out the use cases, and tried to split into two layers…

Jeffrey Yasskin: I’ll be bringing it back in singapore soon. A way to have a single request/response and how to bundle that. We already have that. For web publications, you will end up with a website that is a publication, so if you stick it in a packaging format, it should just work. We need to figure out the details - such as a set of top-level metadata around the app manifest. Will the metadata fit into that - if not, we can talk about how to change things. I think thats the overview. Do you want the format itself descripted? it is not final….

Garth Conboy: I would suggest that we go the next level down - handy waving explainer - on the format.

Tzviya Siegman: https://github.com/WICG/webpackage/blob/master/explainer.md

Dave Cramer: http://cbor.io

Jeffrey Yasskin: If you’re in the github, there is an explainer -and you can look at the proposal or the overall format. The current sketch is that the whole thing will be a CBOR (binary version of JSON with a few extra features) - it has a sequence of features - content and manifest - where the manifest is actually - well we need a different name for that - there’s an index of sections pointing to the offset of the file. There’s an index that’s pointing at the items. It holds the request headers, and offset, and the length where the response exsists. This all might change based on the IETF discussion.

Ralph Swick: https://github.com/WICG/webpackage/blob/master/explainer.md

Jeffrey Yasskin: The request is where the interesting stuff happens. It has a set of signatures and a certificate on how to trust those signatures. Then there is the manifest - which is the app manifest - and a set of hashes of the sub-resources. There can be a set of hashes for each resource. The thing that is hashed is the concatenation of the request headers, the request, and the body. Two of the three only find the body, the 3rd allows the signer to sign the headers and possibly the URL. I’m expecting the proposal to sign everything. That would allow you to prevent something to be placed at a different URL and you to expect users to trust it.

Jeffrey Yasskin: We also want to consider it similar to a server certificate type, if we cannot do this, we’ll try to do a different signing or certificate type. We allow other signers to vouch for the package, so for instance - if your package wants extra permissions beyond general web, you might want someone who has done a static analysis of the contents, then that would let the user trust the contents.
… An author might be able to represent themselves, instead of a domain. In addition to whatever domain it’s being served from.

Jeffrey Yasskin: For the security of the signatures, i’m expecting to derive the signature algorithm from the public key - instead of letting the attacker to put in their own key. For people who are pushing systems like Jose where you push the algorithm next to the signautre. I think that’s the overview. We do some technical things in order to make it safe to sign with TLS certificates. We use the same encoding of the bytes that TLS does.

Jeffrey Yasskin: We’re going to have some mechanisms for sub-packages. It might also require you load an iFrame from a different origin, so you’d need to embed their certificate. that part of the format is the least nailed down. I expect you’ll just stick the resource next to your own resource. There would be a tree of sub-packages living somewhere in the main package. We have to defend against downgrade attempts on sub-packages. You might need to get the latest sub-packages. You might request that sub-packages have a specific hash or “not before” which might allow others to combine packages with yours. You would then just include a hash.

Jeffrey Yasskin: We’ve talked about external sub-packages, where you include a link to the sub-packages. Electron developers are excited as you can include a link to electron itself, so you would not have the user re-download electron if they already have it.

Garth Conboy: There are a few on the queue, but here are a few questions that I think epople would like more details on. In the classic digital pub, we have epub - which is a static set of content zipped up. What drove to the overall approach of a request/response instead of a zip. What drove to this architecture?

Jeffrey Yasskin: The two big qs? “Why not zip” -> security vulnerabilities, and each file is identified two different ways. You can have two copies, and the index at the end tells you what to use - one is security, and one is performance. There was an android attach called the master key invulnerability. Once piece was verifying based off different sets of keys. One could require implementations to get it right, but better not to have it. Having the index at the end of the file means you cannot stream it - so you have to load the entire file first. Some of the usecases would much rather have the browser load resources as they come - or maybe truncate and see what the browser asks for.

Jeffrey Yasskin: We’re also indexing by URL and request headers, rather than filename - zip is filenames to file contents - in order to handle the web, you would need to handle all the request headers somewhere in the filename. Or pair the request headers with the body - all of this is doable, but made zip a non-natural fit.

Garth Conboy: Maybe 2-3 sentences on where you are now with the IETF, what’s likely at the W3c…

Jeffrey Yasskin: The IETF - following their suggestions - there maybe 2-3 pieces of the web packaging proposal. The first is signed request/response pairs. The second is bundles of request/response pairs. That may happen at the IETF. The 3rd piece is a description of how web browsers load the package - which will happen at the W3C. I have the beginning of a sketch on how that would look but it’s probably not worth looking at yet. We will need a specification on how to load the pages. In Singapore in november, we will present again. There has not been too much response to the latest email post.

Hadrien Gardeur: Two questions - as you said, the design is very different from zip, as it has to package resources on the web. We have two use-cases, one for content on the web, and the other is to distribute a package that is not on the web. Are there any thoughts on what can work for the non-web use-case? The package is supposed to work like the web - but i’m wondering on the client side, is this supposed to work without explicit support from the browsers? Can this work with existing technologies - behaviour should be like a proxy, no?

Tzviya Siegman: DPUB IG Use Cases and Reqs https://www.w3.org/TR/pwp-ucr/ (input doc for this group)

Jeffrey Yasskin: Taking your questions in order, I have not thought about how to use this for publications that do not exist on the web. If you have examples of a publication that you’d want to do package that doesn’t exist as a website, I’d be interested in seeing it. It isn’t my top priority, but it would be good to have. The second question [repeated question - on client side how do you implement] - the obvious way is to build a service worker, and then inject it into the service worker cache. That does 95% of the package. What can never be poly-filled is installing the service worker to begin with. If you are offline, there is no way to get a trusted service worker - so that is one way where a signed signle response could replace the standard web packaged format…

Jeffrey Yasskin: You could then also load the package that way and initialize the service worker cache. Some people in the chrome working team want us to represent all packages this way. When you open a package is it part of the network layer - where the browser makes request and so the package layer makes the requests, or do we represent it as a way to bootstrap a service worker - that installs - then it gets an event saying ‘here is your package’. We could polyfill the package loading in a service worker - saying ‘do this service worker, then load packages’ we’ll do that to test the format, but it’s not done yet.

Jeffrey Yasskin: As for clients - non-browser clients are not my top priority. It will certainly be possible to unpackage a package. In the github, there is a go program that can package and unpackage things - so it’s straightforward to package/unpackage. That way we can fuzz it and everyone who is loading packages has a trustworthy way to do it. That’s what non-browsers will have to do.

Hadrien Gardeur: In that case you’d need to rewrite the request or how things are loaded - the URIs…

Jeffrey Yasskin: It would need to respond to the request using stuff from the package. One thing I need to standardize is that - given a browser request and a package request, how can they be the same. That should be part of the library we build. When you build a package, you pull a set of requests out. When you need to provide a resource, you look at the cache.

Hadrien Gardeur: In a way, it would work like the cache layer of the browser, but if you’re using a webview, it would be very difficult to go between a web-view and the web. Most dedicated reading systems use a web-view.

Jeffrey Yasskin: I’m not intimately familiar with the web-view APIs. In android you might be able to use a service worker inside the web-view to be the proxy. In iOS i’m not sure what resources you have. You migth have to re-write everything to use these…

Garth Conboy: Jeffrey, an example of a publication that doesn’t live on the web - i’m happy to send you an epub - as any of them will fit.

Nick Ruffilo: signing with URL might cause peer-to-peer issues?

Jeffrey Yasskin: Saving a file for offline use - could create a non-signed package - with some notes saying: ‘this is not trusted’. The signed one would require that the original site to sign the package, and then allow the user to download and share that newly signed package. In that case, you’re getting content that is a message through the original source -> one way hops.

Jeffrey Yasskin: Each resource within a package has a physical and a logical URL. The physical is where you get it, the logical is where it is signed as.

Ivan Herman: My worries is that the way these things are deployed as that people who are not closed to the system, there is a close reliance on certificates. It is … the whole user interface of the managment of certificates is poor. We have self-publishers who create a book and put it up on some providers website - I don’t see how they would be able to do all the signature stuff because they have no access to these things.

Jeffrey Yasskin: There are two answers - one is that the author could get their own domain, get a cert for that, and hand the signed book to the publisher as a unit. Because of how the package is designed, the publisher can send subsets of a package to a reader. The second option is the author could send the content to the publisher and the publisher signs the content…

Jeffrey Yasskin: We will need to help teach people how to prepare files - but there will be straightforward ways to get things done.

Wolfgang Schindler: +1 to Ivan

Ivan Herman: There is need for a more straightforward way for generating certificates. That may become a major hurdle in deploying these things.

Jeffrey Yasskin: I expect that some web servers will build these things in

George Kerscher: I would envision normal people to author documents, e.g. a teacher in a school prepares a document for the students. The complexity would be a big issue.

Ralph Swick: you answered my first question: How “friendly” are this specification and Service Workers?

Ralph Swick: can a single package be served to different URIs?

Wolfgang Schindler: An official signature typically has to be paid for which would be a restriction for potential authors and small publishers

Ralph Swick: My second question is a variant of the question Nick asked - if i want to serve a file at vastly different URIs, how difficult is that. Or would the logical/physical matter here.

Jeffrey Yasskin: For service workers - were considering adding a service worker event about getting the package and adding it to the cache. The package can certainly be served from different URIs, the interesting question is - can a package say that a resource could be service from any of these 5 logical URIs. You could index the same URI different times with different request headers. You would then need to have different certificates… We haven’t thought through the reasons or how to do it for having multiple locations.

Garth Conboy: I want the last 5-10 minutes to dish out guilt, so lets take hadrien’s question and that be the end of this topic todya.

Hadrien Gardeur: You mentioned that the chrome team might use service workers. Will chrome or other browsers ship with a default service worker or default network policy? That would make it possible for packages to work….

Jeffrey Yasskin: That’s one of the proposals that our loading team has suggested, that we’ll provide a default service worker for packages. Then none of the rest of the browser has to change. I don’t like that - i prefer service workers be under the control of the origin. We could write something that people could use, but it makes me ‘icky’ to say that we’re going to give a service worker to an origin that hasn’t asked for one.

Hadrien Gardeur: so to get support for packages, we would need to write a service worker.

Wolfgang Schindler: having to write a service worker is a hard job for a team used to publishing in epub format!

Jeffrey Yasskin: Assuming we build service workers in a browser, you won’t need one. But, the loading team is that we might require it. If we do packages in the browser, we should just load the package. If the user has visited the packages, then they have the resources within it.

Hadrien Gardeur: You want native support then in the browsers?

Jeffrey Yasskin: Yes

Ivan Herman: Note https://wicg.github.io/webpackage/draft-yasskin-webpackage-use-cases.html#rfc.section.2.2.1 that includes WP specific use cases

3. Guilts in the group…

Garth Conboy: Thank you Jeffrey for spending time with us, we will thing more seriously about our web package related efforts. We will invite you back. I’m going to turn it over to Tzviya. There is an agenda item with Luke and Baldur - do we want to discuss?

Tzviya Siegman: https://github.com/w3c/wpub/pull/64

Jeffrey Yasskin: Thanks for having me, and please file issues in https://github.com/WICG/webpackage!

Tzviya Siegman: Baldur put in a pull request over the weekend. We can get to work on that over at Github. Lets review that and we can review next week.

Jeff Printy: If anyone is interested in the idea of offline packaging of the web for developing countries, this is an interesting article about the Cuban sneakernet: https://www.theguardian.com/world/2014/dec/23/cuba-offline-internet-weekly-packet-external-hard-drives

Wolfgang Schindler: many thanks to jyasskin!

Tzviya Siegman: https://w3c.github.io/wpub/

Tzviya Siegman: The last item mentioned is to review people who offered to write. Lets talk about re-assignment as there are sections that are completely blank. I don’t want to point fingers, but lets go to section 5 - we just have headers. Even the name of the section is up for grabs. We need someone to write that section. We have also yet to see anything about security.

Dave Cramer: I wanted to mention that the lifecycle of a manifest, and a manifest supply - it is quite complicated. If we piggyback on the web manifest, that would save lots of copy-paste.

Ivan Herman: +1 to Dave

Tzviya Siegman: I agree dave.

Nick Ruffilo: What due dates?

Tzviya Siegman: Need to have the first draft done in the next month

Ivan Herman: Nick was not here the last few weeks - the goal isn’t to have a final perfect text - it’s more to give a kind of evenly spread document that samples all the directions we have to go. We ahve 2 years to work out the details, but we have to know where we want to go.

Tzviya Siegman: I will work with ivan and garth to put together some assignments. In the past this has worked.

Tzviya Siegman: Next week we will hear more on metadata and a mini SMIL note.

Tim Cole: There’s an issue #44 - about how to address locators. I appreciate what people have said there - can we schedule some time to discuss this issue?

Tzviya Siegman: We can add it to the agenda, or we can discuss it between.

Tim Cole: I think it will be interesting - can we add it to the agenda.

Garth Conboy: Thank you all - great attendance, we’ll chat over github and we’ll talk next weekend

4. Resolutions

Resolution #1: minutes approved