Publishing Working Group Telco — Minutes

Date: 2018-01-29

See also the Agenda and the IRC Log

Attendees

Present: Teenya Franklin, Tzviya Siegman, Ric Wright, Baldur Bjarnason, Evan Yamanishi, Toshiaki Koike, Avneesh Singh, Laura Brady, Deborah Kaplan, Rachel Comerford, Jun Gamou, Ivan Herman, Dave Cramer, Nick Ruffilo, Wolfgang Schindler, Lillian Sullam, Luc Audrain, Romain Deltour, Evan Owens, Benjamin Young, Joshua Pyle, Matt Garrish, Marisa DeMeglio, Mateus Teixeira, Peter Krautzberger, Hadrien Gardeur, Ben Schroeter, Zheng Xu, Garth Conboy, Bill Kasdorf, Brady Duga, Tim Cole

Regrets: Vladimir Levantovsky, George Kerscher

Guests: Jeffrey Yasskin

Chair: Tzviya Siegman

Scribe(s): Nick Ruffilo

Content:


Tzviya Siegman: https://www.w3.org/publishing/groups/publ-wg/Meetings/Minutes/2018/2018-01-22-minutes

Tzviya Siegman: Review the minutes from last week - any comments?
… Minutes approved - we have Jeffrey Askin here to talk to us about web packaging

1. Web Packaging

Jeffrey Yasskin: I’m here to answer questions - but I can tell you what has changed. Since the last IETF, their advice was to split the spec into two pieces - which is mostly done. Half is about signing an HTTP request/response pair from a given origin (which is what chrome is doing with amp). The other piece is about bundling a set of request/response pairs. That is probably more interesting to your group but less far along. We have a draft of the bundling proposal, but it’s a flat list of exchanges (Req/Res pair) and I would like feedback from you to know what I would need to include to make it useful for you. What is the minimum number of things that you would need.

Jeffrey Yasskin: Bundling PR: https://github.com/WICG/webpackage/pull/98

Tzviya Siegman: Last week there was extensive discussion over IRC - I followed it - I’m not sure we need to reiterate, but Ben/Hadrien, do you have opinions about this?

Benjamin Young: I don’t know if I technically have any problems with it, but there are broad concerns over developer tooling because it is new. There is the process of decoding and encoding. We should think through what the process of putting these together is and what scenarios we need to pack/unpack. With google working on it, it should be primed for text search, but I don’t have concrete asks of the spec yet.

Dave Cramer: I wanted to ask the basic question about the web-application manifest step.

Jeffrey Yasskin: The web app manifest is the top of the bundle - which resources mean what. It has a start URL - to know which page to display. I imagine that a publication could extend the manifest to include additional metadata.

Dave Cramer: There is not going to be - I get confused because web application don’t serve as a manifest in the way of listing all contents within…

Jeffrey Yasskin: Yes - I believe it’s misnamed, but, that is what it is named…

Dave Cramer: You don’t envision this package requiring that sort of thing?

Tzviya Siegman: for reference https://github.com/w3c/wpub/issues/118

Jeffrey Yasskin: The package has a list at the top which lists all the exchanges. It’s a list of the requests of the exchanges and a list of responses. There has been talk from web performance people for a manifest to be available. I haven’t designed that - I’m probably not the best person to do that, but the webapp manifest could point to the resource manifest.

Benjamin Young: The performance thing is interested - a core thing we’re working on in another JSON manifest … would be items in the packages and what is referenced in the manfiest. A key constraint is that it needs to be ordered - so it would be useful for items to be able to get just what they want…

Jeffrey Yasskin: What order?

Benjamin Young: Reading order. but ultimately an intentional progression from front to back. Right now that doesn’t exist. RSS and ATOM have them in a date-based progression, but no other manifest that has linearity of retrieval. It’s not that it’s necessarily a requirement of packaging, but it’s something we’re encoding.

Jeffrey Yasskin: The package does not have any order for the index or the resources. If a web publication would like to say that things appear in a reading order, that would be fine.

Hadrien Gardeur: I don’t mind that the package would be a list of reqs/resp - but how would this work in the case of the WAM - where this could show as the first resource in the package - how can you be the first if there is no order?

Jeffrey Yasskin: There is a top-level index of sections - there is the manifest request/URL - and the index location. You would generally put the manifest URL in the beginning, then you’d look that up in the index. You would want to put it early in the package. Does that make sense?

Hadrien Gardeur: In your list, which is a manifest - you’d have a member of this list, that would be what is the WAM. Once I know that, I can look at the list of resp/req - Once I’ve identified which URI is required - I can do a range request to only get the content I want?

Jeffrey Yasskin: yes

Hadrien Gardeur: The full list of req/resp is a full list?

Jeffrey Yasskin: It’s a mapping of HTTP requests to the range within the package to the response and payload. Right now it has no structure, but if there is a need for structure - such as a btree, we could add that, but no one said that is essential - so the whole index is loaded.

Hadrien Gardeur: A flat list is fine, as it has a very different role - and we shouldn’t try to do too many things at once. it’s a flat list which gives specific information about how to get specific data. The one thing I want to make sure is possible is that - how you identify the manifest shouldn’t be something that only works with the WAM - not just for our own group, but for any group that works with WAM. There should be a generic way to identify the manifest

Jeffrey Yasskin: The top level is named indexes. The manifest key means WAM - It could be named and we could add other top-level names.

Hadrien Gardeur: Have you thought about generic link elements? With ATOM there are generic links that can point to any source, then you have a rel tag. That way resources are actually embedded and you don’t have to create a name element. You can use whatever rel and media type you want.

Jeffrey Yasskin: that makes sense to think of it that way

Benjamin Young: I agree with Hadrien’s concerns - I feel like extensibility is a bit tangled up in that document and it’s use here, so I’d love to see clarification. The flexibility to use a different manifest would be useful. We need to look through how this uses full reqs/resp. I think it’s great that it’s in there, but what are the ramifications and what things from what places can be included. There is an intent to do origin-based signing, but what are the ramifications of those design choices. We’ve talked about being able to combine multiple resources into a package - if that is a thing, that this is that. Or that a lack of that is OK as well - so what does it mean to package things up in this way. So - are we moving the filesystem to the web, or the web to the filesystem. This is coming from the web to a filesystem - so we need to think through that as well.

Jeffrey Yasskin: Are you going to want external dependencies?

Benjamin Young: External in the package - there are scenarios where that is desired - where we get a publication that is a collection of published resources - that would want to utilize resources that are already on the web.

Mateus Teixeira: +1 (external resources)

Tzviya Siegman: I’m sure there are many people who are confused - Ben - can you do a 30 second summary on how this discussion relates to web package and such.

Benjamin Young: jyasskin: use case documents https://w3c.github.io/dpub-pwp-ucr/

Benjamin Young: The epub zip format is the one best known to the group. In it, it has an initial member of a media type, and an XML manifest. The zip package is made off of a hard drive - and items are in order of whatever the zip tool wants to put them. Another thing points to reading order, then one the TOC. None of these have web knowledge or network knowledge. The single file then gets moved around. In this package format - it encodes more than just a file - it includes the initial request of the server to give you - usually a GET request (i want a PDF version, etc) and then it includes the response “i got it this time, etc” and then the body of the response - which would be analogous to the file stored within the EPUB. It would be the URL and the file. It is also - because it’s based on web packaging - has MIME-like tendencies. There is more information about the existence of this thing than we’ve had in the past. What we cannot do with this format is to go from a non-web-connected hard drive and package a thing and move the thing around. All these items need to be on the web and requested and responded to.

Jeffrey Yasskin: You could fake it - but you’d need to do it from a disk (you’d need a signing key) so you aren’t going to build a package simply from the req/resp, but they are there on behalf of the client. Most of the time they will be generated from a disk with extra metadata.

Benjamin Young: so it’s a little more like service workers where it makes a fake HTTP world - so you make fake requests and responses come back. That’s where signing comes in.

Jeffrey Yasskin: We do expect that if you make a web request that they will have something to do with each other, but they do not have to.

Jeffrey Yasskin: I’ve been using the assumption that if you see a request in a package, if you make the same request, you’ll get the same response - but for instance some of the signature verifications, you can update a signature by re-requesting from the actual web. If you are completely from the back-end you wouldn’t need to fake it. You would probably build things from the server, not from making requests.

Garth Conboy: It sounds like where Benjamin was going - that this format the request URL could be a file URL - it’s not really that relevant. it’s not invalid to say: “there are a bunch of resources that have never been on the web”

Jeffrey Yasskin: You could have a bunch of file URLs but i’m concerned about how that would load - but there are ways of generating URIs that are not accessible via the web.

Jeffrey Yasskin: It should work - signing would be weird, but yeah, it should work

Jeffrey Yasskin: I’m also happy to get more comments as bugs on https://github.com/WICG/webpackage/issues

Benjamin Young: It would be great to get exact questions that need answering. What I was going to mention - related to origin signing - that’s one of the most valuable part of this. If we didn’t go with that, we’d miss out about origin-based. In the history of publishing - what a publisher provides is the authority that they went through the trouble of vetting authority - so they put their name and reputation on it. And it’s something the web terribly lacks and this tool provides.

Jeffrey Yasskin: The current design is missing that - the layering that was asked for is sign exchanges - so you could sign a single or bundle of exchanges. A publisher probably wants to sign an entire book - not the individual pages - so it’s about bundling. That publisher signature can be counter-signature - so individual pages are by the author, and the publisher signs the book - which should work with file-URLS, they aren’t signing as the file-origin…

Nick Ruffilo: what if there was no file and the reading system acted as a mini web server as a local host? Would that open up an opportunity for offline reading?

Jeffrey Yasskin: You cannot get a signing for localhost - so you would leave that resource unsigned. You then would have a publisher sign it as a non-origin signature.

Hadrien Gardeur: I wanted to point out that this is what readium and readium 2 are doing. They are using localhost to serve files in a package. It should work perfectly fine with other packages. This works OK when you have relative URIs. If you have absolute URIs - such as pointing on a web - this doesn’t work very well. We have two situations - where native browser support works well, and one that only works well in a native environment. Having something that works well with pubs that are on the web AND for things that never existed on the web - they are quite different.

Garth Conboy: Jeffrey - you talked about using non-signed resources locally, and the pub signing later - what is required, if anything to be signed?

Jeffrey Yasskin: To sign something, you get a certificate saying who you are. Private/Public key. You sign something and it attaches it to your identity. There are many different type of security - such as TLS, but there are others. There are also certificates that say a signer is a legal entity. If you sign with something like that, a web browser cannot really use that. Although it could be displayed above content to say this entity vouches for this.

Garth Conboy: What is the requirement for signing resources?

Jeffrey Yasskin: There is no requirement that you have to sign a resource in a bundle. There is a requirement from a web browser that - if a resource in a package claims a request URL - to act as a URL, it needs to be signed as it’s origin. That’s where origin signing requirement comes from. If you go to a website and you want to save it - you cannot generate signatures for it - the URL that shows when loading that will be different. In a reading system, that may not matter - you may not have it definitely from that origin. You may still want the publisher to counter-sign it.

Garth Conboy: In the past, Ivan, you were viewing the complexity of origins as a feature and a bug. Does this recent discussion make you happy or not?

Ivan Herman: My problem was different - even if we wanted to go from just the web, because there is a strong wish to have signature, the problem comes that for everyday person’s to have all the signature provided, becomes a headache because of the complexity of the flow of getting a signature - and the tools. it’s not a fault of those around the table… If I want to create a packaged publication, and i want to sign it - i don’t have the tools yet.

Luc Audrain: +1

Wolfgang Schindler: +1 to Ivan’s worries

Jeffrey Yasskin: An individual author may not create a signed version.

Ivan Herman: Because signature is so important, we need to work on making the process better (by asking our friends to make it better/easier). There is a community out there that we should put pressure on.

Jeffrey Yasskin: If you’re running a web server we’d expect you to have a module to be able to create these. it’s straightforward to write software to generate packages as a server - and I’m sure it will get integrated with a wordpress, there will be tools.

Ivan Herman: Let’s be optimistic… I tried to update my personal domain certificate - but the process left me very uneasy, even though it was spelled out for me.

Nick Ruffilo: why is signing important?

Jeffrey Yasskin: If you want to publish something and have it be interactive, and save data, you have to sign - so that it isn’t exposing it to everyone. or you have to be CORS. If you are just in read mode - it will show up in a URL bar, but otherwise you don’t actually need to sign.

Ivan Herman: Stepping away from the tech details - we wanted to see how we could move forward. What I was wondering - the first step - is it possible to use Moby Dick - and try to go through the process of creating a packaged version of that using this concept as a spec. This makes it clearer what is going on - what are the difficulties and easy parts, then work together on that, which might raise issues that are interesting/new. Then see how we can help you with some of the work, Jeffrey. We can always raise issues, but issues are on very specific things. I’m not sure - besides maybe Benjamin and Hadrien, we don’t have a good view as how the whole thing will resolve. I see Benjamin has volunteered for a flowchart?

Tzviya Siegman: Most of us need this… Dave? Have you tried this yet?

Dave Cramer: I’ve done it for different formats, and I’ve played around, but I need to look.

Ivan Herman: We should cross-check with Jeffrey to make sure we have the same thing.

Tzviya Siegman: I was going to comment about signatures, but lets’ make sure dave’s example is current then pass it to Jeffrey.

Jeffrey Yasskin: Note that “webpack” is a different thing from “web packages”, and we’re trying to find a new name for “web package” to avoid that confusion.

Dave Cramer: https everywhere!

Nick Ruffilo: So not everyone would need signatures - just certain types of content (especially ones that are interactive)

Jeffrey Yasskin: Yes…

Benjamin Young: I suggest we write down some apocalyptic scenarios. We talk about idealistic stuff - but we rarely talk about publisher A buying publisher B - where publisher B no longer exists as a domain… That way we can think about how that affects the content. Then how this would map to the Web Package Format - but generally, what are the actions that a publisher would need to get into all the formats.

Jeffrey Yasskin: There’s an issue - all of the signatures in a package, those signatures expire, and fairly quickly. A publication you’d want to trust longer, but I’m not sure how we would be able to,… Publications should not need revalidation all the time - and I’m not sure how to square these things.

Benjamin Young: We should research verifiable claims.

Benjamin Young: this WG https://www.w3.org/2017/vc/WG/

Ivan Herman: Can you summarize in one minute?

Tzviya Siegman: https://www.w3.org/TR/verifiable-claims-use-cases/

Benjamin Young: Verifiable Claims is working on being able to verify the claims made in files that move around that doesn’t require them to permanently exist on the web, or constant money flow into a registry. That way you can make a claim, sign that claim, then pass that file around to those who need it and they can act on that claim. ID-related scenarios - a publisher signing they are - such as a hash even. Those statements need a foundation. Right now we just sort-of trust. What I’m hearing is that the web package format plays inside the existing space, and works with existing services, but knowing the volatility of domains, it has to stay inside that space and is providing more optimizations around cached collections and web distributions, not necessarily chausser’s work for eternity - which is what publishers need - validated verifiable claim…

Jeffrey Yasskin: I think VC is missing some of the apoc - such as what if a cert gets stolen, etc. Revocation is the hard part…

Tzviya Siegman: Any other questions for Jeffrey?

Ivan Herman: How do we step forward?

Jeffrey Yasskin: I’m going to be finishing up the first draft of the bundling spec next week. The PR that I posted to the minutes - please do. Send patches… Look at what I produced, and if it works, great, if not, let me know. The sign exchanges draft i hope they will accept in March. I don’t know after that what timeline is likely to look like. The bundling spec might get discussed in London, and the IETF won’t discuss and it’ll come back to the W3C.

Ivan Herman: I see that the URL you gave refers to IETF work - so none of that is done in W3C?

Jeffrey Yasskin: Bundling - we don’t know yet, but W3C will stay involved.

Ivan Herman: Do we have any timeline on how things would evolve - we have dependencies.

Jeffrey Yasskin: I can take some of the timeline by you - do you have a date for when you’d need a bundling spec sketched?

Ivan Herman: We are supposed to have a candidate rec - tech wise we should be fixed - by end of this year.

Tzviya Siegman: Q1 2019…

Ivan Herman: in about a year - that means the packaging spec should be at our disposal in 8 months?

Jeffrey Yasskin: I should be able to have a finished draft by summer - but I expect we’ll need to iterate on it.

Garth Conboy: Thank you Jeffrey

2. Outreach

Tzviya Siegman: https://docs.google.com/document/d/1kY6NJs5s28S2uXfcB55IdTSAPTnbMeP4rVVajdCaQBc/edit?usp=sharing

Tzviya Siegman: For the first public working drafts - we put together a google doc. If you have people who would be interested in this - if you work for an org that will utilize this - please shout about this.

Tzviya Siegman: https://www.w3.org/publishing/groups/publ-wg/Meetings/F2F/2018.05.Toronto

Tzviya Siegman: We have a F2F coming up in Toronto in May. Garth, Ivan, and I will be talking with our hosts (Kobo) there are information about Hotels - get hotels and flights in order, but we wanted to make sure people were aware of that. If you are new - there is a meeting tomorrow for new members.

Nick Ruffilo: bye!