— Minutes

Date: 2018-04-23

See also the Agenda and the IRC Log


Present: Romain Deltour, Jun Gamou, Jasmine Mulliken, Deborah Kaplan, Avneesh Singh, Tzviya Siegman, Luc Audrain, Ben Dugas, Joshua Pyle, Dave Cramer, Wolfgang Schindler, Adam Sisco, Teenya Franklin, Matt Garrish, Nick Ruffilo, Evan Owens, George Kerscher, Hadrien Gardeur, Vladimir Levantovsky, Nick Brown, Franco Alvarado, Wendy Reid, Chris Maden, Gregorio Pellegrino, Laurent Le Meur, Bill Kasdorf, Benjamin Young, Brady Duga, Tim Cole, Mateus Teixeira, Marisa DeMeglio, Reinaldo Ferraz, Peter Krautzberger, David Stroup, Karen Myers, Murata Makoto

Regrets: Garth Conboy, Ivan Herman, Ben Walters, Mustapha Lazrek


Chair: Tzviya Siegman

Scribe(s): Nick Ruffilo, Dave Cramer


Tzviya Siegman: title: PWG

Murata Makoto: I have been involved in epub 3 and 3.2 but i’m also interested in what’s happening in this group. I work for Keio advanced double3

Nick Brown: I’m Nick and I work for Vital Source - part of Ingram. I’ve been involved with IDPF before and mostly epub3 for the last few years. I work closely with Rick Jonson.

Tzviya Siegman: Welcome.

1. minutes

Tzviya Siegman: https://www.w3.org/publishing/groups/publ-wg/Meetings/Minutes/2018/2018-04-16-pwg.html

Tzviya Siegman: First item - review minutes from last week. Comments?

dave: They should be approved

Tzviya Siegman: Approved.

Resolution #1: Last meeting’s minutes approved

2. offlining

Tzviya Siegman: something we haven’t discussed yet - the concept of offlining. I hoped today we could come together on different approaches to offlining. There is a question of if we do it now or defer. Lets open up the floor. I want to start saying that offline is crucial to publications. I believe there is an important case to offline and it’s extremely important. Especially

Nick Ruffilo: in scholarly publishing - if someone wishes to annotate an HTML document and they are on an airplane or other no-connectivity situations.

Dave Cramer: https://w3c.github.io/dpub-pwp-ucr/#onloffl

Hadrien Gardeur: I think what we could be doing regarding offlining is rely on existing infoset items - if i offline a publication, should it be the default reading order? How do you deal with versions? How do I know there are updates? The technical details are going to be difficult, but explaining how various items in the infoset can be used is a good first step.

Laurent Le Meur: It occurred to me that when we make a relationship with the video business - netflix doesn’t try to offline movies for exactly the same stream. When you want to offline, you download another version. There could be an alternate solution that there is a stream read where you annotate in the reader using a plugin, and if you want to read it offline, you download it and read it in a reading system. This is what the video world is doing now. We could fallback to this solution.

Deborah Kaplan: The question that will come up - and the big question is how to deal with remote resources. Such as an iframe with a quiz in it from a remote resource - will the publication say that certain items cannot be offlined - or do we need legal notation for what can be downloaded. We will need to accommodate the concept that things may not work with offline/download

Tzviya Siegman: I want to explain/remind people why we’ve separated offline and packaging. There are a place where people know that PDF and Epub exist, but they aren’t interested in it - such as the world of scholarly publishing. They have access to PDF or epub - but the act of downloading is not what they want. It’s an extra step to have to download a file and share it. For scholarly publishing, the only available format is PDF because epub did not catch on in the scholarly world. If we say the way to access is the complete package, it won’t catch on in the scholarly world. So having it offline but not downloaded, that’s attractive (in a package)

Romain Deltour: the other elephant in the room is service workers - so if we say that the web publication does not intend to use offline - we need to have a declarative way for offlining, but it would require service workers - although they would live in the reading system not the document. The web has solved offlining in this way, so if we want to do something different, we have to explain why.

Luc Audrain: in our first public working draft we have clearly associated offlining with the resource list - that means that there is a resource list and that means it should be available when you have no connection. We have something straightforward, as soon as we have the resource list and possibly the default reading order, we should enable access to the publication offline. Our first public working draft states that a user agent should do this unless it cannot reach the item, in which case it becomes an external resource. With the resource list we have a good start with the question of offlining.

Hadrien Gardeur: To react to service workers - i am not sure we can say the web has solved offline, but i would advice against this being the only solution. I’m not sure we should say we only have one way. There are some real technical issues regarding offlinability. The browser or dedicated reading app can trigger offlining a publication - which is very different than a service worker is designed. The Service worker is normally allowing the web-app to determine what is offline.

Dave Cramer: I have concerns with service workers as they exist today - especially where they store stuff and how it’s handled. I don’t want my book deleted by the browsers - so we have needs beyond what service workers are doing. It’s a huge topic and I’m wondering if we can let this happen in the first version of the spec. Annotations while offline - that seems really complex and is that required for version 1 of the spec?

Romain Deltour: +1 on the topic of “storage” or “persistence”, which is one the limitation of today’s Web for publications

Evan Owens: There is an interesting practice in scholarly publishing where a user downloads a PDF - crossref has the functionality where there is a link in the PDF that the user can check to see if there is an update. That’s a utility that lets the publisher know if content is updated. Updating is an important issue - the user needs to know if it’s the latest version they are looking at.

Tzviya Siegman: I wanted to talk about storage VS persistent. We need to suss out how persistent something needs. If it’s your ability to go offline for an hour - that’s different than needing to persist for a long time….

Benjamin Young: https://storage.spec.whatwg.org/#persistence

Benjamin Young: https://dauwhe.github.io/html-first/Moby-Dick/

Benjamin Young: It’s come up a few times that: A) we need more nuanced language around ‘offline’ but there are many different states. Offline in the browser is currently temporal - there are quota limits, it’s optimized for your storage, and it lumps it in with cookies. So it’s possible to clear cache and get rid of your books. There is the storage spec at the WhatWG - does have some persistence stuff. It is a way to ask the browser to keep something longer and be less volatile - but it’s not currently spec’d and thought of. With things like web-apps, they are offline-first. That is not to say we can’t pick up from where we are on the web. Service workers are a good way to process requests. Late last year I demo’d a link with an infoset reading order pulled from TOC - then pipes all that through some iframes. the service worker watches those iframes and then keeps those items in a cache. Additionally, it has a manifest if you load in a phone that will signal you can install this thing - it will put an icon on your home screen and you can read offline. It uses the infoset to then read the items offline… It’s saying the URLs, but it can still be worked with offline - but you’re working with the cache - so you don’t have strong persistence. Once we start looking at permanence, we are looking at versioning. So we need to signal that there are new copies… There are lots of issues to file. It can be done with current tech.

Nick Ruffilo: we need some use cases
… for different types of publishers
… when reading fiction/novel on the beach, while turning off wifi
… is a different need for offline
… you might want an update but you might not
… in scholarly, a minor correction might be a major update
… but downloading to a device might be out of scope
… so the industry might want a different solution

Nick Brown: I wanted to make a quick comment about versions, annotations, etc feeling big and scary. Some of our experience here. A significant portion of our reading system platform go towards supporting those things. The last few folks that have spoken have talked about descoping some of those aspects and focus on cache and persistence. That feels like the appropriate place to focus. It’s a nice primitive path to solve some of the complex packages. I have a hard time seeing how the complex ones could be solved simply and within spec. That seems like the right start - and it solves the cache and persistence issues.

Nick Brown: +1 Hadrien - exactly right, and I wouldn’t propose to standardize our solutions to those problems

Jasmine Mulliken: Is there anything we can draw on what the web archiving group has been addressing? Is there something we can draw on? LIke a single work file we can draw on?

Tzviya Siegman: One thing Garth and I talked about - is whether or not service workers are an option - if we’re putting the opportunity on the author of a script - it should be optional - meaning that offlining, or other features could be optional. The reading system needs to support it, but we’re saying that for those people who care about it - ok. If it’s really important, put the script in, and great it will work. The publisher/author need to determine if the users want to use it. Scholarly publishers include scripts for things like mathjax - but making it optional reduces the burden of what is required.

Deborah Kaplan: As for web-archiving, tim is on the queue - Tim is probably best to answer. There are possible relations. I wanted to lean against the way Nick R framed the issue - that some things need or don’t need offline. It doesn’t gain us anything to say “there is a scholarly publishing model that is different.” I can think there are cases where you might need annotations while sitting on the beach. I do like the idea of saying it’s an optional thing. So a publication doesn’t need to be offlinable, but if we define what it means, then a publication CAN be offlinable, so it’s a better way of defining it. It lets publishers/content creators to second guess what the users are doing with the content, but that is acceptable in this case.

Hadrien Gardeur: we can’t truly offline anything for good with today’s technology, the best we can do is cache things and intercept some network requests for a limited period of time

Tim Cole: I put myself on the queue to talk about the importance of taking control of the timing of the cache. Publishers have the responsibility sometimes to make sure that content doesn’t persist - like a library, where content could expire. So it could be useful to tell the UA how long the content should be around for. In terms of optional. In response to web archiving, the question always comes when to take a snapshot - when does something change, and how do I mark the date and time of this change, and how many am I obligated to take. I’m not sure this group is going to solve those issues. but that’s what they’re working on.

Benjamin Young: I wanted to add one thing - the moby dick demo just expresses links to primary resources and covers the TOC. The script that should be considered a polyfill does the actual offlining. It takes the things afforded, and then adds the ability of offline it. There is nothing implicit in the state of the document that made it go offline or prevented it. It’s just an HTML document - but it was shaped in a way that was meant to be consistent, so that something else knowing intent could offline it. The script or the UA does the things required to offline, but the affordances were about linking to the required resources and having proper cache headers so that the service workers were allowed to do that. Affording for it was just the descriptive bits.

Tzviya Siegman: We need some technical meat and need to go beyond affordances.
… lets come away with some action items. Some need to work on use-cases.
… Anyone interested with some use-cases for offlining?
… We need to build this out a bit more - and flesh out a bit more in what we’re looking for - possibly make it optional. Tim talked about making it a timed expiration.

Benjamin Young: Related to the use-cases, a lot that this affects. Versioning, annotation, and both of those relate to identification of the resource itself. The one thing that current caching/offlining gets us is that it’s experiencable off the network without changing the identity. Nothing else has that at the moment. That lets you be away from the network while expericing the thing. This is foundational - that the identity doesn’t change. Such that all the things I do with the publication still have meaning when I transport them. Identity ties into all these concerns.

Tzviya Siegman: Can I ask Benjamin, Tim, and Nick Brown - who touched on some key points - can you work on a few github issues to capture some of what we’ve talked about in this meeting?

3. Infoset update

Luc Audrain: I think we - I need at least some more clarification about what is expected in our first public working draft. We had some discussions - long discussions - about what should we describe in the web publications - it should be very simple, the minimal that will allow people not coming from publishing/epub world to think about what could be a publication on the web. The pendulum went a bit too far on “why not a single HTML page to be a publication.” It seems to me that what we have in our first draft is a reflection of this movement towards a simple infoset to start with for people not coming from the epub world. Now I see a different move. We had these discussions in the interest group - and the 3.1 BFF (Browser friendly format) It looks like this info is coming again. In my opinion, we should re-explain what we are expecting with this BFF. About the infoset, the other point of view - the extremity of the pendulum - is what hadrien said in #176, where hadrien explained the GAP between what we have in epub and the needs. My question is, do you imagine - for web publication - do we need to have a description of all of what is in epub? Not in terms of implementation but in terms of items. That is not clear to me. Reading the infoset as it is now, some items need to be more detailed, but I don’t think we have to describe implementation. I thought until now that the spec we have to now cover everything we need for a first implementation. I’m not sure it’s the good way to do it.
… Do we need to have much more things - all or almost all of what we’ll have in epub 4. Obviously it will be with MAY - for pwp and epub 4 then MAY will then becomes a should or possibly a MUST…

Tzviya Siegman: Is the question whether the infoset - which is rather minimal - is sufficient or if we need to incorporate all the details that came from the gap analysis between epub?

Luc Audrain: yes

Benjamin Young: The things that I want to focus on are that the infoset affords things that we need. I would be concerned if we just inhereted epub 3.1 - for all the reasons it was trying to afford things - which may be some of our issues, but we have a web spec which has different concerns. I would want to really be careful that things were over there, so we’ll move them over here.

Tzviya Siegman: I think it’s worth going through the functionality items that epub provided and see if that’s available here. For example, the epub infoset includes a “link” element. The link item was there to afford external metadata - but that’s available in PWP, just in a different way.

Luc Audrain: The gap analysis goes much further. I agree, we should not copy/paste, but we should redefine in web-language and technologies all the things that are in 3.1 that should be in 4.0

Tzviya Siegman: I think that Hadrien’s list is important to look at. Like what Benjamin said, we need to find ways to dupe the functionality, not the technology.

Hadrien Gardeur: we had to start somewhere - i haven’t looked at the satellite specs, just the core. We need to think carefully where each of the infoset items belong. They might not be important in epub 3 but might be in 4. The example used - link - it’s not limited to external metadata, and we don’t yet have a replacement for it’s functionality.

Tzviya Siegman: OK, so we need to take a look at each item, think about it’s functionality, and make sure it is covered.

Luc Audrain: Starting with hadrien’s list, we should create an action item out of each. Would it come to WP and how it would be expressed in web technologies. In any case we need to understand how to rethink for an open web platform.

Tzviya Siegman: Anyone willing to volunteer?

Luc Audrain: For the infoset as it is now - for the minimal web publication - yes it is stable.

Tzviya Siegman: What I will do is take the issue hadrien raised and break it out into further issues and look into where it belongs going forward.

Luc Audrain: We should have an item on BFF - so it would be completely clear for everyone.

Tzviya Siegman: Exactly what do you mean?

Luc Audrain: Myself at least - I am not very comfortable with what BFF

Tzviya Siegman: BFF is something we did in the past - and it’s used as reference, but we are not going to use it going forward. We have WP.

Hadrien Gardeur: One thing - BFF isn’t entirely in the past. The work done continues with Readium 2 and the web publication manifest. We did the work to transform infoset items. We make sure that it’s round-tripable between epub and the readium manifest. It works well so far - when Luc mentions looking at BFF - i think he means looking at the work that was done.

Tzviya Siegman: Interesting, thank you.
… I will follow up with those who volunteered.
… Affordances task force later today

4. Resolutions