SV_MEETING_TITLE -- 23 Apr 2018

<tzviya> title: PWG

<scribe> scribenick: nickruffilo

Makoto: I have been involved in epub 3 and 3.2 but i'm also interested in what's happening in this group. I work for Keio advanced double3

Nick Brown: I'm Nick and i work for Vital Source - part of Ingram. I've been involved with IDPF before and mostly epub3 for the last few years. I work closely with Rick Jonson.

Tzviya: Welcome.

<tzviya> https://www.w3.org/publishing/groups/publ-wg/Meetings/Minutes/2018/2018-04-16-pwg.html

Tzviya: First item - review minutes from last week. Comments?

minutes

dave: They should be approved

offlining

Tzviya: Approved. Main topic - something we haven't discussed yet - the concept of offlining. I hoped today we could come together on different approaches to offlining. There is a question of if we do it now or defer. Lets open up the floor. I want to start saying that offline is crucial to publications. I believe there is an important case to offline and it's extremely important. Especially

in scholarly publishing - if someone wishes to annotate an HTML document and they are on an airplane or other no-connectivity situations.

<dauwhe> https://w3c.github.io/dpub-pwp-ucr/#onloffl

Hadrien: I think what we could be doing regarding offlining is rely on existing infoset items - if i offline a publication, should it be the default reading order? How do you deal with versions? How do I know there are updates? The technical details are going to be difficult, but explaining how various items in the infoset can be used is a good first step.

Laurent: It occured to me that when we make a relationship with the video business - netflix doesn't try to offline movies for exactly the same stream. When you want to offline, you download another version. There could be an alternate solution that there is a stream read where you annotate in the reader using a plugin, and if you want to read it offline, you download it and read it in a reading

system. This is what the video world is doing now. We could fallback to this solution.

Deborah: The question that will come up - and the big question is how to deal with remote resources. Such as an iframe with a quiz in it from a remote resource - will the publication say that certain items cannot be offlined - or do we need legal notation for what can be downloaded. We will need to accomodate the concept that things may not work with offline/download

<Zakim> tzviya, you wanted to point out the difference between offline and package

Tzviya: I want to explain/remind people why we've separated offline and packaging. There are a place where people know that PDF and Epub exist, but they aren't interested in it - such as the world of scholarly publishing. They have access to PDF or epub - but the act of downloading is not what they want. It's an extra step to have to download a file and share it. For scholarly publishing, the

only available format is PDF because epub did not catch on in the scholarly world. If we say the way to access is the complete package, it won't catch on in the scholarly world. So having it offline but not downloaded, that's attractive (in a package)

Romain: the other elephant in the room is service workers - so if we say that the web publication does not intend to use offline - we need to have a declaritive way for offlining, but it would require service workers - although they would live in the reading system not the document. The web has solved offlining in this way, so if we want to do something different, we have to explain why.

Luc: in our first public working draft we have clearly associated offlining with the resource list - that means that there is a resource list and that means it should be available when you have no connection. We have something straightforward, as soon as we have the resource list and possibly the default reading order, we should enable access to the publication offline. Our first public working

draft states that a user agent should do this unless it cannot reach the item, in which case it becomes an external resource. With the resource list we have a good start with the question of offlining.

Hadrien: To react to service workers - i am not sure we can say the web has solved offline, but i would advice against this being the only solution. I'm not sure we should say we only have 1 way. There are some real technical issues regarding offlinability. The browser or dedicated reading app can trigger offlining a publication - which is very different than a service worker is designed. The

Service worker is normally allowing the web-app to determine what is offline.

dave: I have concerns with service workers as they exist today - especially where they store stuff and how it's handled. I don't want my book deleted by the browsers - so we have needs beyond what service workers are doing. It's a huge topic and I'm wondering if we can let this happen in the first version of the spec. Annotations while offline - that seems really complex and is that required

for version 1 of the spec?

<rdeltour> +1 on the topic of "storage" or "persistence", which is one the limitation of today’s Web for publications

<Zakim> tzviya, you wanted to discuss SW

Evan: There is an interesting practice in scholarly publishing where a user downloads a PDF - crossref has the functionality where there is a link in the PDF that the user can check to see if there is an update. That's a utility that lets the publisher know if content is updated. Updating is an important issue - the user needs to know if it's the latest version they are looking at.

Tzviya: i wanted to talk about storage VS persistent. We need to suss out how persistant something needs. If it's your ability to go offline for an hour - that's different than needing to persist for a long time....

<bigbluehat> https://storage.spec.whatwg.org/#persistence

<bigbluehat> https://dauwhe.github.io/html-first/Moby-Dick/

Benjamin: It's come up a few times that: A) we need more nuanced language around 'offline' but there are many different states. Offline in the browser is currently temporal - there are quota limits, it's optimized for your storage, and it lumps it in with cookies. So it's possible to clear cache and get rid of your books. There is the storage spec at the WhatWG - does have some persistence

stuff. It is a way to ask the browser to keep something longer and be less volatile - but it's not currently spec'd and thought of. With things like web-apps, they are offline-first. That is not to say we can't pick up from where we are on the web. Service workers are a good way to process requests. Late last year I demo'd a link with an infoset reading order pulled from TOC - then pipes all

that through some iframes. the service worker watches those iframes and then keeps those items in a cache. Additionally, it has a manifest if you load in a phone that will signal you can install this thing - it will put an icon on your home screen and you can read offline. It uses the infoset to then read the items offline...
...: It's saying the URLs, but it can still be worked with offline - but you're working with the cache - so you don't have strong persistence. Once we start looking at permanence, we are looking at versioning. So we need to signal that there are new copies... There are lots of issues to file. It can be done with current tech.

<dauwhe> NickRuffilo: we need some use cases

<dauwhe> ... for different types of publishers

<dauwhe> ... when reading fiction/novel on the beach, while turning off wifi

<dauwhe> ... is a different need for offline

<dauwhe> ... you might want an update but you might not

<dauwhe> ... in scholarly, a minor correction might be a major update

<dauwhe> ... but downloading to a device might be out of scope

<dauwhe> ... so the industry might want a different solution

Nick Brown: i wanted to make a quick comment about versions, annotations, etc feeling big and scary. Some of our experience here. A significant portion of our reading system platform go towards supporting those things. The last few folks that have spoken have talked about descoping some of those aspects and focus on cache and persistence. That feels like the appropriate place to focus. It's

a nice primitive path to solve some of the complex packages. I have a hard time seeing how the complex ones could be solved simply and within spec. That seems like the right start - and it solves the cache and persistence issues.

<Nick_Brown> +1 Hadrien - exactly right, and I wouldn't propose to standardize our solutions to those problems

Jasmine: Is there anything we can draw on what the web archiving group has been addressing? Is there something we can draw on? LIke a single work file we can draw on?

tzviya: One thing garth and I talked about - is whether or not service workers are an option - if we're putting the opportunity on the author of a script - it should be optional - meaning that offlining, or other features could be optional. The reading system needs to support it, but we're saying that for those people who care about it - ok. If it's really important, put the script in, and great

it will work. The publisher/author need to determine if the users want to use it. Scholarly publishers incluse scripts for things like mathjax - but making it optional reduces the burden of what is required.

Deborah: As for web-archiving, tim is on the queue - Tim is probably best to answer. There are possible relations. I wanted to lean against the way Nick R framed the issue - that some things need or don't need offline. It doesn't gain us anything to say "there is a scholarly publishing model that is different." I can think there are casees where you might need annotations while sitting on the

beach. I do like the idea of saying it's an optional thing. So a publication doesn't need to be offlinable, but if we define what it means, then a publication CAN be offlinable, so it's a better way of defining it. It lets publishers/content creators to second guess what the users are doing with the content, but that is acceptable in this case.

<Hadrien> we can't truly offline anything for good with today's technology, the best we can do is cache things and intercept some network requests for a limited period of time

<Zakim> bigbluehat, you wanted to mention affording for offline vs. the actions to offline something

Tim: I put myself on the queue to talk about the importance of taking control of the timing of the cache. Publishers have the responsibility sometimes to make sure that content doesn't persist - like a library, where content could expire. So it could be useful to tell the UA how long the content should be around for. In terms of optional. In response to web archiving, the question always comes

when to take a snapshot - when does something change, and how do I mark the date and time of this change, and how many am I obligated to take. I'm not sure this group is going to solve those issues. but that's what they're working on.

Benjamin: I wanted to add one thing - the moby dick demo just expresses links to primary resources and covers the TOC. The script that should be considered a polyfill does the actual offlining. It takes the things afforded, and then adds the ability of offline it. There is nothing implicit in the state of the document that made it go offline or prevented it. It's just an HTML document - but

it was shaped in a way that was meant to be consistent, so that something else knowing intent could offline it. The script or the UA does the things required to offline, but the affordances were about linking to the required resources and having proper cache headers so that the service workers were allowed to do that. Affording for it was just the descriptive bits.

Tzviya: We need some technical meat and need to go beyond affordances.
...: lets come away with some action items. Some need to work on use-cases.
... Anyone interested with some use-cases for offlining?
... We need to build this out a bit more - and flesh out a bit more in what we're looking for - possibly make it optional. Tim talked about making it a timed expiration.

Benjamin: Related to the use-cases, alot that this affects. Versioning, annotation, and both of those relate to identification of the resource itself. The one thing that current caching/offlining gets us is that it's experiencable off the network without changing the identity. Nothing else has that at the moment. That lets you be away from the network while expericing the thing. This is

foundational - that the identity doesn't change. Such that all the things I do with the publication still have meaning when I transport them. Identity ties into all these concerns.

Tzviya: Can I ask Benjamin, Tim, and Nick Brown - who touched on some key points - can you work on a few github issues to capture some of what we've talked about in this meeting?
... Lets move on to the infoset. Luc?

<George> /join pwg

<bigbluehat> s/\/join pwg//

Infoset update

<George> George back on phone

Luc: I think we - I need at least some more clarification about what is expected in our first public working draft. We had some discussions - long discussions - about what should we describe in the web publications - it should be very simple, the minimal that will allow people not coming from publishing/epub world to think about what could be a publication on the web. The pendulum went a bit

too far on "why not a single HTML page to be a publication." It seems to me that what we have in our first draft is a reflection of this movement towards a simple infoset to start with for people not coming from the epub world. Now I see a different move. We had these discussions in the interest group - and the 3.1 BFF (Browser friendly format) It looks like this info is coming again. In my

opinion, we should re-explain what we are expecting with this BFF. About the infoset, the other point of view - the extremeity of the pendulum - is what hadrien said in #176 where hadrien explained the GAP between what we have in epub and the needs. My question is, do you imagine - for web publication - do we need to have a description of all of what is in epub? Not in terms of implementation

but in terms of items. That is not clear to me. Reading the infoset as it is now, some items need to be more detailed, but I don't think we have to describe implementation. I thought until now that the spec we have to now cover everything we need for a first implementation. I'm not sure it's the good way to do it.

scribe: Do we need to have much more things - all or almost all of what we'll have in epub 4. Obviously it will be with MAY - for pwp and epub 4 then MAY will then becomes a should or possibly a MUST...

Tzviya: Is the question whether the infoset - which is rather minimal - is sufficient or if we need to incorporate all the details that came from the gap analysis between epub?

Luc: yes

<pkra> have to leave early. Sorry.

Benjamin: The things that I want to focus on are that the infoset affords things that we need. I would be concerned if we just inhereted epub 3.1 - for all the reasons it was trying to afford things - which may be some of our issues, but we have a web spec which has different concerns. I would want to really be careful that things were over there, so we'll move them over here.

Tzviya: I think it's worth going through the functionality items that epub provided and see if that's available here. For example, the epub infoset includes a "link" element. The link item was there to afford external metadata - but that's available in PWP, just in a different way.

Luc: The gap analysis goes much further. I agree, we should not copy/paste, but we should redefine in web-language and technologies all the things that are in 3.1 that should be in 4.0

tzviya: I think that Hadrien's list is important to look at. Like what Benjamin said, we need to find ways to dupe the functionality, not the technology.

Hadrien: we had to start somewhere - i haven't looked at the satellite specs, just the core. We need to think carefully where each of the infoset items belong. They might not be important in epub 3 but might be in 4. The example used - link - it's not limited to external metadata, and we don't yet have a replacement for it's functionality.

Tzviya: OK, so we need to take a look at each item, think about it's functionality, and make sure it is covered.

Luc: Starting with hadrien's list, we should create an action item out of each. Would it come to WP and how it would be expressed in web technologies. In any case we need to understand how to rethink for an open web platform.

Tzviya: Anyone willing to volunteer?

Luc: For the infoset as it is now - for the minimal web publication - yes it is stable.

Tzviya: What I will do is take the issue hadrien raised and break it out into further issues and look into where it belongs going forward.

Luc: We should have an item on BFF - so it would be completely clear for everyone.

Tzviya: Exactly what do you mean?

Luc: Myself at least - I am not very comfortable with what BFF

Tzviya: BFF is something we did in the past - and it's used as reference, but we are not going to use it going forward. We have WP.

Hadrien: One thing - BFF isn't entirely in the past. The work done continues with Readium 2 and the web publication manifest. We did the work to transform infoset items. We make sure that it's round-tripable between epub and the readium manifest. It works well so far - when Luc mentions looking at BFF - i think he means looking at the work that was done.

Tzviya: Interesting, thank you.
...: I will follow up with those who volunteered.
... Affordances task force later today

- DRAFT -

SV_MEETING_TITLE

23 Apr 2018

Attendees

Contents

minutes

offlining

Infoset update

Summary of Action Items

Summary of Resolutions

Scribe.perl diagnostic output