17:53:05 RRSAgent has joined #dpub-arch 17:53:05 logging to http://www.w3.org/2016/03/03-dpub-arch-irc 17:53:31 Zakim has joined #dpub-arch 17:54:06 zakim, this will be dpub_archival_tf 17:54:06 I do not see a conference matching that name scheduled within the next hour, TimCole 17:54:34 Meeting: DPub Archival Task Force 17:54:51 Chair: Tim Cole 17:55:15 rrsagent, make log public 17:56:05 Agenda: https://lists.w3.org/Archives/Public/public-digipub-ig/2016Mar/0009.html 17:57:01 TimCole has joined #dpub-arch 17:57:12 Agenda: https://lists.w3.org/Archives/Public/public-digipub-ig/2016Mar/0009.html 17:57:51 TimCole has joined #dpub-arch 17:59:14 astein has joined #dpub-arch 18:03:19 HeatherF has joined #dpub-arch 18:03:36 present+ Heather_Flanagan 18:03:40 rrsagent, Agenda: https://lists.w3.org/Archives/Public/public-digipub-ig/2016Mar/0009.html 18:03:40 I'm logging. I don't understand 'Agenda: https://lists.w3.org/Archives/Public/public-digipub-ig/2016Mar/0009.html', TimCole. Try /msg RRSAgent help 18:03:49 Agenda: https://lists.w3.org/Archives/Public/public-digipub-ig/2016Mar/0009.html 18:04:28 lrosenth has joined #dpub-arch 18:04:35 present+ Leonard 18:04:49 present+ Tim_Cole 18:05:06 present+ 18:05:17 Agenda: https://lists.w3.org/Archives/Public/public-digipub-ig/2016Mar/0009.html 18:05:17 present+ astein 18:06:32 Bill_Kasdorf has joined #dpub-arch 18:06:35 Agenda: https://lists.w3.org/Archives/Public/public-digipub-ig/2016Mar/0009.html 18:06:40 present+ Bill_Kasdorf 18:07:11 scribenick: HeatherF 18:07:32 minutes: https://www.w3.org/2016/02/18-dpub-arch-minutes.html 18:07:37 TimCole: Approval of the minutes? 18:07:57 ... minutes approved, no discussion 18:08:06 Topic: Administrivia 18:08:54 From HeatherF: http://www.dpconline.org/advice/preservationhandbook 18:09:03 http://www.archives.gov/digitization/strategy.html 18:09:40 Ayla Stein 18:09:51 TimCole: Will miss the next few meetings; Ayla Stein will co-lead when needed 18:10:15 Topic: Outreach 18:10:47 TimCole: Any updates on the outreach effort? 18:10:58 Bill_Kasdorf: haven't had a chance to reach out yet 18:11:04 TimCole: haven't had a chance to reach out yet 18:11:20 lrosenth: haven't had a chance to reach out yet 18:11:24 We are bad, bad people 18:11:26 Neither have I 18:11:28 haha 18:11:45 TimCole: this will go on the agenda next time. And we mean it! 18:12:03 Topic: Use Cases 18:12:11 http://w3c.github.io/dpub-pwp-arch/Archival-UCR.html#LOCKSS 18:12:29 TimCole: This is the beginnings of a use case posted this morning. 18:12:47 ... This doc is meant to be the place to put use cases, talk about requirements for those use cases that are relevant to the PWP vision 18:13:08 LOCKSS: http://www.lockss.org/about/how-it-works/ 18:13:23 ... Talking about LOCKSS in particular, see link about how LOCKSS works. 18:14:10 ... LOCKSS goes and spiders/scrapes publisher websites that they have permission to archive. Then, when someone has permission to access that content, LOCKSS acts as a proxy cache. 18:14:36 ... New versions will be posted and LOCKSS will update, as per usual proxy behavior. 18:14:57 Bill: For LOCKSS, are they accessing the library, or are they accessing the publisher? 18:15:07 TimCole: They are accessing what the library is subscribing to. 18:15:15 I didn't see anyone on the queue 18:15:20 I was! 18:15:24 :-) 18:15:29 q? 18:16:03 TimCole: People coming through the library servers would see what was available in the proxy cache, and LOCKSS would siphon off copies for the archives. 18:16:18 TimCole: CLOCKSS works more directly with the publishers and makes that copying more routine. 18:17:02 ... Some interesting issues come up: when the publisher content go away, and we move into a pure archive/perservation point, then LOCKSS servers up the cached copy as long as the ACCEPT headers match what LOCKSS has cached 18:17:53 ... The browser goes to a new version of the content and says "I only accept HTML5"; LOCKSS only has "HTML4"; at which point, LOCKSS will try to migrate on the fly to HTML5 or it will respond with a "406" error 18:18:06 Bill: Will it also say "but we can give you a different format?" 18:18:10 q? 18:18:10 TimCole: not sure 18:18:34 lrosenth: that seems a sensible model. Are we saying that's a model we like, or just saying that this is what it does, these are the facts? 18:18:52 PWP: http://w3c.github.io/dpub-pwp/#arch 18:18:55 TimCole: Right now, we are just collecting the use cases. We should be considering whether the facts in the use cases cause concern. 18:19:25 ... Right now, the PWP talks about Service Workers, which provide a local way to serve resources. 18:19:53 q? 18:20:05 lrosenth: there are no requirements that the PWP must use Service Workers. That said, it has no bearing on this, because we're talking about a server operation, where a server is doing the caching, not the client 18:21:28 ... unclear as to what the issue is here; if we publish in PWP format, then LOCKSS will store that. If we request it as a PWP, then it will be served as a PWP. If I request in another format, the server may convert it on the fly. 18:21:54 ... the reverse is also true: if I post content in PDF, and instead of asking for that as PDF I ask it as a PWP, the server may convert it on the fly. 18:22:38 TimCole: Probably correct. Just wondering whether there is something that the client will do with the PWP that may somehow result in the copy that the LOCKSS server cached not having everything that the client needs 18:22:50 q? 18:23:22 Bill: When we look at the Portico model, a very contrasting model, it will be easier to see that, given how PWP is being spec'd, how compatible is it to these two important dark archiving schemes? 18:23:39 ... Is there anything we're doing that creates a problem? Is there anything we're doing that can optimize what they're doing? 18:24:05 lrosenth: Right. And for LOCKSS, the answer is "we don't need to care" 18:25:38 ... the packaged versus the unpackaged differentiation becomes interesting. If I publish the PWP unpackaged, and there is no provision in the server to provide a packaged version, then LOCKSS is going to have to do a huge amount of work to archive the PWP 18:26:12 Bill: That is still better than where they are now. If they are now going to a website, what other resources are there that are essential to that content? e.g., fonts, media, etc. 18:26:34 ... PWP is going to make that easier, even in unpackaged form. the PWP must unambiguously get you those resources. 18:27:09 TimCole: The problem is that LOCKSS right now works on the assumption that whatever its grabbing from a web browser is all it needs. It doesn't know anything about the formats in any great detail. It does not take advantage of a manifest to build a package. 18:27:38 Bill: OK. Then perhaps the PWP would enable LOCKSS to do a better job than it can currently do to archive the publication in a more complete and correct way. 18:27:50 lrosenth: It will still require LOCKSS to do more work, which is fine from our perspective. 18:28:00 TimCole: that's a concern this use case needs to highlight. 18:28:06 http://www.dlib.org/dlib/january05/rosenthal/01rosenthal.html 18:28:24 http://blog.dshr.org/2013/02/rothenberg-still-wrong.html 18:29:10 ... This is a debate that covers both sides of the LOCKSS model. 18:29:30 q? 18:29:32 ... Why it doesn't work, why it does. 18:29:51 ... This will be interesting to look at as we discuss the CLOCKSS model as well. 18:30:13 BillCole: Will also want to talk to Craig Van Dyke, who now works at CLOCKSS 18:31:16 s/BillCole/Bill_Kasdorf/ 18:32:08 TimCole: so, potential question about packaged versus unpackaged. Will do more refinement on the LOCKSS use case, highlighting what we see as issues, and what we see as non-issues. 18:32:45 ... Also planning on a use case based on Portico. 18:33:19 ... As part of their normalization process, they would develop a package from a manifest, so they could maintain both the content and appearence over time. 18:33:56 lrosenth: Unlike EPUB, PWP does not require that all things in a package or all items referenced in a manifest are from the same site or are self contained. PWP allows for external references. 18:34:10 q? 18:34:13 ... Someone taking the material and creating a package may not have the rights to the externally referenced material. 18:34:52 TimCole: a very good point. How you deal with normalization is an important issue. 18:35:42 Bill: The fundamental strategy with Portico was to normalize so that they may support format migrations, keeping the format up to date and uniform. However, it can't migrate those external resources, it can only link to them to the extent the links are stable (which they may not be) 18:36:05 TimCole: Also raises the question that comes up with other formats: when the fonts are not available for whatever reason, what are the fall backs. 18:36:52 lrosenth: This is why PDF/A exists. We may want to say whether there is a PWP/A. You can do all these things with PWP, but if you want one that is archive-ready, you will need to do these additional things (as requirements) 18:37:07 TimCole: There may be a need for that, even if we don't have to define it fully in the docs we are working with. 18:37:20 q? 18:37:21 lrosenth: We can just come back to the main group and say "we think this is important to do" and leave it at that. 18:37:34 TimCole: For next call, will have a Portico use case. 18:37:42 Topic: Upcoming Agendas 18:38:07 action: Tim to come up with a use case for Portico that illustrates some of the features of portico as a model 18:38:28 q+ 18:39:16 q? 18:39:18 TimCole: if you don't have a git hub account to get the use cases updated, send Tim the details 18:39:36 ack {HeatherF} 18:40:00 ack HeatherF 18:40:56 24th works for me 18:41:02 TimCole: propose moving the next call to the 24th of March; Tim will propose to the list 18:42:41 Bill: are we looking to interview and report back on the outreach contacts, or bring them in for calls? 18:43:57 TimCole: If the contacts can meet with us at one of our times, invite them in to come in at the half hour (reserving the second half of the call for the discussion). If they can't need to do the interview. 18:44:09 works for me 18:44:09 ... they are welcome to come in earlier, of course 18:44:48 ... We have talked a few times about PDF/A; would lrosenth be able to walk us through some of the important lessons from that process? 18:45:08 lrosenth: yes, can do. Have a presentation that has been given in the past that may be useful and relevant. What PDF/A is and is not. 18:45:27 TimCole: We will put that on the agenda for the next call 18:46:18 ... What else needs to be done before our next call? Perhaps start thinking about what other write ups we might need? 18:47:01 ... What started this group was the need for some text under the Archival and library services in the PWP white paper. We can talk in that text about the need for a PWP/A approach. 18:47:21 ... We also have a glossary we may want to expand on. Are there any other products we want to expand on? 18:47:25 *crickets* 18:47:27 q? 18:48:25 \o/ 18:48:35 ... Then with that, we can adjourn! Talk to you in (probably) three weeks 18:49:31 rrsagent, make log public 18:49:46 rrsagent, draft minutes 18:49:46 I have made the request to generate http://www.w3.org/2016/03/03-dpub-arch-minutes.html TimCole 21:13:02 Zakim has left #dpub-arch