See also: IRC log
<TimCole> Agenda: https://lists.w3.org/Archives/Public/public-digipub-ig/2016Mar/0009.html
<TimCole> scribenick: HeatherF
<TimCole> minutes from 18 Feb: https://www.w3.org/2016/02/18-dpub-arch-minutes.html
TimCole: Approval of the
minutes?
... minutes approved, no discussion
<TimCole> From HeatherF: http://www.dpconline.org/advice/preservationhandbook
<lrosenth> http://www.archives.gov/digitization/strategy.html
TimCole: Will miss a few of our scheduled meetings; Ayla Stein will co-lead TF and chair calls when needed.
TimCole: Any updates on the outreach effort?
Bill_Kasdorf: haven't had a chance to reach out yet
TimCole: haven't had a chance to reach out yet
lrosenth: haven't had a chance to reach out yet
We are bad, bad people
<astein> Neither have I
TimCole: this will go on the agenda next time. And we mean it!
<TimCole> http://w3c.github.io/dpub-pwp-arch/Archival-UCR.html#LOCKSS
TimCole: This is the beginnings
of a use case posted this morning.
... This doc is meant to be the place to put use cases, talk
about requirements for those use cases that are relevant to the
PWP vision
<TimCole> more about LOCKSS: http://www.lockss.org/about/how-it-works/
TimCole: This use case talks about LOCKSS in
particular, see link about how LOCKSS works.
... LOCKSS goes and spiders/scrapes publisher websites that
they have permission to archive. Then, when someone has
permission to access that content, LOCKSS acts as a proxy
cache.
... New versions
will be posted and LOCKSS will update, as per usual proxy
behavior.
Bill: For LOCKSS, are they accessing the library, or are they accessing the publisher?
TimCole: They are accessing what the library is subscribing to.
<astein> I didn't see anyone on the queue
TimCole: People coming through
the library servers would see what was available in the proxy
cache, and LOCKSS would siphon off copies for the
archives.
... CLOCKSS works more directly with the publishers and makes
that copying more routine.
... Some interesting issues come up: when the publisher content
go away, and we move into a pure archive/perservation point,
then LOCKSS servers up the cached copy as long as the ACCEPT
headers match what LOCKSS has cached
... The browser goes to a new version of the content and says
"I only accept HTML5"; LOCKSS only has "HTML4"; at which point,
LOCKSS will try to migrate on the fly to HTML5 or it will
respond with a "406" error
Bill: Will it also say "but we can give you a different format?"
TimCole: not sure
lrosenth: that seems a sensible model. Are we saying that's a model we like, or just saying that this is what it does, these are the facts?
<TimCole> PWP: http://w3c.github.io/dpub-pwp/#arch
TimCole: Right now, we are just
collecting the use cases. We should be considering whether the
facts in the use cases cause concern.
... Right now, the PWP talks about Service Workers, which
provide a local way to serve resources.
lrosenth: there are no
requirements that the PWP must use Service Workers. That said,
it has no bearing on this, because we're talking about a server
operation, where a server is doing the caching, not the
client
... unclear as to what the issue is here; if we publish in PWP
format, then LOCKSS will store that. If we request it as a PWP,
then it will be served as a PWP. If I request in another
format, the server may convert it on the fly.
... the reverse is also true: if I post content in PDF, and
instead of asking for that as PDF I ask it as a PWP, the server
may convert it on the fly.
TimCole: Probably correct. Just wondering whether there is something that the client will do with the PWP that may somehow result in the copy that the LOCKSS server cached not having everything that the client needs
Bill: When we look at the Portico
model, a very contrasting model, it will be easier to see that,
given how PWP is being spec'd, how compatible is it to these
two important dark archiving schemes?
... Is there anything we're doing that creates a problem? Is
there anything we're doing that can optimize what they're
doing?
lrosenth: Right. And for LOCKSS,
the answer is "we don't need to care"
... the packaged versus the unpackaged differentiation becomes
interesting. If I publish the PWP unpackaged, and there is no
provision in the server to provide a packaged version, then
LOCKSS is going to have to do a huge amount of work to archive
the PWP
Bill: That is still better than
where they are now. If they are now going to a website, what
other resources are there that are essential to that content?
e.g., fonts, media, etc.
... PWP is going to make that easier, even in unpackaged form.
the PWP must unambiguously get you those resources.
TimCole: The problem is that LOCKSS right now works on the assumption that whatever its grabbing from a web browser is all it needs. It doesn't know anything about the formats in any great detail. It does not take advantage of a manifest to build a package.
Bill: OK. Then perhaps the PWP would enable LOCKSS to do a better job than it can currently do to archive the publication in a more complete and correct way.
lrosenth: It will still require LOCKSS to do more work, which is fine from our perspective.
TimCole: that's a concern this use case needs to highlight.
<TimCole> http://www.dlib.org/dlib/january05/rosenthal/01rosenthal.html
<TimCole> http://blog.dshr.org/2013/02/rothenberg-still-wrong.html
TimCole: This is a debate that
covers both sides of the LOCKSS model.
... Why it doesn't work, why it does.
... This will be interesting to look at as we discuss the
CLOCKSS model as well.
Bill_Kasdorf: Will also want to talk to Craig Van Dyke, who now works at CLOCKSS
TimCole: so, potential question
about packaged versus unpackaged. Will do more refinement on
the LOCKSS use case, highlighting what we see as issues, and
what we see as non-issues.
... Also planning on a use case based on Portico.
... As part of their normalization process, they would develop
a package from a manifest, so they could maintain both the
content and appearence over time.
lrosenth: Unlike EPUB, PWP does
not require that all things in a package or all items
referenced in a manifest are from the same site or are self
contained. PWP allows for external references.
... Someone taking the material and creating a package may not
have the rights to the externally referenced material.
TimCole: a very good point. How you deal with normalization is an important issue.
Bill: The fundamental strategy with Portico was to normalize so that they may support format migrations, keeping the format up to date and uniform. However, it can't migrate those external resources, it can only link to them to the extent the links are stable (which they may not be)
TimCole: Also raises the question that comes up with other formats: when the fonts are not available for whatever reason, what are the fall backs.
lrosenth: This is why PDF/A exists. We may want to say whether there is a PWP/A. You can do all these things with PWP, but if you want one that is archive-ready, you will need to do these additional things (as requirements)
TimCole: There may be a need for that, even if we don't have to define it fully in the docs we are working with.
lrosenth: We can just come back to the main group and say "we think this is important to do" and leave it at that.
TimCole: For next call, will have a Portico use case.
<scribe> ACTION: Tim to come up with a use case for Portico that illustrates some of the features of portico as a model [recorded in http://www.w3.org/2016/03/03-dpub-arch-minutes.html#action01]
TimCole: if you don't have a git hub account to get the use cases updated, send Tim the details
<astein> ack {HeatherF}
HeatherF: Will not be available for scheduled call on 17th.
Bill: Also will not be available for scheduled call on 17th.
<astein> 24th works for me
TimCole: propose moving the next call to the 24th of March; Tim will propose to the list
Bill: are we looking to interview and report back on the outreach contacts, or bring them in for calls?
TimCole: If the contacts can meet with us at one of our times, invite them in to come in at the half hour (reserving the second half of the call for the discussion). If they can't need to do the interview.
<astein> works for me
TimCole: they are welcome to come
in earlier, of course
... We have talked a few times about PDF/A; would lrosenth be
able to walk us through some of the important lessons from that
process?
lrosenth: yes, can do. Have a presentation that has been given in the past that may be useful and relevant. What PDF/A is and is not.
TimCole: We will put that on the
agenda for the next call
... What else needs to be done before our next call? Perhaps
start thinking about what other write ups we might need?
... What started this group was the need for some text under
the Archival and library services in the PWP white paper. We
can talk in that text about the need for a PWP/A
approach.
... We also have a glossary we may want to expand on. Are there
any other products we want to expand on?
*crickets*
<astein> \o/
scribe: Then with that, we can adjourn! Talk to you in (probably) three weeks