See also: IRC log
<ayla_stein> Agenda: https://lists.w3.org/Archives/Public/public-digipub-ig/2016Apr/0020.html
ayla_stein: objections?
... hearing no objections, approved.
ayla_stein: a use case for LocKSS
ntay: format migration may not be in scope for use case
ayla_stein: could you expand on why not in scope
timC: It seems to me that PWP may
make migration on access
... as I understood LOCKSS format migration is driven by
use
... and when people use a PWP that is unpacked on server, do
they necessarily get archivable package
ntay: format on migration may not
be current
... LOCKSS access system has not needed this yet.
... risk model of LOCKSS is primarily concerned with the
bits.
... don't see PDF, GIF, ASCII as being of concern for
client-side rendering obsolence.
... if GIF could no longer be rendered by mainstream Web
browser
... then LOCKSS might put mechanism in place that when client
requests an image but doesn't accept gif,LOCKSS could migrate
to png
... But LOCKSS doesn't see this as a use case.
... my understanding is that PWP is not a file in the same
sense
... it's a more a framework and manifest
... Could still put together a CLOCKSS / LOCKSS
TimC: Difference between CLOCKSS and LOCKSS
ntay: same underlying technology
TimC: Ntay can you walk us through the acquisition process
ntay: 2 mechanisms by which we
retrieve content
... 1 Web harvest, collect content as it would be presented to
User
... plug-in provide extra intelligence to crawler so it can
parse various units that contain multiple publication (e.g.,
issue)
... helps it figure out what the units it needs to make
archival package(s)
... decision about what to package per publisher largely being
figured out by LOCKSS
TimC: so the manifest of PWP might simplify this.
ntay: yes
TimC: assumes that what is need for archiving is same as what is needed for portability
ntay: 2nd method is more back end
process
... publishers are making content available on the back end for
archiving services
... more typically the source files, e.g., includes pdf but
also XML, etc., but may not have all of the presentation (CSS)
files
... things are neatly organized in a tree.
TimC: How would you phrase some of these as user stories...
tzviya: the back end approach is not necessairly relevant for PWP
ntay: in CLOCKSS model, all
content archived is dark until there is a trigger event
(publisher goes out of business, natural disaster knocks out
servers, etc.)
... if we harvested, the user will see what they are used to
seeing.
... backend acquisition makes it quality of access
experience
... to the extent that we can make use cases generic, not
necessairly tied to archiving service
... there may be specific examples from David's Blog post that
talk about problems in the absence of a manifest
... so this may help us
TimC: use case archival service wants to harvest (spider) a PWD, and expects to find in the manifest what it needs to make sure it gets all the right pieces.
ntay: yes
<ayla_stein> yes!
ntay: another use case is versioning, if one part of a PWP gets updated how is that update handle by archiving service
tzviya: also a revisioning use
case
... e.g., an update for a mis-spelled word in chapter 3
... vs. a new version of chapter 2.
ayla_stein: keeping track of errata and retractions
tzviya: we can start these as
issues it GitHub
... re errata, these might be done as annotation
... we do have to consider what to do with errata, revisions,
versions, etc.
ayla_stein: removal retractions
(publisher just removes the item)
... what does the archive service do?
ntay: would be surprised if retraction resulted in deletion from an archive
tzviya: we give retractions their
own DOI, separate from the original article's DOI
... shows how people DOIs for different purposes
ayla_stein: Medusa digital
repository at Illinois does include digital monographs
... so what does that archive need to facilitate
archiving
... archivist needs more a sense what makes a document valid --
i.e., health check
... sounds like he needs some sort of archivist validation
TimC: what does validation mean? how does validatiy change over tiime?
tzviya: there is an e-pub check system for validating
ayla_stein: it does sound like he
wants some way to read the publication and know how to validate
it
... might not have to be an external tool
tzviya: ePub has a validator, but has not come up yet for PWP
ntay: so what does ePub check do
tzviya: checks HTML, structure, etc.
<tzviya> epubcheck https://github.com/IDPF/epubcheck
ntay: been focused on how PWP
will help verify completeness
... not clear whether you could easily check appearance and/or
browser compatibility
ayla_stein: not clear that responsive design is of concern yet to the Library / Archive space
ntay: Responsive Design is a best practice for Web Archiving
<ntay> https://library.stanford.edu/projects/web-archiving/archivability
TimC: the basic uc a archiving
service wants validate a PWP as being adequate for
archiving.
... Ayla will write something up.
ayla_stein: archivist will be
worried about the range of content that can be included in a
PWP, since these technologies change over time
... my understanding that PWP
TimC: if PWP is wide open about
what it includes
... does that mean that some PWPs may not be archivable?
tzviya: is this the same issue as comes up when we talk about Archiving the Web?
ayla_stein: Leonard's discussion about PDF/A experience may help also
ntay: my understand of PDF/A, the
ability to embed arbitrary content means you end up with binary
blogs
... as archivist we deal with not having control all the
time
... so while some formats easier to archive than other, it
isn't that there's a non-archivable format
TimC: use case for making assessment of risk (from archiving perspective)