W3C


DPUB Archival TF

04 Feb 2016

Agenda

See also: IRC log

Attendees

Present
Heather Flanagan, Tim Cole, Tzviya Siegman, Deborah Kaplan, Bill Kasdorf, Markus Gylling
Regrets
Leonard Rosenthol
Chair
Tim Cole
Scribe
HeatherF

Contents


<scribe> scribenick: HeatherF

<TimCole> Wiki page: https://www.w3.org/dpub/IG/wiki/Task_Forces/archival

<TimCole> Leonard's email: https://lists.w3.org/Archives/Public/public-digipub-ig/2016Feb/0021.html

TimCole: Reviewing task force goals (see wiki page for initial draft)
...: any changes, either in structure or in content?
... Should we keep the potential for expanded scope of material going beyond the PWP?

mgylling: the problem statement/goal is spot on; any output from this TF should feed the use cases more than the PWP directly.
...: so, produce use cases and functional requirements

dkaplan: agree. Would add that in the long run, the product of this TF would be an archival profile for the PWP.
...: right now, however, we're creating functional requirements and use cases.

tzviya: +1. If other cases come up that fall outside of this remit, we can always record them on the wiki to save for later.

Bill_Kasdorf: as a point of clarification, it seems clear that with the existing goals statement, that we are focusing on formal archives.
...: we are not talking about a publisher who wants to archive a version of a publication for future use. Is that correct? Should we explicitly state this?

dkaplan: rather than saying that's out of scope, we should consider it a subset. This is about preservation, not just archiving.
...: we are talking about the formal archivist definition of preservation. We are talking about long-term, persistent ability to access content.

<mgylling> +1

TimCole: Suggests that we need to have some mods to the goals, including that we are going to create use cases, and that we need to confine scope to formal archiving.
...: hopefully we won't have to define "formal archiving" from scratch; want to use someone else's.

<dkaplan3> http://www2.archivists.org/glossary/terms/p/preservation

<scribe> ACTION: dkaplan3 to pull together the formal definition of archiving/preservation [recorded in http://www.w3.org/2016/02/04-dpub-arch-minutes.html#action01]

<scribe> ACTION: TimCole to add the creation of use cases to the goals on the wiki [recorded in http://www.w3.org/2016/02/04-dpub-arch-minutes.html#action02]

TimCole: Next topic - experts we should consult. We have some pointers to documents on the wiki; do we need specific people brought in as well?

tzviya: Our goal is to define use cases. To get a broader set of use cases, it would be useful to either interview or invite others from that community.
...: Deborah has formal archival training. (So does Heather)

:-)

TimCole: have been in touch with people at Portico for ideas about their workflows; they ingest data and normalize it on a regular basis. Want to know how the format of what they get impacts their workflow.
...: To avoid duplication, suggests that we keep track on the wiki re: who we are reaching out to.

Bill_Kasdorf: Portico and Lockss/Clokss are interesting contrasting organizations in this space.
...: Portico normalizes the content, whereas Lockss/Cloks harvests documents and so has a lot of web documents.

dkaplan3: Outreach - yes, we should do that, and not just to organizations, and we should keep a list. Another problem to be aware of, this TF is currently mostly US participants.
...: if we can get someone not anglophone, at least as consulting expert, that would be helpful.

TimCole: What about resources or documents? Anything to add there? Please add if you think of anything.

<Bill_Kasdorf> Also British Library, KB (Nat. Lib. of Netherlands), BNF (Bibliothèque Nationale de France)

<Bill_Kasdorf> Important issue with BL, KB, etc. is that they are they are mandated to archive content, "legal depository"

TimCole: Regarding logistics, this TF has enough work to keep us busy for a while. Should we get on a regular call schedule? What would be a good timeline for this work?

+1

<mgylling> +1

<tzviya> +1

We will aim for twice a month, though perhaps not at this time.

TimCole: Will search in the range of 10am-12pm Eastern, M-Th. This will narrow down the doodle poll.
...: Emails should go to the main dpub list, but email authors should remember to put in [dpub-arch] in the subject for easier sorting.

<dkaplan3> +1
...: What's our timeline? How long should this TF expect to run?

dkaplan3: As long as we keep the scope narrow, we start with what is not already defined (don't reinvent definitions where we don't have to)

<Zakim> tzviya, you wanted to discuss goals and timeline

tzviya: Let's not let this be something that just happens at the meetings; do work between meetings. We can target writing use cases and seeing how much we can do in three months.

mgylling: +1 to tzviya. In terms of timeline, we haven't set a final delivery date to the larger use case effort, but we will soon. Having a note by TPAC this year (end of September) would be a reasonable target.

and music ensues

mgylling: NISO also has work going on in this space; make sure we don't duplicate effort.

no more classical music. sadness.

<scribe> ACTION: TimCole to reach out to Todd Carpenter at NISO re their work in this space [recorded in http://www.w3.org/2016/02/04-dpub-arch-minutes.html#action03]

TimCole: so, three to four month slot. Target end of May.
...: Is there a deadline on the PWP?

tzviya: The IG chairs need to talk about that.

mgylling: if this group comes up with a new paragraph, that will be enough to refresh the PWP regardless of its state.
...: it is a lightweight process to update that when needed.

TimCole: does anyone have comments on Leonards presentation re: PDF/A? Might schedule time on a future call for Leonard to talk about this directly.
...: PDF/A is a recognized standard, but probably not sensible to turn everything into PDF/A

tzviya: An interesting presentation, but don't put the cart before the horse. We are not recommending one particular solution here.

dkaplan: There are very good reasons that PDF/A is not the appropriate recommendation. We are (probably) not headed towards ISO standardization. In generating the PDF/A standard, many contacts were made and use cases developed.
...: to the extent that the archival community participated, we should find that input and use it

TimCole: Do people want to start commenting on what use cases? What libraries have done historically is collect content from publishers at time of publication, so there are use cases of library services telling publishers what they need
...: but often libraries are coming to content well after publication. That's another category of use cases.

dkaplan: Archivists can ingest just about anything. Anything you come to after-the-fact, anything that hasn't been made as an archival document to begin with, is just like anything else (games, etc) that they might have to archive.

TimCole: so should we make clear some of the potential trade-offs about what happens if you don't consider archival requirements up front?
...: Print materials were simpler. Digital material introduces problems of versioning.

Bill_Kasdorf: Would like to see a basic definition that a PWP is natively amenable to archiving, similar to how EPUB is natively amenable to accessibility

dkaplan: Agree with limitations. Accessibility should be the default, and the same thing with preservation. This is, however, a huge limitation.
...: A preservable document can't be preservable unless it is entirely offline with all its essential elements.

Bill_Kasdorf: That is a fundamental principle of PWP.

TimCole: Evn when you can take everything offline, if you try and open it in 5 years, it likely won't look the same as it did at time of publication.

Bill_Kasdorf: What is it that's being preserved? Is it the appearance or the essential content?

dkaplan: That is a question that even in the preservation community must be decided on a document-by-document basis.

Summary of Action Items

[NEW] ACTION: dkaplan3 to pull together the formal definition of archiving/preservation [recorded in http://www.w3.org/2016/02/04-dpub-arch-minutes.html#action01]
[NEW] ACTION: TimCole to add the creation of use cases to the goals on the wiki [recorded in http://www.w3.org/2016/02/04-dpub-arch-minutes.html#action02]
[NEW] ACTION: TimCole to reach out to Todd Carpenter at NISO re their work in this space [recorded in http://www.w3.org/2016/02/04-dpub-arch-minutes.html#action03]
 

Summary of Resolutions

[End of minutes]

Minutes formatted by David Booth's scribe.perl version 1.144 (CVS log)
$Date: 2016/02/08 10:42:57 $