Task Forces/archival

From Digital Publishing Interest Group
Jump to: navigation, search

Task force on PWP Archival

  • Leader(s):
    • Tim Cole, University of Illinois
    • Ayla Stein, University of Illinois

Members (Please add your name, organization)

  • Deborah Kaplan, Safari Books
  • Leonard Rosenthol, Adobe
  • Tzviya Siegman, Wiley
  • Heather Flanagan, RFC Editor
  • Ayla Stein, University of Illinois
  • Tim Cole, University of Illinois
  • Bill Kasdorf, Apex
  • Markus Gylling, IDPF & DAISY
  • Nicholas Taylor, Stanford University

Meeting Agendas and Minutes

The Task Force will meet at 1 PM Eastern (US) on first and third Thursday of February, March, April and May. Agendas (and all other TF discussions) should be posted to the IG email list; subject line should be prefixed '[dpub-arch]'.

See separate WebEx page for information about joining TF calls.

Goals

The Portable Web Publications for the Open Web Platform (PWP) document includes a limited description of how the PWP needs to support archiving. This task force will review that text, reach out to the library and archive community for more detailed requirements and use cases, and offer feedback to the Digital Publishing Interest Group.

Information of interest will include finding out whether archival institutions have run into technical issues (e.g., missing information) when archiving large number of EPUB documents. What information, either in metadata or in the document format, should be provided to making archiving easier? What information is missing and what is necessary for a PWP to make the job of archivists easier? What prototypical archiving service use cases drive these requirements?

Some of the answers will point to areas entirely outside the scope of the PWP; capturing that information will be useful as well.

Scope

The TF will focus on formal archiving service and content preservation workflow use cases and requirements that potentially impact and/or inform practices for publishing on the Open Web Platform. This encompasses archiving done at the time of original publication as well as archiving done later in a publication's life cycle (e.g., by harvesting content available on the Web), including:

  • Archiving of their own content done by publishers;
  • Archiving done by trusted 3rd parties working in collaboration with publishers, e.g., Portico, LOCKSS / CLOCKSS;
  • Archiving done by national libraries and other entities serving as depositories of record;
  • Archiving done by general Web archives, e.g., the Internet Archives.

Timeline

The Task Force will strive to complete its work by the end of May 2016.

Outcomes

  1. Revised text and additional references (links) for the PWP white paper.
  2. Archiving use cases illustrating dependencies and requirements relevant to the work of the IG.

Points to Consider

The Task Force will take advantage of existing standards, documented best practices, relevant use cases, community terminology, etc.

Resources

Outreach

TF Members, to avoid duplication of effort, please add here organizations / initiatives you are contacting to learn more about archiving services and requirements.

Glossary

  • Archival Quality: "adj. ~ 1. Records media · Resistant to deterioration or loss of quality, allowing for a long life expectancy when kept in controlled conditions. - 2. Records storage conditions · Not causing harm or reduced life expectancy. Notes: ANSI/AIIM deprecates the use of 'archival' because it is a highly subjective term. Rather, they suggest using measures of 'life expectancy', which are based on empirical tests. While no materials meet the ideal definition of 'archival', many archivists use the term informally to refer to media that can preserve information, when properly stored, for more than a century." [1]
  • Archivist: "n. ~ 1. An individual responsible for appraising, acquiring, arranging, describing, preserving, and providing access to records of enduring value, according to the principles of provenance, original order, and collective control to protect the materials' authenticity and context. - 2. An individual with responsibility for management and oversight of an archival repository or of records of enduring value." [2]
  • Digital preservation: "Digital preservation is the active management of digital content over time to ensure ongoing access." [3]
  • Immutability: "The quality of being unchanging. Notes: The content of a document is fixed in that it is stable and resists change, but it may not be immutable. Words may be erased or added. 'Immutability' connotes a significantly greater resistance to change, such that any change is clearly evident. In information technology, immutability is accomplished by creating a process to demonstrate that the record has not been altered. [4]
  • Life expectancy:
    • "n. ~ The length of time that an item is expected to remain intact and useful" [5]
    • "Life Expectancy (LE) is a term that describes the stability of imaging materials. The standard has always been "archival." But when computer folks say archival, they are talking about something that is usable in 2 months. When librarians say archival, they mean forever. Life expectancy is a new term that accommodates both ends of this continuum. The definition of Life Expectancy is the length of time that information is predicted to be [stable]" [6]
  • Persistent object preservation: "A technique to ensure electronic records remain accessible by making them self-describing in a way that is independent of specific hardware and software." [7]

Background