slanted W3C logo
Cover page images (keys)

Beyond Eighteen Wheels

Considerations in Archiving Documents Represented Using the Extensible Markup Language

Liam Quin

Images from www.fromoldbooks.org used by permission. Photographs by Liam Quin.

Outline

what is a document?

a sequence of characters represented digitally on a computer such that the sequence of characters satisfies the productions and constraints of the XML specification

A Long Time

a long time

The conference programme suggests hundreds of millennia. No matter: the future began in the past.

A document will be said to have been stored for a given period of time if, at the end of that time, the same sequence of characters can be retrieved.

Archives

The place where a document is stored is an archive.

Physical location becomes insignificant in the digital cloud...

Document Context

Once the context of creation is lost, understanding of the artifact is necessarily incomplete.

How an ancient object was used is often a mystery.

So, you need to record the purpose and context.

Tacit and Explicit Knowledge

A funeral oration might be perceived quite differently from a shopping list; a parody differently from a news article. The expected use and implicit shared understanding between document creator and audience in these examples can be lost by the Very Long Time; this tacit knowledge must therefore be documented and made explicit if the archived document is to be interpreted as it was intended.

Relationship to other documents

implicit - e.g. a dictionary or glossary

explicit - if you link terms to the glossary

normative - the specifications that define the way that the document, at some level is to be interpreted.

Selection as a Political Decision

providing easier access to some documents implies harder access to others: the choice of which documents to archive is (or can be) a political decision every bit as much as decisions about which books to keep on the shelves in a public library.

(Landow)

Consider archiving secondary documents such as research notes.

This increases the burden on Finding Aids.

Where to Store

Licences

If you want your documents to spread like weeds you have to make them freely distributable.

How to store: Physical Substrate

Leave clear instructions outside the box...

An unmarked cassette tape containing a novel...

Logical Layers

A computer disk stores a sequence of bit patterns, arranged in concentric rings; every file is stored as if it were a sequence of integers.

encoding: integer → character
font: character → font

There's no general way to determine the version of an “encoded character set” in use for a document. So document it.

declarative and Procedural

E.g. FORTRAN program to plot a circle, vs. an SVG circle

Deducing that a particular Calcomp plotter held the red pen in position five might or might not be trivial.

Remember Hypercard?

Meanings Change

indifferent and vindictive justice

those who expose their bodies

Don’t put too much burden on words; say the same thing in multiple ways (e.g. in documentation).

Avoid implicit content

Avoid obscure features

Document the significance of markup items

Validate

Check Links

Provide for Translations

Give Context

a dream journal might have an introduction that says the author wrote down memories of dreams each day for a year, but the wider context might include that this was part of a theraputic exercise in working out resentment towards alien visitors, and that, after a year, the writer's perception about the visitors was changed.

Be boring

Creativity is for Content, not Specifications (use existing specs, and archive copies of them)

Structure

Don't assume people will understand 16.08.10 as a date.

Names, addresses, telephone numbers, anywhere we have a structured notation, should be marked up and/or described.

Conclusions

standards?

Should there be a standard for archiving electronic digital documents?

Do large archiving organizations need help in gathering together relevant specifications?

Questions?

liam@w3.org, Liam R. E .Quin