
 
       
	 
	 
	 
	 
	 
	 
	 
	 
	 
	 
	
Considerations in Archiving Documents Represented Using the Extensible Markup Language
Images from www.fromoldbooks.org used by permission. Photographs by Liam Quin.
a sequence of characters represented digitally on a computer such that the sequence of characters satisfies the productions and constraints of the XML specification
The conference programme suggests hundreds of millennia. No matter: the future began in the past.
A document will be said to have been stored for a given period of time if, at the end of that time, the same sequence of characters can be retrieved.
The place where a document is stored is an archive.
Physical location becomes insignificant in the digital cloud...
Once the context of creation is lost, understanding of the artifact is necessarily incomplete.
How an ancient object was used is often a mystery.
So, you need to record the purpose and context.
A funeral oration might be perceived quite differently from a shopping list; a parody differently from a news article. The expected use and implicit shared understanding between document creator and audience in these examples can be lost by the Very Long Time; this tacit knowledge must therefore be documented and made explicit if the archived document is to be interpreted as it was intended.
implicit - e.g. a dictionary or glossary
explicit - if you link terms to the glossary
normative - the specifications that define the way that the document, at some level is to be interpreted.
providing easier access to some documents implies harder access to others: the choice of which documents to archive is (or can be) a political decision every bit as much as decisions about which books to keep on the shelves in a public library.
(Landow)
Consider archiving secondary documents such as research notes.
This increases the burden on Finding Aids.
If you want your documents to spread like weeds you have to make them freely distributable.
Leave clear instructions outside the box...
An unmarked cassette tape containing a novel...
A computer disk stores a sequence of bit patterns, arranged in concentric rings; every file is stored as if it were a sequence of integers.
encoding: integer → character
      font: character → font
There's no general way to determine the version of an “encoded character set” in use for a document. So document it.
E.g. FORTRAN program to plot a circle, vs. an SVG circle
Deducing that a particular Calcomp plotter held the red pen in position five might or might not be trivial.
Remember Hypercard?
indifferent and vindictive justice
those who expose their bodies
Don’t put too much burden on words; say the same thing in multiple ways (e.g. in documentation).
Avoid obscure features
Document the significance of markup items
Validate
Check Links
Provide for Translations
a dream journal might have an introduction that says the author wrote down memories of dreams each day for a year, but the wider context might include that this was part of a theraputic exercise in working out resentment towards alien visitors, and that, after a year, the writer's perception about the visitors was changed.
Creativity is for Content, not Specifications (use existing specs, and archive copies of them)
Don't assume people will understand 16.08.10 as a date.
Names, addresses, telephone numbers, anywhere we have a structured notation, should be marked up and/or described.
Should there be a standard for archiving electronic digital documents?
Do large archiving organizations need help in gathering together relevant specifications?
liam@w3.org, Liam R. E .Quin