Re: [Glossary] Definition of a portable document (and other things...)

Deborah

I'm sorry I didn't note your point earlier in the thread (that's a mea
culpa not a you-a culpa). I don't necessarily think all of us would agree
on that definition of "curated" but since you also are OK with a different
term I think it's a moot point.

Re: my git example, you are absolutely right, under some circumstances a
git repo could be considered a document as well, such as where the repo
consists of a dataset of some kind, but in some ways that would be like a
document consisting of all variations and revisions of every edition of
Huckleberry Finn, including all errata etc., all combined into a unified
whole... it can be imagined, but AFAIK has never existed. So it's a very
special case, whereas the far more common case I was getting at is a single
instantiation of a particular edition. To me that is all a git snapshot is
- a particular instantiation, concretely and uniquely defined by a single
set of SHA checksums. And if that snapshot is of software, and is
considered a "release" it will be tested and verified to work in the whole.
Or if the snapshot is a publication, it will similarly be verified as a
unit. Whereas the git repo consists of all possible versions of all
resources, many of which won't be intended or even able to work together.
I.e. to me it's only a document if all its parts, in specific
instantiations, work together, thus the document itself is a specific
instantiation.

If we said that a portable document had to be verifiable as a set of SHA
checksums I would be happy (packaging into a single PDF or ZIP archive
being a cheat to avoid needing the checksums). In Leonard's case of
external video file, it still has a unique checksum (unless it's generated
on the fly by a webcam in which case the referencing document has a ding
against it's "portability" attribute).  In Leonard's case of font
referenced only by name, if there is no checksum for the specific instance
of the font, then as well it's a ding on portability (one which, especially
for non-Latin scripts, may have dire consequences for intelligibility of
the contents).

--Bill


On Tue, Sep 8, 2015 at 3:12 PM, Deborah Kaplan <
dkaplan@safaribooksonline.com> wrote:

> Olaf Drümmer wrote:
>
> > Nonetheless I would keep curation out of the text for the definitions,
> and condense it into 'intended'. Joseph Beuys (German artist) once put a
> pile of grease somewhere and intended it to be a
> > work of art (not sure how much curation went on while he was doing it,
> at least it didn't turn into cheese). Some cleaning person did not get the
> message and… Anyway: that pile of grease would
> > have to be considered a document, its portability only limited by
> climate/temperature ;-). If Beuys had incidentally dropped a same shaped
> and same sized pile of grease, it would not have been
> > a document.
>
> I am comfortable changing the term; "curated" has a jargon meaning in
> museums, libraries, and archives, and outside of that environment may have
> different connotations.
>
> >Bill McCoy <bmccoy@idpf.org> said:
>
> >    A computer program to me can validly produce anything we consider a
> "Portable Web Document". For example a realization of my monthly bank
> statement will be a document, but it is not curated by a human.
>
> Far up this now lengthy thread (mea culpa!) I discussed how curation by
> computer is very much a form of curation. Humans with intent created the
> tool which generated the monthly bank statement. The bank statement itself
> it simply a serialized view of some cells in your bank's data tables, but
> the choice to create that *specific* view of those cells -- and your choice
> to have your bank generate the PDF or paper, instead of quietly trusting
> Quicken to make some background transactions while it updates its own local
> database -- is what creates a document.
>
> (As Olaf has also said, much more succinctly.)
>
> Bill McCoy <bmccoy@idpf.org> said:
>
> >    If an online calendar is simply a UX over a database then I don't
> consider it a "document" (whether or not the calendar entries have been
> curated). But if the calendar system can produce a PDF representation of
> the calendar, that would be a portable document (but not a "portable web
> document").
> >
> >    Similarly if you search on Google for "influenza" the results on the
> left (the search results) are in no way a "web document" (IMO), the sidebar
> on the right (with navigation via tabs) could be considered a "web
> document" but is not a "portable web document" - and whether it's truly a
> web document could be debated. The PDF that is generated is certainly a
> portable document (but not a portable "web" document, as I understand that
> term). But whether the content of the sidebar was in the first place
> human-curated or machine generated via semantic processing to me is not
> decisive as to whether it should be considered a "web document", and
> certainly not as to whether the PDF should be considered a "portable
> document". In fact I don't know the answer. So thus "document-ness", at
> least to me, has nothing directly to do with human curation.
>
> [and then in a second email]
>
> > Could an entire git repository a document (in the sense we mean for this
> activity)? I don't think so. Could a particular snapshot (e.g. current
> mainline or a named release) of a git repository
>
> From an information science POV, an entire git repository -- or a
> calendar, or a collection of search results, or a search algorithm -- can
> absolutely be documents.  The dependency is not whether they can be turned
> into a PDF or and HTML representation: digital paper, as it were -- just as
> a text with embedded video can be a document, or tablet-based interactive
> picturebooks. The dependency is whether the object as it stands is being
> treated as a document.  Places where this has real digital publication
> ramifications in the academy include:
>
> - In digital theses and dissertations, when a student is required to
> deposit the documents of his doctoral work in an electronic thesis and
> dissertation database as a graduation requirement -- and the documents are
> composed of software products, chemical formulae, or datasets.
>
> - In an archives, when a scholar deposits her life's research, including
> her academic papers, her patented algorithm, several boxes of papers and
> ephemera, petabytes of data, the export of her Microsoft Outlook mailbox,
> and her award-winning website with interactive visualizations of her
> findings. The author writing about that scholar's life work interacts with
> each of these items in the archives, described and catalogued as a
> document, and analyzes each one critically as a complete document.
>
> - In a records management department, in an era where paper or even PDF
> rules and regulations have given way to micro-updates of websites, so the
> recordkeepers must record snapshots of entire web heirarchies as the
> documents recording the institution's history, later to be published in an
> online index for the board of directors.
>
> What makes each of these a "document" is that humans need to understand
> each as a concrete whole. It's not the technology of curation that matters
> -- indeed, in the third example, an automated spider run by the internet
> archive does the trick.  It's the choice to view the parts as a "document"
> -- to view a dynamic website as a procedures manual, to view a running
> computer program as a dissertation.
>
> Suzanne Briet was the French scholar who came up with the lovely,
> evocative antelope example:
>
> "An antelope running wild on the plains of Africa should not be considered
> a document... But if it were to be captured, taken to a zoo and made an
> object of study, it has been made into a document. It has become physical
> evidence being used by those who study it. Indeed, scholarly articles
> written about the antelope are secondary documents, since the antelope
> itself is the primary document."
>
> Deborah
>
>


-- 

Bill McCoy
Executive Director
International Digital Publishing Forum (IDPF)
email: bmccoy@idpf.org
mobile: +1 206 353 0233

Received on Tuesday, 8 September 2015 23:31:21 UTC