Library terminology informally explained

From Semantic Web Standards
Revision as of 21:09, 16 November 2011 by Jhostage (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Some Library Terminology, Informally Explained

This is a public wiki. If you wish to add to this (including adding terms that you would like to have someone define) you need to obtain a W3C wiki account, available to anyone. See link along left hand side under "account request."

MulDiCat: Multilingual dictionary of cataloguing

The Multilingual dictionary of cataloguing terms and concepts contains definitions for many terms and concepts used by the library cataloguing community. Definitions are taken from authoritative sources. Terms and definitions are available in English and a variety of other languages.

100, 245, etc.

When librarians speak in three-digit numbers, they are using the names of MARC fields for the data in library records. When a group of fields is referred to, an "X" is used to mean "any digit." So, "6XX" refers to any field in the range 600-699. MARC uses fields from 001 to 899, with the 9XX range reserved for local use.

Some common fields are:

* 100  the author field
* 245  the title field
* 260  the publisher, place of publication, and date
* 300  the pagination, size, etc.
* 6XX  the subject fields, of which 650 is for topical subjects and is the most common
* 7XX  called "added entries" these are all of the additional authors, titles, and other information that is not part of the main entry.
* 856  the field that carries a URL for the online version of the resource, or closely related information such as tables of contents that are online.

access point

Based on the card catalog, an access point was any element of the record that resulted in a card being added to the catalog for access. Access points were headings that were filed alphabetically in the catalog. The access point concept was carried over in some computerized catalog software. In these catalogs, a user enters a left-anchored string and is returned a screen of alphabetically sorted catalog entries that appear before and after that string. The term "access point" is sometimes used to refer to any part of the bibliographic record that is searchable, in particular when speaking of fielded searches in OPACs.

added entry

Any heading that is not included in the main entry. On the traditional library card, added entries were found at the very bottom of the card and represented where additional cards were filed in the card catalog. Added entries are access points in the catalog.

analytic entry

This is what libraries call a bibliographic record for an article in a journal or magazine or newspaper, or for a chapter in a book. In general, libraries catalog only the "whole": the book or the journal. When they do catalog any parts of those wholes, it is called "analyzing," thus an analytic entry into the catalog. Libraries produce few of these analytic entries; the cataloging of journal articles is done by indexing companies and sold to libraries as services (remember the Reader's Guide in your local library?).

Anglo-American Cataloguing Rules (AACR, AACR2)

Library records are created using a very detailed set of rules that determine exactly what data is included and how it is presented. The current rules in the US, UK and Canada are the Anglo-American Cataloguing Rules, Second Edition, last revised 2005 (ISBN 978-0-8389-3555-2 (loose-leaf)). Work is underway to create the successor to those rules, called Resource Description and Access (RDA).

authority control

See [1] for now. [improve me!]

call number

The call number is what you see on the spine of the book that tells you where the book can be found on the library shelves. The term "call number" dates from times when libraries had closed stacks and users had to request (or "call for") the book using that number. The call number identifies the book. In most libraries today the call number comes from a classification system and represents the main subject of the book. It is also a unique identifier for that physical volume in that library, although this role of identifier has been partially replaced by the barcodes libraries place on books and that are used by the circulation systems.

carrier

The medium of storage of the knowledge resource, as opposed to content, which is the intellectual content of the resource. See: RDA/ONIX framework for resource categorization, version 1.0, August 2006.

class number

Throughout the 19th century libraries experimented with ordering their books by topic using classification systems. Books are assigned a "class number" (although some systems use letters or combinations of letters and numbers). Using class numbers, books can be placed on the shelves in classification order, thus creating a collection that can be browsed by subject. At the same time, the class number allows the book to be located on the shelf. Two often-used classification systems are the Dewey Decimal Classification and the Library of Congress Classification.

content

The intellectual content of the resource, as opposed to carrier, which is the medium of storage for the content. See: RDA/ONIX framework for resource categorization, version 1.0, August 2006.

continuing resource

Resources that are published not all at once but over time are generally known as journals or "serials." Current library terminology for these is "continuing resource." This includes resources that are published over time make changes in place, such as Web sites or looseleaf publications that get update pages. It also includes yearbooks and other content that is published on a regular basis. Thus, a yearly almanac or a reference book like Physician's Desk Reference may be treated as a continuing resource by a library.

corporate author

In some cases, a corporation is considered an author. This is the case for certain government documents and for reports issued by organizations and corporations. In the case of conference proceedings, the conference itself is listed as the author in library bibliographic records.

Contextual Query Language (CQL)

See [2] for now [improve me]

Dewey Decimal Classification (DDC)

Developed in the late 1800's the DDC is a library classification system that uses numbers in the range 000-999, with each number having the capability to be divided hierarchically using decimal places. The DDC is heavily used in public libraries in the US and in some college and university libraries. The DDC is now published by OCLC.

expression (FRBR)

"Expression" is one of the four entities of the hierarchical model of a bibliographic resource in FRBR. It is defined as "the intellectual or artistic realization of a work in the form of alpha-numeric, musical, or choreographic notation, sound, image, object, movement, etc., or any combination of such forms." Expression follows directly under "work", and, in the case of texts, is the work as it is expressed in some language. The library community is struggling with the abstract concepts of work and expression and the actual dividing line between them, and how they can be defined in bibliographic practice, is not clear.

filing/nonfiling

Library documentation may refer to "filing" or "nonfiling" parts of a record. The term "filing" comes from the card catalog days when entries were filed in the catalog. In the computing environment one would use the term "sorting" instead of "filing." Some fields in the MARC record, notably titles, allow the cataloger to indicate up to 9 characters of the beginning of the string that are to be ignored in filing. Thus, a title that begins with "The" has a non-filing value of 4 (t, h, e, plus the space). The value is generally stored in an indicator position in the field.

The filing rules in the card catalog were not just a matter of alphanumeric order, and filing in the correct order required human judgment. For example, numbers were filed as if they were spelled out. This means that the book '1984' would be filed as the words "nineteen hundred eightyfour." To make things even more confusing, numbers were filed using the words of the language of the text. An Italian translation of '1984' would be filed as "mille novecento ottanta quattro," and thus quite far away from the English version. With the advent of online catalogs, libraries came to accept straight alphanumeric order as the sort order for headings.

Functional Requirements for Bibliographic Records (FRBR)

FRBR is a conceptual model of bibliographic entries using an entity/relationship model. FRBR was first issued in 1998 as a product of the IFLA Study Group on the Functional Requirements for Bibliographic Records. It is the first such model that has been developed in the library cataloging community. FRBR is limited to the data in bibliographic description. There are related functional requirements for authority data (FRAD).

general material designation (GMD)

The "GMD" is a general statement about the physical type of resource and appears in square brackets after the title, e.g.:

Title: 12 Angry Men [videorecording]

It is defined by the International Standard Bibliographic Description rules and the Anglo-American Cataloguing Rules.

heading

Another term from card catalog days, "heading" refers to any part of the bibliographic record that would result in a separate card entry in the catalog. Headings were written or typed at the top of each card, and represented titles, authors, subjects, and series. Most items held by the library had 6 to 8 cards in the card catalog, one for each heading. Today's bibliographic records continue to have fields for these same data elements, and often they are still referred to as "headings."

Note that this library practice, which has many benefits in terms of authority data management and usage, brings an extra level of indirection in the data, as discussed in the surrogate item.

holdings

Libraries refer to the items they own as "holdings". When you look in a library catalog to see what books or DVDs a library has, you are looking at the catalog of the library's holdings.

integrated library system (ILS)

ILS refers to library systems in which all components make use of a single bibliographic database. Components of an ILS include the online catalog, acquisitions and fund accounting, serials control and checkin, circulation (lending), and other library management functions.

interlibrary loan

Libraries have a complex system of partnerships through which they will lend items from one library to another for patron use. Depending on the country, this may be coordinated regionally or nationally. In this way, the holdings of all of the area's libraries are available to everyone in that region or country, even those in small locales with limited library service.

International Federation of Library Associations and Institutions (IFLA)

As its name states, IFLA is a worldwide organization that includes the library associations of all countries as well as many libraries. IFLA coordinates many library activities, supports the development of libraries globally, and is the standards body for global library standards like ISBD and FRBR.

International Standard Bibliographic Description (ISBD) and ISBD punctuation

ISBD is a standard that was developed by IFLA to provide uniformity in the cataloging work of the world's national libraries. ISBD covers all published materials that may be held in libraries, such as multimedia resources, maps, and computer data. In the countries that use AACR, one usually hears ISBD referred to as a punctuation standard, which it also is. ISBD punctuation can be thought of as an early text markup standard that places particular punctuation in the printed library record to indicate fields. ISBD punctuation is intended to help the reader of a library record display, but also should allow for machine analysis of the library data when it is in textual, rather than fielded, form.

International Standard Bibliographic Number (ISBN)

The ISBN is a publisher product number that has been used in the book supply chain since 1968. Each published book that is a separate product gets its own ISBN. This means that a hardback version and a paperback version of the same book will have different ISBNs because they are different products with different qualities like size, weight, and price. Library records contain the ISBN where available, but many books in libraries were published before the ISBN became a standard. Although it may seem that each library record should have only one ISBN, library records will often carry the ISBN for both the hardback and the paperback editions so that libraries do not have to add a separate record into their database for each of them. Also, some multivolume works have ISBNs on each volume, and a single library record may represent all of the volumes with all of the ISBNs.

item

In library terms, an "item" is an actual physical volume. That said, "itemness" becomes unclear when, for example, a group of journal issues that each have a separate item barcode are bound together into a volume for shelving. The item concept is important because it is the level at which libraries do inventory, report counts of the library's holdings and yearly increases in holdings, and do lending. "Item" is the lowest and most concrete level of the FRBR bibliographic resource description.

library

Library in this report refers to a collection of information resources curated for a designated community and providing services around those resources. In this definition, libraries may be public or private, large or small, and are not limited to any particular types of resources. While discovery and delivery of resources are important services, preservation of resources is a key library activity that is not within the mission of non-library institutions, and therefore should be given particular attention

Library of Congress Classification (LCC)

LCC is a classification system for libraries. It uses a combination of letters and numbers, and divides the library topically into 21 main classes. Unlike Dewey, the classification number is not hierarchical in nature, meaning that a number like HV21 is not necessarily subordinate to HV2, and that HV21 and HV24 may be unrelated from a taxonomic point of view. LCC is used and maintained at the Library of Congress, and is in use in many university and large public libraries in the United States and elsewhere.

Library of Congress Control Number (LCCN)

The LCCN represents the catalog record created by the Library of Congress. It began as the LC Card Number in 1906, when the Library of Congress printed and sold cards to libraries that they could use in their own catalogs. The LCCN identifies the metadata for the resource, not the resource itself.

main entry

Dating from book and card catalog practices, each bibliographic entry is represented by a single author and one title at the head of the entry, although other authors and titles may be present on the record in a secondary position. The main entry creates a uniform display for library catalog entries, and is considered by some to serve as an identifier for the resource.

manifestation

A term from FRBR, the "manifestation" is the actual produced resource, such as a book, a music CD or a film on DVD. In modern publishing, manifestations are often mass-produced, and data that refers to the manifestation is valid for all of the items from that printed product (e.g. they all have the same title, publisher name, pagination). For books, the manifestation is identified by the ISBN.

MARC

MARC is a general designation for the record format used in many Western libraries and developed as an ANSI standard in the late 1960s. The term "MARC" derives from "MAchine-Readable Cataloging" and can be used to refer to the record format defined in the standard, the library instance of that record format that makes particular choices, and the content standard for creating bibliographic records in that format. The current definition of the record format standard is ISO 2709. The library instance of the MARC standard as well as the content standard are maintained by the Library of Congress. The current version of the library standard is called MARC 21. There are many variations on the standard, the most common of which is UNIMARC.

OCLC

OCLC Online Computer Library Center, Inc. is a nonprofit, membership, computer library service and research organization; it was incorporated in 1967 as the not-for-profit Ohio College Library Center. OCLC is the primary metadata service provider in the US, but more than 27,000 libraries in 86 countries and territories use OCLC services. OCLC maintains the largest database (WorldCat) of bibliographic records in the world, as well as the information (holdings) on which libraries own the items. Libraries subscribe to OCLC services for bibliographic records, for the management of interlibrary loan requests, and, more recently, as the user interface to the collections of some libaries.

OCLC number

Each record in WorldCat is given an OCLC number. This number generally represents a published item that will be found in many libraries. In theory, each published book (or music CD, or DVD, or other library resource) will have one record in WorldCat with an OCLC number. When a library uses OCLC for their cataloging, the record in the libraries' database usually retains the OCLC number for the record that was downloaded from OCLC, which can be used as a globally unique identifier for the record and can be used to link to other records for the same bibliographic resource.

online public access catalog (OPAC)

OPAC is the term for the computerized catalog interface used by the library public. It essentially is the replacement for the card catalog.

pagination

In library records, pagination consists of the paging pattern (Roman numerals, Arabic numerals, etc.) and a record of the highest number used for each pattern: xii, 356 p. While this gives an indication of the total number of pages it is not a precise measure of the total number because it does not include an unnumbered front matter or blank pages. Pagination can be recorded in leaves rather than pages. (A leaf is a physical page with two sides, whereas with pagination usually both sides receive a page number.) When a work is in multiple volumes, the library data usually just records the number of volumes (e.g. 3 v.) and not the number of pages or pagination of each volume.

references

References are a pointer from one entry in a catalog to another. There are two types of references: see and see also. A see reference points from an entry point (a heading or term) that is not authorized for use (essentially an "altLabel") to the authorized form (e.g. "prefLabel") that has the same meaning. (Example: "Stray dogs see Feral dogs".) See also references are a link between two headings or terms that are authorized, and that the cataloger has determined may be of interest to users looking up one or the other of the terms. (Example: "Clemens, Samuel Langhorne see also Twain, Mark".) These two were essentially the only relationships between resources that existed in the card catalog.

Resource Description and Access (RDA)

The successor cataloging rules to AACR2, RDA was made available in 2010 as a subscription-based online product, RDA Toolkit. RDA is based at least in part on FRBR concepts.

Search/Retrieve via URL (SRU)

See [3] for now [improve me]

serial

Items that are published over time in discrete parts are called "serials." Common serials are magazines, newspapers, and journals. Other publications that are considered serials by libraries include reference books published yearly (e.g. Physician's Desk Reference, Farmer's Almanac). Recently libraries began using "continuing resource" as a blanket term to describe these items and other resources published on a continuing basis.

statement of responsibility

The "statement of responsibility" is a string of characters that follows the title in the library catalog record, usually preceded by a slash ("/ "). In the case of books and other printed materials, the content of the statement of responsibility is taken directly from the title page of the resource, and can read something like: "by John Smith with illustrations by Maggie Jones." Its role is to show the user how the resource described itself on the title page.

surrogate

A library catalog is a surrogate for the actual collection. It is made up of brief representations of items in the library collection. A library catalog entry is a surrogate for the item, with key information that describes the item such as author, title, publication information and physical characteristics. The catalog also places items in a topical representation of knowledge using subject headings and classification numbers. These topical entities add another layer of indirection: libraries build artificial knowledge organization systems for the purposes of description, which are considered to be first-order objects by themselves. In Semantic Web terms [1], http://libris.kb.se/resource/bib/9800324 identifies "real-world object" book: its dc:creator value refers to the author of the book, not the Swedish Library which curates the bibliographic data. But http://viaf.org/viaf/95152561, an authority item, does not identify a real person directly. It identifies a name authority cluster created by VIAF, which eventually leads to a resource standing for the person itself (http://viaf.org/viaf/24604287/#foaf:Person). See [2] for a short discussion on this.

[1] http://www.w3.org/TR/cooluris/#semweb

[2] http://www.w3.org/TR/2005/WD-swbp-skos-core-guide-20051102/#secmodellingrdf

title page

It has long been the custom in printing to include a page at the front of the book that contains the title, the author(s), the publisher, and often the date of publication. This page may also list other key contributors like translators or illustrators. The back side of this page (the "verso") will usually have details such as copyright information, the ISBN, and for US publications it may have the Library of Congress "cataloging in publication (CIP)." Certain information in the library bibliographic record is transcribed faithfully from the title page, such that the record functions as a surrogate for that page. In particular, these elements are taken directly from the title page: title, statement of responsibility, place of publication, publisher name, date.

tracings

"Tracings" is terminology from the card catalog. Before the time of printed cards (in which each card contains all of the bibliographical information) there was a primary card that had along the bottom a list of all of the headings that would be entered into the catalog for that bibliographic item. These included added authors, series entries, and subject entries. This card served as the control card for the item; if the item were withdrawn from the library, this card would list all of the cards that would need to be removed from the catalog, so that the librarian could "trace" them through the catalog. In electronic catalogs, the term can be used to refer to the set of added entries in the bibliographic record.

work

One of the least clear terms in the bibliographic universe, "work" is casually used to refer to an individual book or resource. In the area of cultural commentary, "work" may be used to refer to all of the intellectual output of an individual (e.g. "Thomas Mann's work spans three decades ..." However, "work" has taken on a formal meaning when one is speaking in the context of FRBR, and it refers to a single creation at its most general level. Thus, if one refers to the work Der Zauberberg, this includes all versions, printings, and translations of the creation by Mann. A work is inherently abstract in the FRBR definition.

WorldCat

WorldCat is a union catalog (a combined library catalog describing the collections of a number of libraries) which itemizes the collections of OCLC's global cooperative members. It is built and maintained collectively by the participating libraries.

Definitions

The library community has developed many standards with which (meta)data (records with information about entities such as books) can be represented. An issue in charting the available standards in this report is that different terms are being used to describe those standards, in both the library and Linked Data communities. For example, Dublin Core is variously referred to as a metadata element set, a schema, a (RDF) vocabulary. For clarity and ease of reading we consistently use the following three terms to describe the types of standards. The intention is not to provide an airtight classification, but only to provide intuitive terms which help readability.

Note that many standards, such as the Library of Congress Subject Headings, could be seen as falling under several of the categories below depending on the context in which they are used. In this report, we assign standards to categories based on their "typical" usage.


Metadata element set or element set

Similar terms: RDF vocabulary, (RDF) schema, ontology.

A metadata element set defines classes of entities and attributes (elements) of entities. Usually a metadata element set does not define bibliographic entities, rather it provides elements to be used by others to describe such entities.

Examples: Dublin Core defines elements such as Creator and Date (but DC does not define bibliographic records that use those elements). FRBR defines entities such as Work and Manifestation and elements that link and describe them. MARC21 defines elements (fields) to describe bibliographic records and authorities.

Confusions:

  • MARC21 also defines code lists e.g. for countries and organizations. These are used as values in records, so these code lists should be classified as value vocabularies.

Value vocabulary

Similar terms: thesaurus, code list, classification scheme, subject headings, taxonomy, controlled vocabulary, authority file, digital gazetteer, concept scheme, knowledge organisation system.

A value vocabulary defines concepts (topics, art styles, authors) that are used as values of elements in metadata records. Typically a value vocabulary does not define bibliographic resources such as books but concepts related to bibliographic resources (persons, languages, countries, etc.). They are "building blocks" with which metadata records can be built. Many libraries require specific value vocabularies as mandatory for selecting values for a particular metadata element (for example, values for Dublin Core Creator should be selected from VIAF). A value vocabulary thus represents a "closed list" of allowed values for an element.

In actual metadata records, the values used can be literals, codes, or identifiers (including URIs), as long as these refer to a specific concept in a value vocabulary.

Examples: LCSH defines topics of books, Art and Architecture Thesaurus defines a.o. art styles, VIAF defines authorities, GeoNames defines geographical locations (e.g. cities).

Confusions

  • in most cases a value vocabulary reuses metadata elements from standards such DC, ISO 2788 or SKOS. In some cases a value vocabulary also defines new metadata elements. For example, GeoNames defines elements for coordinates, names and postal codes of places. Similarly, VIAF defines elements to describe authorities (corporations, people). One could then say that GeoNames and VIAF are metadata element sets. To prevent this confusion we refer to the whole of GeoNames as value vocabulary, and to there element sets as the GeoNames element set.
  • We classify VIAF and GeoNames as value vocabularies instead of datasets because they are used (or are meant to be used) extensively as value vocabularies in creating records in other datasets, while their metadata elements are not widely reused (as are DC elements). We acknowledge that this distinction is dependent on the role that the dataset/vocabulary plays instead of its inherent characteristics. Our viewpoint is indeed debatable, but sufficient for the purposes of our report.

Dataset

Similar terms: collection, metadata record set.

A dataset is a collection of structured metadata -- descriptions of things, such as books in a library. Library records consist of statements about things, where each statement consists of an element ("attribute" or "relationship") of the entity, and a "value" for that element. The elements that are used are usually selected from a set of standard elements, such as Dublin Core. The values for the elements are either taken from value vocabularies such as LCSH, or are free text values.

Note that in the Linked Data context, Datasets do not necessarily consist of clearly identifiable "records" (see entry on Records).

Example: a record could have a Subject element drawn from Dublin Core, and a value for Subject drawn from LCSH.

Confusions:

  • As "sets of structured metadata", Element Sets and Vocabularies could be seen as datasets. This report makes a pragmatic, usage-based distinction between sets of structured metadata specifically about Elements or Concepts and sets of structured metadata about all other sorts of things in the world (here called Datasets).
  • in some cases, individual records in a dataset are themselves used as values in records from another, separate dataset. For example, Jacques Derrida wrote a book that offers commentary on a book by Martin Heidegger. Derrida's book is owned by the Bibliotheque Nationale de France (BnF) and is, therefore, recorded in the BnF dataset; Heidegger's book is owned by the Deutsche Nationalbibliothek (DNB) and similarly recorded in the DNB dataset. BnF staff can indicate, in the book's record in the BnF dataset, that Derrida's book offers commentary on Heidegger's book in the DNB dataset. The statement in the Derrida record might include a Dublin Core Subject property with, as its value, a reference to the Heidegger record in the DNB dataset. In this case, we would still consider the BnF and DNB datasets as datasets, not a value vocabularies.
  • VIAF, GeoNames and other value vocabularies can be seen as defining records (using metadata elements and values), so one might consider them to be datasets. We explain below why we do not classify them as datasets.