Themes: Digital Libraries

What do we mean by digital libraries?

Libraries are a key component of the information infrastructure which underpins modern life. The provide a essential resource for the public and the specialist for reference and for research.

However, the environment for libraries is changing. The cost of buying and archiving books and journals is increasing, and the amount of space required to provide physical collections based on paper is becoming so great that only a few national collections can accommodate more than a fraction of the total output. At the same time, the use of digital media has enabled the production of material electronically economically and with much smaller space constraints. at the same time the near universal availability of high-speed networks with the WWW has revolutionised the distribution and access of information resources to anywhere in the world. Together these givce an opportunity to reinvent the library as a digital library, a library which is largely defined by its provision of resources using digital media, and providing online access to those resources.

Digital libraries also offer the opportunity for the library to reconsider its function. Traditionally, libaries have collected publications generated by publishers, and catalogued them so that they can be retrieved in a systematic fashion. In a university or research organisation, they have typically supplied subject specialists to assist users to find the relevant resources. In a world of digital libraries, this function can be reconsidered, and two major functions are now possible:Information gathering, and Information Publication

Information gathering.

This is the traditional role of the library; gathering and cataloguing resources for archiving and use by the users of the library.

This traditional role remains, but with a change of emphasis away from the storage and cataloguing of physical resources, to the provision of access to reources (they are not necessarily physically moved) and the addition of electronic, searchable catalogues so that users can electronically locate the relevant resources.

For information gathering in digital libraries, the major issue is how to locate material from amongst the very large amount available either in free or subscription repositories, those which are most relevant to the users in the organisation; that is how to narrow down the search accurately to provide the most relevant and only the most relevant publications.

Information publishing.

The library also has the opportunity of becoming the channel by which the organisation disseminates its results to the world.

In this role, the library takes on one of the traditional roles of the publisher in spreading the word of the output of the organisation. Publishers will certainly likely to continue to play their role, typically as a mark of quality through peer review. But through movements such as the Open Archive Initiative http://www.openarchives.org/, institutes are increasing preserving and disseminating their own publications.

For information publishing via libraries, the main issue is how to provide information on the publications stored in the institutional repository so that they can be accessed by the maximum number of the relevant readership (for example typically for research publications this would be other researchers in the field; for other publications there is also the most relevant target audience. Thus the problem is one of providing accurate catalogue data which can be searched by other users and their agents.

How does the semantic web help?

The key aspect for the Digital Library community is the provision of shared catalogues which can be published and browsed. This requires the use of common metadata to describe the fields of the catalogue, (such as author, title, date, publisher); and common controlled vocabulary to allow subject indentifiers to be assigned to publications.

By publishing controlled vocabularies in one place, which can then be accessed by all users across the Web, then library catalogues can use the same web-accessible vocabularies to catalogue their publications, marking them using the most relevant terms from the most relevant thesauri for the domain of interest. Then search engines can use the same vocabularies to control and refine their search to ensure that the most relevant items of information are returned to the user.

The semantic web offers relevant standards and approaches that can help with these problems. It offers open standards that can enable vendor neutral solutions, it offers a useful flexibility (structured and semi-structured, formal and informal, open extensibility) and it helps to support decentralized solutions where that is appropriate. Thus RDF can be used as a common interchange format for catalogue metadata and shared vocabulary, which can be used by all libraries and search engines across the Web.

Whilst other formats can be used as well, RDF does have some advantages. It is a generic open standard whereas many alternatives are either proprietary or specific to a particular domain. It standardizes the data model (together with a serialization syntax) whereas alternatives such as direct use of XML focus on the document syntax. By breaking down information into small independent units (triples) and using global identifiers for all objects/properties/types (URIs) it becomes possible to integrate information from several sources by simply concatenating the sets of the triples and following the new relations. The data model is sufficiently simple and makes sufficiently few assumptions that it be used to express both structured and semi-structured data making integration across heterogeneous sources more straightforward.

SWADE resources relevant to this problem

Thesaurus formats and demonstrators

The main place to begin for the Digital Library community in the SWAD-Europe project is the SWAD-Europe Thesaurus Activity. Here, we provide a set of standard formats and tools for describing controlled vocabularies and classifications called the Simple Knowledge Organistion System (SKOS). We also provide some sample thesauri which use these formats, and some demonstration software to allow people and programs to browse and select terms from a thesuaus across the web.