Use Case LOCAH
Back to Use Cases & Case Studies page
LOCAH (Map Based Data Visualisation)
- Adrian Stevenson, LOCAH Project Manager, UKOLN, University of Bath, UK. http://www.ukoln.ac.uk.
- Jane Stevenson, Archives Hub Manager, Mimas, University of Manchester, UK. http://www.archiveshub.ac.uk.
Background and Current Practice
The Archives Hub is a national service that provides a wealth of rich inter- disciplinary information about archives held across the UK. The LOCAH Project is investigating the creation of links between the Hub and other data sources including DBPedia, and the BBC, as well as links with OCLC for name authorities and with the Library of Congress for subject headings. The project aims to make new links between diverse content sources, enabling the free and flexible exploration of data and enabling researchers to make new connections between subjects, people, organisations and places to reveal more about our history and society.
Archive data is by its nature incomplete and often sources are hidden and little known. User studies and log analyses indicate that Archives Hub users frequently search laterally through the descriptions; this gives them a way to make serendipitous discoveries. Linked data is a way of vastly expanding the benefits of lateral search, helping users discover contextually related materials. Creating links between archival collections and other sources is crucial as archives relating to the same people, organisations, places and subjects are often widely dispersed. By bringing these together intellectually, new discoveries can be made about the life and work of an individual or the circumstances surrounding important historical events.
(1) A researcher interested in archives and other resources relating to Winston Churchill to quickly see where all the material is held on a map based data visualisation
(2) Linked data allows for the aggregation of the various archive resources, and the enrichment of that data facilitating links to resources available outside the archival domain.
- Academic researchers, lecturers, teachers, postgraduates
- Information professionals supporting research and teaching
- Software developers
Use Case Scenario
A researcher, Ella, is interested in people who have created, or who are significantly referred to in archives about the Second World War, but has been finding it difficult to bring her research materials together, as the archives of relevance are so widely distributed. She is particularly interested in ‘Sir Winston Churchill’ and wants to find archives created by him, or achives that have significant references to him. When Ella browses for 'winston churchill’ on the Archives Hub, she discovers a fair scattering of Churchills as name entries - all apparently the same man, but it is difficult for her to get an overall sense of where these archives are held across the UK. Ella thinks it would be great if she could quickly see where all the materials on or by Churchill are held, to help her plan her research trips. Ella also thinks it would be great if the descriptions could include images, and other information about Churchill that she knows is available on Wikipedia.
The aggregation and merging of the Hub data sources is enabled by the use of linked data. Linked data can also provide enrichment by linking to other data sources such as dbpedia. In the Archives Hub Linked Data, all the instances of Churchill will have conceptualisations (their own URI), and these will link to a URI for Churchill the man. This allows for the pulling of all Churchill descriptions together. It would be possible to provide 3 levels of ‘significance’: (i) archives where the person in question is the ‘creator’ (the origination field), (ii) archives where the person in question is in the index terms (indicating they are a significant subject) and (iii) archives where the person is referred to in the text.
Dates would be a straightforward way to narrow the search, as these are normalised for all descriptions.
Existing Work (optional)
The LOCAH project is currently working to make records from the JISC funded Archives Hub service, and records from the JISC funded Copac service, available as Linked Data. In each case, the aim is to provide persistent URIs for the key entities described in that data, dereferencing to documents describing those entities. The information will be made available as web pages in XHTML containing RDFa and also Linked Data RDF/XML. SPARQL endpoints will be provided to enable the data to be queried. The aims and objectives of the project are described in more detail on the project blog: (http://blogs.ukoln.ac.uk/locah/2010/07/23/locah-project-aims-objectives-and-final-outputs/)
Related Vocabularies (optional)
A range of vocabularies and ontologies will be used to represent data produced by the LOCAH project:
The LOCAH project will provide an ontology to describe ISAD(G) standard finding aids and an ontology to describe MODS records for bibliographic resources. More details on the current status of these developments (as at 13th October 2010) is described on the project blog: http://blogs.ukoln.ac.uk/locah/2010/09/28/model-a-first-cut/ and http://blogs.ukoln.ac.uk/locah/2010/10/07/modelling-copac-data/.
As part of this work we expect to use some or all of the following:
- DC Terms
- Bibliontology (BIBO)
Problems and Limitations (optional)
Archives are described hierarchically, and this presents challenges for the output of Linked Data. In addition, descriptions are a combination of structured data and semi-structured data. There is no UK content standard for archival descriptions, which mitigates against the creation of consistent descriptions.
The ‘extent’ data of the Hub archive(s) is not straightforward to use, as this information is provided in a free text field, and measurements given include number of boxes, items, linear metres, etc. Also, giving an indication of the type of material would only be possible for descriptions that contain this information.
There are challenges around the data content. The Hub has numerous examples of inconsistencies, such as where the ‘creator’ is ‘Joe Bloggs and others’ rather than just a name, or where the access points do not have rules or a source associated with them.
Within EAD there are access points, or index terms, associated with the description. These are most commonly subject, name and place. It is only possible to refer to a very general ‘associated with’ relationship between the archive and the index terms provided for it (you cannot tell what the role of a person is if they are indexed – they may be an author, or referred to in some of the archives in some way).
Some additional general problems of providing linked data are described on our ‘Creating Linked Data: more reflections from the coal face’ project blog at http://blogs.ukoln.ac.uk/locah/2010/09/22/creating-linked-data-more-reflections-from-the-coal-face/
- A lack of examples within our particular domain (archives)
- A lack of helpful information about how to create a data model
- What sort of relationships between ‘things’ to include in our modeling
- A certain level of expertise is important, to model data and output RDF
More information on the LOCAH project is available at http://blogs.ukoln.ac.uk/locah. The linked data we intend to provide is not yet published (as of 13th October 2010), but should be available soon. The project team is already testing a development instance of a SPARQL endpoint for the Archives Hub data.