Use Case Collecting material related to courses at The Open University

Owen Stephens LUCERO Project Manager Library and Learning Resources Centre The Open University Walton Hall Milton Keynes, MK7 6AA

T: +44 (0) 1908 858701 F: +44 (0) 1908 653571 E: o..stephens@open.ac.uk

Background and Current Practice

Currently a student wishing to discover all of the material - books, DVDs, CDs, TV programmes, Podcasts, Open Educational Resources, etc. - related to a specific Open University course (aka module), would have to consult a different data sources, with a different system and interface, for each type of resource required, explore their results and integrate them manually. In a similar scenario, the same resources are needed by lecturers in creating new courses or tutorials, as well as by researchers in connecting the result of their research to existing resources.


(1) SEARCH/BROWSE: In this scenario a student finds all the material directly related to their course/module as well as supplementary material based on direct links, cross-repositories such as subject classifications, as well as based on indirect (and possibly external) links such as “the people involved in the creation of the resources”

(2) URIS, RELATE (new, existing), REUSE-SCHEMAS (SKOS): This is realized by publishing relevant data sets as linked, where standardised URI schemes are used to connect Open University resources across repositories, as well as through external links, e.g. to datasets at the BBC, UK government, DBPedia and other publication/library resources. Common resources include subject classifications which are used to classify courses/modules, publications, A/V material and others, which are expressed in SKOS and mapped to each other. We expect resources such as people, courses, topics and publications to play a central role in connecting Open University datasets to each other, and to external resources.

Target Audience

Open University current and potential students

Use Case Scenario

Fiona is a student who has previously studied module "The arts past and present (AA100)". She is interested in taking a Level 2 course as a follow up and is considering "Art and its histories (A216)". Having read the basic description available online, she decides to find out a bit more about the course by looking at the related course material. With a single click from the course description she is able to view a page that lists the 31 related pieces of course material, and is able to see that the course materials include a rich collection of printed and audio-visual material. For some pieces of audio-visual material she also finds there are links to transcripts of the material, enabling her to understand the type of content further.

A link to her local library service enables her to find that one of the books 'Gender and Art' is available via her local public library service in Birmingham. She borrows and reads the book in advance of signing up to the course.

While browsing the page of course material, Fiona finds that one part of the course material "Textiles in Ghana" is also being made available openly available on the Open Learn platform (http://openlearn.open.ac.uk/course/view.php?id=3427), and works through this material to get a flavour of the course.

Further to this, she sees additional supplementary material listed including a series of video podcasts also on the Textiles of Ghana, which she watches and which enhances her study of the Open Learn material. Having invested a considerable amount of research into the course, Fiona decides that she is happy to commit both the time and money to studying the course.

Application of linked data for the given use case

The page that Fiona, the student, views with related course material and supplementary material is backed by RDF data provided by The Open University’s open linked data endpoint.

One SPARQL query finds books/dvds/cds/etc. material that is catalogued in the library. Another query finds relevant material in the Open Learn platform of freely available material. Another query uses a subject classification linked to the course (the classification scheme is expressed as SKOS and stored in the triple store) to find all related podcasts available from http://podcasts.open.ac.uk.

All of the data sets available have been transformed into RDF and populated into the triple store by the LUCERO project (see "Existing Work" below) - the exact mechanism for this depends on the data set. For the library catalogue material this is based on the transformation of a MARC export which is filtered to only include records related to Open University courses (where we can be highly confident the records were created locally)

The results of the SPARQL query are brought together and displayed in a web page by an application, and for each relevant item there is a link back to the full record/item on the originating system, enabling Fiona to link to the material on Open Learn, the podcasts, the full library catalogue record etc.

Existing Work

The LUCERO (Linking University Content for Education and Research Online, see http://lucero.open.ac.uk) Project at the Open University (also funded by JISC) is investigating and prototyping the use of linked data technologies and approaches to linking and exposing data for students and researchers.

LUCERO is working on exposing a number of Open University data sets as Linked Data, including: Course Information Research Publications recorded in the Open University Research Online (ORO) repository People information (specifically OU staff) Podcasts Course material metadata - this is descriptive information from the Open University library catalogue covering books and other media (e.g. DVDs, CDs) but excluding online material A number of Open University research and teaching material resources.

Through this activity LUCERO aims in particular to answer the following questions:

"What are the workflows, business processes, policies and technologies needed to expose the Open University and related digital content as linked data?"

"How can we integrate linked data technology in a sustainable way to support the research and educational activities of a Further or Higher Education organisation?"

The usecase described here is one possible use case which will be enabled by the project as an example. It is believed that the publication of a substantial number of data sets as Linked Data will enable many other scenarios, including ones not specifically involving Library Linked Data.

Related Vocabularies

A range of ontologies will be used to represent data in the LUCERO project (see 'Existing Work' above). In the use case described it is likely the following vocabularies would be used:

  • DC Terms
  • Bibliontology (BIBO)
  • FOAF
  • W3C media ontology
  • SKOS
  • An ontology to describe courses/modules currently being created by the project

Problems and Limitations

There are no specific technical challenges anticipated that would prevent this use case being realised.

However, there are specific challenges to transferring library data to RDF which the project has encountered. Unless specifically flagged, the challenges below are seen as applying in general to library data recorded in MARC21 using AACR2 cataloguing practices, and are not limited to this specific use case or related to any local practices at the Open University:

  • There is not yet a standard way of representing bibliographic data (or related library data) as RDF (either agreed via standards bodies, or simply through standard practices)
  • Existing library records represent specific modeling of data, one that tends to emphasise the 'carrier' over the 'content'.
  • For some items details content information is in free text fields within the library MARC record (e.g. MARC 505 table of contents field)
  • The course material recorded in the library catalogue at the Open University is more heterogeneous in reality that perhaps is expressed in the library catalogue records. Specifically there are questions as to whether it is appropriate to model audio-visual material in the same way as print material – this question applies more generally across the library sector.
  • Some material format information is stored in free text fields (e..g. MARC 300)
  • Some existing work to model library catalogue data seems to be focused on 'bibliographic' material - a specific example is the proposed isbd ontology (http://metadataregistry.org/vocabulary/list.html). It is not clear it would be appropriate to apply isbd properties to audio-visual material
  • Library data can be messy and inconsistent. Where the MARC record does not have a specific place to record a piece of information (e.g. in this use case the course to which the material is related) it can result in inconsistent cataloguing practice over long periods of time. Also many pieces of data you may wish to standardise as single entities in RDF (e.g. Publisher, Place of Publication) are recorded as free text in MARC, with no 'authority control' and so may require work to map to single entities, or result in accidental duplicate identities being created for the same entity
  • Best practices in connecting library material and other types of resources, including courses (e.g., through reading lists, references), A/V material, and available open educational material (which might be repurposed from existing resources) need to be created.

It should be noted that the traditional print centric cataloguing practice was not intended to meet the purposes and requirements outlined in this use case, and over time practices change. At the Open University cataloguing practices will change to accommodate linked data developments, informed by the Lucero project recommendations.

Related Use Cases and Unanticipated Uses

The data sets being exposed by LUCERO (see "Existing Work" above) will enable a very wide range of queries to be answered that would currently require several data sets to be consulted separately. For example 'find all the material authored or produced by a specific member of staff', 'I enjoyed this podcast, what courses are related?'. There is also the possibility of combining other information to allow the data to be filtered by specific criteria, e.g. by combining information about the accessibility of library resources with course information, it may be possible to allow a student with a disability to filter out courses which include materials that they would not be able to access.

Combined with further data sets related to research publications (from the Open Universities repository ORO), information about Open University locations (campuses, offices, buildings), etc. there will be a rich network of data sets that can be queried via SPARQL, to expose relationships.


A fuller description of the Lucero project is available at http://lucero-project.info/, and specifically the existing data already available is described briefly at http://lucero-project.info/lb/2010/10/first-version-of-data-open-ac-uk/ and http://data.open.ac.uk. Note that Library data is not yet published (as of 15/10/2010) although it being actively worked on.

The SPARQL endpoint in the use case already exists at http://data.open.ac.uk/query