Use Case AGRIS

From Library Linked Data
Revision as of 10:00, 19 October 2010 by Ebermes (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Back to Use Cases & Case Studies page

Name

AGRIS indexing and searching

Owner

Food and Agriculture Organization of the United Nations. Contact person: Johannes Keizer <johannes.keizer@fao.org>

Background and Current Practice

Since 1975, the AGRIS (International Information System for the Agricultural Sciences and Technology) database has been aggregating and disseminating bibliographic references, such as research papers, studies and theses, each including metadata such as conferences, researchers, publishers, institutions and subjects, catalogued from more than 150 participating institutions in more than 100 countries. The AGRIS collection covers a wide range of subjects related to agriculture, including forestry, animal husbandry, aquatic sciences and fisheries, human nutrition, and extension. Its content includes unique grey literature such as unpublished scientific and technical reports, theses, conference papers, government publications and more. AGRIS is the largest free collection of metadata on agricultural research, extension and innovation. It is a precious tool for students, researchers and librarians - especially in developing countries - to access agricultural knowledge. AGRIS has a knowledge base of 2.6 million records, which are highly structured and with numerous semantic relations within the knowledge base itself and virtually to other resources on the web. At the moment these linkages are not exploited neither for an efficient use of AGRIS itself nor for linking AGRIS data to related data on the web. An AGRIS record represents itself as only a group of bibliographical data, which do not automatically link to semantic equivalences within the AGRIS knowledge base or to other resources on the web. The reference itself is static and too often contains insufficient information. Analysis shows that most of the time end users check the AGRIS result, do not see the full text and search again, using other engines, for other online resources. However, abstracts and other metadata tags included in the results are probably used as indicators that help the end user to decide whether or not it is worth pursuing the resource indicated by the retrieved record. At the moment only a “trivial linking” of combined terms from the AGRIS record to specific Google searches has been established. Whereas this already shows the enormous potential of using the AGRIS metadata for linking information from the web it lacks semantic rigor and strength.

Goal

(1)

  • The AGRIS database publishes bibliographic records from word-wide agricultural libraries and documentation centers.
  • Fields are disambiguated using authority control.
  • Keywords from agricultural thesauri are added/removed, either manually or using automatic keyword extraction.
  • Users can access the data using fielded or non/fielded searches.
  • Search strings to other search engines are generated from keywords.
  • Related searches are generated from keywords.
  • Users can browse records by navigating subject hierarchies, related subjects, authors, journal titles, etc...

(2) The AGRIS linked data strategy focuses on two different objectives: a. To institute AGRIS as a producer of linked data exploiting the semantic richness of the AGRIS data by creating an open RDF dataset in agricultural sciences, and exposing it to other web services that can consume and link to AGRIS data

b. To institute AGRIS as a consumer of linked data by linking other open data sets to AGRIS, exploiting common vocabulary URIs especially with regard to subject vocabularies, authority control description schemes and other linked data bibliographic records.

Target Audience

  • Individual users: students, publishers, librarians, researchers.
  • Information systems: document repositories, library catalogues.
  • Information service providers, such as NISC , Wolters Kluwer and NTIS, periodically collecting the AGRIS datasets for the integration in their products.

Use Case Scenario

Data processing for AGRIS:

a. The AGRIS center of Kenya sends a batch of bibliographical records to AGRIS. AGRIS compares the data elements to AGRIS standard vocabularies such as AGROVOC, NAL and UNBIS and normalizes the element semantics to AGRIS standard element sets. It compares and disambiguates the content of the elements against the FAO Authority Description Concept Scheme (journals, authors and conferences).

b. The AGRIS indexer uses new incoming records to search a web index (e.g. YaCy) for related resources. It uses the title element, the combined author/subject elements and the conference and journal elements of the record. AgroTagger performs keyword extraction on the related results to produce a set of relevant related keywords based again on standard vocabularies and authority descriptions.

Usage by target audiences:

a. John is a graduate student at Makarere University. He is looking for resources having to do with ‘organic strawberry production’.

b. He goes to the AGRIS web site and searches for ‘Organic strawberry production’, getting 65 results.

c. Together with his record set he gets on the sidebar other information on ‘Organic strawberry production’, e.g. other databases containing bibliographic data on the topic, a list of known experts in the area, a link to a blog which discusses problems of pesticides in strawberry production and links to research institutes that have projects on plant protection regarding strawberries.

d. John is particularly interested in the comparison between nutrient contents in conventionally produced strawberries and organic strawberries. In the header of his record set he finds the concept “nutrients”. Clicking this concept his record set is narrowed to 3 articles about nutrients and organic strawberry production of which one contains the comparison for which he is searching.

e. He clicks the title of the record to get the full description. The full description shows that this article was written by T. Miller and published in the proceedings of the “3rd conference on Organic Strawberry production”. Together with the bibliographic record for the article John gets: links to instances of the full text of the article, a list of articles in which this article has been cited, a list of the other publications of T. Miller, a link to the proceedings of the conference and a link to the website of the conference organizers.

f. John copies the needed information into his notebook and then clicks the keyword “soil properties” (an AGROVOC term) to get another specific record set related to organic strawberry production.

Application of linked data for the given use case

  • Express bibliographic records in RDF.
  • Publish HTTP URIs and RDF records from bibliographic records.
  • Provide standard discovery services, e.g. a SPAQRL endpoint.
  • Use LOD records from subject vocabularies.
  • Use LOD authority descriptions for disambiguation.
  • Link LOD bibliographic databases at record level.

Existing Work

AGRIS metadata elements have been naively mapped to related terms of existing RDF vocabularies such as AGROVOC. A subset of the AGRIS XML data was then converted to RDF, storing the produced data in Sesame, an open source framework which allows for storing, browsing and analyzing the RDF via a SPARQL endpoint. In this way we were able to discover useful information about our data, such as data inconsistencies and relationships between entities.

Related Vocabularies

With the final scope of ensuring interoperability, in the experimental phase, the following vocabularies were used:

  • SKOS [1]
  • BIBO [2]
  • FOAF [3]
  • DC and DCTerms [4]

Problems and Limitations

The first of the four canonical rules for publishing data on the Web, described by Tim Berners-Lee [5] states that the things should be identified with URIs. This is the first big challenge for AGRIS data as they are coming from heterogeneous sources where the semantics are not always defined. Disambiguation of authors, journals, conferences is a particularly daunting task.

Related Use Cases

References

[1] SKOS Simple Knowledge Organization System http://www.w3.org/2004/02/skos/

[2] Bibliographic Ontology Website http://bibliontology.com/

[3] The Friend of a Friend (FOAF) projecthttp://www.foaf-project.org/

[4] Dublin Core Metadata Initiative http://dublincore.org/

[5] http://www.w3.org/DesignIssues/LinkedData.html