This is an archive of an inactive wiki and cannot be modified.

Semantic search service accross mapped multilingual thesauri in the agriculture domain

Contact e-mail: johannes.keizer # fao.org, margherita.sini # fao.org

Application

General purpose and services to the end user

The application shall provide a semantic search service that makes use of mapped thesauri to allow user to use any terminology in any of the languages the thesauri are available and retrieve information from resources which may have been indexed by one of the mapped vocabularies.

Functionality examples

Typical functions are navigating resources, helping to build boolean searches, espanding searches by languages or synonyms.

Special strategies involved in the processing of user actions

Integration between vocabulary-linked functions and other application functions

The information given the controlled vocabularies and the mappings will be used over free text search. One possible process is to converting a free text query into concepts, then expand the query with synonyms and translations before searching the text or metadata. Another one is, starting from conversion of query into concepts, to use the mapping links to expand the query that will be sent against text and metadata.

Additional references

Project website: http://www.fao.org/aims

Vocabularies

Titles

General characteristics (size, coverage) of the vocabulary

Scope: Agriculture in general, including forestry, fisheries, nutrition, environment, etc. Size: Agrovoc contains 35000 terms per 12 languages, that is 371691 terms (Descriptors and Non-descriptors). ASC contains 132 categories and subcategories, TKCS contains 100 categories and subcategories, FAOTERM contains 181 subjects ...

Language(s) in which the vocabulary is provided

AGROVOC contains different amounts of term per language:

ASC, TKCS, FAOTERM come in the 5 FAO official languages (EN, AR, ZH, ES, FR)

Vocabulary extract

AGROVOC:

Term code: 1939
Term label: EN : Cows, FR : Vache, ES : Vaca, AR : بقرات , ZH : 母牛 , PT : Vaca, CS : krávy, JA : 雌牛 , TH : แม่โค  , SK : kravy, DE : KUH
BT : Cattle (code 1391)
NT : Suckler cows, Dairy cows (26767, 36875)
RT : Heifers, Cow milk, Milk yielding animals, Females (3535, 4833, 15969, 16080)
SNR : Females (15969)
Scope Note : Use only for cattle and zebu cattle; for other species use
"Females" (15969) plus the descriptor for the species

Other schemes, see: http://www.fao.org/aims/ag_classifschemes.jsp

Structure explanation

AGROVOC is made up of terms, which consist of one or more words representing always one and the same concept. Terms are divided into Descriptors and non-descriptors, the first ones being the only currently used for indexing. For each descriptor, a word block is displayed showing the hierarchical relation to other terms: BT (broader term), NT (narrower term), RT (related term), UF (non-descriptor). Non-descriptors, which can be thought as "non used synonyms" of Agrovoc Keywords, appear followed by a reference (USE operator) to the descriptor of the corresponding concept.

Scope notes are used in AGROVOC to clarify the meaning of both descriptors and non-descriptors. Taxonomic and geographical terms are marked for easy searching, filtering and downloading.

Table of symbols used in the word blocks of descriptors and non-descriptors

Symbol          Meaning         Type
BT      broader term    descriptor
NT      narrower term   descriptor
RT      related term    descriptor
USE     use     non-descriptor
UF      used for        descriptor

Actually, the AIMS project includes some more links, presented in http://www.fao.org/aims/cs_relationships.htm:

Example of these links are:

Machine-readable representation of the vocabulary

ftp://ftp.fao.org/gi/gil/gilws/aims/kos/agrovoc_formats/skos/2006

Software applications used to create and/or maintain the vocabulary, features lacking for the case

The current vocabulary management system lacks distributed maintenance (currently centralised), text mining on helping terminologist to build the thesaurus, etc.

Structure of the database used to currently manage the vocabulary

The structure of the AGROVOC thesaurus in a relational database format is available here : ftp://ftp.fao.org/gi/gil/gilws/aims/kos/agrovoc_formats/db_format.doc

Standards and guidelines considered during the design and construction of the vocabulary

The thesaurus has been built is the early 1980s, so during the years different standards have been followed.

Management of changes

Recently a new team for the revision of the agrovoc thesaurus has been composed. The team consults with experts and partners all over the world for the changes on the terms. A new distributed maintenance system is under development.

Additional references

http://www.fao.org/aims/ag_intro.htm

Vocabulary mappings

Mapped vocabularies

AGROVOC and the Chinese Agricultural Thesaurus AGROVOC and NAL (national Agricultural Library) thesaurus

Extracts of Mappings

For AGROVOC-CAT originally the mapping has been represented in excel, subsequently it has been represented using OWL.

CAT-ID  CAT-EN                  Map             AG-ID   AG-EN                   AG-ID   AG-EN
30854   Senta flammea           Exact           9748    Cheena
50008   Mayetola destructor     Exact-OR        24260   Triticale (gramineae)   7949    Triticales (product)
1160    Two-shear sheep         NT1             3662    Hordeum vulgare

Files are available if needed. For AGROVOC-NAL the work is currently ongoing.

Types of mapping used

The mapping uses the links provided by the SKOS mapping vocabulary.

Additional references

http://www2.db.dk/nkos2005/Margherita%20Sini.pdf

http://www.fao.org/docrep/008/af241e/af241e00.htm#Contents

http://www.efita.net/apps/accesbase/dbsommaire.asp?d=5841&t=0&identobj=TagfclQc&uid=57305290&sid=57305290&idk=1