Semantic search service accross mapped multilingual thesauri in the agriculture domain
Contact e-mail: johannes.keizer # fao.org, margherita.sini # fao.org
Application
General purpose and services to the end user
The application shall provide a semantic search service that makes use of mapped thesauri to allow user to use any terminology in any of the languages the thesauri are available and retrieve information from resources which may have been indexed by one of the mapped vocabularies.
Functionality examples
Typical functions are navigating resources, helping to build boolean searches, espanding searches by languages or synonyms.
Special strategies involved in the processing of user actions
- concept identification
- query expansion (translations/synonyms)
- concept identification, i.e. linking a free text query to a concept from the controlled vocabularies based on similar label used.
- query expansion, using synonymys,translations, but also semantic hierarchical and associative links found in a single thesaurus, as well as mapping links between thesauri
Integration between vocabulary-linked functions and other application functions
The information given the controlled vocabularies and the mappings will be used over free text search. One possible process is to converting a free text query into concepts, then expand the query with synonyms and translations before searching the text or metadata. Another one is, starting from conversion of query into concepts, to use the mapping links to expand the query that will be sent against text and metadata.
Additional references
Project website: http://www.fao.org/aims
Vocabularies
Titles
- FAO AGROVOC Thesaurus
- FAO Agris/Caris Classification Scheme (ASC)
- FAO Technical Knowledge Classification Scheme (TKCS)
- FAOTERM SUBJECTS
- Chinese Agrigultural Thesaurus (CAT), maintained by the Chinese Accademy of Agricultural Sciences.
General characteristics (size, coverage) of the vocabulary
Scope: Agriculture in general, including forestry, fisheries, nutrition, environment, etc. Size: Agrovoc contains 35000 terms per 12 languages, that is 371691 terms (Descriptors and Non-descriptors). ASC contains 132 categories and subcategories, TKCS contains 100 categories and subcategories, FAOTERM contains 181 subjects ...
Language(s) in which the vocabulary is provided
AGROVOC contains different amounts of term per language:
- AR : 25913 CS : 38667 DE : 25215 EN : 39240 ES : 41624 FR : 38342 JA : 38655 PT : 36428 SK : 25794 TH : 25416 ZH : 36408
ASC, TKCS, FAOTERM come in the 5 FAO official languages (EN, AR, ZH, ES, FR)
Vocabulary extract
AGROVOC:
Term code: 1939 Term label: EN : Cows, FR : Vache, ES : Vaca, AR : بقرات , ZH : 母牛 , PT : Vaca, CS : krávy, JA : 雌牛 , TH : แม่โค , SK : kravy, DE : KUH BT : Cattle (code 1391) NT : Suckler cows, Dairy cows (26767, 36875) RT : Heifers, Cow milk, Milk yielding animals, Females (3535, 4833, 15969, 16080) SNR : Females (15969) Scope Note : Use only for cattle and zebu cattle; for other species use "Females" (15969) plus the descriptor for the species
Other schemes, see: http://www.fao.org/aims/ag_classifschemes.jsp
Structure explanation
AGROVOC is made up of terms, which consist of one or more words representing always one and the same concept. Terms are divided into Descriptors and non-descriptors, the first ones being the only currently used for indexing. For each descriptor, a word block is displayed showing the hierarchical relation to other terms: BT (broader term), NT (narrower term), RT (related term), UF (non-descriptor). Non-descriptors, which can be thought as "non used synonyms" of Agrovoc Keywords, appear followed by a reference (USE operator) to the descriptor of the corresponding concept.
Scope notes are used in AGROVOC to clarify the meaning of both descriptors and non-descriptors. Taxonomic and geographical terms are marked for easy searching, filtering and downloading.
Table of symbols used in the word blocks of descriptors and non-descriptors
Symbol Meaning Type BT broader term descriptor NT narrower term descriptor RT related term descriptor USE use non-descriptor UF used for descriptor
Actually, the AIMS project includes some more links, presented in http://www.fao.org/aims/cs_relationships.htm:
- Traditional Thesaurus relationships: Broader Term (BT); Narrower Term (NT); Is Referenced in Scope Note (SNX); Scope Note Reference (SNR); See (SEE); Seen For (SF); Use (USE); Used For (UF) Used For+ (UF+)
- Concept-to-Concept specific relationships (subclass of; caused by; member of; part of; etc)
- Term-to-Term relationships (related term; synonym; translation)
- String-to-String relationships (spelling variant; acronym)
Example of these links are:
("bucket", synonym, "pail")
("Corp.", abbreviation_of, "Corporation")
("Food and Agriculture Organization", acronym, "FAO")
("organisation", spelling_variant, "organization")
("vache" (FR), translation, "cow" (EN))
("African violet", scientific_taxonomic_name, "Saintpaulia")
Machine-readable representation of the vocabulary
ftp://ftp.fao.org/gi/gil/gilws/aims/kos/agrovoc_formats/skos/2006
Software applications used to create and/or maintain the vocabulary, features lacking for the case
The current vocabulary management system lacks distributed maintenance (currently centralised), text mining on helping terminologist to build the thesaurus, etc.
Structure of the database used to currently manage the vocabulary
The structure of the AGROVOC thesaurus in a relational database format is available here : ftp://ftp.fao.org/gi/gil/gilws/aims/kos/agrovoc_formats/db_format.doc
Standards and guidelines considered during the design and construction of the vocabulary
The thesaurus has been built is the early 1980s, so during the years different standards have been followed.
Management of changes
Recently a new team for the revision of the agrovoc thesaurus has been composed. The team consults with experts and partners all over the world for the changes on the terms. A new distributed maintenance system is under development.
Additional references
http://www.fao.org/aims/ag_intro.htm
Vocabulary mappings
Mapped vocabularies
AGROVOC and the Chinese Agricultural Thesaurus AGROVOC and NAL (national Agricultural Library) thesaurus
Extracts of Mappings
For AGROVOC-CAT originally the mapping has been represented in excel, subsequently it has been represented using OWL.
CAT-ID CAT-EN Map AG-ID AG-EN AG-ID AG-EN 30854 Senta flammea Exact 9748 Cheena 50008 Mayetola destructor Exact-OR 24260 Triticale (gramineae) 7949 Triticales (product) 1160 Two-shear sheep NT1 3662 Hordeum vulgare
Files are available if needed. For AGROVOC-NAL the work is currently ongoing.
Types of mapping used
The mapping uses the links provided by the SKOS mapping vocabulary.
Additional references
http://www2.db.dk/nkos2005/Margherita%20Sini.pdf