This is an archive of an inactive wiki and cannot be modified.

GTAA Web Browser

Contact e-mail: vmalaise # few.vu.nl, hennie.brugman # mpi.nl

Application

General purpose and services to the end user

The application provides a way to search for and browse through the GTAA thesaurus' terms as a Web application, giving access to its content on the Web.

Functionality examples

The vocabulary is divided in different facets, some of which are organized according to different hierarchies (Broader term/Narrower term for the Subject and Genre facets, additional Categories for the Subject facet).

The browser gives access to terms from a given or all of the facets, according to the thesaurus' structure. It gives the possibility to access all Subject terms of a given Category, to refine by combining different Categories (most of the terms belong to more then one Category), and then to browse through the different elements of the thesaurus structure: Broader Terms and Narrower Terms are displayed as a tree, Related terms, non preferred terms and Scope notes are displayed in a box. Relationships between terms from different facets (a particular King in the Person facet, the subject Kings and the country which this King rules in the Geographical Location facet, for example) were computed and added to the thesaurus browser as an additional navigation functionality. For facets with no hierarchical structure by default, we added some inclusion links in the case of the Geographical Location and provide an alphabetical access.

A generic alphabetical access is also possible for all the facets, including these with a hierarchical structure, by means of a specific search box as is detailed later.

Application architecture

The GTAA Web Browser is accessible across the Web: the browser is implemented as a web application that can retrieve thesaurus data from an extensible set of data sources. One of those is Sound & Vision’s primary source of the GTAA, a relational database. Using this source, radio and television professionals will always have the latest modifications of the GTAA available. To accommodate the needs of researchers the browser can also use an RDF/OWL representation of the thesaurus as its data source. This RDF/OWL store can be updated on request using a separate web application.

Special strategies involved in the processing of user actions

Query refinement using the thesaurus structure: As most Subject terms are part of more than one Category, we offer the user a filtering functionality. Categories and sub-categories are displayed in association with the number of Terms belonging to them. When the user selects a category, its Terms are displayed and a box in the Browser is updated with the list of other categories these Terms can belong to, and the number of overlapping terms. For example, if a user selects the Category Military Issues, the terms related to Military Issues are displayed, and other overlapping categories are proposed for narrowing down the number of terms. If the user selects also Traffic and Transportation, he will get the list of military vehicles in the thesaurus. He can narrow down his query even further by selecting Vessels, in which case the list is narrowed down to military vessels. The number of terms to be displayed can thus be narrowed down to a dozen by two or three clicks.

All the relationships in the thesaurus are implemented as hyperlinks to navigate in its content, and as mentioned before: Relationships between terms from different facets (a King in the Person facet, the subject Kings and the country which this King rules, for example) were computed and added to the thesaurus browser as an additional navigation functionality. [...] we added some inclusion links in the case of the Geographical Location [facet, that are also browsable as hyperlinks].

Integration between vocabulary-linked functions and other application functions

The Browser contains an Alphabetical search box, with a spell checker functionality and a list of synonyms computed from general language dictionaries as an additional entry point to the thesaurus' preferred terms.

Additional references

http://ems01.mpi.nl:8080/GTAABrowser

Véronique Malaisé, Lora Aroyo, Hennie Brugman, Luit Gazendam, Annemieke de Jong, Christain Negru and Guus Schreiber.(2006). Evaluating a thesaurus browser for an audio-visual archive. EKAW'06, Prague, October 2006.

Brugman, H., Malaisé, V., Gazendam, L. (2006). A Web Based General Thesaurus Browser to Support Indexing of Television and Radio Programs. In Proceedings of the 5th international conference on Language Resources and Evaluation (LREC 2006). 24-26 May 2006, Genoa, Italy.

Vocabulary

Title

GTAA thesaurus: Gemeenschappelijke Thesaurus Audiovisuele Archieven – Common Thesaurus for Audiovisual Archives

General characteristics (size, coverage) of the vocabulary

The GTAA thesaurus is the primary source for vocabulary used in the Netherlands Institute for Sound and Vision (the Dutch national public Audiovisual and radio archives) documentation process. It covers a wide range of topics, as it is meant to describe anything that can be broadcasted on TV or radio.

It contains approximately 160.000 terms. The GTAA terms are divided in 6 disjoint facets: Keywords (3800 preferred terms), Locations (14.000), Person Names (97.000), Organization-Group-Other Names (27.000), Maker Names (18.000) and Genres (113 terms)

Language(s) in which the vocabulary is provided

The GTAA is in Dutch.

Vocabulary extract

The vocabulary is shown to the user in the interface of the Web Browser described in the Application section, the extract below shows the constructs around a Term:

Preferred Term: ambachten (crafts)
Related terms: 
ondernemingen (ventures)
beroepen (professions)
artistieke beroepen (artistic professions)
Broader Term: beroepen (professions)
Narrower terms: 
boekbinders (bookbinders)
bouwvakkers (building workers)
glasblazers (glassblowers)
...
Scope Note: niet voor afzonderlijke ambachten maar alleen als verzamelbegrip, bijv. voor (markten van) oude ambachten (not for specific crafts, only in general meaning, e.g. (markets of) old crafts)
Categories:  
05 economie (economy)
09 techniek (technique)

some terms also have a UF (Use For) relationship to non preferred terms:

affiches (posters)
Use for: 
aanplakbiljetten
posters

Structure explanation

The thesaurus mainly uses constructs as presented in the ISO 2788 standard and commonly used in companies or institutions: amongst others, Broader Term, Narrower Term, Related Term, Scope Notes. Terms from all facets of the GTAA may have Related Terms, Use/Use for and Scope Notes, but only Keywords and Genres can also have a Broader Term/Narrower Term relations, organizing them into a set of hierarchies.

Additionally, Keywords terms are thematically classified in 88 subcategories of 16 top Categories. Although the data model that is used for the thesaurus allows links between terms across facets, no instances of these links currently exist.

Machine-readable representation of the vocabulary

The thesaurus can be accessed via the Web Browser presented in the previous section, at the URL: http://ems01.mpi.nl:8080/GTAABrowser/

Software applications used to create and/or maintain the vocabulary, features lacking for the case

There is a module used at Sound and Vision to edit and update manually the thesaurus, internally. currently this module lacks features like generating unique identifiers for Preferred terms, checking the consistency so that hierarchical or associative relationships only occur between Preferred terms, making a Concept-based view of the thesaurus instead of a term-based one.

Standards and guidelines considered during the design and construction of the vocabulary

The design of the thesaurus and its datamodel was partly based on the ISO 2788 recommendation. However, there are constructs that are not mentionned there, like the classification of keywords into Categories and cross facet links.

Management of changes

A commitee of expert decides for the relevant updates to do to the thesaurus and publishes updated versions, which are uploaded in the documentalist's daily system (called iMMiX). We will access this update in the form of an XML export in our project, and convert it to SKOS. For the moment, we only access the Word export of the thesaurus' latest version.

Additional references

Mark van Assem, Veronique Malaisé, Alistair Miles and Guus Schreiber.(2006). A method to convert thesauri to SKOS. In Proc. Third European Semantic Web Conference (ESWC'06), Budvar, Montenegro, June 2006.

Brugman, H., Malaisé, V., Gazendam, L. (2006). A Web Based General Thesaurus Browser to Support Indexing of Television and Radio Programs. In Proceedings of the 5th international conference on Language Resources and Evaluation (LREC 2006). 24-26 May 2006, Genoa, Italy.