This is an archive of an inactive wiki and cannot be modified.

CHOICE@CATCH ranking of candidate terms for description of radio and TV programs

Contact e-mail: vmalaise # few.vu.nl, hennie.brugman # mpi.nl

Application

General purpose and services to the end user

Radio and television (RTV) programs at the Dutch national broadcasting archive (Sound and Vision) typically are associated with contextual text descriptions (web site texts, subtitles, program guide texts, texts from the production process, etc). Documentalists at Sound and Vision manually describe RTV programs using this type of context documents. For this description task they use the GTAA (Gemeenschappelijke Thesaurus Audiovisuele Archieven - Common Thesaurus for Audiovisual Archives). Our project uses natural language processing techniques to automatically extract candidate GTAA terms from the context documents.

The application that is described in this use case takes these candidate terms as input, and ranks them on basis of the structure of the GTAA thesaurus. For this ranking it is assumed that candidate terms that are mutually connected by thesaurus relations (directly or indirectly) are more likely to be good descriptions than isolated candidate terms. For example, the fact that "Voting" and "Democratization" are related in GTAA by a two-step path (via the "Election" term and two "related-to" links) will influence positively the ranking of these terms.

Ranked terms will be presented to documentalists to speed up their description work.

Functionality examples

Functionality is simple: input is a list or term URIs, output is a ranked list of URIs.

Application architecture

Currently the application is a standalone Java application that is called from the command line with a file containing URIs as argument. At a later stage this application will be implemented as a (SOAP) web service. The application uses a Sesame web repository containing the SKOS version of the GTAA thesaurus to retrieve the 'term context' of the terms in the input list. This term context is stored in a temporary local Sesame repository.

Special strategies involved in the processing of user actions

The term context mentioned before currently contains all terms that are directly connected by broader term, narrower term or related term relations. In the future we may want to differentiate between types of thesaurus relations, or we may want to use more complex patterns of thesaurus relations for our ranking algorithm.

Integration between vocabulary-linked functions and other application functions

The thesaurus-based recommendation system can also be integrated with a recommendation system that is based on co-occurences between terms that are used in previously existing descriptions of RTV programs.

Additional references

See the use case description "Recommend metadata" on http://ems01.mpi.nl/usecases/

Vocabulary

GTAA, cf. EucGtaaBrowser use case description.