RucRankingForDescription - W3C Semantic Web Deployment Wiki

================================================================
Section 0. Contact and confidentiality
================================================================

Contact e-mail:
vmalaise@few.vu.nl, hennie.brugman@mpi.nl

Do you mind your use case being made public on the working group website and documents?
No

================================================================
Section 1. Application
================================================================

In this section we ask you to provide some information about the 
application for which the vocabulary(ies) and or vocabulary mappings are being used.

Please note:
-- If your use case does not involve any specific application, but 
consists rather in the description of a specific vocabulary, skip 
straight to Section 2.
-- If your application makes use of links between different 
vocabularies, do not forget to fill in Section 3!


1.1. What is the title of the application?
CHOICE@CATCH ranking of candidate terms for description of radio and tv programs


1.2. What is the general purpose of the application?
      What services does it provide to the end-user?

Radio and television programs at the Dutch national broadcasting archive (Sound and Vision) typically are associated with contextual text descriptions (web site texts, subtitles, program guide texts, texts from the production process, etc). Documentalists at Sound and Vision manually describe RTV programs using this type of context documents. For this description task they use the GTAA (Gemeenschappelijke Thesaurus Audiovisuele Archieven - Common Thesaurus for Audiovisual Archives). Our project uses natural language processing techniques to automatically extract candidate GTAA terms from the context documents.

The application that is described in this use case takes these candidate terms as input, and ranks them on basis of the structure of the GTAA thesaurus. For this ranking it is assumed that candidate terms that are mutually connected by thesaurus relations (directly or indirectly) are  more likely to be good descriptions than isolated candidate terms.

Ranked terms will be presented to documentalists to speed up their description work.



*1.3. Provide some examples of the functionality of the application. Try to illustrate all of the functionalities in which the vocabulary(ies) and/or vocabulary mappings are involved.

Functionality is simple: input is a list or term URI's, output is a ranked list of URI's.


1.4. What is the architecture of the application?
      What are the main components?
      Are the components and/or the data distributed across a network, or across the Web?

Currently the application is a standalone Java application that is called from the command line with a file containing URI's as argument. At a later stage this application will be implemented as a (SOAP) web service.
The application uses a Sesame web repository containing the SKOS version of the GTAA thesaurus to retrieve the 'term context' of the terms in the input list. This term context is stored in a temporary local Sesame repository.


1.5. Briefly describe any special strategy involved in the processing
of user actions, e.g. query expansion using the vocabulary structure.

The term context mentioned under 1.4 currently contains all terms that are directly connected by broader term, narrower term or related term relations. In the future we may want to differentiate between types of thesaurus relations, or we may want to use more complex patterns of thesaurus relations for our ranking algorithm.



1.6. Are the functionalities associated with the controlled
vocabulary(ies) integrated in any way with functionalities provided by other means? (For example, search and browse using a structured
vocabulary might be integrated with free-text searching and/or some sort of social bookmarking or recommender system.)

We plan to integrate our thesaurus based recommendation system with a recommendation system that is based on co-occurences between terms that are used in previously existing descriptions of RTV programs.


1.7. Any additional information, references and/or hyperlinks.

See the use case description "Recommend metadata" on http://ems01.mpi.nl/usecases/


================================================================
Section 2. Vocabulary(ies)
================================================================

CF RucGtaaBrowser use case description

================================================================
Section 3. Vocabulary Mappings
================================================================

In this section we ask you to provide some information about the 
mappings or links between vocabularies you would like to be able to 
represent using SKOS.

Please note:
-- If your use case does not involve vocabulary mappings or links, you 
may skip this section!

3.1. Which vocabularies are you linking/mapping from/to?

*3.2. Please provide below some extracts from the mappings or links 
between the vocabularies. Use the layout or presentation format that you 
would normally provide for the users of the mappings. Please ensure that 
the examples you provide illustrate all of the different types of 
mapping or link.

3.3. Describe the different types of mapping used, with reference to 
the examples given in paragraph 3.2.

3.4. Any additional information, references and/or hyperlinks.