SWAD-Europe: Semantic Web and Web Services: RDF/XML and SOAP for Web Data Encoding (DRAFT)

Project name:: W3C Semantic Web Advanced Development for Europe (SWAD-Europe)
Project Number:: IST-2001-34732
Workpackage name:: 8. Thesaurus Research Prototype
Workpackage description:: http://www.w3.org/2001/sw/Europe/plan/workpackages/live/esw-wp-8.html
Deliverable title:: N/A
URI:: N/A
Authors:: Nikki Rogers Dave Beckett
Abstract:: This document is contributing to Workpackage 8. We show here the results of our initial efforts in collecting a public list of use-cases for all applications of thesaurus and similar services, typically in distributed environments. These use-cases will be used as the basis for defining the SKOS thesaurus web service API. The definition of this web service API will then form the basis for effort on Deliverable 8.7 - development of a public Research Prototype Demonstrating RDF Thesaurus Technology. Please see for the WSDL of the prototype web service. For more information and reports on thesaurus service API and prototype development please see the SWAD-Europe Thesaurus Activity page at http://www.w3.org/2001/sw/Europe/reports/thes/ . We intend the development of these use-cases to be conducted as a public discussion.

Contents

Section 1: Context/Scope
Section 2: Human End-user Use Cases
Section 3: Machine-to-Machine Use Cases
Section 4: Towards an API specification: is there a common 'set of questions' arising from the above Use Cases?

GENERAL TODOS

Section 1: Context/Scope

We consider the use cases both in terms of human end-user requirements of an online thesaurus (or similar) service, and also of machine-to-machine (M2M) requirements.

Section 2: End-user Use Cases

1 - Marking up general web resources for exposure in 'the semantic web'

2 - Marking up resources for a specific user community Similar to 1., but where for example the user is a SOSIG (social sciences) cataloguer. The user may wish to be offered preferred/non-preferred terms from more than one potential thesaurus - with thesauri entry points for each and the ability to browse through each thesaurus from these points. In this sense, the use case is "finding the right thesaurus" (as suggested by Dave Reynolds). An advanced version of this scenario might be where a cataloguer wishes to "create my own thesaurus" for a specific purpose (and even share it with other cataloguers) by finding points at which to federate more than one thesaurus, and to then draw up (and save) a subset of the merged thesauri for use in cataloguing.

3 - Alistair Miles' use case: tool support for better searching and also browsing using web search engines such as Google. By "better searching" we tend to mean improved query recall (i.e. the user's search term is expanded with synonyms/partial equivalents). By browsing we tend to mean support for the user in narrowing down/refining their search term(s) in order to produce greater accuracy/relevancy in search results.

4 - Similar to 3. but in a specialist community environment such as that of a SOSIG end user

5 - Dan Brickley's use case: multilingual IMAGE retrieval (This is the case where a user expresses a query to recall images with embedded metadata, but need their query term to be translated into different languages. Are there overlaps here with the SIMILE use cases?)

6 - Charles McCathieNevile - Multilingual support. Similar to 5, but for a specific community: an end-user requires translation services e.g. for the W3C glossary (this is a requirement for term mappings across languages in specific contexts)

7 - Parse document to suggest metadata. As an advanced service option, the service receives a whole document (e.g. like one of Steve Cayzer's blog entries - or items, which are metadata about blog entries). It extracts from the document the appropriate information in order to speculate (via automation) what is about and thus suggest thesaurus terms with which to mark it up. (ie. this is like an extension of "give me a list of preferred and non-preferred terms in some thesaurus Y matching some submitted keyword" question below).

Note 1: At the User Interface level: in many cases the user may require visualisation tools for multiple thesauri cross-walking, for example a tool like Protege. We will make a design decision for deliverable 8.7 regarding whether to use such tools with the demonstrator, or whether to keep the User Interface level out of scope (noting that as this is a prototype web service it would be nice to browse thesaurus data online)

Note 2: Again, at the User Interface level we are aware that the sort of browse and search support indicated by use cases 3 and 4 above might be confusing to the user. For example when using a browse facility to refine a search term, the an end-user may not be clear that they are browsing terms, instead thinking they are browsing resources. And for example when using thesaurus-enabled search support, the user may find result sets to be confusing in that their original search term often will not appear in the results (instead synonyms or partial equivalents). As with Note 1, We will make a design decision for deliverable 8.7 regarding whether to keep User Interface issues in scope or not.

Section 3: Machine-to-Machine Use Cases

1 - Cross-search support to give ('invisibly') better query recall across a set of data repositories , e.g. this would extend a tool like the JISC-funded Subject Portal Project (SPP) cross-search. [Note 2, above, applies]

2 - Cross-browse support to allow end-users to "seamlessly" browsing a hierarchy of categories represented across a set of available data repositories in order to refine their search terms, for example when 2 or more KOS's have been "federated". This would complement an online subject-specialist service for example, such as complementing the JISC-funded Subject Portal Project (SPP) cross-search.

3. [Dan Brickley's suggestion - re Bized/Sosig trials and the Desire project] - take two different data services (which for the Desire project were a couple of internet catalogues at ILRT), each using different schemes, and exploit mappings between the taxonomies to merge data into a single environment.

[Note: JISC IE Geo-spatial centralised service - related scenarios?]

Section 4: Towards an API specification: is there a common 'set of questions' arising from the above Use Cases?

Assumption: in the following questions we assume that concepts are identified by their single preferred term (and can have multiple non-preferred terms). However, we anticipate that concept URIs may also be part of such questions/exchanges: "give me the URI of the concept in thesaurus Y identified by this preferred term X" or in the case of checking whether the preferred term for some concept has perhaps changed, which would lead to the question: "give me the preferred term in thesaurus Y for the concept identified by this URI X" We may build these use cases in.

Asking for information pertaining to a single thesaurus

"give me a list of the preferred and non-preferred terms in some thesaurus Y matching some submitted keyword" [for the case where a cataloguer/searcher needs to know any terms that are a potential match for some keyword - permitting truncation or stemming. The user is generally trying to find an "entry" point into someus that they are unfamiliar with.]

"give me a list of the preferred and non-preferred terms any number of specified/unspecified thesauri, matching some submitted keyword" [for the case where a cataloguer/searcher wishes to find thesauri that may be suitable when attaching metadata to resources or refining search terms, repectively.]
"give me the non-preferred term(s) for some concept X in some thesaurus Y"

"give me the scope note for some concept X in some thesaurus Y"

"give me the broader/narrower/related term for some term Z in some thesaurus Y"

"give me all "top" (/root) terms for a preferred term X/concept Z in some thesaurus Y"

.

Asking for information pertaining to multiple thesauri

"give me the equivalent term(s) for term X/concept Y in some target thesaurus/thesauri if it exists, or partial equivalent if it exists"

The service may also answer questions regarding metadata about thesauri such as "give me all supported semantic relations in some thesaurus Y"

Maintained by Nikki Rogers

SWAD-Europe: Use Cases for a Thesaurus Service (draft document)

Contents

Section 1: Context/Scope

Section 2: End-user Use Cases

Section 3: Machine-to-Machine Use Cases

Section 4: Towards an API specification: is there a common 'set of questions' arising from the above Use Cases?

Asking for information pertaining to a single thesaurus

Asking for information pertaining to multiple thesauri