Seed Use Cases

This page presents summaries of existing known use cases based on existing linked data applications and platforms and prototypes developed in R&D projects.

Use Case Template

We aim to just present overviews of use cases in an easily digestible form, so please use the following tempate:

Title

Source Reference
Industry sector
Actors and benefits they get from use case
Summary of use case in a few lines
Language technologies involved
Language resources involved
Specific benefit of using linguistic linked data in use case
Critical assessment of use case, e.g. in commercial use, trial only, research prototype, under-development
Provided by

Linguistic-Linked Data Aware

Use cases specifically designed to employ linguistic linked data:

Ontology Localisation

Title: Ontology Localization
Source Reference: http://cordis.europa.eu/fp7/ict/language-technologies/project-monnet_en.html
Industry sector: Any
Actors and benefits they get from use case: Developers of ontologies or controlled vocabularies that want to localize their ontologies for cross-lingual interoperability
Summary: In many scenarios, the localization of ontologies is an important task as interoperability needs to be established across languages and borders. The Monnet project has considered in particular the case of financial terminologies / vocabularies such as XBRL and the specific GAAP-based language-specific vocabularies. In order to establish interoperability between all these national vocabularies, they need to be aligned. As a basis for automatic or semi-automatic alignment, appropriate translations are needed. Thus, techniques are needed to translate the labels of an ontology into some other target language, which we refer to as ``ontology localization.
Language technologies involved: automated terminology extraction and analysis; machine translation systems
Language resources involved: multilingual or monolingual linked data lexicons or dictionaries; multilingual term bases; translation memories
Specific benefit of using linguistic linked data in use case: reuse resources for finding (domain-specific) translation candidates supporting the localization
Provided by: Philipp Cimiano - UNIBI

Publishing Rich Lexical Knowledge with Ontologies

Title: Publishing Rich Lexical Knowledge with Ontologies
Source Reference: http://www.w3.org/community/ontolex/wiki/IFLA, http://www.w3.org/community/ontolex/wiki/AGROVOC
Industry sector: Localisation, specifically terminology management
Actors and benefits they get from use case: Developers of thesauri, classification schemes, ontologies; Consumers and users of thesauri or classification schemes
Summary: In many cases a deeper linguistic grounding of linguistic elements specified in thesauri is required, e.g. including inflectional or other syntactic information describing how the terms behave syntactically and semantically. However, current models such as SKOS are not sufficient for this purpose. Thus, there is a clear need for an extensive vocabulary to model the linguistic properties of terms in a thesaurus, classification scheme etc. This need has been documented at http://www.w3.org/community/ontolex/wiki/IFLA for the International Federation of Library Associations and Institutions and at http://www.w3.org/community/ontolex/wiki/AGROVOC for the AGROVOC thesaurus developed by the Food and Agriculture Organization (FAO).; Several technological needs arise in the context of such a use case, such as i) easy porting of SKOS resources to ontolex, automatic creation of ontolex resources from non-ontolex resources (SKOS, RDF etc.). The needs of the above example use cases is addressed by introducing the ontolex vocabulary currently in development by the ontolex W3C community group, which provides the vocabulary necessary to define ontology lexica that realize a separate lexical layer that can be used to provide the rich linguistic and lexical information externally to the ontology / thesaurus / classification scheme in question in a separate file.

The benefit of the ontolex model is that ontology lexica can be published separately from the ontologies / models they lexicalize, thus giving a high degree of flexibility to add lexica for additional languages. This in particular adds some modularity as support for other languages that can be provided incrementally as needed.

Language technologies involved: morphological analysers; translation tools
Language resources involved: multilingual or monolingual dictionaries; existing lexical resources to link to
Specific benefit of using linguistic linked data in use case: Reuse of available resources for localization of lexica
Provided by: Philipp Cimiano, UNIBI

Aggregation of Lexical and Encyclopaedic Sources

Title: Aggregation of Lexical and Encyclopaedic sources
Source Reference: http://multijedi.org http://babelnet.org http://babelnet.org/2.0
Industry sector: Any
Actors and benefits they get from use case: Developers of dictionaries, encyclopedias, thesauri, ontologies; Consumers and users of large machine-readable, language knowledge resources (e.g. companies building systems that require large amounts of knowledge)
Summary: The alignment and integration of lexicographic, i.e. from dictionaries, and encyclopedic knowledge, i.e. from encyclopedias, is crucial for both developers and consumers of large knowledge resources. However, many online language resources are either based on Wikipedia, e.g. DBpedia, thereby focusing on encyclopedic content, or on dictionaries, such as OmegaWiki or Wiktionary. The MultiJEDI project has considered the case of interlinking and merging several language resources, i.e., WordNet, Wikipedia, OmegaWiki and the Open Multilingual WordNets. To perform the alignment, techniques are needed which link the same meanings available in different resources and decide when to merge the corresponding concepts into unified, multilingual concept representations.
Language technologies involved: word sense disambiguation; machine translation tools
Language resources involved: multilingual or monolingual online dictionaries and encyclopedias
Specific benefit of using linguistic linked data in use case: Reuse of available resources; Alignment to, exploitation and availability of other language resources in the LLOD cloud
Provided by: Roberto Navigli, UNIRM

Multilingual Disambiguation and Entity Linking

Title: Multilingual Disambiguation and Entity Linking
Source Reference: http://babelfy.org (available online)
Industry sector: Any
Actors and benefits they get from use case: Users and consumers of semantically-annotated or semantically-indexed text/data in any language
Summary: While Word Sense Disambiguation, the task of automatically associating meanings with words in context, has typically been a task restricted to a small number of researchers, recently the emergence of the new task of Entity Linking, concerned with linking named entities within text, has opened up new possibilities for a huge number of companies in search for services aimed at semantic indexing and linking of text written in arbitrary languages.; To perform Entity Linking, however, large amounts of machine-readable knowledge need to be available, together with effective algorithms for performing the task.; While several approaches to Entity Linking exist, the MultiJEDI project has addressed the task of integrating Word Sense Disambiguation with Entity Linking, showing that Wikification and Entity Linking services can greatly benefit from the integration of lexical, i.e. from dictionaries, and encyclopedic knowledge.; To do this and keep the task independent of language, large amounts of knowledge, lexicalized and connected in as many languages as possible, need to be made available.
Language technologies involved: word sense disambiguation; entity linking
Language resources involved: BabelNet; multilingual or monolingual online dictionaries and encyclopedias
Specific benefit of using linguistic linked data in use case: performance improvement thanks to linking to and exploiting other LLOD
Provided by: Roberto Navigli and Paola Velardi, UNIRM

Sentiment Analysis

from EUROSENTIMENT: http://eurosentiment.eu/wp-content/uploads/2013/06/EUROSENTIMENT-D2-3-User-Req-Use-Cases-UPM-v1_7-Final.pdf [paul & thierry]

Multilingual and Cross-lingual Sentiment Analysis

Title: Multilingual and Cross-lingual Sentiment Analysis
Source Reference: Slides of WebLyzard at EDF LIDER WS Meeting
Industry sector: Any
Actors and benefits they get from use case: Developers of sentiment analysis / opinion mining systems; Sentiment analysis and opinion mining systems are heavily used to understand and structure online communication about products, services etc. for marketing purposes. In many cases, analysis of brands across countries and natural languages is crucial. However, adapting a sentiment analysis system to other domains is expensive, requiring sentiment lexica in different languages, which are ideally contextualized.
Language technologies involved: automated sentiment analysis
Language resources involved: contextualized sentiment lexica for multiple languages
Specific benefit of using linguistic linked data in use case: reuse (linked) sentiment lexica in multiple languages to adapt a sentiment analysis system to other languages, lowering the cost for doing so
Provided by: Philipp Cimiano - UNIBI

Annotating News Feeds

from OpenCalais use cases [paul]

Resoruce Sharing for Named Entity Recognition

from Open NER project

something from DBPedia spotlight and Apache Standbol [Sebastian]

Information Management Use Cases

Data management projects that may benefit from linguistic linked data

Crowdsource Media Annotations

from MICO project: http://www.mico-project.eu/use-cases/ [?]

Data integration and analysis value chain

from CODE project; http://code-research.eu/vision [?]

Language analysis in economics and finance

from DOPA project; http://www.dopa-project.eu/index.php/vision/ Source of requirements for language analysis in economics and finance [?]

Cross lingual media retrieval in medicine

from KHRESHMOI project http://www.khresmoi.eu/use-cases/ Source of requirements for cross lingual and multimodal IR [dave]

Media Fragment Management

from MediaMixer project: http://community.mediamixer.eu/usecases [dave]

Language Technology Projects

Project employing language technology that may benefit from linguistic linked data.

Terminology Extraction

Title: Terminology Extraction for Localisation
Source Reference: FALCON Project
Industry sector: Localisation, specifically terminology management
Actors and benefits they get from use case: Localisation clients who will improve the terminology consistency of source content prior to translation; Language Service Providers who will be able to provide better translation quality through consistency in term translation
Summary: Localisation clients run source text through an automated term identification service which has been trained on a dictionary indexed with references to one or more linked data dictionaries, including their own organisation one.; The automated term identification service returns the source text with terms annotated, e.g. using ITS2.0 terminology data category, with a reference to a lexical data entry.; The term annotations are reviewed by the client's terminologist who identifies any false positives, dereferencing and reviewing information in the lexical data entry if needed. False positives may be fed back to improve the training corpora of the terminology extraction service.; The approved term annotations are then reviewed by a linguist who may, if the dictionary is multilingual and includes the source language, then deference the link of the terms, examine any translations present and opt to approve it for use in the job. If a translation is not present the linguistic may provide one if judged to be important for translation consistency. In both cases the term translations are then passed as a multilingual glossary together with the source text to the language service provider. If a new translation had been generated, this may be submitted back to the dictionary as a candidate translation for future use.
Language technologies involved: automated terminology extraction; term suggestion and translation review tool
Language resources involved: multilingual or monolingual linked data lexicon or dictionary
Specific benefit of using linguistic linked data in use case
Critical assessment of use case, e.g. in commercial use, trial only, research prototype, under-development
Provided by: Dave Lewis - CNGL/TCD

Parallel Text Curation and MT Retraining

Title: Parallel Text Curation and MT Retraining
Source Reference: FALCON Project
Industry sector: Localisation, specifically use of machine translation and postediting
Actors and benefits they get from use case: Localisation clients who will improve the quality of MT translation of their content over time, both for direct publishing of machine translated content and for improving throughput and discounts of human post-edited translation from LSPs.; Language Service Providers who will be able to provide better machine translation quality to clients through tailored selection of parallel text for training an MT engine for a specific job (where clients don't posses their own MT), and is also able to offer more competitive discounts based on improved postediting efficiency resulting from higher quality. The predicable improvement of MT for a client can be factored into discounts or charged as a value add service is access to the MT engine is offered after the job is completed.
Summary: As a new client translation project is started, the LSP trains an MT engine by selecting the most appropriate parallel text in the required language pairs from both internal and publicaly available parallel text in linked data format. This selection can be based on factors such as the similarity in domain, genre and term distribution or on the experience and quality assessment of the translators/posteditors of previous translations in relation to the client's project.; As the translation project progresses, the generated translations are also reviewed and selected for frequent retraining of the MT engine, improving the automated translation stage progressively over the period of the project.
Language technologies involved: machine translation; provenance tracking of postediting of machine translation
Language resources involved: parallel text with quality annotation
Specific benefit of using linguistic linked data in use case: ease in integrating postedit and QA data from different tools and subcontractors
Critical assessment of use case, e.g. in commercial use, trial only, research prototype, under-development: research prototype
Provided by: Dave Lewis - CNGL/TCD

Content Annotation for LT in Localisation

from MLW-LT working group: use cases: http://www.w3.org/International/its/wiki/Use_cases_-_high_level_summary [felix/dave]

Multilingual document search and data mining

from MANTRA project: http://www.mantra-project.eu/ Source of requirements on ML document annotation for search and data mining – biomed domain [dave/kev]

Quality Assessment for Machine Translation

from QTLaunchpad: http://www.qt21.eu/launchpad/ Mt quality assessment [felix]

Speeding Up Grammar Generation for Spoken Dialogue Systems

Title: Supporting Development of Dialogue Systems using Linked Data and Ontology Lexica
Source Reference: Portdial Project
Industry sector: Any
Actors and benefits they get from use case: Developers of dialogue systems
Summary: The creation of grammars for dialogue systems is costly. It can be made more cost-efficient by techniques that semi-automatically support the creation of grammars. In the Portdial project, an approach by which grammars are induced in a top-down fashion from an ontology lexicon has been explored and shown to deliver grammars that are highly precise but lack recall. UNIBI has implemented this approach for LTAG (Lexicalized Tree Adjoining Grammars), CCG (Combinatorial Categorial Grammars) and GF (Grammatical Framework Grammars).

Nevertheless, such grammars can be used to seed the further process of extending the coverage of the grammar, using other techniques to increase coverage. In addition, the Portdial project has in particularly supported the development of so called pre-terminal rules which expand non-terminals into a set of named entities. It has been shown that Linked Data can be used to support this use case.

Language technologies involved: Top-down grammar generation
Language resources involved: linked data with labels in different languages; existing language-specific grammars
Specific benefit of using linguistic linked data in use case: reuse (linked) ontology lexica for top-down grammar induction; reuse linked data with labels in many languages to support enhancement of pre-terminal rules in many languages
Provided by: Philipp Cimiano - UNIBI

Speeding Up Grammar Generation for Spoken Dialogue Systems

Title: Supporting Cross-lingual Information Retrieval

Source Reference: Organic.Lingua Project
Industry sector: Any
Actors and benefits they get from use case: ???
Summary: ???
Language technologies involved: ?
?: ?; ?
Specific benefit of using linguistic linked data in use case: reuse (linked) ontology lexica for top-down grammar induction; reuse linked data with labels in many languages to support enhancement of pre-terminal rules in many languages
Provided by: CELI?

Social Media Analytics

Philipp to describe three use cases.

Multimedia Analytics

May be some input from use cases for non linked data meta-data:

IPTC for images
PLUS for images
PBCORE for public broadcasting
W3C media annotation group

Use Case Template

Title

Linguistic-Linked Data Aware

Ontology Localisation

Publishing Rich Lexical Knowledge with Ontologies

Aggregation of Lexical and Encyclopaedic Sources

Multilingual Disambiguation and Entity Linking

Sentiment Analysis

Multilingual and Cross-lingual Sentiment Analysis

Annotating News Feeds

Resoruce Sharing for Named Entity Recognition

Information Management Use Cases

Crowdsource Media Annotations

Data integration and analysis value chain

Language analysis in economics and finance

Cross lingual media retrieval in medicine

Media Fragment Management

Language Technology Projects

Terminology Extraction

Parallel Text Curation and MT Retraining

Content Annotation for LT in Localisation

Multilingual document search and data mining

Quality Assessment for Machine Translation

Content Analytics Marketplace

News Content Analysis

LT innovation

Speeding Up Grammar Generation for Spoken Dialogue Systems

Speeding Up Grammar Generation for Spoken Dialogue Systems

Social Media Analytics

Multimedia Analytics