Seed Use Cases

From Linked Data for Language Technology Community Group

This page presents summaries of existing known use cases based on existing linked data applications and platforms and prototypes developed in R&D projects.

Use Case Template

We aim to just present overviews of use cases in an easily digestible form, so please use the following tempate:

Title

Source Reference
Industry sector
Actors and benefits they get from use case
Summary of use case in a few lines
Language technologies involved
Language resources involved
Specific benefit of using linguistic linked data in use case
Critical assessment of use case, e.g. in commercial use, trial only, research prototype, under-development
Provided by

Linguistic-Linked Data Aware

Use cases specifically designed to employ linguistic linked data:

Ontology Localisation

Title
Ontology Localization
Source Reference
http://cordis.europa.eu/fp7/ict/language-technologies/project-monnet_en.html
Industry sector
Any
Actors and benefits they get from use case
Developers of ontologies or controlled vocabularies that want to localize their ontologies for cross-lingual interoperability
Summary
In many scenarios, the localization of ontologies is an important task as interoperability needs to be established across languages and borders. The Monnet project has considered in particular the case of financial terminologies / vocabularies such as XBRL and the specific GAAP-based language-specific vocabularies. In order to establish interoperability between all these national vocabularies, they need to be aligned. As a basis for automatic or semi-automatic alignment, appropriate translations are needed. Thus, techniques are needed to translate the labels of an ontology into some other target language, which we refer to as ``ontology localization.
Language technologies involved
automated terminology extraction and analysis
machine translation systems
Language resources involved
multilingual or monolingual linked data lexicons or dictionaries
multilingual term bases
translation memories
Specific benefit of using linguistic linked data in use case
reuse resources for finding (domain-specific) translation candidates supporting the localization
Provided by
Philipp Cimiano - UNIBI

Publishing Rich Lexical Knowledge with Ontologies

Title
Publishing Rich Lexical Knowledge with Ontologies
Source Reference
http://www.w3.org/community/ontolex/wiki/IFLA, http://www.w3.org/community/ontolex/wiki/AGROVOC
Industry sector
Localisation, specifically terminology management
Actors and benefits they get from use case
Developers of thesauri, classification schemes, ontologies
Consumers and users of thesauri or classification schemes
Summary
In many cases a deeper linguistic grounding of linguistic elements specified in thesauri is required, e.g. including inflectional or other syntactic information describing how the terms behave syntactically and semantically. However, current models such as SKOS are not sufficient for this purpose. Thus, there is a clear need for an extensive vocabulary to model the linguistic properties of terms in a thesaurus, classification scheme etc. This need has been documented at http://www.w3.org/community/ontolex/wiki/IFLA for the International Federation of Library Associations and Institutions and at http://www.w3.org/community/ontolex/wiki/AGROVOC for the AGROVOC thesaurus developed by the Food and Agriculture Organization (FAO).
Several technological needs arise in the context of such a use case, such as i) easy porting of SKOS resources to ontolex, automatic creation of ontolex resources from non-ontolex resources (SKOS, RDF etc.). The needs of the above example use cases is addressed by introducing the ontolex vocabulary currently in development by the ontolex W3C community group, which provides the vocabulary necessary to define ontology lexica that realize a separate lexical layer that can be used to provide the rich linguistic and lexical information externally to the ontology / thesaurus / classification scheme in question in a separate file.

The benefit of the ontolex model is that ontology lexica can be published separately from the ontologies / models they lexicalize, thus giving a high degree of flexibility to add lexica for additional languages. This in particular adds some modularity as support for other languages that can be provided incrementally as needed.

Language technologies involved
morphological analysers
translation tools
Language resources involved
multilingual or monolingual dictionaries
existing lexical resources to link to
Specific benefit of using linguistic linked data in use case
Reuse of available resources for localization of lexica
Provided by
Philipp Cimiano, UNIBI


Aggregation of Lexical and Encyclopaedic Sources

Title
Aggregation of Lexical and Encyclopaedic sources
Source Reference
http://multijedi.org http://babelnet.org http://babelnet.org/2.0
Industry sector
Any
Actors and benefits they get from use case
Developers of dictionaries, encyclopedias, thesauri, ontologies
Consumers and users of large machine-readable, language knowledge resources (e.g. companies building systems that require large amounts of knowledge)
Summary
The alignment and integration of lexicographic, i.e. from dictionaries, and encyclopedic knowledge, i.e. from encyclopedias, is crucial for both developers and consumers of large knowledge resources. However, many online language resources are either based on Wikipedia, e.g. DBpedia, thereby focusing on encyclopedic content, or on dictionaries, such as OmegaWiki or Wiktionary. The MultiJEDI project has considered the case of interlinking and merging several language resources, i.e., WordNet, Wikipedia, OmegaWiki and the Open Multilingual WordNets. To perform the alignment, techniques are needed which link the same meanings available in different resources and decide when to merge the corresponding concepts into unified, multilingual concept representations.
Language technologies involved
word sense disambiguation
machine translation tools
Language resources involved
multilingual or monolingual online dictionaries and encyclopedias
Specific benefit of using linguistic linked data in use case
Reuse of available resources
Alignment to, exploitation and availability of other language resources in the LLOD cloud
Provided by
Roberto Navigli, UNIRM


Multilingual Disambiguation and Entity Linking

Title
Multilingual Disambiguation and Entity Linking
Source Reference
http://babelfy.org (available online)
Industry sector
Any
Actors and benefits they get from use case
Users and consumers of semantically-annotated or semantically-indexed text/data in any language
Summary
While Word Sense Disambiguation, the task of automatically associating meanings with words in context, has typically been a task restricted to a small number of researchers, recently the emergence of the new task of Entity Linking, concerned with linking named entities within text, has opened up new possibilities for a huge number of companies in search for services aimed at semantic indexing and linking of text written in arbitrary languages.
To perform Entity Linking, however, large amounts of machine-readable knowledge need to be available, together with effective algorithms for performing the task.
While several approaches to Entity Linking exist, the MultiJEDI project has addressed the task of integrating Word Sense Disambiguation with Entity Linking, showing that Wikification and Entity Linking services can greatly benefit from the integration of lexical, i.e. from dictionaries, and encyclopedic knowledge.
To do this and keep the task independent of language, large amounts of knowledge, lexicalized and connected in as many languages as possible, need to be made available.
Language technologies involved
word sense disambiguation
entity linking
Language resources involved
BabelNet
multilingual or monolingual online dictionaries and encyclopedias
Specific benefit of using linguistic linked data in use case
performance improvement thanks to linking to and exploiting other LLOD
Provided by
Roberto Navigli and Paola Velardi, UNIRM

Sentiment Analysis

from EUROSENTIMENT: http://eurosentiment.eu/wp-content/uploads/2013/06/EUROSENTIMENT-D2-3-User-Req-Use-Cases-UPM-v1_7-Final.pdf [paul & thierry]

Multilingual and Cross-lingual Sentiment Analysis

Title
Multilingual and Cross-lingual Sentiment Analysis
Source Reference
Slides of WebLyzard at EDF LIDER WS Meeting
Industry sector
Any
Actors and benefits they get from use case
Developers of sentiment analysis / opinion mining systems
Sentiment analysis and opinion mining systems are heavily used to understand and structure online communication about products, services etc. for marketing purposes. In many cases, analysis of brands across countries and natural languages is crucial. However, adapting a sentiment analysis system to other domains is expensive, requiring sentiment lexica in different languages, which are ideally contextualized.
Language technologies involved
automated sentiment analysis
Language resources involved
contextualized sentiment lexica for multiple languages
Specific benefit of using linguistic linked data in use case
reuse (linked) sentiment lexica in multiple languages to adapt a sentiment analysis system to other languages, lowering the cost for doing so
Provided by
Philipp Cimiano - UNIBI

Annotating News Feeds

from OpenCalais use cases [paul]

Resoruce Sharing for Named Entity Recognition

from Open NER project

something from DBPedia spotlight and Apache Standbol [Sebastian]

Information Management Use Cases

Data management projects that may benefit from linguistic linked data

Crowdsource Media Annotations

from MICO project: http://www.mico-project.eu/use-cases/ [?]

Data integration and analysis value chain

from CODE project; http://code-research.eu/vision [?]

Language analysis in economics and finance

from DOPA project; http://www.dopa-project.eu/index.php/vision/ Source of requirements for language analysis in economics and finance [?]

Cross lingual media retrieval in medicine

from KHRESHMOI project http://www.khresmoi.eu/use-cases/ Source of requirements for cross lingual and multimodal IR [dave]

Media Fragment Management

from MediaMixer project: http://community.mediamixer.eu/usecases [dave]

Language Technology Projects

Project employing language technology that may benefit from linguistic linked data.

Terminology Extraction

Title
Terminology Extraction for Localisation
Source Reference
FALCON Project
Industry sector
Localisation, specifically terminology management
Actors and benefits they get from use case
Localisation clients who will improve the terminology consistency of source content prior to translation
Language Service Providers who will be able to provide better translation quality through consistency in term translation
Summary
Localisation clients run source text through an automated term identification service which has been trained on a dictionary indexed with references to one or more linked data dictionaries, including their own organisation one.
The automated term identification service returns the source text with terms annotated, e.g. using ITS2.0 terminology data category, with a reference to a lexical data entry.
The term annotations are reviewed by the client's terminologist who identifies any false positives, dereferencing and reviewing information in the lexical data entry if needed. False positives may be fed back to improve the training corpora of the terminology extraction service.
The approved term annotations are then reviewed by a linguist who may, if the dictionary is multilingual and includes the source language, then deference the link of the terms, examine any translations present and opt to approve it for use in the job. If a translation is not present the linguistic may provide one if judged to be important for translation consistency. In both cases the term translations are then passed as a multilingual glossary together with the source text to the language service provider. If a new translation had been generated, this may be submitted back to the dictionary as a candidate translation for future use.
Language technologies involved
automated terminology extraction
term suggestion and translation review tool
Language resources involved
multilingual or monolingual linked data lexicon or dictionary
Specific benefit of using linguistic linked data in use case
Critical assessment of use case, e.g. in commercial use, trial only, research prototype, under-development
Provided by
Dave Lewis - CNGL/TCD

Parallel Text Curation and MT Retraining

Title
Parallel Text Curation and MT Retraining
Source Reference
FALCON Project
Industry sector
Localisation, specifically use of machine translation and postediting
Actors and benefits they get from use case
Localisation clients who will improve the quality of MT translation of their content over time, both for direct publishing of machine translated content and for improving throughput and discounts of human post-edited translation from LSPs.
Language Service Providers who will be able to provide better machine translation quality to clients through tailored selection of parallel text for training an MT engine for a specific job (where clients don't posses their own MT), and is also able to offer more competitive discounts based on improved postediting efficiency resulting from higher quality. The predicable improvement of MT for a client can be factored into discounts or charged as a value add service is access to the MT engine is offered after the job is completed.
Summary
As a new client translation project is started, the LSP trains an MT engine by selecting the most appropriate parallel text in the required language pairs from both internal and publicaly available parallel text in linked data format. This selection can be based on factors such as the similarity in domain, genre and term distribution or on the experience and quality assessment of the translators/posteditors of previous translations in relation to the client's project.
As the translation project progresses, the generated translations are also reviewed and selected for frequent retraining of the MT engine, improving the automated translation stage progressively over the period of the project.
Language technologies involved
machine translation
provenance tracking of postediting of machine translation
Language resources involved
parallel text with quality annotation
Specific benefit of using linguistic linked data in use case
ease in integrating postedit and QA data from different tools and subcontractors
Critical assessment of use case, e.g. in commercial use, trial only, research prototype, under-development
research prototype
Provided by
Dave Lewis - CNGL/TCD

Content Annotation for LT in Localisation

from MLW-LT working group: use cases: http://www.w3.org/International/its/wiki/Use_cases_-_high_level_summary [felix/dave]

Multilingual document search and data mining

from MANTRA project: http://www.mantra-project.eu/ Source of requirements on ML document annotation for search and data mining – biomed domain [dave/kev]

Quality Assessment for Machine Translation

from QTLaunchpad: http://www.qt21.eu/launchpad/ Mt quality assessment [felix]

Content Analytics Marketplace

from Annomarket https://annomarket.eu/ [?]

News Content Analysis

from XLIKE: http://www.xlike.org/wp-content/uploads/2012/03/D1.2.1-Requirements-for-early-prototypes-v10.pdf news content analysis [?]

LT innovation

LT innovate – industrial views on LT development, http://www.lt-innovate.eu/resources/document/lt-industry-definition-taxonomy [?]

Speeding Up Grammar Generation for Spoken Dialogue Systems

Title
Supporting Development of Dialogue Systems using Linked Data and Ontology Lexica
Source Reference
Portdial Project
Industry sector
Any
Actors and benefits they get from use case
Developers of dialogue systems
Summary
The creation of grammars for dialogue systems is costly. It can be made more cost-efficient by techniques that semi-automatically support the creation of grammars. In the Portdial project, an approach by which grammars are induced in a top-down fashion from an ontology lexicon has been explored and shown to deliver grammars that are highly precise but lack recall. UNIBI has implemented this approach for LTAG (Lexicalized Tree Adjoining Grammars), CCG (Combinatorial Categorial Grammars) and GF (Grammatical Framework Grammars).

Nevertheless, such grammars can be used to seed the further process of extending the coverage of the grammar, using other techniques to increase coverage. In addition, the Portdial project has in particularly supported the development of so called pre-terminal rules which expand non-terminals into a set of named entities. It has been shown that Linked Data can be used to support this use case.

Language technologies involved
Top-down grammar generation
Language resources involved
linked data with labels in different languages
existing language-specific grammars
Specific benefit of using linguistic linked data in use case
reuse (linked) ontology lexica for top-down grammar induction
reuse linked data with labels in many languages to support enhancement of pre-terminal rules in many languages
Provided by
Philipp Cimiano - UNIBI

Speeding Up Grammar Generation for Spoken Dialogue Systems

Title
Supporting Cross-lingual Information Retrieval
Source Reference
Organic.Lingua Project
Industry sector
Any
Actors and benefits they get from use case
???
Summary
???
Language technologies involved
?
?
?
?
Specific benefit of using linguistic linked data in use case
reuse (linked) ontology lexica for top-down grammar induction
reuse linked data with labels in many languages to support enhancement of pre-terminal rules in many languages
Provided by
CELI?

Social Media Analytics

Philipp to describe three use cases.

Multimedia Analytics

May be some input from use cases for non linked data meta-data: