Seed Use Cases
This page presents summaries of existing known use cases based on existing linked data applications and platforms and prototypes developed in R&D projects.
Use Case Template
We aim to just present overviews of use cases in an easily digestible form, so please use the following tempate:
Title
- Source Reference
- Industry sector
- Actors and benefits they get from use case
- Summary of use case in a few lines
- Language technologies involved
- Language resources involved
- Specific benefit of using linguistic linked data in use case
- Critical assessment of use case, e.g. in commercial use, trial only, research prototype, under-development
- Provided by
Linguistic-Linked Data Aware
Use cases specifically designed to employ linguistic linked data:
Ontology Localisation
- Title
- Ontology Localization
- Source Reference
- http://cordis.europa.eu/fp7/ict/language-technologies/project-monnet_en.html
- Industry sector
- Any
- Actors and benefits they get from use case
- Developers of ontologies or controlled vocabularies that want to localize their ontologies for cross-lingual interoperability
- Summary
- In many scenarios, the localization of ontologies is an important task as interoperability needs to be established across languages and borders. The Monnet project has considered in particular the case of financial terminologies / vocabularies such as XBRL and the specific GAAP-based language-specific vocabularies. In order to establish interoperability between all these national vocabularies, they need to be aligned. As a basis for automatic or semi-automatic alignment, appropriate translations are needed. Thus, techniques are needed to translate the labels of an ontology into some other target language, which we refer to as ``ontology localization.
- Language technologies involved
- automated terminology extraction and analysis
- machine translation systems
- Language resources involved
- multilingual or monolingual linked data lexicons or dictionaries
- multilingual term bases
- translation memories
- Specific benefit of using linguistic linked data in use case
- reuse resources for finding (domain-specific) translation candidates supporting the localization
- Provided by
- Philipp Cimiano - UNIBI
Publishing Rich Lexical Knowledge with Ontologies
- Title
- Publishing Rich Lexical Knowledge with Ontologies
- Source Reference
- http://www.w3.org/community/ontolex/wiki/IFLA, http://www.w3.org/community/ontolex/wiki/AGROVOC
- Industry sector
- Localisation, specifically terminology management
- Actors and benefits they get from use case
- Developers of thesauri, classification schemes, ontologies
- Consumers and users of thesauri or classification schemes
- Summary
- In many cases a deeper linguistic grounding of linguistic elements specified in thesauri is required, e.g. including inflectional or other syntactic information describing how the terms behave syntactically and semantically. However, current models such as SKOS are not sufficient for this purpose. Thus, there is a clear need for an extensive vocabulary to model the linguistic properties of terms in a thesaurus, classification scheme etc. This need has been documented at http://www.w3.org/community/ontolex/wiki/IFLA for the International Federation of Library Associations and Institutions and at http://www.w3.org/community/ontolex/wiki/AGROVOC for the AGROVOC thesaurus developed by the Food and Agriculture Organization (FAO).
- Several technological needs arise in the context of such a use case, such as i) easy porting of SKOS resources to ontolex, automatic creation of ontolex resources from non-ontolex resources (SKOS, RDF etc.). The needs of the above example use cases is addressed by introducing the ontolex vocabulary currently in development by the ontolex W3C community group, which provides the vocabulary necessary to define ontology lexica that realize a separate lexical layer that can be used to provide the rich linguistic and lexical information externally to the ontology / thesaurus / classification scheme in question in a separate file.
The benefit of the ontolex model is that ontology lexica can be published separately from the ontologies / models they lexicalize, thus giving a high degree of flexibility to add lexica for additional languages. This in particular adds some modularity as support for other languages that can be provided incrementally as needed.
- Language technologies involved
- morphological analysers
- translation tools
- Language resources involved
- multilingual or monolingual dictionaries
- existing lexical resources to link to
- Specific benefit of using linguistic linked data in use case
- Reuse of available resources for localization of lexica
- Provided by
- Philipp Cimiano, UNIBI
Aggregation of Lexical and Encyclopaedic Sources
- Title
- Aggregation of Lexical and Encyclopaedic sources
- Source Reference
- http://multijedi.org http://babelnet.org http://babelnet.org/2.0
- Industry sector
- Any
- Actors and benefits they get from use case
- Developers of dictionaries, encyclopedias, thesauri, ontologies
- Consumers and users of large machine-readable, language knowledge resources (e.g. companies building systems that require large amounts of knowledge)
- Summary
- The alignment and integration of lexicographic, i.e. from dictionaries, and encyclopedic knowledge, i.e. from encyclopedias, is crucial for both developers and consumers of large knowledge resources. However, many online language resources are either based on Wikipedia, e.g. DBpedia, thereby focusing on encyclopedic content, or on dictionaries, such as OmegaWiki or Wiktionary. The MultiJEDI project has considered the case of interlinking and merging several language resources, i.e., WordNet, Wikipedia, OmegaWiki and the Open Multilingual WordNets. To perform the alignment, techniques are needed which link the same meanings available in different resources and decide when to merge the corresponding concepts into unified, multilingual concept representations.
- Language technologies involved
- word sense disambiguation
- machine translation tools
- Language resources involved
- multilingual or monolingual online dictionaries and encyclopedias
- Specific benefit of using linguistic linked data in use case
- Reuse of available resources
- Alignment to, exploitation and availability of other language resources in the LLOD cloud
- Provided by
- Roberto Navigli, UNIRM
Multilingual Disambiguation and Entity Linking
- Title
- Multilingual Disambiguation and Entity Linking
- Source Reference
- http://babelfy.org (available online)
- Industry sector
- Any
- Actors and benefits they get from use case
- Users and consumers of semantically-annotated or semantically-indexed text/data in any language
- Summary
- While Word Sense Disambiguation, the task of automatically associating meanings with words in context, has typically been a task restricted to a small number of researchers, recently the emergence of the new task of Entity Linking, concerned with linking named entities within text, has opened up new possibilities for a huge number of companies in search for services aimed at semantic indexing and linking of text written in arbitrary languages.
- To perform Entity Linking, however, large amounts of machine-readable knowledge need to be available, together with effective algorithms for performing the task.
- While several approaches to Entity Linking exist, the MultiJEDI project has addressed the task of integrating Word Sense Disambiguation with Entity Linking, showing that Wikification and Entity Linking services can greatly benefit from the integration of lexical, i.e. from dictionaries, and encyclopedic knowledge.
- To do this and keep the task independent of language, large amounts of knowledge, lexicalized and connected in as many languages as possible, need to be made available.
- Language technologies involved
- word sense disambiguation
- entity linking
- Language resources involved
- BabelNet
- multilingual or monolingual online dictionaries and encyclopedias
- Specific benefit of using linguistic linked data in use case
- performance improvement thanks to linking to and exploiting other LLOD
- Provided by
- Roberto Navigli and Paola Velardi, UNIRM
Sentiment Analysis
from EUROSENTIMENT: http://eurosentiment.eu/wp-content/uploads/2013/06/EUROSENTIMENT-D2-3-User-Req-Use-Cases-UPM-v1_7-Final.pdf [paul & thierry]
Multilingual and Cross-lingual Sentiment Analysis
- Title
- Multilingual and Cross-lingual Sentiment Analysis
- Source Reference
- Slides of WebLyzard at EDF LIDER WS Meeting
- Industry sector
- Any
- Actors and benefits they get from use case
- Developers of sentiment analysis / opinion mining systems
- Sentiment analysis and opinion mining systems are heavily used to understand and structure online communication about products, services etc. for marketing purposes. In many cases, analysis of brands across countries and natural languages is crucial. However, adapting a sentiment analysis system to other domains is expensive, requiring sentiment lexica in different languages, which are ideally contextualized.
- Language technologies involved
- automated sentiment analysis
- Language resources involved
- contextualized sentiment lexica for multiple languages
- Specific benefit of using linguistic linked data in use case
- reuse (linked) sentiment lexica in multiple languages to adapt a sentiment analysis system to other languages, lowering the cost for doing so
- Provided by
- Philipp Cimiano - UNIBI
Annotating News Feeds
from OpenCalais use cases [paul]
Resoruce Sharing for Named Entity Recognition
from Open NER project
something from DBPedia spotlight and Apache Standbol [Sebastian]
Information Management Use Cases
Data management projects that may benefit from linguistic linked data
Crowdsource Media Annotations
from MICO project: http://www.mico-project.eu/use-cases/ [?]
Data integration and analysis value chain
from CODE project; http://code-research.eu/vision [?]
Language analysis in economics and finance
from DOPA project; http://www.dopa-project.eu/index.php/vision/ Source of requirements for language analysis in economics and finance [?]
Cross lingual media retrieval in medicine
from KHRESHMOI project http://www.khresmoi.eu/use-cases/ Source of requirements for cross lingual and multimodal IR [dave]
Media Fragment Management
from MediaMixer project: http://community.mediamixer.eu/usecases [dave]
Language Technology Projects
Project employing language technology that may benefit from linguistic linked data.
Terminology Extraction
- Title
- Terminology Extraction for Localisation
- Source Reference
- FALCON Project
- Industry sector
- Localisation, specifically terminology management
- Actors and benefits they get from use case
- Localisation clients who will improve the terminology consistency of source content prior to translation
- Language Service Providers who will be able to provide better translation quality through consistency in term translation
- Summary
- Localisation clients run source text through an automated term identification service which has been trained on a dictionary indexed with references to one or more linked data dictionaries, including their own organisation one.
- The automated term identification service returns the source text with terms annotated, e.g. using ITS2.0 terminology data category, with a reference to a lexical data entry.
- The term annotations are reviewed by the client's terminologist who identifies any false positives, dereferencing and reviewing information in the lexical data entry if needed. False positives may be fed back to improve the training corpora of the terminology extraction service.
- The approved term annotations are then reviewed by a linguist who may, if the dictionary is multilingual and includes the source language, then deference the link of the terms, examine any translations present and opt to approve it for use in the job. If a translation is not present the linguistic may provide one if judged to be important for translation consistency. In both cases the term translations are then passed as a multilingual glossary together with the source text to the language service provider. If a new translation had been generated, this may be submitted back to the dictionary as a candidate translation for future use.
- Language technologies involved
- automated terminology extraction
- term suggestion and translation review tool
- Language resources involved
- multilingual or monolingual linked data lexicon or dictionary
- Specific benefit of using linguistic linked data in use case
- Critical assessment of use case, e.g. in commercial use, trial only, research prototype, under-development
- Provided by
- Dave Lewis - CNGL/TCD
Parallel Text Curation and MT Retraining
- Title
- Parallel Text Curation and MT Retraining
- Source Reference
- FALCON Project
- Industry sector
- Localisation, specifically use of machine translation and postediting
- Actors and benefits they get from use case
- Localisation clients who will improve the quality of MT translation of their content over time, both for direct publishing of machine translated content and for improving throughput and discounts of human post-edited translation from LSPs.
- Language Service Providers who will be able to provide better machine translation quality to clients through tailored selection of parallel text for training an MT engine for a specific job (where clients don't posses their own MT), and is also able to offer more competitive discounts based on improved postediting efficiency resulting from higher quality. The predicable improvement of MT for a client can be factored into discounts or charged as a value add service is access to the MT engine is offered after the job is completed.
- Summary
- As a new client translation project is started, the LSP trains an MT engine by selecting the most appropriate parallel text in the required language pairs from both internal and publicaly available parallel text in linked data format. This selection can be based on factors such as the similarity in domain, genre and term distribution or on the experience and quality assessment of the translators/posteditors of previous translations in relation to the client's project.
- As the translation project progresses, the generated translations are also reviewed and selected for frequent retraining of the MT engine, improving the automated translation stage progressively over the period of the project.
- Language technologies involved
- machine translation
- provenance tracking of postediting of machine translation
- Language resources involved
- parallel text with quality annotation
- Specific benefit of using linguistic linked data in use case
- ease in integrating postedit and QA data from different tools and subcontractors
- Critical assessment of use case, e.g. in commercial use, trial only, research prototype, under-development
- research prototype
- Provided by
- Dave Lewis - CNGL/TCD
Content Annotation for LT in Localisation
from MLW-LT working group: use cases: http://www.w3.org/International/its/wiki/Use_cases_-_high_level_summary [felix/dave]
Multilingual document search and data mining
from MANTRA project: http://www.mantra-project.eu/ Source of requirements on ML document annotation for search and data mining – biomed domain [dave/kev]
Quality Assessment for Machine Translation
from QTLaunchpad: http://www.qt21.eu/launchpad/ Mt quality assessment [felix]
Content Analytics Marketplace
from Annomarket https://annomarket.eu/ [?]
News Content Analysis
from XLIKE: http://www.xlike.org/wp-content/uploads/2012/03/D1.2.1-Requirements-for-early-prototypes-v10.pdf news content analysis [?]
LT innovation
LT innovate – industrial views on LT development, http://www.lt-innovate.eu/resources/document/lt-industry-definition-taxonomy [?]
Speeding Up Grammar Generation for Spoken Dialogue Systems
- Title
- Supporting Development of Dialogue Systems using Linked Data and Ontology Lexica
- Source Reference
- Portdial Project
- Industry sector
- Any
- Actors and benefits they get from use case
- Developers of dialogue systems
- Summary
- The creation of grammars for dialogue systems is costly. It can be made more cost-efficient by techniques that semi-automatically support the creation of grammars. In the Portdial project, an approach by which grammars are induced in a top-down fashion from an ontology lexicon has been explored and shown to deliver grammars that are highly precise but lack recall. UNIBI has implemented this approach for LTAG (Lexicalized Tree Adjoining Grammars), CCG (Combinatorial Categorial Grammars) and GF (Grammatical Framework Grammars).
Nevertheless, such grammars can be used to seed the further process of extending the coverage of the grammar, using other techniques to increase coverage. In addition, the Portdial project has in particularly supported the development of so called pre-terminal rules which expand non-terminals into a set of named entities. It has been shown that Linked Data can be used to support this use case.
- Language technologies involved
- Top-down grammar generation
- Language resources involved
- linked data with labels in different languages
- existing language-specific grammars
- Specific benefit of using linguistic linked data in use case
- reuse (linked) ontology lexica for top-down grammar induction
- reuse linked data with labels in many languages to support enhancement of pre-terminal rules in many languages
- Provided by
- Philipp Cimiano - UNIBI
Speeding Up Grammar Generation for Spoken Dialogue Systems
- Title
- Supporting Cross-lingual Information Retrieval
- Source Reference
- Organic.Lingua Project
- Industry sector
- Any
- Actors and benefits they get from use case
- ???
- Summary
- ???
- Language technologies involved
- ?
- ?
- ?
- ?
- Specific benefit of using linguistic linked data in use case
- reuse (linked) ontology lexica for top-down grammar induction
- reuse linked data with labels in many languages to support enhancement of pre-terminal rules in many languages
- Provided by
- CELI?
Social Media Analytics
Philipp to describe three use cases.
Multimedia Analytics
May be some input from use cases for non linked data meta-data:
- IPTC for images
- PLUS for images
- PBCORE for public broadcasting
- W3C media annotation group