Lider roadmapping activities

From Linked Data for Language Technology Community Group

Introduction

Purpose

This document summarizes key findings from roadmapping activities undertaken by the LIDER project. These activities feed directly into the work of the LD4LT group on use cases and requirements on linked data and language technologies.

Structure

The document provides information for each activity: time and place, related event, target audience etc. It then summarizes key points on use cases and requirements gathered during the activity. Finally it provides links to more detailed information.

LIDER roadmapping workshops

LIDER roadmapping workshop, 21st March 2014, Athens

General information

Summary of outcomes

  • The workshop led to work on converting the META-SHARE language resource metadata to RDF. This is now undertaken in the LD4LT group.
  • Participants emphasized the need of general best practices and a general architecture for providing linguistic linked data, both metadata and the data itself.
  • Various industries have started to be aware of linguistic linked data.
  • Information about linguistic linked data sets should be made available.
  • Future workshops should use summaries of previous events as an introduction
  • Legal experts should be brought onto the table.
  • The work plan of the underlying LIDER project should be made clear to the public including milestones and activities.
  • Showcases on benefits of linked data in various domains like agriculture, medical, education etc. would be beneficial.

LIDER roadmapping workshop, 8-9 May 2014, Madrid

General information

Summary of outcomes

  • About text analytics:
    • Users of text analytics tools need: adaptability to their content domain, customization (e.g. import of taxonomies), flexible input & output formats and processing mechanisms (API, offline, ...).
    • Sentiment resolution is an important functionality for many text analytics usage scenarios.
    • When deciding on solutions, users take capacity (volume, performance, latency) and cost into account.
    • Multilingual text analytics is an area that so far has not seen a lot of activity in industry.
  • About multilingual aspects of Wikipedia:
    • Various types of background information also from Wikipedia can help translators.
    • Data models for structured information in Wikipedia, e.g. Wikidata, do not rely on the linked data,but conversions to and from linked data are possible.
    • One challenge is the integration of resources like Wikidata, BabelNet or DBpedia.
  • On language resources and language resource metadata:
    • One has to be careful in classifying resources, e.g. language resources (e.g. lexica) versus general knowledge bases like Wikipedia
    • A (diagram of an) linguistic linked data cloud needs to clearly distinguish between different types of resources.
    • The quality of language resources is sometimes hard to evaluate.
    • Interoperabilty of language resource metadata is a key prerequisite for working with the linguistic linked data cloud.
  • On multilingual corpus transformation:
    • NIF can serve as a pivot format for linked data based corpus representation and processing.
    • Several presentations demonstrated the feasibility of a NIF based approach in different usage scenarios, like RDF representation of existing, XML-based corpora, or integration of linguistic information with localization process metadata.
    • Tooling should be made available to work with linguistic linked data; best practices are needed as well.

Many use cases and requirements discussed at the workshop related to providing access to more structured/annotated data sources and allowing simple connectivity via APIs and standards.

LIDER roadmapping workshop, 4-6 June 2014, Dublin

General information

Summary of outcomes

  • There is a need for a common API to text analysis services, live update of linked data source, user feedback mechanisms, or annotation relevance indicators.
  • "Too much information is no information": linked data information can help the translator only if it does not lead to an information overflow.
  • A stand-off annotation mechanism is needed to deal with annotation overlap. NIF could be a solution.
  • For the localization industry, licensing metadata is of key importance. Only with such metadata one can also work with internal = closed linked data.
  • Terminology and linked data is a hot topic discussed also in the LD4LT group. Currently there is no standard mapping of the TBX format to RDF.
  • Bitext (= aligned text of a source and one or several translations) could be exposed as as linked data, as an alternative to TMX.

LIDER roadmapping workshop, 2 September 2014, Leipzig

General information

Summary of outcomes

  • Sharing of linked data involving a cooperative data curation
  • Providing more resources for micro-domains that generalize and can be shared
  • Avoid knowledge silos by emphasizing more the linking both in communities and enterprises
  • Focus on more high-quality open data
  • Work on deeper analysis and more semantics to enable semantic search for things, rather than strings
  • Clarification of what accuracy rates of linked data analytics are reasonable for clients with high statistical result expectations

LIDER roadmapping workshop, 6 July 2015, Rome

General information

Summary of outcomes

The workshop brought together the multimedia community with the linguistic linked data community. The main topic was the analysis of cross-media usage of linguistic linked data within Italian public bodies and projects in private companies. A concrete outcome was the potential for a collaboration with the Chambers of Deputies for improving the publishing and production of linked data and multimedia content.

LIDER roadmapping workshop, 13 July 2015, Munich

General information

Summary of outcomes

The workshop brought together the healthcare community with the linguistic linked data community. Providers of several types of data sources in this area and providers of analytics solutions for the health care sector came together and introduced the current state and needs. Several issues came up frequently, like: the integration and harmonization of data on the semantic level, the availability of data, the quality and usability of content analytics systems, and the need to educate people about what technologies to use for what purposes.

LIDER roadmapping workshop, 20 October Madrid

General information

Summary of outcomes

The event helped to increase awareness of linguistic linked data in focused communities such as digital humanities and linguistics. It brought together linguistics, digital humanities, computer science, and the public sector. Some key outcomes and topics discussed at the event are:

  1. Dissemination of the guidelines and best practices for Linguistic Linked Data, as well as of the Ontolex-lemon model.
  2. Application of linked data technologies in digital humanities. Several success stories were presented at the workshop, such as the Spanish National Library linked data portal. Libraries and related use cases are ready for the application of linguistic linked data techniques in the short term.
  3. Promotion of language technologies by the Spanish government.

Other activities relevant for roadmapping (in chronological order)

XML Prague 2014

General information

Summary of outcomes

  • Various groups of users can be identified for linked data aware content analytics tooling, from an individual content author to a content architect and indexing specialists working with masses of content.
  • For all these users the usability of content analytics tools are of high importance.
  • Tooling needs to be available in the right part of the content production workflow.
  • Certain standardization challenge can help the interoperability of metadata produced by content analytics applications.
  • These applications only add value for the end user if they are tailored to selected domains and languages.
  • In that way content analytics also has the potential to become a booster for business models demonstrating the value of public open data.

MultilingualWeb Workshop 2014

General information

Summary of outcomes

  • The relation between Wikidata and DBpedia was a main topic of discussion. The workshop helped to understand the different and often complementary viewpoints of these efforts for creating multilingual structured data sources.
  • Several participants and presentations stressed the need for standards. Multilingual, digital content workflows, including curated and crowd sourced data, need to be based on standards.
  • The embedding of linguistic linked data into general Web technology development is needed.
  • The quality of linked data resources was a re-occurring topic at this event and many other LIDER roadmapping events.
  • For users of analytics solutions, often details of the technical approach are less important compared to other relevant dimensions such as capacity (volume, performance, latency) and cost of solutions.
  • Interoperability of language resource metadata is a key for all applications.

Tutorial on Linked Data for Language Technologies

General information

  • Date/Place: 26-31.05.2014 Reykjavic
  • Audience: PhD students, post-docs, or industry people working with NLP applications
  • Related event: LREC 2014

Summary of outcomes

  • Dissemination of linguistic linked data technologies; research related community building

Soap! Conference 2014

General information

Summary of outcomes

  • Like other content related areas, technical documentation suffers from more and more content being available on the Web. The industry asks: So how to survive the competition and make a difference?
  • Content with additional assets, like metadata to ease localization and further metadata for contextualization, personalization, or search engine optimization, could become the differentiator.
  • The terms semantic web or linked data are not well known in industry. High level business people who decide about budget for content workflow tools etc. see semantic web as an academic area.
  • One has to focus on the functionality and talk about metadata, then chances are higher to convince people investing in more technologies.
  • Application areas of linguistic linked data in technical documentation discussed are e.g.: users who have a problem with a device / software / ... search on the Web and find the answer often in manual content. But since currently most of the content is not enriched with metadata (=linked data annotations, schema.org, …), they search for keywords. Havingquestion answering functionality for technical documentation may be attractive for many industry areas, and may become a differentiator to competitors.
  • Technical documentation existing content already contains a lot of information which currently is lost in the process from authoring to Web publishing. There is an opportunity also for their existing content to become a value asset in above applications or others.

Building the Multilingual Web of Data Tutorial

General information

  • Date/Place: 20.10.2014, Riva de Garda
  • Main page: http://iswc2014.semanticweb.org/
  • Audience: Semantic Web Community researchers, developers, and practitioners with interest in multilingualism and/or NLP techniques in the context of the Semantic Web.
  • Related event: ISWC 2014

Summary of outcomes

  • Evangelization of linguistic linked data technologies; research related community building

EKAW 2014

General information

SHARE-PSI Roadmapping session on Linguistic Linked Data and commercial use

General information

Big Data Networking Day 2015

General information

  • Date/Place: 16.01.2015, Brussels
  • Audience: H2020 ICT-16 Project Participants

Summary of outcomes

  • LIDER participants contributed to discussions about research and innovation planning in the area of language technologies. Here, LIDER cooperated with the Horizon 2020 projects CRACKER and LT-OBSERVATORY.

III Jornadas esDBpedia

General information

Summary of outcomes

Increase the amount and quality of semantic data stored in esDBpedia, and the number of links to other semantic data silos in Spanish, to make esDBpedia the core of semantic data in Spanish. This event helped to increase the awareness of linked data, the creation of semantic data and storage technologies.

Linked Data and Machine Translation Meeting 2015

General information

  • Date/Place: January 2015, Berlin
  • Audience: key research groups in the realm of machine translation and linguistic linked data.

Summary of outcomes

  • Promote usage info on LLD resources in MT, to demonstrate their usefulness to the MT community.
  • Enhance tabular corpora with LLD metadata to enable more open, repeatable data management for shared MT training and evaluation tasks.
  • Develop “killer application showcases” demonstrating the use of MT for making the linked data Web multilingual, and vice versa. This is a two way relationship between MT and LLD. Currently it seems that the benefits of LLD for MT are easier to showcase, e.g. by using a multilingual LLD resource for MT training.
  • Consider joint research proposals. A project consortium focusing on linked data experts that understands multilingual aspects is key.
  • It was agreed that any strategic research direction for joint MT and LLD research should be informed by current public consensus building efforts with the aim of maximising industry involvement. Ongoing public consensus building activities include: Metadata for language resources developed with META-SHARE input at the Linked Data for Language Technology (LD4LT) community group at the W3C; Discussion on open data management for public automated translation at the ITS interest group at the W3C, which may provide useful input to the CEF for MT services; An initial LLD research roadmap5 developed by the LIDER project, due to be reviewed and revised at the LD4LT group; Best practice on multilingual linked open data which will address linked data profiles for terminology and parallel text resources.

XML Prague 2015

General information

Summary of outcomes

  • The relation of XML based toolchains to linked data toolchains has been identified as an important topic for getting real world application of (linguistic) linked data in XML based digital content workflows. Tackling this topic could be a task for future projects.


SHARE-PSI Roadmapping session on linguistic linked data sets for PSI

General information

MultilingualWeb Workshop 2015 and Riga Summit on the Multilingual Digital Single Market

General information

Summary of outcomes

LIDER key people contributed via presentations, panel discussions, a LIDER booth, demos and posters to all three days of the Riga Summit: the META-FORUM event on day 1, the summit event on day 2, and on day 3 to the MultilingualWeb workshop and the Connecting Europe Facility event. In this way, LIDER raised awareness about linguistic linked data in a huge and diverse audience.

LIDER Sessions at the BDVA Summit 2015

General information

Summary of outcomes

A blog post by the Big Data Europe project who co-organised the session on standardisation summarizes the outcomes of that session. The session brought together stakeholders from various big data related communities and was one input to the planning of the European Commission survey on standards in the digital single market.

The session on the multilingual digital single market brought together key people from industry and research who are interested in the bridge between language technologies and big data. It helped to continue and strengthen the already existing awareness in the big data community about language being the largest amount of data on the Web, and about needs and opportunities for handling big, multilingual data.

4th Workshop on the Multilingual Semantic Web (MSW-4)

General information

Summer Datathon on Linguistic Linked Data 2015

General information

Summary of outcomes

The datathon increased the awareness of linguistic linked data and attracted many people interested in the topic, putting them in connection with other experts in various fields. The participants were encouraged to join related groups in W3C and the open knowledge foundation, and to stay in touch and the linked data community. Also they were encouraged to finish the datathon work items, to adopt guidelines using e.g. the LIDER reference cards, and to use LingHub for discovering language resources. They should increase the amount of linguistic linked data and consider work on new usage scenarios and linguistic linked data aware applications.


20th European symposium on Languages for Special Purposes

General information

Summary of outcomes

At the event, usage of linguistic linked data in the realm of terminology was discussed. Data integration is a key challenge for terminological resources. This relates to integration of terminology resources in one language or across languages, as well as integration of terminological knowledge with world knowledge or other types of linguistic resources. The Lemon model can help to tackle this challenge, and tools like the TBX2RDF converter can ease application development.

ACL 2015

General information

Summary of outcomes

tbd


EDF 2015

General information

  • Date/Place: 15-16.11.2015 Luxembourg
  • Main page: http://2015.data-forum.eu/
  • Audience: industry, research, policy makers, and community

Summary of outcomes

Raise awareness of Linguistic Linked Data, and main goals achieved by LIDER project. Dissemination of main outcomes of the project: Guidelines and Best practices for linguistic linked data, though the video and also through other visual material printed like leaflet or reference cards. Set up and support a community and attract many people/users interested in Linked data or that can be benefit by Linked Data technologies produced by LIDER. Networking Symbiosis with other projects