Main Page

From Linked Data for Language Technology Community Group

Activities

We also invite you to participate in any of the upcoming group calls and roadmapping workshops being organised by the group.

Calls and workshops are announced over the mailing list. We pursue three major activities:

  • Roadmapping activities (primarily 2013-2015)
    • Roadmapping activities have not been stalled since 2015, but they are now being conducted in more focused sub-discussions, i.e., on web standards for language resource metadata and linguistic annotation
  • Metadata vocabularies for language resources (since 2015)
  • Web standards for linguistic annotation (since 2019)

Web standards for linguistic annotations

Since 2013, LD4LT has embraced the NLP Interchange Format (NIF) as a community standard for representing linguistic annotations on the web. However, NIF competes (and partially overlaps) in this regard with other web standards such as Web Annotation on the one hand, with ISO standards that primarily build on XML technologies on the other hand, as well as with community- or platform-specific solutions such as TEI-XML (for the Digital Humanities) and the LAPPS Interchange Format (for Galaxy). We identified the need to harmonize these efforts and are currently working towards a synthesis.

  • GitHub: repository
  • We conduct (more or less) regular telcos in collaboration with the Cost Action Nexus Linguarum, announced via the LD4LT mailing list

OWL Metamodel for Language Resources

With the aim of converting data/metadata of Language Resources into the cloud of Linguistic Linked Data, an OWL metamodel is being developed based on the inputs and previous experience of well-established LRs communities such as Meta-Share. In this section we are collecting pointers and materials related to that topic.

Roadmapping activities

During an initial phase, LD4LT focused on roadmapping activities and surveys regarding needs and potential of linked data in language technology. This has been conducted in conjunction with the FP7 project LIDER: Linked Data as an enabler of cross-media and multilingual content analytics for enterprises across Europe, with a peak in activity in 2015. On this basis, we focused on more specialialized aspects such as language resource metadata and exchange formats for linguistic annotations.

  • Summary of LD4LT Roadmapping workshops
    • 05/03/2015: 1st session on LIDER reference architecture NOTE: this had been announced to be 3 March, but the call will be 5 March.
    • 19/03/2015: Presentation of linghub to the LD4LT community
    • 02/04/2015: Presentation on the Digital Single Market and linguistic linked data enhanced content analytics

Use Case questionnaire

Part of the roadmapping acitivities is to develop use cases to explore the industrial relevance of linguistic linked data in different industrial and governmental applications. Initial input on such use cases is being gathered though an online questionnaire currently offered by the group, and these will be elaborated on in face to face session at the above workshops. To seed this process however, a number of suggested use cases are being summarised on this wiki based on previous or ongoing applications.

The results of an initial survey conducted by the LIDER project into requirements and use cases related to linguistic linked data are now available. It is interesting to read this in tandem with results of a survey just released by LT-Innovate on interest in a European Language Cloud.

Events

Past Events

Collaborative Landscape

The LD4LT Community Group aims to work closely with other active communities with an interest in building consensus on interoperability of linguistic data using linked data.

W3C Best Practice for Multilingual Linked Open Data Community Group

This group focusses on capturing best practice in publishing and using linguistic linked data.

LD4LT collaborates on improving the understanding of the use cases to which these best practices may apply.

W3C CG Ontology-Lexica (OntoLex)

W3C CG OntoLex published the OntoLex vocabulary, maintains existing and emerging OntoLex modules, as well as demonstrations and best practice for the representation of lexica and machine readable dictionaries on the web of data.

LD4LT collaborates with OntoLex on metadata for lexical resources as well as on linking textual data and dictionaries.

Cost Action CA18209 - European network for Web-centred linguistic data science (NexusLinguarum, 2019-2024)

The main aim of this Action is to promote synergies across Europe between linguists, computer scientists, terminologists, and other stakeholders in industry and society, in order to investigate and extend the area of linguistic data science. We understand linguistic data science as a subfield of the emerging “data science”, which focuses on the systematic analysis and study of the structure and properties of data at a large scale, along with methods and techniques to extract new knowledge and insights from it. Linguistic data science is a specific case, which is concerned with providing a formal basis to the analysis, representation, integration and exploitation of language data (syntax, morphology, lexicon, etc.). In fact, the specificities of linguistic data are an aspect largely unexplored so far in a big data context.

In order to support the study of linguistic data science in the most efficient and productive way, the construction of a mature holistic ecosystem of multilingual and semantically interoperable linguistic data is required at Web scale. Such an ecosystem, unavailable today, is needed to foster the systematic cross-lingual discovery, exploration, exploitation, extension, curation and quality control of linguistic data. We argue that linked data (LD) technologies, in combination with natural language processing (NLP) techniques and multilingual language resources (LRs) (bilingual dictionaries, multilingual corpora, terminologies, etc.), have the potential to enable such an ecosystem that will allow for transparent information flow across linguistic data sources in multiple languages, by addressing the semantic interoperability problem.

LD4LT collaborates with Nexus Linguarum mostly in the context of their WG1 "Linked-Data based Language Resources". From 2019 to 2024, Nexus Linguarum plays a coordinating role for several W3C CGs, complementing task-oriented CGs with a global perspective on linguistic data science.

Open Linguistics Working Group (OWLG)

LD4LT partially sprang out of the Open Knowledge Foundation (OKF) working group on Open Data in Linguistics (OWLG). In 2020, OKF infrastructures were restructured and OKF WGs have been dissolved and their mailing lists integrated into a shared forum.

The OWLG continues to exist as a Google Group, but the main locus of the activity of this community has moved to the Cost Action Nexus Linguarum. It is possible, however, that OWLG activities will increase again after the end of Nexus Linguarum in 2024, as Nexus has temporarily absorbed its function to provide an umbrella over and an exchange between different language technology CGs.

Main activities of the OWLG include creating and maintaining the Linguistic Linked Open Data (LLOD) cloud diagram, organizing a series of international workshops on Linked Data in Linguistics (LDL).

W3C Data Activity

Works to facilitate potentially Web-scale data integration and processing. It does this by providing standard data exchange formats, models, tools, and guidance.

LD4LT will promote awareness of best practice from the Data Activity and standardised vocabularies (e.g. DCAT, ORG, PROV) from the W3C to identify further language-specific use cases and vocabularies that may be advanced in collaboration with other groups, e.g. BP-MLOD.

W3C Internationalization Activity

Works with W3C working groups and liaises with other organizations to make it possible to use Web technologies with different languages, scripts, and cultures. It includes the ITS Interest Group, which is maintaining an RDF vocabulary for text annotation based on the Internationalization Tag Set 2.0.

LD4LT will promote awareness of existing guidelines and best practice promoted in relation to web content, and also assist in identifying use cases needing best practice in interlinking multilingual web and linguistic linked data, and promote these via the BP-MLOD Community Group.

Past collaborations

In its founding era, LD4LT has been collaborating with W3C Linked Data Models for Emotion and Sentiment Analysis Community Group, but this group went inactive since 2020

Relics

Update or remove.

Teleconference logistics

Tracker, mailing list etc.

ISSUE and ACTION tracker

public-ld4lt@w3c.org mailing list archive

LD4LT Community Group Public Landing page