Linked Data for Language Technology (LD4LT) Group Kick-Off Meeting and Roadmap meeting, 21 March, Athens, Greece
Linked Data (LD) has proven beneficial in many new and unforeseen ways for Language Technology (LT) and the newly gained interoperability and availability of LT data and services is currently receiving industry adoption. With the foundation of the LD4LT W3C community group, we would like to start the discussion and analyse current trends as well as offer a crystallization point to coordinate the development of future LD-based LT applications. See the agenda.
All feedback is welcome and participation is open to all interested organisations and individuals from industry and academia.
The LD4LT Group Kick-Off and Roadmap Meeting is supported by the LIDER project, the MultilingualWeb community, the NLP2RDF project, the Working Group for Open Data in Linguistics as well as the DBpedia Project.
As input to the discussion and the work of the LD4LT group, you may consider to fill in the first LIDER survey. During the kick-off meeting, via the survey and in the LD4LT group, provide your view on how linked data and language technology should benefit each other.
To be held 7-8 May 2014 in Madrid, Spain, W3C announced today the seventh MultilingualWeb workshop in a series of events exploring the mechanisms and processes needed to ensure that the World Wide Web lives up to its potential around the world and across barriers of language and culture.
This workshop is made possible by the generous support of the LIDER project. As part of the event, LIDER will organize a roadmapping workshop on linked data and content analytics.
Anyone may attend all sessions at no charge and the W3C welcomes participation by both speakers and non-speaking attendees. Early registration is encouraged due to limited space.
Building on the success of six highly regarded previous workshops, this workshop will emphasize new technology developments that lead to new opportunities for the Multilingual Web. The workshop brings together participants interested in the best practices and standards needed to help content creators, localizers, language tools developers, and others meet the challenges of the multilingual Web. It provides further opportunities for networking across communities. We are particularly interested in speakers who can demonstrate novel solutions for reaching out to a global, multilingual audience.
ITS 2.0 provides a foundation for integrating automated processing of human language into core Web technologies. ITS 2.0 bears many commonalities with its predecessor, ITS 1.0, but provides additional concepts that are designed to foster the automated creation and processing of multilingual Web content.
Work on application scenarios for ITS 2.0 and gathering of usage and implementation experience will now take place in the ITS Interest Group.
The Unicode Consortium has announced Version 6.3 of the Unicode Standard and with it, significantly improved bidirectional behavior. The updated Version 6.3 Unicode Bidirectional Algorithm now ensures that pairs of parentheses and brackets have consistent layout and provides a mechanism for isolating runs of text.
Based on contributions from major browser developers, the updated Bidirectional Algorithm and five new bidi format characters will improve the display of text for hundreds of millions of users of Arabic, Hebrew, Persian, Urdu, and many others. The display and positioning of parentheses will better match the normal behavior that users expect. By using the new methods for isolating runs of text, software will be able to construct messages from different sources without jumbling the order of characters. The new bidi format characters correspond to features in markup (such as in CSS). Overall, these improvements also bring greater interoperability and an improved ability for inserting text and assembling user interface elements.
The improvements come with new rigor: the Consortium now offers two reference implementations and greatly improved testing and test data.
In a major enhancement for CJK usage, this new version adds standardized variation sequences for all 1,002 CJK compatibility ideographs. These sequences address a well-known issue of the CJK compatibility ideographs — that they could change their appearance when any process normalized the text. Using the new standardized variation sequences allows authors to write text which will preserve the specific required shapes of these CJK ideographs, even under Unicode normalization.
Version 6.3 includes other improvements as well:
- Improved Unihan data to better align with ISO/IEC 10646
- Better support for Hebrew word break behavior and for ideographic space in line breaking
The MultilingualWeb-LT Working Group has published a Proposed Recommendation of Internationalization Tag Set (ITS) Version 2.0. The technology described in this document provides a foundation for to integrating automated processing of human language into core Web technologies. ITS 2.0 bears many commonalities with its predecessor, ITS 1.0 but provides additional concepts that are designed to foster the automated creation and processing of multilingual Web content. ITS 2.0 focuses on HTML, XML-based formats in general, and can leverage processing based on the XML Localization Interchange File Format (XLIFF), as well as the Natural Language Processing Interchange Format (NIF). Comments are welcome through 22 October.
The draft implements all changes since the previous publication of 11 April 2013. There are no remaining open issues. The Working Group is planning to finalize ITS 2.0 now: this is your last time to provide feedback! The Last Call period ends 11 June.
ITS 2.0 provides metadata to foster the adoption of the multilingual Web.
A report summarizing the MultilingualWeb workshop in Rome is now available from the MultilingualWeb site. It contains a summary of each session with links to presentation slides and more detailed scribing done on site in Rome. Links to video for each session will be posted soon.
With approximately 150 attendees, the Rome Workshop focused on the theme “Making the Multilingual Web Work” and emphasized information about the best practices and standards that help content creators and localizers ensure that the World-Wide Web lives up to its name, across boundaries of language and culture. Attendees heard from a variety of perspectives, with fruitful dialogue between various stakeholder groups involved in trying to expand the multilingual scope of the Web.
Taking place over two days (12 and 13 March, 2013) at the headquarters of the UN’s Food and Agriculture Organization (FAO), the Workshop featured twenty-four conference-style presentations, seven poster presentations, and an “open space” discussion that featured six breakout sessions focusing on key topics that emerged during the Workshop. In addition, it showcased technology implementations of the forthcoming internationalization Tag Set (ITS) 2.0 standard.
Unicode Bidirectional Algorithm basics is a repackaging of the initial part of “What you need to know about the bidi algorithm and inline markup” as a standalone article. It provides a gentle introduction to the behaviour of the Unicode Bidirectional Algorithm, and helps you understand why bidirectional text in Arabic, Hebrew, Thaana, Urdu, etc. behaves the way it does.
ITS 2.0 provides metadata to foster the adoption of the multilingual Web.
The MultilingualWeb-LT Working Group published a First Public Working Draft of Metadata for the Multilingual Web – Usage Scenarios and Implementations. This document introduces a variety of usage scenarios and applications for the Internationalization Tag Set (ITS) 2.0, ranging from simple machine translation or human translation quality check to training for machine translation systems or automatic text analyis. Many of the underlying implemementations will be showcased in the upcoming W3C MultilingualWeb Workshop 12-13 March in Rome.