A report summarizing the MultilingualWeb workshop in Madrid is now available from the MultilingualWeb site. It contains a summary of each session with links to presentation slides and minutes taken during the workshop in Madrid. The workshop was a huge success, with approximately 110 participants, and with the associated LIDER roadmapping workshop. The Workshop was hosted by Universidad Politécnica de Madrid, sponsored by the EU-funded LIDER project, by Verisign and by Lionbridge.
A new workshop in the MultilingualWeb series is planned for 2015.
This document builds upon on the Character Model for the World Wide Web 1.0: Fundamentals to provide authors of specifications, software developers, and content developers a common reference on string matching on the World Wide Web and thereby increase interoperability. String matching is the process by which a specification or implementation defines whether two string values are the same or different from one another.
The main target audience of this specification is W3C specification developers. This specification and parts of it can be referenced from other W3C specifications and it defines conformance criteria for W3C specifications, as well as other specifications.
This version of this document represents a significant change from its previous edition. Much of the content is changed and the recommendations are significantly altered. This fact is reflected in a change to the name of the document from “Character Model: Normalization” to “Character Model for the World Wide Web: String Matching and Searching”.
Version 7.0 of the Unicode Standard is now available, adding 2,834 new characters. This latest version adds the new currency symbols for the Russian ruble and Azerbaijani manat, approximately 250 emoji (pictographic symbols), many other symbols, and 23 new lesser-used and historic scripts, as well as character additions to many existing scripts. These additions extend support for written languages of North America, China, India, other Asian countries, and Africa. See the link above for full details.
Most of the new emoji characters derive from characters in long-standing and widespread use in Wingdings and Webdings fonts.
Major enhancements were made to the Indic script properties. New property values were added to enable a more algorithmic approach to rendering Indic scripts. These include properties for joining behavior, new classes for numbers, and a further division of the syllabic categories of viramas and rephas. With these enhancements, the default rendering for newly added Indic scripts can be significantly improved.
Unicode character properties were extended to the new characters. The old characters have enhancements to Script and Alphabetic properties, and casing and line-breaking behavior. There were also nearly 3,000 new Cantonese pronunciation entries, as well as new or clarified stability policies for promoting interoperable implementations.
Two other important Unicode specifications are maintained in synchrony with the Unicode Standard, and have updates for Version 7.0. These will be released at the same time:
A Last Call Working Draft of Encoding has been published.
While encodings have been defined to some extent, implementations have not always implemented them in the same way, have not always used the same labels, and often differ in dealing with undefined and former proprietary areas of encodings. This specification attempts to fill those gaps so that new implementations do not have to reverse engineer encoding implementations of the market leaders and existing implementations can converge.
The body of this spec is an exact copy of the WHATWG version as of the date of its publication, intended to provide a stable reference for other specifications. We are hoping for people to review the specification and send comments about any technical areas that need attention (see the Status section for details).
Please send comments by 1 July 2014.
The Unicode Consortium is pleased to announce the release of version 2014-05-16 of the Unicode Ideographic Variation Database (IVD). This release registers the new Moji_Joho collection, along with the first 10,710 sequences in that collection, 9,685 of which are shared by the registered Hanyo-Denshi collection. Details can be found at http://www.unicode.org/ivd/.
The slides from the MultilingualWeb workshop (including several posters) and the LIDER roadmapping workshop are now available for download. Additional material (videos of the presentations, a workshop report and more) will follow in the next weeks – stay tuned.
See the program. The keynote speaker will be Alolita Sharma, Director of Language Engineering from the Wikimedia Foundation. She is followed by a strong line up in sessions entitled Developers, Creators, Localizers, Machines, and Users, including speakers from Microsoft, Wikimedia Foundation, the UN FAO, W3C, Yandex, SDL, Lionbridge, Asia Pacific TLD, Verisign, DFKI, and many more. On the afternoon of the second day we will hold Open Space breakout discussions. Abstracts and details about an additional poster session will be provided shortly.
The program will also feature an LD4LT event on May 8-9, focusing on text analytics and the usefulness of Wikipedia and Dbpedia for multiilngual text and content analytics, and on language resources and aspects of converting selected types of language resources into RDF.
Participation in both events is free. See the Call for Participation for details about how to register for the MultilingualWeb workshop. The LD4LT event requires a separate registration and you have the opportunity to submit position statements about language resources and RDF.
If you haven’t registered yet, note that space is limited, so please be sure to register soon to ensure that you get a place.
The MultilingualWeb workshops, funded by the European Commission and coordinated by the W3C, look at best practices and standards related to all aspects of creating, localizing and deploying the multilingual Web. The workshops are successful because they attract a wide range of participants, from fields such as localization, language technology, browser development, content authoring and tool development, etc., to create a holistic view of the interoperability needs of the multilingual Web.
We look forward to seeing you in Madrid!
We would like to remind you that the deadline for speaker proposals for the 7th MultilingualWeb Workshop (May 7–8, 2014, Madrid, Spain) is on Friday, March 14, at 23:59 UTC.
Featuring a keynote by Alolita Sharma (Director of Engineering, Wikipedia) and breakout sessions on linked open data and other critical topics, this Workshop will focus on the advances and challenges faced in making the Web truly multilingual. It provides an outstanding and influential forum for thought leaders to share their ideas and gain critical feedback.
While the organizers have already received many excellent submissions, there is still time to make a proposal, and we encourage interested parties to do so by the deadline. With roughly 200 attendees anticipated for the Workshop from a wide variety of profiles, we are certain to have a large and diverse audience that can provide constructive and useful feedback, with stimulating discussion about all of the presentations.
For more information and to register, please visit the Madrid Workshop Call for Participation.
Alolita Sharma (Wikipedia) to deliver keynote at 7th Multilingual Web Workshop (May 7–8, 2014, Madrid)
We are please to announce that Alolita Sharma, Director of Engineering for Internationalization and Localization at Wikipedia, will deliver the keynote at the 7th Multilingual Web Workshop, “New Horizons for the Multilingual Web,” in Madrid, Spain (7–8 May 2014).
With over 30 million articles in 286 languages as of January 1, 2014, Wikipedia has now become one of the largest providers of multilingual content in the world. Because of its user-generated and constantly changing content, many traditional processes for managing multilingual content on the web either do not work or do not scale well for Wikipedia. Alolita Sharma’s keynote will highlight Wikipedia’s diversity in multilingual user-generated content and the language technologies that Wikipedia has had to develop to support its unprecedented growth of content. She will also discuss the many challenges Wikipedia faces in providing language support for the mobile web.
The Multilingual Web Workshop series brings together participants interested in the best practices, new technologies, and standards needed to help content creators, localizers, language tools developers, and others address the new opportunities and challenges of the multilingual Web. It will provide for networking across communities and building connections.
Registration for the Workshop is free, and early registration is recommended since space at the Workshop is limited.
There is still opportunity for individuals to submit proposals to speak at the workshop. Ideal proposals will highlight emerging challenges or novel solutions for reaching out to a global, multilingual audience. The deadline for speaker proposals is March 14, but early submission is strongly encouraged. See the Call for Participation for more details.
This workshop is made possible by the generous support of the LIDER project, which will organize a roadmapping workshop on linked data and content analytics as one of the tracks at Multilingual Web Workshop.
The workshop is a free community event – there is no admission fee for participants, but registration is required.
You are encouraged to provide a title for a position statement in your registration form. This is a simple, short statement that summarizes your ideas / technologies / use cases related to Linked Data and Language Technology.
As input to the discussion and the work of the LD4LT group, you may also want to fill in the first LIDER survey.