W3C

XML Technology

XML Technologies including XML, XML Namespaces, XML Schema, XSLT, Efficient XML Interchange (EXI), and other related standards.

XML Essentials Header link

XML is shouldered by a set of essential technologies such as the infoset and namespaces. They address issues when using XML in specific applications contexts.

Schema Header link

Formal descriptions of vocabularies create flexibility in authoring environments and quality control chains. W3C’s XML Schema, SML, and data binding technologies provide the tools for quality control of XML data.

Security Header link

Manipulating data with XML requires sometimes integrity, authentication and privacy. XML signature, encryption, and xkms can help create a secure environment for XML.

Transformation Header link

Very frequently one wants to transform XML content into other formats (including other XML formats). XSLT and XPath are very powerful tools for creating different representations of XML content.

Query Header link

XQuery (supported by XPath) is a query language for XML to extract data, similar to the role of SQL for databases, or SPARQL for the Semantic Web.

Components Header link

The XML ecosystem is using additional tools to create a richer environment for using and manipulating XML documents. These components include style sheets, xlink xml:id, xinclude, xpointer, xforms, xml fragments, and events.

Processing Header link

A processing model defines what operations should be performed in what order on an XML document.

Internationalization Header link

W3C has worked with the community on the internationalization of XML, for instance for specifying the language of XML content.

Publishing Header link

XML grew out of the technical publication community. Use XSL-FO to publish even large or complex multilingual XML documents to HTML, PDF or other formats; include SVG diagrams and MathML formulas in the output.

News Atom

This documentbuilds upon on the Character Model for the World Wide Web 1.0: Fundamentals to provide authors of specifications, software developers, and content developers a common reference on string matching on the World Wide Web and thereby increase interoperability. String matching is the process by which a specification or implementation defines whether two string values are the same or different from one another.

The main target audience of this specification is W3C specification developers. This specification and parts of it can be referenced from other W3C specifications and it defines conformance criteria for W3C specifications, as well as other specifications.

This version of this document represents a significant change from its previous edition. Much of the content is changed and the recommendations are significantly altered. This fact is reflected in a change to the name of the document from “Character Model: Normalization” to “Character Model for the World Wide Web: String Matching and Searching”.

The 4th LIDER roadmapping workshop and LD4LT event will take place on September 2nd in Leipzig, Germany. It will be collocated with the SEMANTiCS conference.

The goal of the workshop is to gather input from experts and stakeholders in the area of content analytics, to identify areas and tasks in content analytics where linked data & semantic technologies can contribute. The workshop will organised as part of MLODE 2014 and will be preceded by a hackathonon the 1st of September.

The event is supported by the LIDER EU project, the MultilingualWeb community, the NLP2RDF project as well as the DBpediaProject.

Version 7.0 of the Unicode Standardis now available, adding 2,834 new characters. This latest version adds the new currency symbols for the Russian ruble and Azerbaijani manat, approximately 250 emoji (pictographic symbols), many other symbols, and 23 new lesser-used and historic scripts, as well as character additions to many existing scripts. These additions extend support for written languages of North America, China, India, other Asian countries, and Africa. See the link above for full details.

Most of the new emoji characters derive from characters in long-standing and widespread use in Wingdings and Webdings fonts.

Major enhancements were made to the Indic script properties. New property values were added to enable a more algorithmic approach to rendering Indic scripts. These include properties for joining behavior, new classes for numbers, and a further division of the syllabic categories of viramas and rephas. With these enhancements, the default rendering for newly added Indic scripts can be significantly improved.

Unicode character properties were extended to the new characters. The old characters have enhancements to Script and Alphabetic properties, and casing and line-breaking behavior. There were also nearly 3,000 new Cantonese pronunciation entries, as well as new or clarified stability policies for promoting interoperable implementations.

Two other important Unicode specifications are maintained in synchrony with the Unicode Standard, and have updates for Version 7.0. These will be released at the same time:

UTS #10, Unicode Collation Algorithm— the standard for sorting Unicode text
UTS #46, Unicode IDNA Compatibility Processing— for processing of non-ASCII URLs (IDNs)

The LIDER project has published a report on the first Linked Data for Language Technology event, which was held 21st March in alignment with the European Data Forum in Athens. Read the report.

Industry stakeholders from many areas (localization, publishing, language technology applications etc.) and key researchers from linked data and language technology discussed promises and challenges around linguistic linked data. The report summarizes all presentations and includes an initial list of use cases and requirements for linguistic linked data. This and the overall outcome of the event will feed into work of the LD4LT group (see especially the LD4LT latest draft version of use cases), and the field of multilingual linked data in general.

The LD4LT group is part of the MultilingualWeb community – learn more about related projects.

A Last Call Working Draft of Encodinghas been published.

While encodings have been defined to some extent, implementations have not always implemented them in the same way, have not always used the same labels, and often differ in dealing with undefined and former proprietary areas of encodings. This specification attempts to fill those gaps so that new implementations do not have to reverse engineer encoding implementations of the market leaders and existing implementations can converge.

The body of this spec is an exact copy of the WHATWG version as of the date of its publication, intended to provide a stable reference for other specifications. We are hoping for people to review the specification and send comments about any technical areas that need attention (see the Status section for details).

Please send comments by 1 July 2014.

On 4 June and as part of the Localization World conference in Dublin, the FEISGILTT event will again provide an opportunity to discuss latest developments around localization and multilingual Web technologies. The event is sponsored by the LIDER project.

Highlights include updates about ITS 2.0 and XLIFF 2.0, and a session about usage scenarios for linguistic linked data in localization. Speakers include Kevin O’Donnell (Microsoft), Bryan Schnabel (Tektronix), Yves Savourel (Enlaso) and many more.

Register nowto meet the key players around standards that will influence today’s and future business.

The slides from the MultilingualWeb workshop (including several posters) and the LIDER roadmapping workshopare now available for download. Additional material (videos of the presentations, a workshop report and more) will follow in the next weeks – stay tuned.

The MultilingualWeb workshop on 7-8 May will be streamed live ! Follow the event online if you cannot make it to Madrid. For details about speakers and presentations see the workshop program . The workshop is supported by the LIDER project and sponsored by Verisign and Lionbridge.

See the program.The keynote speaker will be Alolita Sharma, Director of Language Engineering from the Wikimedia Foundation. She is followed by a strong line up in sessions entitled Developers, Creators, Localizers, Machines, and Users, including speakers from Microsoft, Wikimedia Foundation, the UN FAO, W3C, Yandex, SDL, Lionbridge, Asia Pacific TLD, Verisign, DFKI, and many more. On the afternoon of the second day we will hold Open Space breakout discussions. Abstracts and details about an additional poster session will be provided shortly.

The program will also feature an LD4LT event on May 8-9, focusing on text analytics and the usefulness of Wikipedia and Dbpedia for multiilngual text and content analytics, and on language resources and aspects of converting selected types of language resources into RDF.

Participation in both events is free. See the Call for Participation for details about how to register for the MultilingualWeb workshop. The LD4LT event requires a separate registrationand you have the opportunity to submit position statements about language resources and RDF.

If you haven’t registered yet, note that space is limited, so please be sure to register soon to ensure that you get a place.

The MultilingualWeb workshops, funded by the European Commission and coordinated by the W3C, look at best practices and standards related to all aspects of creating, localizing and deploying the multilingual Web. The workshops are successful because they attract a wide range of participants, from fields such as localization, language technology, browser development, content authoring and tool development, etc., to create a holistic view of the interoperability needs of the multilingual Web.

We look forward to seeing you in Madrid!

Register now for the recently announced workshop on Linked Data, Language Technologies and Multilingual Content Analytics (8-9 May, Madrid). A preliminary agenda has been created and the registration formis available.

If you are interested in contributing a position statement please indicate this in the dedicated field in the registration form. The workshop organizers will come back to you with questions to answer in the position statement. We then will select which statements are appropriate for presentations on 9 May, and inform you by 28 April.

We are looking forward to see you in Madrid, both for this event and the MultilingualWeb workshop!