Files and Scripts Used in Managing W3C Translations

This document describes the various files used to store all W3C related translations in one place, as well as the scripts to generate different "views" of the same data. Beyond the importance and interest of its own, it should be noted that managing translations at W3C may be considered as a modest showcase for the usage of various W3C technologies. Data are stored in several RDF files, some of them specifically maintained for the translations, some of them of a more general interest. The fact that RDF based information originating from different sources can be combined easily in one project shows the value of the Semantic Web approach. Also, all the generated files are based on Unicode and follow all the guidelines of the W3C Internationalization Activity.

(If you are a W3C Team member, and you want to know how to update the information on the W3C site, please consult the additional information file.)

The RDF Files

The primary RDF file containing the translations is: RDFData/Trans2005.rdf (and will be Trans2006.rdf, Trans2007.rdf, etc). The file contains two types of information: the translation data themselves (they are grouped by languages), preceded by some information on the languages proper. To facilitate editing this file the translators' data is factored out into a separate RDFData/translators.rdf file.

For each translation, there is a reference to the original document (using the property trans:translationFrom. The resource referred to by this property is, in fact, the dated URI of the document but, most importantly, it is the same resource as used in another RDF file on technical reports at W3C: tr.rdf.

The caveat with tr.rdf, however, is that it does not include all documents that are frequently translated (eg, W3C in 7 points, WAI Quick Tips), nor does it include information like domain, or shorter titles. To add this missing information, a third RDF file is available: recs.rdf. This RDF file contains some additional data for all Recommendations, plus some entries for documents like the ones cited above.

The rest of the translation data are fairly straightforward, so are the language descriptions; they do not really need further explanation.

All the RDF files at a glance

File Role/Description
Translation files per year. Ie, Trans2005.rdf Main source of translation data. Note that translations up to the end of 2004 are collected TransTo2004.rdf, it is only starting 2005 that this was changed to a per-year RDF file.
translators.rdf Contact list of the most important translators. Each person, who would appear at least twice as a translator, has his/her contact stored here to reduce space and possible errors in updates. Note that for persons with non-latin scripts in their names (Arabic, Chinese, etc), a latinized version is also stored, if the data is known, to make a more accessible display.
langInfo.rdf Additional information on languages (native name, name of the language in English, etc)
tr.rdf Central RDF file on W3C Technical Reports.
recs.rdf Additional information on recommendations (eg, short name, W3C domain).
extras.rdf Sometimes documents get translated that do not appear in tr.rdf: notes, working drafts, web pages, quick tips, etc. The relevant data are stored in this file.
docGroups.rdf Definition of groups of document that can be considered as one when querying translations (eg, DOM Level 2)
transSchema.rdf The RDF Schema file for trans.rdf.
langSchema.rdf The RDF Schema file for langInfo.rdf.

All these files are publicly available.

The Generated (XHTML) Files

The generation of the XHTML files are done by Python scripts, relying on the RDFLib module. The following files are generated:

File Role
OverviewLang.html Overview of all translations, ordered by languages. Each language can be addressed directly using the ISO language code as a fragment identifier.
OverviewTech.html Overview of all translations, ordered by domains and documents. Each document can be addressed directly using the document ID as used in the W3C TR pages as a fragment identifier (ie, its "short name", essentially the last part of its undated URI).
Overview.html The 'home page' of the translations, it contains menus for all the various entries referring to CGI Script calls.


CGI Entries

The set of Python scripts also have a set of (public) CGI entry points. Both the language and the technology view can be inquired, and both can be done by requesting various "views". Here are all the six possibilities (all examples are with French translations and CSS1 as a technology):

URI Description Inquire the translations of CSS1, return a full XHTML page. The identifier of the technology is the document ID used in the W3C TR pages as a fragment identifier (you can also look at the menus on the translations' home page for the exact codes). See also note below. Inquire the translations of CSS1, return a partial HTML only: a dl list, enclosed in a div. Callers can use this output by incorporating it into their own pages Inquire the translations of CSS1, return an XML encoded RDF with the relevant information. Inquire all French translations, return a full XHTML page. See also note below.
/2003/03/Translations/byLanguage?language=fr&output=PartialHTML Inquire all French translations, return a partial HTML only: a dl list, enclosed in a div.
/2003/03/Translations/byLanguage?language=fr&output=RDF Inquire all French translations, return an XML encoded RDF with the relevant information.

W3C Technologies are sometimes published as a collection of recommendations, rather than one document (for example, XML Schemas is published as Parts 0, 1, and 2) . The separate RDF depository describes those, with resources identified by rdf:ID. In the calls above the technology identifier can also use these rdf:ID values, and the translations will be listed for all constituents.

Some languages have local "versions": fr-ca for Canadian French, pt-br for Brasilian Portuguese, etc. These codes can be inquired directly if one is interested in, say, Brasilian Portuguese translations only. If the "main" language code is used, all local versions are displayed as well (if there are translations marked as such).

The scripts return the information in Unicode, more specifically in UTF-8. Ie, if you plan to include the output into your own XHTML pages, for example, your server should set UTF-8 encoding in the HTTP response header.

“Comma” tool

Another way of getting to the information on translations is to use the “comma” tool facility of the W3C server. Eg, if one wants to find the translation of the document, the URI of the form:,translations can also be used. It returns the same information as the CGI entry in full HTML form. This can also be combined with language information as follows:,translations-hu returns the list of Hungarian translations of that document (if any).

The only caveat with this tool is that it has to be used with “real” documents, ie, the tool cannot be used with the groups of documents described in the previous section. Nor can the comma tool return anything else than fully formatted HTML pages.

Ivan Herman, Head of Offices (
Last revised: $Date: 2006/08/24 15:22:18 $