This document describes the various files used to store all W3C related translations in one place, as well as the scripts to generate different "views" of the same data. Beyond the importance and interest of its own, it should be noted that managing translations at W3C may be considered as a modest showcase for the usage of various W3C technologies. Data are stored in several RDF files, some of them specifically maintained for the translations, some of them of a more general interest; some of them maintained for the translations, some of them generated by and aimed other purposes (too). The queries are made using SPARQL, a query language for RDF data. The fact that RDF based information originating from different sources can be combined easily in one project shows the value of the Semantic Web approach. Also, all the generated files are based on Unicode and follow all the guidelines of the W3C Internationalization Activity.
(If you are a W3C Team member, and you want to know how to update the information on the W3C site, please consult the additional information file.)
The primary RDF file containing the translations is: /2003/03/Translations/RDFData/Trans2006.rdf
(and was Trans2005.rdf, will be Trans2007.rdf,
Trans2008.rdf, etc). To facilitate editing this file the
translators' data is factored out into a separate /2003/03/Translations/RDFData/translators.rdf
file; similarly, some information on langauges are in /2003/03/Translations/RDFData/langInfo.rdf.
For each translation, there is a reference to the original document (using
the property trans:translationFrom. The resource referred to by
this property is, usually, the dated URI of the document but; it is the same
resource as used in another RDF file on technical reports at W3C: tr.rdf.
The caveat with tr.rdf that it does not include information
like shorter titles, quite necessary for, eg, pull-down menus. To add this
missing information, a third RDF file is available: /2003/03/recs.rdf. This RDF file contains
some additional data for all Recommendations, plus some entries for
documents like the ones cited above.
All documents get an internal 'id' that is used as code in, eg, the
pull-down menus. This id is identical to what is used on the /TR page. However, in some (rare) cases, this id does
not give a clear identification. This does not happen to documents on the /TR
pages but does happen for tutorials, FAQ-s, web pages. For these cases an
explicit id may be necessary, and is stored in the RDF file.
| File | Role/Description |
|---|---|
Translation files per year. Ie, Trans2005.rdf |
Main source of translation data. Note that translations up to the
end of 2004 are collected TransTo2004.rdf,
it is only starting 2005 that this was changed to a per-year RDF
file. |
translators.rdf |
Contact list of the most important translators. Each translator has his/her contact stored here to reduce space and possible errors in updates. Note that for persons with non-latin scripts in their names (Arabic, Chinese, etc), a latinized version is also stored, if the data is known, to make a more accessible display. This RDF file also includes (when available) the email addresses of the translators, although that address is never displayed on the generated HTML output. |
langInfo.rdf |
Additional information on languages (native name, name of the language in English, etc). The languages are identified by their ISO codes (usually the two letter codes). |
tr.rdf |
RDF file on W3C Technical Reports (generated automatically whenever a new entry is made to the technical reports page of W3C) |
recs.rdf |
Additional information on recommendations (eg, short name). |
extras.rdf |
Sometimes documents get translated that do not appear in tr.rdf: notes, tutorials, web pages, quick tips, etc. The relevant data are stored in this file; this includes the short name, the categorization for the advanced search (eg, I18N document, tutorial, etc). In a few cases and explicit id value is also necessary, this is stored with the document’s resource, too. |
docGroups.rdf |
Definition of groups of document that can be considered together when querying translations (eg, DOM Level 2) and when displayed in the pull down menus. |
extraControls.rdf |
Some extra properties (eg, groups) that are relevant for the new
system only and not for the older translation management system. If
the old system is declared obsolete, this file may disappear and may
be merged into docGroups.rdf. |
transSchema.rdf |
The RDF Schema file for trans.rdf |
langSchema.rdf |
The RDF Schema file for langInfo.rdf |
All these files are public.
The management can be roughly divided in two parts: off-line (ie, when a new translation is added to the data) and on-line, ie, when a query is made to the data. Both steps rely on the same set of tools and on the same principle. This principle is
An interesting technical observation is that most of the time is taken up by the first and the last step, ie, parsing the RDF data and displaying the information properly on the screen. The SPARQL query itself is, comparatively, very quick.
When the RDF data is updated, a script is run to generate the Overview page, the advanced query page, the news archive and the RSS feed, etc. The query page is an XHTML form; its target is a CGI script that retrieves data on-line using the same principles and tools. The entries of the pull-down menus (set of languages, available documents in our TR pages, etc) reflect the current state of the RDF data. The menus generated on the Overview page (referring to, say, translations for one language) are simply shortcut for a more complicated CGI calls.
Several additional measures have been taken to speed up operations.
Links subdirectory) that are generated in this phase
(by internally issuing the appropriate queries and storing the results in
the files). Ie, the full This is based on a CGI script (mapped from a URI in W3C’s date space). Each query item in the call's URI translates into a new graph pattern to a SPARQL query. This query is used to retrieve the relevant translation data. The “rest” of the processing is to turn this data into readable XHTML.
Note that the return format of the CGI call can be Full HTML, partial HTML, or pure RDF (the query page includes a radio button to choose among those). These mean:
div element) that can be included in another XHTML page
(e.g., via PHP include or something similar). The scripts return the
information in Unicode, more specifically in UTF-8.
This is important to know if you plan to use that fragment in your own
page.To speed up processing, a caching mechanism is used. Queries are stored in internal, hidden XHTML files. At query time the dates of those files are compared with the date of the binary pickle data (see above) and the query is issued only when really necessary.
Another way of getting to the information on translations is to use the
“comma” tool facility of the W3C server. Eg, if one wants to find the
translation of the document http://www.w3.org/TR/owl-features/,
the URI of the form: http://www.w3.org/TR/owl-features/,translations
can also be used. It returns the same information as the CGI entry in full
HTML form.
The only caveat with this tool is that it has to be used with “real” documents, ie, the tool cannot be used with the groups of documents described in the previous section. Nor can the comma tool return anything else than fully formatted XHTML pages.