A
translateattribute was recently
added to HTML5 . At the three
MultilingualWeb
workshopswe have run over the past two years, the idea of this
kind of ‘translate flag’ has constantly excited strong interest
from localizers, content creators, and from folks working with
language technology.
How it works
Typically authors or automated script environments will put the attribute in the markup of a page. You may also find that, in industrial translation scenarios, localizers may add attributes during the translation preparation stage, as a way of avoiding the multiplicative effects of dealing with mistranslations in a large number of languages.
There is no effect on the rendered page (although you could, of course, style it if you found a good reason for doing so). The attribute will typically be used by workflow tools when the time comes to translate the text – be it by the careful craft of human translators, or by quick gist-translation APIs and services in the cloud.
The attribute can appear on any element, and it takes just two
values:
yesor
no. If the value is
no, translation tools should protect the
text of the element from translation. The translation tool in
question could be an automated translation engine, like those used
in the online services offered by Google and Microsoft. Or it could
be a human translator’s ‘workbench’ tool, which would prevent the
translator inadvertently changing the text.
Setting this translate flag on an element applies the value to all contained elements and to all attribute values of those elements.
You don’t have to use
translate="yes"for this to work. If a page has no
translateattribute, a translation system or
translator should assume that all the text is to be translated. The
yesvalue is likely to see little use,
though it could be very useful if you need to override a translate
flag on a parent element and indicate some bits of text that should
be translated. You may want to translate the natural language text
in examples of source code, for example, but leave the code
untranslated.
Why it is needed
You come across a need for this quite frequently. There is an example in the HTML5 spec about the Bee Game. Here is a similar, but real example from my days at Xerox, where the documentation being translated referred to a machine with text on the hardware that wasn’t translated.
<p>Click the Resume button on the Status Display or the
<span class="panelmsg" translate="no">CONTINUE</span> button
on the printer panel.</p>
Here are a couple more (real) examples of content that could
benefit from the
translateattribute. The first is from a
book, quoting a title of a work.
<p>The question in the title <cite translate="no">How Far Can You Go?</cite> applies to both the undermining of traditional religious belief by radical theology and the undermining of literary convention by the device of "breaking frame"...</p>
The next example is from a page about French bread – the French for bread is ‘ pain‘.
<p>Welcome to <strong translate="no">french pain</strong> on Facebook. Join now to write reviews and connect with <strong translate="no">french pain</strong>. Help your friends discover great places to visit by recommending <strong translate="no">french pain</strong>.</p>
So adding the translate attribute to your page can help readers better understand your content when they run it through automatic translation systems, and can save a significant amount of cost and hassle for translation vendors with large throughput in many languages.
What about Google Translate and Microsoft Translator?
Both Google and Microsoft online translation services already provided the ability to prevent translation of content by adding markup to your content, although they did it in (multiple) different ways. Hopefully, the new attribute will help significantly by providing a standard approach.
Both Google and Microsoft currently support
class="notranslate", but replacing a class attribute
value with an attribute that is a formal part of the language makes
this feature much more reliable, especially in wider contexts. For
example, a translation prep tool would be able to rely on the
meaning of the HTML5
translateattribute always being what is
expected. Also it becomes easier to port the concept to other
scenarios, such as other translation APIs or localization standards
such as XLIFF.
As it happens, the online service of Microsoft (who actually
proposed a translate flag for HTML5 some time ago) already
supported
translate="no". This, of course, was a proprietary tag
until now, and Google didn’t support it. However, just yesterday
morning I received word, by coincidence, that Webkit/Chromium has
just added support for the
translateattribute, and yesterday afternoon
Google added support for
translate="no"to its online translation service.
See the results of some tests I put together this morning.
(Neither yet supports the
translate="yes"override.)
In these proprietary systems, however, there are a good number of other non-standard ways to express similar ideas, even just sticking with Google and Microsoft.
Microsoft apparently supports
style="notranslate". This is not one of the options
Google lists for their online service, but on the other hand they
have things that are not available via Microsoft’s service.
For example, if you have an entire page that should not be
translated, you can add
<meta name="google" value="notranslate">inside
the
headelement of your page and Google won’t
translate any of the content on that page. (However they also
support
<meta name="google" content="notranslate">.)
This shouldn’t be Google specific, and a single way of doing this,
ie.
translate="no"on the
htmltag, is far cleaner.
It’s also not made clear, by the way, when dealing with either
translation service, how to make sub-elements translatable inside
an element where
translatehas been set to
no– which may sometimes be needed.
As already mentioned, the new HTML5 translate attribute provides a simple and standard feature of HTML that can replace and simplify all these different approaches, and will help authors develop content that will work with other systems too.
Can’t we just use the lang attribute?
It was inevitable that someone would suggest this during the
discussions around how to implement a translate flag, however
overloading language tags is not the solution. For example, a
language tag can indicate which text is to be spellchecked against
a particular dictionary. This has nothing to do with whether that
text is to be translated or not. They are different concepts. In a
document that has
lang="en"in the html header, if you set
lang="notranslate"lower down the page, that text will
now not be spellchecked, since the language is no longer English.
(Nor for the matter will styling work, voice browsers pronounce
correctly, etc.)
Going beyond the translate attribute
The W3C’s
ITS (International Tag Set)
Recommendation proposes the use of a translate flag such as the
attribute just added to HTML5, but also goes beyond that in
describing a way to assign translate flag values to particular
elements or combinations of markup throughout a document or set of
documents. For example, you could say, if it makes sense for your
content, that by default, all
pelements with a particular class name
should have the translate flag set to
nofor a specific set of documents.
Microsoft offers something along these lines already, although
it is much less powerful than the ITS approach. If you use
<meta name="microsoft" content="notranslateclasses
myclass1 myclass2" />anywhere on the page (or as part of
a widget snippet) it ensures that any of the CSS classes listed
following “notranslateclasses” should behave the same as the
“notranslate” class.
Microsoft and Google’s translation engines also don’t translate
content within
codeelements. Note, however, that you don’t
seem to have any choice about this – there don’t seem to be
instructions about how to override this if you do want your
codeelement content translated.
By the way, there are plans afoot to set up a new
MultilingualWeb-LT Working Group at the W3C in conjunction with a
European Commission project to further develop ideas around the ITS
spec, and create reference implementations. They will be looking,
amongst many other things, at ways of integrating the new
translateattribute into localization
industry workflows and standards. Keep an eye out for it.