ISSUE-296: Remove xml:lang placement restrictions from IMSC

xml:lang constraints in IMSC

Remove xml:lang placement restrictions from IMSC

Raised by:
Nigel Megitt
Opened on:
The EBU XML Subtitles group raises this concern re the constraints of placement of xml:lang within IMSC:

The use of xml:lang in IMSC is contrary to the accepted use (and recommended best practise) of this attribute from the xml standard [1]. The IMSC document [2] states in section 4.4 Language that “All instances of the xml:lang attribute within a subtitle document SHALL have identical values.”


In the W3C document “Best Practices for XML Internationalization” [3] published by the W3C Working Group on 13 February 2008 (over 5 years ago and without subsequent amendment or correction), section 3 “When Authoring XML Content” states that

“Authors of XML content should consider the following best practices:”.

Best Practise: Specifying the language of content Use xml:lang (or its equivalent in your schema) on the root element of the document, and on each element where the language of the content changes.


Clearly the best practise is to use xml:lang to correctly identify the language of any element within a document when it differs from the surrounding elements.

This best practise advice is further reinforced by another document from the W3C: xml:lang in XML document schemas [4] in the section “When to use xml:lang”.

Content directly associated with the XML document (either contained within the document directly or considered part of the document when it is processed or rendered) should use the xml:lang attribute to indicate the language of that content. xml:lang should be reserved for content authors to directly label any natural language content they may have.


xml:lang is defined by XML 1.0 as a common attribute that can be used to indicate the language of any element's contents. This includes any human readable text, as well as other content (such as embedded objects like images or sound files) contained by the element in which it appears. The xml:lang value applies to any sub-elements contained by the element. It also applies to attribute values associated with the element and sub-elements (though using natural language in attributes is not best practice). The value of the xml:lang attribute is a language tag defined by BCP 47.

We propose that the IMSC document should be corrected to accurately reflect the established guidelines.

There is additionally a specific use case for permitting multiple languages to be indicated within content: this is to permit the use of distributed subtitle document for alternative purposes such as processing by text to speech engines to generate 'spoken subtitles', in which language-appropriate speech synthesis models may be required depending on content.

This is related to ISSUE-295.
[nigel]: pal proposes removing the xml:lang constraint

11 Nov 2013, 09:09:21

[nigel]: pal removes proposal to restrict xml:lang in IMSC though may re-instate it depending on CFF response

11 Nov 2013, 09:14:18

[nigel]: proposal accepted, pal to make edit

15 Nov 2013, 08:09:26

Fixed. See

Pierre-Anthony Lemieux, 6 Jun 2014, 23:30:58

