ISSUE-296: Remove xml:lang placement restrictions from IMSC

xml:lang constraints in IMSC

Remove xml:lang placement restrictions from IMSC

State:
CLOSED
Product:
TTML IMSC 1.0
Raised by:
Nigel Megitt
Opened on:
2013-11-07
Description:
The EBU XML Subtitles group raises this concern re the constraints of placement of xml:lang within IMSC:

The use of xml:lang in IMSC is contrary to the accepted use (and recommended best practise) of this attribute from the xml standard [1]. The IMSC document [2] states in section 4.4 Language that “All instances of the xml:lang attribute within a subtitle document SHALL have identical values.”

[1] http://www.w3.org/TR/REC-xml/
[2] https://dvcs.w3.org/hg/ttml/raw-file/tip/ttml-ww-profiles/ttml-ww-profiles.html#language

In the W3C document “Best Practices for XML Internationalization” [3] published by the W3C Working Group on 13 February 2008 (over 5 years ago and without subsequent amendment or correction), section 3 “When Authoring XML Content” states that

“Authors of XML content should consider the following best practices:”.

Best Practise: Specifying the language of content Use xml:lang (or its equivalent in your schema) on the root element of the document, and on each element where the language of the content changes.

[3] http://www.w3.org/TR/xml-i18n-bp/#AuthoringTime

Clearly the best practise is to use xml:lang to correctly identify the language of any element within a document when it differs from the surrounding elements.

This best practise advice is further reinforced by another document from the W3C: xml:lang in XML document schemas [4] in the section “When to use xml:lang”.

Content directly associated with the XML document (either contained within the document directly or considered part of the document when it is processed or rendered) should use the xml:lang attribute to indicate the language of that content. xml:lang should be reserved for content authors to directly label any natural language content they may have.

[4] http://www.w3.org/International/questions/qa-when-xmllang

xml:lang is defined by XML 1.0 as a common attribute that can be used to indicate the language of any element's contents. This includes any human readable text, as well as other content (such as embedded objects like images or sound files) contained by the element in which it appears. The xml:lang value applies to any sub-elements contained by the element. It also applies to attribute values associated with the element and sub-elements (though using natural language in attributes is not best practice). The value of the xml:lang attribute is a language tag defined by BCP 47.

We propose that the IMSC document should be corrected to accurately reflect the established guidelines.

There is additionally a specific use case for permitting multiple languages to be indicated within content: this is to permit the use of distributed subtitle document for alternative purposes such as processing by text to speech engines to generate 'spoken subtitles', in which language-appropriate speech synthesis models may be required depending on content.

This is related to ISSUE-295.
Related Actions Items:
Related emails:
  1. {minutes} TTWG Meeting 19/6/2014 (from nigel.megitt@bbc.co.uk on 2014-06-19)
  2. RE: {agenda} TTWG Meeting 19/6/2014 (from mdolan@newtbt.com on 2014-06-18)
  3. {agenda} TTWG Meeting 19/6/2014 (from nigel.megitt@bbc.co.uk on 2014-06-18)
  4. RE: {agenda} TTWG Meeting 12/6/2014 (from mdolan@newtbt.com on 2014-06-12)
  5. Re: {agenda} TTWG Meeting 12/6/2014 (from pal@sandflow.com on 2014-06-11)
  6. Re: {agenda} TTWG Meeting 12/6/2014 (from nigel.megitt@bbc.co.uk on 2014-06-11)
  7. Re: {agenda} TTWG Meeting 12/6/2014 (from pal@sandflow.com on 2014-06-11)
  8. Re: {agenda} TTWG Meeting 12/6/2014 (from nigel.megitt@bbc.co.uk on 2014-06-11)
  9. {agenda} TTWG Meeting 12/6/2014 (from nigel.megitt@bbc.co.uk on 2014-06-11)
  10. Minutes for 12/12/13 (from nigel.megitt@bbc.co.uk on 2013-12-12)
  11. Revised IMSC ED (from pal@sandflow.com on 2013-12-11)
  12. TTWG Agenda for 12/12/13 (from nigel.megitt@bbc.co.uk on 2013-12-11)
  13. TTWG Minutes for 5/12/13 (from nigel.megitt@bbc.co.uk on 2013-12-05)
  14. TTML Minutes for 15/11/13 (from nigel.megitt@bbc.co.uk on 2013-11-21)
  15. TTML Minutes for 11/11/13 (from nigel.megitt@bbc.co.uk on 2013-11-21)
  16. ISSUE-296 (xml:lang constraints in IMSC): Remove xml:lang placement restrictions from IMSC [IMSC] (from sysbot+tracker@w3.org on 2013-11-07)

Related notes:

[nigel]: pal proposes removing the xml:lang constraint

11 Nov 2013, 09:09:21

[nigel]: pal removes proposal to restrict xml:lang in IMSC though may re-instate it depending on CFF response

11 Nov 2013, 09:14:18

[nigel]: proposal accepted, pal to make edit

15 Nov 2013, 08:09:26

Fixed. See https://dvcs.w3.org/hg/ttml/diff/2720fa1fae9c/ttml-ww-profiles/ttml-ww-profiles.source.html

Pierre-Anthony Lemieux, 6 Jun 2014, 23:30:58

Changelog:

Created issue 'Remove xml:lang placement restrictions from IMSC' nickname xml:lang constraints in IMSC owned by Nigel Megitt on product IMSC, description 'The EBU XML Subtitles group raises this concern re the constraints of placement of xml:lang within IMSC:

The use of xml:lang in IMSC is contrary to the accepted use (and recommended best practise) of this attribute from the xml standard [1]. The IMSC document [2] states in section 4.4 Language that “All instances of the xml:lang attribute within a subtitle document SHALL have identical values.”

[1] http://www.w3.org/TR/REC-xml/
[2] https://dvcs.w3.org/hg/ttml/raw-file/tip/ttml-ww-profiles/ttml-ww-profiles.html#language

In the W3C document “Best Practices for XML Internationalization” [3] published by the W3C Working Group on 13 February 2008 (over 5 years ago and without subsequent amendment or correction), section 3 “When Authoring XML Content” states that

“Authors of XML content should consider the following best practices:”.

Best Practise: Specifying the language of content Use xml:lang (or its equivalent in your schema) on the root element of the document, and on each element where the language of the content changes.

[3] http://www.w3.org/TR/xml-i18n-bp/#AuthoringTime

Clearly the best practise is to use xml:lang to correctly identify the language of any element within a document when it differs from the surrounding elements.

This best practise advice is further reinforced by another document from the W3C: xml:lang in XML document schemas [4] in the section “When to use xml:lang”.

Content directly associated with the XML document (either contained within the document directly or considered part of the document when it is processed or rendered) should use the xml:lang attribute to indicate the language of that content. xml:lang should be reserved for content authors to directly label any natural language content they may have.

[4] http://www.w3.org/International/questions/qa-when-xmllang

xml:lang is defined by XML 1.0 as a common attribute that can be used to indicate the language of any element's contents. This includes any human readable text, as well as other content (such as embedded objects like images or sound files) contained by the element in which it appears. The xml:lang value applies to any sub-elements contained by the element. It also applies to attribute values associated with the element and sub-elements (though using natural language in attributes is not best practice). The value of the xml:lang attribute is a language tag defined by BCP 47.

We propose that the IMSC document should be corrected to accurately reflect the established guidelines.

There is additionally a specific use case for permitting multiple languages to be indicated within content: this is to permit the use of distributed subtitle document for alternative purposes such as processing by text to speech engines to generate 'spoken subtitles', in which language-appropriate speech synthesis models may be required depending on content.

This is related to ISSUE-295.
' non-public

Nigel Megitt, 7 Nov 2013, 13:50:48

Status changed to 'open'

Glenn Adams, 14 Nov 2013, 04:32:06

Status changed to 'pending review'

Pierre-Anthony Lemieux, 6 Jun 2014, 23:30:58

Status changed to 'closed'

19 Jun 2014, 14:28:28


David Singer <singer@apple.com>, Nigel Megitt <nigel.megitt@bbc.co.uk>, Chairs, Thierry Michel <tmichel@w3.org>, Philippe Le Hégaret <plh@w3.org>, Staff Contacts
Tracker: documentation, (configuration for this group), originally developed by Dean Jackson, is developed and maintained by the Systems Team <w3t-sys@w3.org>.
$Id: index.php,v 1.325 2014-09-10 21:42:02 ted Exp $