This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 28263 - [webvtt] "valid" language tags [I18N-ISSUE-429]
Summary: [webvtt] "valid" language tags [I18N-ISSUE-429]
Status: RESOLVED MOVED
Alias: None
Product: TextTracks CG
Classification: Unclassified
Component: WebVTT (show other bugs)
Version: unspecified
Hardware: PC All
: P2 normal
Target Milestone: ---
Assignee: This bug has no owner yet - up for the taking
QA Contact: Web Media Text Tracks CG
URL:
Whiteboard: widereview
Keywords: changeDeclined, decided
Depends on:
Blocks:
 
Reported: 2015-03-22 00:15 UTC by Silvia Pfeiffer
Modified: 2016-10-11 18:27 UTC (History)
6 users (show)

See Also:


Attachments

Description Silvia Pfeiffer 2015-03-22 00:15:13 UTC
Feedback by Addison Phillips from W3C I18N group:
http://lists.w3.org/Archives/Public/public-tt/2015Mar/0062.html

I18N comment: https://www.w3.org/International/track/issues/429

http://www.w3.org/TR/webvtt1/#dfn-webvtt-cue-language-span

The description of the cue language span feature reads in part:

--
A WebVTT cue span start tag "lang" that requires an annotation; the annotation represents the language of the following component, and must be a valid BCP 47 language tag. [BCP47]
--

The term 'valid' has specific meaning in BCP 47. It requires that all of the subtags in the tag be registered in the IANA registry (or that the tag itself be one of the grandfathered tags). Is this the intended meaning/requirement here? An alternative would be to require "well-formed" language tags (consistent with the BCP 47 grammar but not necessarily consisting of valid subtags).

There is nothing wrong with requiring validity, please note, if that's your intention. Only that this is a higher standard than may be supposed by the spec author.
Comment 1 Philip Jägenstedt 2015-03-23 03:38:54 UTC
Yep, this is intentional, and it matches the language for the lang attribute in HTML:
https://html.spec.whatwg.org/#the-lang-and-xml:lang-attributes
Comment 2 Addison Phillips 2015-03-24 01:08:53 UTC
(In reply to Philip Jägenstedt from comment #1)
> Yep, this is intentional, and it matches the language for the lang attribute
> in HTML:
> https://html.spec.whatwg.org/#the-lang-and-xml:lang-attributes

Okay, although I generally give a health warning for implementers when it says the tag must be Valid. The 2119 keyword "must" and the bcp47 keyword "valid" impose a specific burden that implies, among other things, some sort of failure mode for well-formed but invalid tags. I suspect that HTML even means "well-formed"... but I'm perfectly happy with validity checks.
Comment 3 Philip Jägenstedt 2015-03-24 07:20:39 UTC
Oh, the purpose of the syntax is perhaps not clear. The syntax is only used for defining validity, i.e. for someone trying to implement a validator. To actually implement WebVTT one must follow the parsing section, which doesn't use the syntax in any way at all. When parsing, there is no checking of the format at all. This is also how HTML works: whatever is in the lang="" attribute will end up in the DOM.

Would it have helped to have a big warning at the top of the syntax section saying that it is only used for defining validity, not processing?
Comment 4 David Singer 2015-03-24 21:53:20 UTC
so, it sounds like we do mean "well-formed" as against BCP 47 (which is a syntax check) rather than "valid" (which is a registry check)?
Comment 5 Philip Jägenstedt 2015-03-25 04:29:30 UTC
(In reply to David Singer from comment #4)
> so, it sounds like we do mean "well-formed" as against BCP 47 (which is a
> syntax check) rather than "valid" (which is a registry check)?

I really think validity is what this should express. If someone writes <lang jp> instead of <lang ja>, a WebVTT validator should warn them about that, as an HTML validator does.
Comment 6 Richard Ishida 2015-10-30 02:54:51 UTC
https://github.com/w3c/webvtt/issues/217
Comment 7 Silvia Pfeiffer 2016-10-11 18:27:46 UTC
https://github.com/w3c/webvtt/pull/256 - the one that fixed this