This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.
Feedback by Addison Phillips from W3C I18N group: http://lists.w3.org/Archives/Public/public-tt/2015Mar/0058.html I18N comment: https://www.w3.org/International/track/issues/425 http://www.w3.org/TR/webvtt1/#webvtt-file-structure Various constructs such as 'cue identifier' are described as being: -- ...any sequence of one or more characters not containing the substring "-->"... -- The document makes understood that this is a sequence of Unicode characters. However, it leaves open the question of whether different Unicode character sequences that represent the same semantic string identifier (see: Charmod [1] and Charmod-Norm [2]) are considered "the same" or not. As currently written, different UTF-8 byte sequences are considered distinct. We would suggest that identifiers that use distinct code point sequences are considered distinct (that is, that you are what we call a "non-normalizing Specification"), which suggests that you include at least a health warning about the dangers of using different character sequences. [1] http://www.w3.org/TR/charmod/ [2] http://www.w3.org/TR/charmod-norm/ Particularly: http://www.w3.org/TR/charmod-norm/#formal-language and http://www.w3.org/TR/charmod-norm/#non-normalizing
I agree with the comment. It doesn't only apply to cue identifiers (also e.g. voices, classes, cue data, etc.). Maybe we can add a new subsection under Conformance? [[ Unicode normalization This specification does not normalize Unicode text as part of its processing requirements. Example: A cue with the identifier consisting of the character U+212B ANGSTROM SIGN will not match a selector targeting a cue with an ID consisting of the character U+00C5 LATIN CAPITAL LETTER A WITH RING ABOVE. ]]
I've prepared https://github.com/w3c/webvtt/pull/195
The text in the pull has the right intention, but it has a serious flaw in it. It says: -- This specification requires that Unicode text must not be normalized. -- I think you mean to say something more like: -- Implementations of this specification MUST NOT normalize Unicode text during processing. --
https://github.com/w3c/webvtt/pull/195 was merged on June 10, are any further changes needed?
I've made the change suggested by Addison, see https://github.com/w3c/webvtt/pull/207
Merged https://github.com/whatwg/webvtt/pull/207
Satisfied by this change.