10944 – Add speech synthesis features to WebVTT descriptions

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 10944 - Add speech synthesis features to WebVTT descriptions

Summary: Add speech synthesis features to WebVTT descriptions

Status:	NEW

Alias:	None

Product:	TextTracks CG
Classification:	Unclassified
Component:	WebVTT (show other bugs)
Version:	unspecified
Hardware:	All All

Importance:	P2 normal
Target Milestone:	---
Assignee:	Silvia Pfeiffer
QA Contact:	This bug has no owner yet - up for the taking

URL:
Whiteboard:	v2
Keywords:	media, NotInW3CSpecYet

Depends on:
Blocks:

Reported:	2010-10-01 07:27 UTC by Masatomo Kobayashi
Modified:	2015-10-05 00:56 UTC (History)
CC List:	8 users (show)

See Also:

Attachments

Description Masatomo Kobayashi 2010-10-01 07:27:06 UTC

Section affected: http://www.whatwg.org/specs/web-apps/current-work/multipage/video.html#websrt-0

WebSRT, especially voice declaration and cue text, seems to have only caption-specific features. As WebSRT is supposed to be used to provide synthesized audio descriptions (and possibly synthesized sign languages in the near future), additional options for audio descriptions would be needed. Or these optional features could be moved to other places, e.g., CSS.

Comment 1 Tab Atkins Jr. 2010-10-01 14:55:46 UTC

What makes you think that WebSRT will provide synthesized audio or SL?  WebSRT is a captioning format.  Perhaps some other feature of <video> or UAs will provide those, but I don't think it's something that WebSRT is meant to handle.

Comment 2 John Foliot 2010-10-01 15:19:59 UTC

(In reply to comment #1)
> What makes you think that WebSRT will provide synthesized audio or SL?  WebSRT
> is a captioning format.  Perhaps some other feature of <video> or UAs will
> provide those, but I don't think it's something that WebSRT is meant to handle.

WebSRT is, as far as I can tell, a format for applying time-stamped data for the synchronization of texts to media. It can be used for captions, sub-titling and possibly other uses as well. It is not yet part of the W3C specification, and is in fact a draft spec produced by WHATWG. (I am unaware of any production ready examples in the wild)

If WebSRT cannot be used to also deliver synthesized audio then it might not be the right candidate as a baseline time-stamp format, as this need is both clear and real: descriptive text is an identified requirement that has both legal precedent and real-world examples. As well, using the time/synchronization "time-stamps" we should be able to provide descriptive texts to non-sighted users, and the IBM team (of which  Masatomo is a part of) have already developed a proof of concept that uses time-stamped texts and synthesized voices to deliver this requirement.

It is for these reasons that WebSRT has not yet been incorporated into the W3C spec - the assessment of the suitability or non-suitability of that possible format has not yet been completed.

Tab if this is a topic of interest to you, I encourage you (and others) to review http://www.w3.org/WAI/PF/HTML/wiki/Media_Accessibility_Checklist - in fact, one of the next steps is to start mapping potential solutions against this checklist looking for holes and defects. Help here would be greatly appreciated - feel free to ping the public-html-a11y@w3.org mailing list and preface the subject with [media] - we can use all the help we can get. (Come on in, grab a spot!)

Comment 3 Ian 'Hixie' Hickson 2010-10-05 22:00:24 UTC

What audio description features are you missing?

Comment 4 Masatomo Kobayashi 2010-10-06 11:37:20 UTC

As speech synthesis features, based on our study, the "voice family" (specific names or gender) and "volume" will be required. The "pause", "rate", "pitch", and "balance" will also be useful. The CSS Speech Module covers these features, just like the traditional CSS covers some WebSRT cue settings features such as the size and position.

As an audio description-specific feature, the "behavior when the speech synthesis hasn't finished by the end time" will be able to be specified.

Comment 5 Ian 'Hixie' Hickson 2010-10-12 08:49:09 UTC

The stuff covered by CSS should just be covered by CSS. Before I spec that, though, I'd like some implementation experience so that we can make sure that's sane. Please let me know when there's an implementation of this feature so that I can study it and ask the relevant implementors for their experience.

For example, the case you bring up of a description that's too long to play coherently in the cue's time span is a good one. What do implementors find their users want? Should the video slow down? Pause? Should the API expose this? These are all questions that it'd be good to get figured out before we specify it.

Comment 6 Silvia Pfeiffer 2013-07-08 11:44:42 UTC

Re-opening and assigning to TextTracks CG, where this will need to be specified for WebVTT.