TTML/changeProposal014
Audio Rendering - CLOSED-IMPLEMENTED
- Owner: Nigel Megitt.
- Started: 14/06/13
- Closed: 20/01/15
Issues Addressed
ISSUE-10 - Embedded Audio - Closed - [1].
Summary
TTML is currently predicated on the visual representation of timed text. It may be useful additionally to present the timed text in audio, e.g. for accessibility purposes. That audio presentation may have been pre-recorded or generated by text to speech technology, in both cases using the document instance as a source. Issue-10 concerns links to external audio resources. Additionally, if the TTML is to be used as a source for generating audio tracks for "audio description" (European term) or "video description" (equivalent US term) then extra markup may be useful to guide the conversion to audio, regardless of whether that converter is a human or a machine. For example pronunciation guidance may be needed.
For both the above use cases and more general metadata capture for subtitles and captions, markup of emotion would also be a useful addition to TTML.
Pronunciation markup may be referenced as an external PLS document.
Emotions may be expressed using EmotionML, however further work is needed to define how to extend TTML with EmotionML semantics.
Speech synthesis markup is probably not needed here - if it were needed, SSML would appear to be relevant.
Edits to be applied
Strawman
Pronunciation lexicon
Add a lexicon
element to tt:head
using the same semantics as in SSML §3.1.5.1
Not required, since the lexicon element would be in a non-TTML namespace, and it is already permitted to add foreign (non-TTML) namespace XML. TTML-only processors must ignore it, but any TTML + PLS processors can do whatever they see fit with it.
Emotion markup
- DONE
- Find a way to integrate EmotionML semantics into TTML
No edits are required: adding XML content from the EmotionML namespace into TTML is already permitted - as it's a foreign namespace TTML-only processors must ignore it, but any TTML + EmotionML processors can do whatever they see fit with it.
Map external audio cue fetch and playback to Javascript
use the onenter associated with cues, and getCueAsAudio(), thus:
cue.addEventHandler("enter", function(sender) { var theVideo = document.getElementByName("theVideo"); var savedVolume = theVideo.volume; theVideo.mute(); // set watchdog on video in case it overruns the description duration var h = theVideo.addEventHandler("timeUpdate", function(video) { if(video.currentTime > cue.endTime) video.pause(); } var myAudio = sender.getCueAsAudio(); // if this is too slow do outside handler myAudio.addEventHandler("ended", function(description) { theVideo.removeEventHandler(h); theVideo.Volume = savedVolume; if(theVideo.paused) theVideo.play(); } myAudio.play(); } }
Edits applied
Audio resources added as specific type of generic data resource, in change https://dvcs.w3.org/hg/ttml/rev/d16d284100b9
Impact
References
- Pronunciation Lexicon Specification
- PLS.
- Emotion Markup Language
- EmotionML.
- Speech Synthesis Markup Language
- SSML.