TTML/changeProposal014

From W3C Wiki

< Change Proposal Index

Audio Rendering - CLOSED-IMPLEMENTED

  • Owner: Nigel Megitt.
  • Started: 14/06/13
  • Closed: 20/01/15

Issues Addressed

ISSUE-10 - Embedded Audio - Closed - [1].

Summary

TTML is currently predicated on the visual representation of timed text. It may be useful additionally to present the timed text in audio, e.g. for accessibility purposes. That audio presentation may have been pre-recorded or generated by text to speech technology, in both cases using the document instance as a source. Issue-10 concerns links to external audio resources. Additionally, if the TTML is to be used as a source for generating audio tracks for "audio description" (European term) or "video description" (equivalent US term) then extra markup may be useful to guide the conversion to audio, regardless of whether that converter is a human or a machine. For example pronunciation guidance may be needed.

For both the above use cases and more general metadata capture for subtitles and captions, markup of emotion would also be a useful addition to TTML.

Pronunciation markup may be referenced as an external PLS document.

Emotions may be expressed using EmotionML, however further work is needed to define how to extend TTML with EmotionML semantics.

Speech synthesis markup is probably not needed here - if it were needed, SSML would appear to be relevant.

Edits to be applied

Strawman

Pronunciation lexicon

Add a lexicon element to tt:head using the same semantics as in SSML §3.1.5.1

Not required, since the lexicon element would be in a non-TTML namespace, and it is already permitted to add foreign (non-TTML) namespace XML. TTML-only processors must ignore it, but any TTML + PLS processors can do whatever they see fit with it.

Emotion markup

DONE
Find a way to integrate EmotionML semantics into TTML

No edits are required: adding XML content from the EmotionML namespace into TTML is already permitted - as it's a foreign namespace TTML-only processors must ignore it, but any TTML + EmotionML processors can do whatever they see fit with it.

Map external audio cue fetch and playback to Javascript

use the onenter associated with cues, and getCueAsAudio(), thus:
    cue.addEventHandler("enter", 

          function(sender)
          {
              var theVideo = document.getElementByName("theVideo");
              var savedVolume = theVideo.volume;
              theVideo.mute();
              // set watchdog on video in case it overruns the description duration
              var h = theVideo.addEventHandler("timeUpdate", 
                    function(video) {
                        if(video.currentTime > cue.endTime) video.pause();
                    }
              var myAudio = sender.getCueAsAudio();  // if this is too slow do outside handler
              myAudio.addEventHandler("ended", 
                    function(description) {
                        theVideo.removeEventHandler(h);
                        theVideo.Volume = savedVolume;
                        if(theVideo.paused) theVideo.play();
                    }
              myAudio.play();
          }
    } 

Edits applied

Audio resources added as specific type of generic data resource, in change https://dvcs.w3.org/hg/ttml/rev/d16d284100b9

Impact

References

Pronunciation Lexicon Specification
PLS.
Emotion Markup Language
EmotionML.
Speech Synthesis Markup Language
SSML.