Re: [css3-speech] fwd comments from Robert Brown from Daniel Weck on 2011-09-30 (www-style@w3.org from September 2011)

From: Daniel Weck <daniel.weck@gmail.com>
Date: Fri, 30 Sep 2011 10:27:38 +0100
To: W3C style mailing list <www-style@w3.org>, fantasai <fantasai.lists@inkedblade.net>, Robert.Brown@microsoft.com
Message-Id: <4A0184A5-C52B-43A0-8C94-E733E1092497@gmail.com>
On 29 Sep 2011, at 22:23, fantasai wrote:
>> 1. Other than screen reading, what other use cases are there for implementing
>> the speech synthesis component of a webapp's user interface as style attributes ?
> 
> I believe text-to-speech for ebooks was one of the main drivers in this round of editing.

That is correct. More precisely, CSS Speech Level 3 is one of the 3 components that allow authors to control speech synthesis in the upcoming EPUB 3 specification (industry standard for e-books). See here:

http://epub-revision.googlecode.com/svn/trunk/build/30/spec/epub30-overview.html#sec-tts

Note that although the EPUB 3 release schedule provided the motivation (and resources) to resume work on the CSS Speech module, the design changes / improvements we have introduced are not specific to e-books.

>> 2. If screen reading is the key scenario, who is the target user? I can't speak
>>   on behalf of the visually impaired, but feedback I've heard in the past is
>>   that the ability for the user to explicitly select the TTS voice and playback
>>   speed is highly desirable in this scenario.
> 
> The CSS user stylesheet model should be able to accommodate voice-family preferences:
>  http://www.w3.org/TR/CSS21/cascade.html#cascade
> Wrt speed control, I can imagine that being treated similar to text zoom.

Yes indeed, as per fundamental accessibility guidelines.
This ties into your "webapp" comment at point #1:

Some EPUB reading systems are purely web-based (i.e. the user-interface itself is presented using HTML/CSS/JavaScript). Web-browsers claiming support for CSS-Speech are expected to provide users with means to override the styles (see Fantasai's description), which in this instance, can potentially impact both the reader's UI and the publication's contents. Note that such reading systems usually manipulate the publication before it gets rendered (dynamic pagination is the obvious use-case), so such webapp would have the capability to programmatically alter the publication stylesheets based on user settings captured via the HTML/JavaScript interface (e.g. change typeface / voice, increase-decrease font-size / speech rate).

Other reading systems wrap a web-browser component inside a "native" user-interface shell (i.e. the browser component is hosted mainly to render the publication's contents, not to expose UI controls), in which case user customisations are conducted via dedicated user-interface controls (which may rely on the CSS user stylesheet mechanism, behind the scenes).

>> 3. How is the user envisaged to interact with a webapp that uses this capability?
>>   For example, how do they interrupt to select a recently spoken element (e.g.
>>   to select an item from a list)? Does the webapp have any shuttle control
>>   (pause/resume, skip forward/back, etc), or is that exclusively provided by the UA?
> 
> At this point I don't believe there are any controls for this on the author side,
> and UA navigation UI is out-of-scope for us.

I agree, this is a user-agent issue. I envision that a web-browser claiming support for CSS-Speech would need to integrate tightly with the underlying speech processor, in order to expose controls such as play/pause/resume/rwd/fwd.

>> 4. How is the playback of rendered speech coordinated with the visual display?
>>   For example, it's common for words or groups of words to be highlighted as
>>   they're spoken (presumably by applying a different style).
> 
> For multi-modal rendering, there's a proposal to use selectors for this:
>  http://www.w3.org/TR/selectors4/#time-pseudos

Yes, visual highlighting of the "active" TTS element is currently not within the scope of CSS-Speech. I think it is a cross-cutting concern that should be addressed separately. For example, the ':past' and ':future' pseudo-classes defined in HTML5 apply only to WebVTT captions (yes, the naming convention conflicts with the current CSS Selector Level 4 proposal). There is also a SMIL Timesheets proposal that defines the presentational aspects of time-based activation. Things need to be harmonised, and this will probably happen beyond the CSS Speech Level 3 timeframe.

See:

http://www.w3.org/TR/html5/rendering.html#the-:past-and-:future-pseudo-classes

http://www.whatwg.org/specs/web-apps/current-work/#the-':past'-and-':future'-pseudo-classes

http://wam.inrialpes.fr/timesheets/docs/timeAction.html

>> 5. I'm curious to know what user agents are actively interested in implementing this?
> 
> I don't have an answer to that; the interest in progressing the module came from
> the EPUB3 WG at IDPF.

The next generation of EPUB and DAISY (Digital Talking Books) reading systems (well, the ones that will claim support for text-to-speech) are directly impacted by CSS3-Speech developments.

I invite you to take a look at this list of potential implementations, and please feel free to contribute (use the mailing-list discussion, the link is given in the wiki page):

http://wiki.csswg.org/spec/css3-speech#testing-reference-implementations

Thanks for your comments Robert!
Kind regards, Daniel
Received on Friday, 30 September 2011 09:28:14 UTC