Improving Spoken Presentation of Content -- 17 Sep 2019

<scribe> Meeting: Improving Spoken Presentation of Web Content

scribe+

MH: will give you background on what we are doing in the pronunciation TF under the APA

<Makoto_> presen+

chair+ Irfan

MH: idea behind the personalization TF, under APA facilitator is Irfan
... Pearson, DAISY, supported by Microsoft, college Board participating

<Irfan> https://w3c.github.io/pronunciation/user-scenarios/

MH: need from education community Pearson and College Board does educational assessments active since Oct 2018

<Irfan> https://w3c.github.io/pronunciation/gap-analysis/

<Irfan> https://w3c.github.io/pronunciation/use-cases/

MH: first working drafts just published, gap analysis, use cases and user senarios.
... student using AT, screen aloud technology. students listening to content that was being misspoken
... education setting even slight problems is a major problem if a word is not spoken exactly like the teacher that is a problem. Read aloud tools can assist language learners.
... learning disabilities to understand content on the web.
... voice base assistance (aka Alexa, Siri, Google Home) etc.
... how do we enable content authors to make these systems to speak the content correctly.
... We can't do this yet in HTML. audio books in EPUB with TTS or reading their Ebook, or books on-mass using TTS is a use case here.
... active spoken content critical in publishing and educational domains.
... we are trying to solve the problem today.
... there are hacks today: improper use of ARIA standard with aria-label but that only helps SR users not read aloud.
... data attributes being uses may be used in proprietary products but no interoperability. Refreshable Braille ETS, Pearson, will put into ARIA will be sent to the speech synth but being read on the display incorrectly then which is a real problem.
... looking for a standards based solution SSML a growing # of Speech Engine support this, Amazon Polly CSS Speech is dead.

AT have nothing to support.

scribe: decision by author speech synth are getting better but education context but the author needs to be best to suggest the spoken content. US there is a consortium for spoken math content.
... people put commas in the text to add commas for pause but this causes issues on the braille display getting ,,,, etc for a long pause.

SSML is a great standard, CSS Speech is dead, PLS is another domain lexicons specification.

scribe: PLS can be domain specific say in chemistry.

<Makoto_> In the context of EPUB3, we have a standard attribute for embedding SSML within HTML content documents.

<Makoto_> It has been very heavily used in Japan. For example, by the biggest textbook publisher (Tokyo Shoseki).

scribe: Gap analysis change language of content, gender, phonetic, substitution, see Gap analysis document. pitch volume, emphasis, say-as
... example zipcode wont' read it as separate numbers for example.
... pausing is an issue.
... HTML lets us markup language an semantics emphasis, language support, emphasis not widely supported capability in HTML but not supported.

ARIA, does not help solving the problem with substitution but this would be a hack.

PSL helps phonetic pronunciation.

CSS speech did rate/pitch, volume but not much else.

SSML, does support all of these potential gaps.

Mokoto: Japanese publishing company uses SSML but costs 4X more

MH: thats only for phonetic pronunciation, could make it easier to markup the language.
... say-as digits/numbers, emphasis, break, verbosity wan to expose to content creators.

<Makoto_> I am afraid that I have to go to a JLreq TF meeting.

Inline SSML within HTML has been a nonstarter, talking with AT at this point not looking to support inline SSML

scribe: attribute model in EPUB3, like data-ssml or just ssml but these are only hacks.

<Makoto_> But let me ask whether the API between browser engines or EPUB reading systems and T2S engines.

Key points: Content encode SSML into HTML

<Makoto_> Text only? Or DOM tree? This issue was raised in the joint meeting of the CSS and I18N.

AT and other speech producers must be able to consume the SSML from the content.

TTS must consume the SSML and render the correct speech.

<Makoto_> BTW, DAISY people in Japan are very skeptical about the use of ruby for T2S.

Apple can map most of the SSML to the native speech, would be great to support this

whsieh: Apples position on this has not changed

<whsieh> CharlesL: whsieh here from Apple (sorry!)

MH: 2015 working with IMS adding SSML to the QTI standard (authoring profile for test questions allowed authors to use SSML into test questions, but that standard QTI gets translated to HTML but then lost the SSML support.

<Roy> See the Pronunciation Overview at https://www.w3.org/WAI/pronunciation/

MH: attribute approach data-SSML has some support
... simple JSON value pairs
... some vendors seem to think this is a doable option.

Irfan: there is a wiki page for the example

<Irfan> https://github.com/w3c/pronunciation/wiki

MH: angle 30deg instead of AT saying CAB or C A B should be interpreted as separate characters.
... no speech synth can do coordinates in math, substitution method where pm gets expanded to picometer for example.

Judy: noun and verb are pronounced with different emphasis,

MH: we haven't see that in practice.
... creating web components, inline SSML, multiple attributes
... survey has been put out towards Speech consumers which options are acceptable.

Omar: use case use SSML for chatbot to service customers not a11y /SR we send the voice file

<Judy> [Judy: Wow! (Markku, comprehensive overview, thank you!]

Omar: we would have to stop doing that from the backend to support SR. other issue is to support other languages.

Janina: when we get to the normative part of the spec, we will need to specify language to ensure all TTS is already loaded.

Judy: isn't that a guideline in WCAG?

Janina: with inline you must declare the languages

aaron: we could do a prescan automation,

Omar: but will that be a refresh of page?

aaron: shold not be a problem nor refresh.

MH: wcag 2.2 might help us description of spoken content is a AAA requirement would like to see it as a AA.

Arab ic: arabic terms depends on the context of the sentence to add specific diatecs.

MH: ruby text

<Irfan> https://w3c.github.io/pronunciation/use-cases/#use-case-ruby

Judy: found the overview helpful information dense, i think a very highlevel overview would be good.

<Judy> ack (long

Bobby: req. for Japan is text layout Ruby model technical issues markup using Ruby above or right side
... issues with ruby model to support Japanese language, can hear these annotations twice, should just skip the Ruby base.
... issue with pronunciation with ruby annotations potenially. Chinese traditional / simplified

MH: one challenges getting all the stakeholders in the same room, we haven't had any Chinese companies to be part of the TF. would be great to get review from Apple, Google, would be great to get Apples involvement. We welcome more input, more eyes looking at what we are doing.
... I am looking at Avneesh and representing the Publishing community.

Avneesh: Matt Garrish has already been assigned to review your specification.

Irfan: FPWD has been published and will add more such as the gap analysis and add examples

use case needs more examples based on the feedback here. we have some timelines and are working towards meeting those.

MH: ETS some testing tools explore the different markup approaches, and they tend to work across platforms but for Mac , there are extra JS to o the mappings.

Irfan: the Survey, we got some feedback but still waiting, so we extended the date to next week.

MH: we will send of further surveys as we get closer to some recomendations, we are reaching out to the AT, and consumers, the amazons, googles, etc.

I was working on an Alexa skill that would take content from the web and if there was SSML contained then it would be spoken correctly.

MH: content editable is an important case

Input text can that speech markup can be done, JS range case

Irfran: HTML content editable, JS can manipulate this…

MH: masters student can take these WYHIWYS (What you Hear is what you See)
... costs how do we make this easier and cheeper and easy to maintain
... thank you all for coming here today, ruby text was great, the cost for SSML, text entry input was all great topics to bring up.

Thanks everyone. great discussion.

- DRAFT -

Improving Spoken Presentation of Content

17 Sep 2019

Attendees

Contents

Summary of Action Items

Summary of Resolutions

Scribe.perl diagnostic output