24 Oct 2018




Mark: Irfan and I are part accessibility group at ETS. we focused on how to make educational assessments such as Praxis, TOEFL, GRE. [07:12] <@irfan> ... for student who are language leraners, lot of the test content delivered through content speech synthesis like read aloud, or TTS [07:12] <@irfan> ...increasing demand from agencies to pronounce things properly [07:13] <@irfan> .. we heard from state agencies that they want a solution to deal with pro[CUT]
... its a real problem and we took this to standards body IMS consortium
... we were successfull in the offering process which is SSML
... the problem is that all of this test content eventaully get rendered in HTML and thats where the AT hits
... we dont have any standard mechanism but have some hecky solutions..
... every vendor in the educations based which serves tens of millionf of students use their own mechanism to serve the content. we need a solution through RAIA path way
... came up with ARIA-ssml solution
... went thourgh couple of iteration. as of now we ahve created task force.


nigel: we did some implementation including rate and pitch.
... the equivalent of voice does not exist

rob: when you say voice, do you mean tone or fanatics?
... do you mean male voice vs female voice or english vs french

nigel: not particularly specific voice but just voice

<foolip> foolip is Philip J├Ągenstedt

mark: BBC takes their best presenter and generate the voice font.

nigel: we can open a board of this idea

foolip: firefox has some bundle of this

mark: i like the idea

rob: for the male vs female voice, not just the color. but different language bundle requires different fanatics in the language model
... there is a diffeernt fanatic set for american english
... there are some sounds that simply do not exists

foolip: its pretty funny.. i tried

rob: its simply doesn't exists.

mark: some countries has robust speech engines.
... but this is different discussion here.
... so we see theconsumers here. we have different class. AT, third party that convert text to speech

<foolip> I think I was saying that there's probably no equivalent to OpenType for voices, i.e. no file format that you give to more than one voice engine.

mark: read aloud broadly available. MS open up with their office projects
... amazon ecos, cortanas, apple home products, they can leverage SSMLin the skills
... that SSML quality is not available for web
... we see need in educational domain and other public domains that pronunciation is critical
... people might be improperly using the ARIA that may fixe the problem but there is no consistent approach
... when you have visual impaired users, when you use aria to label content, that also would happen for braille display. we dont want that.
... it impact to student who can get easily confused for braille display
... searaching for area that can bring SSML into html
... apple did have some implementation of CSS3 speech in ios but its not more active project
... AT would need to support standards have nothing to follow
... the AT and TTS shouldnt make assumptions on presentaion based on their own rules
... we have SSML 1.1. which is W3c recommendation
... there is another approach PLS which solves educationsl specific problems but it doesnt solve all the problems
... we looks at SSML and context and we identified some properties that are required to be wroekd upon such as say-as, phoneme, sub, emphasis, break, prosody
... we have used it in some of the assessment content

say-as controling speaking model of the tearm. how you want numeric value to be read.

scribe: we have the use cased for all of these whithin the educational model

. we have been hitting our head on the wall for number of years to look for solution

scribe: we have looked at and attribute model seems to be a good path
... ePub 3 brought in some SSML attr via namespace
... namespace attr dont go there
... data-attr works
... whos problem is that? content, AT, TTS engines

mark: amazon ivona, polly and other voices support SSML
... we have looked subset of SSML function that we can match
... in the model that we are looking, we are using ssml properties
... we propose to define a new attrbute data-SSML

the attr value is a JSON structure which can contain ssml function and property value pair to be applied to the content contained by element


mark: AT can consume and generate TTS output strings based on the JSON data
... some examples



nigel: at BBC we never use TTS
... but I udnerstand there are lot of cases that w ecan use that

mark: we have a sitaution where one of the amazon voice was being used for elementary educational app. the comment was that speech was too fast. but it was default rate.
... it was based on actually human who was a model for this speech voice but it was fast.
... speech ates as wrods per minute is no longer a valid way to determined
... normal default rates is not a valid parameter

rob: the use case for in advertisement, you might want very quick voice such as insurance policy. but for educational voice, you may want something slow and clear.
... that might be more appropriate for different use cases

mark: these are interesting test cases

rafal: by JSON format, do you mean by saprate file for lexicon

