See also: IRC log
<Hixie> http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2009-July/021125.html
<chaals> ScribeNick: chaals
<scribe> Scribe: Chaals
<dsinger> wondering where a few people are: silvia, judy, and a few others
JF: I am John Foliot. I want to show the
Captioning service from Stanford at some point.
... work on accessibiltiy services at Stanford
JS: Janina Sajka, chair WAI Protocols and Formats. Want to present some thughts about need for controls API
SC: Sally Cain, RNIB. Member of W3C PF group. Echoing what Janina said, want to talk about audio description
AT: Hiro from IBM lab Tokyo. We have been working on this stuff for a decade. Am a general chair for W4A. Want to indtroduce our audio description stuff.
MM: Matt May, Adobe. Want to talk about what we learned from accessibility in Flash, and the authoring tools context
MC: Michael Cooper, work as WAI staff. We're interested in W3C technology having accessibility baked in. I am intersted in using existing stuff rather than new things where effective.
Frank: Microsoft, here to listen
KH: Ken Herrington, google. Work on captioning systems - would like to show a bit of what we have done, and have tried to avoid standards work for ages (but failed now ;) )
Joakim: researched at Ericsson, everyday job is
on indexing (photo tagging, media services, ...)
... co-chair of media annotation group at W3C
... want to talk about what we have done at Ericsson.
FS: Felix Sasaki, from University of Applied Sciences in Potsdam. Teaching metadata and in media annotation workshop. Not presenting in particular
Marisa: DAISY Consortium developer (we make
Digital Talking Book standards)
... here to learn
... can present a bit about DAISY and possibilities with HTML5
DB: Dick Bulterman, co-chair of SYMM (group at
W3C doing SMIL)
... I have been working on this for 14 decades :( Researcher at CWI in
Amsterdam, interested in authoring systems for multimedia presentations.
... would like to talk about what SMIL has done, maybe demo a captioning
system we have for YouTube and some seperate caption streams allowing 3rd-party
personalisation
IH: Ian Hickson, Google, HTML5 editor.
CL: Chris Lilley, W3C Hypertext CG cochair, CSS and SVG group, etc. Want to make sure that whatever we can do will be usable in SVG as well as HTML. Interested in i18n question - make sure you can have different languages.
PLH: Philippe le Hegaret, W3C. Responsible for
HTML, and video, within W3C. Late-arriving participant in timed text working
group. Hoping to get that work finished, have a demo of timed text with
HTML5.
... Didn't hear anyone wanting to present the current state in HTML5
JF: Also representing Pierre-Antoince Champain from Liris (France) who ar working on this.
EC: Eric Carlson, Apple. Mostly responsible for engineering HTML5 media elements in Webki
CMN: Chaals, Opera. In charge of standards, hope not to present anything but interested in i18n and use of accessibility methods across different technologies withut reinventing wheels
DS: Dave Singer, Apple, head of multimedia standards. Interested in building up a framework that gets better accessibility over time.
SP: Sylvia Pfeiffer, work part time for Mozilla, trying to figure out how to get accessibility into multimedia (including looking at karaoke and various other time-based systems). Have some demo stuff to show, but want to review the requirements we have...
GF: Geoff Freed, NCAM, want to talk about captions and audio description in HTML5
JB: Judy Brewer, head of WAI at W3C. Interested in how the options for accessible media affect the user, especially when there are multiple options.
DS: Please do not speak over each other, speak clearly and slowly so interpreters and scribe can follow.
DB: SMIL current experience
[we get: DAISY, Geoff, Google, Timed Text, Stanford Captioning, Silvia, Matt]
DS: go like this:
1. Geoff - about multimedia
2. John, Standford
3. Ken, Google stuff
4. Marisa, Daisy
James Craig rolls in from Apple
5. Silvia, stuff she has done
6. Matt (Flash)
7. Dick - SMIL-based captioning
8. Philippe, Timed Text
2.5. Pierre's video
<dsinger> geoff: how do we do your presentation?
GF: Want to talk a little about my concerns.
... we also did some Javascript real-time captioning
... We have been playing around at NCAM with some captioning stuff that might
work with video as is in HTML5, using real-time captioning.
... we have been testing by stripping captions from a broadcast stream and
embedding them in a page using javascript. We have a way to change channels -
e.g. to have live captioning for an event like this
... or it could be used as a way to stream caption data over different
channels.
... not the ideal way, it involves some outside work
... would like to see something like a caption element rather than having to
inject stuff with JS because that is probably not the most efficient way.
... We have a demo, I will send a screenshot
JF: My role is assisting content providers to get
accessible content online. Video has been a problem for some time.
... Feedback from people with Video was that they found it expensive and
difficult to actually make it happen when they were just staff doing something
else, not video experts.
... We made a workflow that allows staff to get captioned video online more or
less by magic.
... We set up a user account, and they can upload a media file.
http://captiontool.stanford.edu/public/UploadFile.aspx
scribe: We have contracted with professional
transcription companies (auto-transcription was not accurate enough for us) to
do transcripts.
... $5/minute to get 24-hour turn-around, $1.50 / minute for 5-day turnaround
- $90 per hour of video.
... System allows us to use multiple transcription services. We have created
some custom dictionaries (e.g. people's name, technical terms we use, and so
on)
... Content owners can also add special terms if they use something odd, to
improve accuracy.
... If you already have a transcript you can upload that instead. (We then do
the remaining work for free)
JF: Upload file, and we generate multiple formats
- FLV, MP4, MP3 (which is sent to transcript company).
... email is sent to transcription company when we have the file, and one to
content producer so they can start putting their content online even if they
haven't yet received the captions.
... When transcription is done it is returned by the transcription company
into the web interface.
... Then we do some automatic timestamp generation to turn transcript into
various formats.
... User gets an email saying they have their captions, and we give them some
code to copy that incorporates the captions into the web.
... This is not quite shrink-wrapped and painless, but it is pretty close. You
still have to shift some files yourself from server to server, but we are
working on automating these steps too.
... We rolled out the system at the end of the summer, have some users now and
are talking to campus content creators to roll it out widely.
... Scalability of production is important. Stanford makes maybe 200 hours of
video / week, and a bunch of that has archival value. So we need to be able to
run it simply, and scalably.
SP: You mentioned a company that does timesamping
JF: Docsoft http://docsoft.com
SP: Open source?
JF: Nope. Shrinkwrap product based on Dragon.
Speech reco not close enough for datamining, but good enough to automate
timestamping completely.
... We focus on the datamining aspect of this as much as the accessibility
JC: Have you experimented with the UI in the video player for this?
JF: Not at this point. Also talking to reelsurfer about this. We are looking at it.
SP: Linking directly - should mention that there is a W3C Media Fragments group, looking at standards to directly address into a time with a URI
JF: We see people putting stuff onto YouTube - and that will use our caption materials if they are there. We are giving people the ability to do this stuff...
<scribe> ACTION: John to provide a link that shows us some of how the stanford system works and looks for users [recorded in http://www.w3.org/2009/11/01-media-minutes.html#action01]
<ChrisL2> Please let us have your sample code for embedding, so we can see what people are using today
Video is presented (slideset and commentary)
<ChrisL2> Wonder what "speaker diarization" is ?
SP: Tracking who is speaking when
http://www.youtube.com/watch?v=QRS8MkLhQmM
(Note that the video *relies* on the captions)
KH: Video is spoken and signed in different langauges. Trying t oclarify it isn't for a handful of poor deaf people, but for everyone.
(Luckily everyone here gets that)
PLH: Are the captions burned into the file or seperate
KH: Seperate. I will explain that...
... A bit about numbers. Google has a very large number of videos, and
everything has to work at scale.
... Numbers are importa, going back to what John was saying, we show there is
value for a lot of poeple. Now we have numbers like how many people are using
captions on You Tube.
... those numbers can drive the adoption of captioning.
... we offer video searching of caption data.
(demo of actually searching, digging into a particular term's occurence in a video)
PLH: Is there anything in the usage numbers about how much people use video and if captioning has an influence on it.
KH: Don't have numbers to give, but they are good
enough that I get more support in Google than I used to :)
... want to let people see the numbers for their own video.
[Shows translation of captioning]
PLH: Speech to text, or caption as a source?
KH: Taking the captions as a source.
SP: Has Google included the offset search for YouTube into the Google engine.
KH: Yep. That was what I just showed.
SP: Captions can be used to improve search results. Is Google using them for indexing videos?
KH: Yes
DB: If I post a video, and someone else wants to
caption later, how does that work? Can anybody caption anybody else's video?
... can I restrict who uses the content?
KH: There are several websites that allow this. You can do it with or without permission.
DB: So I effectively make a new copy of the base video?
KH: Sort of depends on how you do it. For YouTube you ned to be the owner - or contact the owner.
JF: There are other 3rd prty tools that pull
tools and transcript from seperate places.
... The value of captions gets attention when people see the translation of
captions
KH: Behind the scenes...
http://video.google.com/timedtext?v=aC_7NzXAJNI&lang=en
SP: You use a new XML format instead of SRT - for some particular reason?
KH: We control it. SRT isn't always so
sweet....
... we actually produce various formats (incl. srt among others). It's easy to
add new ones, so we don't care which format people have.
http://video.google.com/timedtext?v=aC_7NzXAJNI&lang=en&tlang=ru is a live-generated translation
[screenshot]
Caption text is stripped from the broadcast and injected in real time using Javascript.
various UI controls on top.
Transcript button can also give you a full log.
Break - 17 minutes.
<ChrisL2> ScribeNick: ChrisL2
<chaals> [Please do not use camera flash]
Marisa gives a presentation
John introduces a guest account for the Stanford facility to experiment with the transcription service
[Marisi introduces DAISY with a demo]
<jcraig> http://captiontool.stanford.edu/ user: W3C_HTML5 pass: Accessible!
DAISY uses SMIL to synchronise the audio with the html text
MD: DAISY 4 being defined adds forms and video.
Has an authoring and a distribution side. Can HTML5 be used for distribution
... Native browser support for audio and video is good, but we se barriers in
html5, for example the extensibility. How to de add sidebars for example. Needs
more roles than are built in.
... Want to avoid downgrading the content to represent it
JS: DAISY did a demo with two filmings of Henry V, scene by scene sync for comparative cinematography plys avatar for lip readers. Still interest in multiple synced media levels?
MD: Will be able to answer that next week
JS: Not clear if it fits in HTML 5 or 5.1 or 6
MD: Interested to use SMIL and HTML5 together to
get that synchronisation
... also forms and annotations
... constrained by implementations
SP: Demos with three Firefox builds - standard,
with screenreader, and custom patched to add native accessibility support
... Need for signing captions for hard of hearing and deaf
... audio descriptions
... textual transcripts for screen reader or braille
... All of these create multiple content tracks. Multiple audio, text, and
video tracks for a composite media eleent
... need a standard javaScript interface to control this
... and coneg reduces download to just the items of interest
... Text tracks are special and much more valuable outside the multimedia
container. Easier to search index etc more easily than burried in media
... aids editability, crowdsourching, CMS integration
... sharable
Proposes an itext element with @lang, @type, @charset @src @category and @display
<silvia> https://wiki.mozilla.org/Accessibility/HTML5_captions
sent to whatwg and html5 list, not much discussion there, but feedback directly on the wiki
[demo with video and selectable captions, subtitles]
SP: Issue is that all timed text tracks are
treated identically. So next proposal identifies some categories, not all
supported in the implementation
... but these are the categories in use based on 10 years of collecting
them
[demo with compiled firefox, rather than using scripting]
(prettier interface compared to the scripted one, same functionality, captions are transparent background now like proper subtitles
CL: Question on encodings vs display
PLH: Why is this an issue, Mozilla does this
SP: Uses HTML existing code for layout
PLH: default captioning language
SP: display="auto" selects an auto language
negortiation. other options are none and force
... user can override author choice, but author shoule be abel to express
their design too
... second proposal has a grouping element itextlist to express common
categories etc rather than repeating them
https://wiki.mozilla.org/Accessibility/HTML5_captions_v2
dom interface allos to see if a given itext element is active
PLH: Are you generating events?
SP: Yes, onenter and onleave for new caption blocks or segments, so can listen for that
PLH: charset because srt ddoes not indicate the charset?
SP: yes. Some formats dont provide this
... charset is optional, some formats self describing and don't need it
... no registered mime type for srt
[discussion on where SRT is defined]
SP: currentText api shows the currently displayed
text, so a script can manipulate the text or display it
... also a currenttime interface
... works the same for external text tracks of for ones in the media
container
JC: So can do searching or access text that will be displayed later
[demo of v2 using scripting implementation, changing language on the fly]
[demo of firefox with a screen reader, firevox]
[demo hgad audio discriptions for vision impaired, and text to speech audio description. uses aria.]
aria active region
SC: Will that use the defualt screenreader, or ontly the one in the browser
SP: the default
CN: How do you add audio ?
SP: needs native support in the browser with a n
interface so its the same for internal and external sources ... audo and video
needs to be synchronised
... dynamic composition on server is recommended way to do that
CMN: Issue with therird party signing track, third party has no access to server where the video lives
SP: text is special, but audo could be treated
like that too. To be discussed
... needs a spec
CMN: Signed video is similarly special to subtitle text
JC: Can the audio pause the video for long descriptive text (for time to read it or have it spoken)
FO: If the text is inside the container and there is external text too what happens?
(we agree that the proiority needs to be defined)
FO: User needs to decide, no one size fits all solution
SP: yes we need the flexibility there and have the api to make it workable
JS: Resource discovery
CMN: So need to think about an API that finds internal as wel las external tracksa dn treat them uniformly
HT: Can itext eleents be added or changed dynamically?
SP: yes
S\[SP demonstrates a video hosting site with ogg audio, video, and subtitle./captioning/transcript support]
(all done in HTML5 and script)
http://oggify.com/ under development
SP: Like youtube except with open source and open formats and script plus HTML5
FS: Where does the timing info come from?
SP: From the subtitle file, they all have start and end times
JS: Nested structural navigation is important.
chapters, sections etc
... access to next scrne next act would be good
SP: Titled text tracks have DVD-style chapter
markers
... linear though, not hierarchical due to limitations of flat file
MD: Yes DAISY infers structure from heading levels
SP: Complext to bring in generic HTML and then
display it o anothert HTL stream .. security issues
... media fragments wg is specifying how to jump to named offsets as well as
time offsets
... not finished yet
JS; Direct access also for bookmarking as well as flipping through
SP: Chapter markers and structure exposes the structural content of the video, for speed reading among others. can do it with URIs so bookmarkable and can be in the history
[Matt talks about history of accessible captioning in Flash]
MM: Two minimal requirements, flash support in
video since Flash 6 and later the ability to insert cue points to associate
captions with the video, in Flash 8
... several attempts to crreate captioning, but they were unsynced so
unsuccessful. Result was lack of adoption and a thousand hacks to try and do
it
... cue points got us closer, reliable feature
... starting with flash 8, reliable caption sync but no standard way to do it.
usually embedded int ehc ontsainer so hard coded fonts, done in actionscript.
Content buried inside script
... Came to realisation that inside-only was a naive approach, looked for
alternatives. In flash 9 we supported timed text dfxp
... can assocate flv_playback_caption component, takes an external timed text
file for captions
... used an existing standard, tt was there
... not re-re-re-re inventing the whee;
... third parties can build captions and authoring software
... hopeful that other formats adopt dfxp as well
... Breaking out the captions, as Sylvia and Ken mentioned, is important for
any standard. Cam embed but thats just the first step. Only download the
required captions. Allosw third parties to add their own content. Crowdsourcing
captions
... dealing with third parties adding captions later
... For html5, also important to have captions *inline* in the html document
itself. Not complex to add
CL: SMIL also found a need to have text inline as an option
DB: One option
SP: most of my demos are at http://www.annodex.net/~silvia/itext/
MM: Everyone is familiar with magpie
[we aren't]
MM: Shows this is well tried territory, so HTML5 should be able t use existing authoring tools and workflows to do this
<jcraig> MAGpie is a captioning tool from NCAM
MM: So please consider existing solutions
<jcraig> http://ncam.wgbh.org/webaccess/magpie/
JC: Many content providers of video have no idea how captioning works, as other people do it
M: Using captions in AJAX, inserting vie athe DOM in real time. Issues of scrolling.
[lunch break]
<fo> t
<silvia> Dick Bulterman (sp?) will talk about SMIL
<scribe> ScribeNick: sylvia
<dsinger> ScribeNick: silvia
chaals: I will scribe *for a bit* :-)
<chaals> Scribe: Silvia
"Supporting Accessible Content" - lessons from SMIL 1/2/3/
co-chair of SMIL working group
head of distributed & interactive systems group at CWI
interest for long time in working with multimedia
temporal & spatial synchronisation
take-home message: a11y isn't about a particular media format
it is about supporting selectivity among peer encodings
e.g. different encoding of same content for different situations, e.g. when driving/reading/conference
it is about a coordination mechanism to manage selection of media streams
difficulty is that they change over time
it is about providing 'situational accessibility' support
nobody wants special-purpose tools
<plh> --> http://www.w3.org/2009/Talks/1031-html5-video-plh/Overview.xhtml plh's slides
we should solve the problem for everybody
we need to make the temporal and spatial synchronisation explicit to be able to do the complex things
what is accessible content?
what kind of things would need to be done with a video object
it could be a svg object or another object, too
- want to add subtitles
- want to add captions and labels
labels are a cheap and simple way to communicate what is being visible in the video
"you are about to see my son play an instrument"
- line art & graphics
it would be nice to have a uniform model for all types of objects
- audio descriptions
- semantic discriminators
people want things at different levels of detail at different times
Some experiences from our SMIL work:
- not all encodings will be produce by the same party
e.g. even while Disney owns the video, they may not be the ones to create the captions
- not all content will be co-located
e.g. a video may be on one server, but content enrichments will be on many different servers
if you want highly synchronised audio and video, they practically have to be in the same file
the network delays can easily add up to make it impossible to synchronise them
but you can create some things on different servers
- there may be complex content dependencies
- each piece of content may not be aware of the complete presentation
*SMIL Support for Alternative Content*
SMIL 1.0 in 1996
<switch> : selection based on system test attributes
(language, bitrate, captions)
support alternative selection of parallel tracks
demo of MIT Prof. Lewen (sp?)
in SMIL:
<video src="MITguy" … />
<switch>
<text src="A" systemLanguage="nl" systemCaptions="on"/>
<plh> s/ Lewen (sp\?)/Walter Lewin/
<text src="B" systemCaptions="on"/>
</switch>
<ChrisL2> ther eis an implicit PAR around that example so the video and the switch play in parallel
document order is only fallback - user preference dominates it
*SMIL 2.0 (2001)*
- custom test attributes
- added <excl> tag for pre-emptive inclusion of content
<excl> provides support for audio descriptions
event-based activation
demo of a video, which pauses on a schedules pre-empt to wait for an audio description to be displayed
when that audio description is done, the video continues
*SMIL 3.0 (2008)*
- number of different extensions
- <smilText>: another timed text format - allows embedded hyperlinks, allows style sheets, allows motion, allows fine-grained text
streamable labels, captions, mW events
- smilState: allows coordination via data model
- timed, decentralized metadata
- media pan & zoom (temporal focus, e.g. Ken Burns effect, coupled with audio description)
demo of a web page with three buttons used to influence the presentation
* SmilText *
Why not simply reuse DFXP?
- it was not intended to be embedded in SMIL
- it isn't a streaming format
- it doesn't allow mix of absolute/relative/event timing
- it doesn't handle motion text
- layout + style processing are idiosyncratic
SMIL needed to support live TV streaming and supporting live captioning wasn't possible with DFXP
smilText was explicitly designed to map well with DFXP
there is still an easy mapping
- smilText is functionally compatible, with a direct mapping to DFXP
- smilText is also a direct replacement for RealText
*What will HTML5 need?*
* video object can't always determine timeline
- need external-to-video notion of temporal/spatial coordination
* Simple media control (start/stop/pause) are not rich enough
- impossible to enumerate all of the things that you may want to start/stop/pause in parallel
it might be a good idea to create a middleware that handles e.g. pausing across all involved elements
* need to support embedded and external companion content
*T/S coordination info: where?*
temporal/spatial coordination should go into a script? a web page header? a flash object? into SMIL?
a companion media object?
- very laborious for fine-grained timing
in script controlling directive activation?
- probably not, cause it doesn't scale
In companion synchronization specification
- good for extensibility
- options:
— fully external
— timesheet
— internal
timesheets are like style sheets for synchronising timing
html+time is an example
there is no right answer - but it is important to have all the flexibility
*Editing Complex Presentations*
Demo: GRiNS (1996-2004)
authoring sw for smil presentations
- interactive navigation
-scalable presentations
demo: BBC did a 40min newscast with a structured view and direct access using SMIL
a structured view gives you all the possibilities of gaining the presentation in the way that you want it
*Adding Captions to 3rd Party Videos*
An Aside:
- helping the community to share comments on videos that other people own
demo: Ambulant Captioner
*Adding Captions & Labels (and Context)*
After selection, add comments
helps caption authoring
does predictive timing on your captions
it helps people produce timing more easily
*Putting Navigation into Captions*
captions are being used to provide navigation
temporal hyperlinking
intra-clip navigation
inter-clip navigation
*More about SMIL & Accessibility*
SMIL 3.0 Book:
find it on xmediaSmil.net
captioning tool: www.ambulantPlayer.org/smilTextWebApp/
together anywhere anytime project: www.ta2-project.eu
smilText: code.google.com/smiltext-javascript/
ugliest demo section :-)
JS SMIL Player: ambulantPlayer.org
timesheets: w3.org/TR/Timesheets
re-usable timing
that ends the presentation of DB on SMIL
questions?
joakim: is SMIL used?
MMS is based on SMIL
Quicktime media player
<ChrisL2> s/www./http:\/\/
digital signage
quicktime on the desktop had a smil implementation
windows media player
windows media player uses it to pre-empt national security events
realplayer
joakim: playlist of product presentations?
in supermarkets France or Finnland uses SMIL for kiosks
interactive selectivity is one of the big things there
having the target of a hyperlink change over time
we see a lot of ways of deployment of SMIL, but it's been frustrating that it's not been used more
things move incredibly slowly
even with dozens of years of experience with interactive multimedia, we still only have a <video> tag in html
SMIL has the power that easy things can be done easily, but also difficult things in a simple way
finishes Dick's presentation
<plh> --> http://www.w3.org/2009/Talks/1031-html5-video-plh/Overview.xhtml plh's slides
next: Philippe on the state of timed text (DFXP)
shows a html5 video demo
demo: www.w3.org/2009/Talks/1031-html5-video-plh/Overview.xhtml#(2)
<ChrisL2> http://www.w3.org/2009/Talks/1031-html5-video-plh/
Timed Text
came out of the TV world
started in 2003
original idea was to have an authoring format for the web
as Adobe used it in Flash, it became increasingly a delivery format
demo: NCAM flash player with DFXP
<tt
xmlns="http://www.w3.org/ns/ttml"
xmlns: tts="http://www.w3.org/ns/ttml#styling">
<body>
<div>
<p begin="0s" end="10s">
This word must be
<span tts:color='red'>red</span>
<br />and this one
<span tts:color='green'>green</span>.
</p>
</div>
</body>
</tt>
highly controversial use of name spaces
*online captioning*
# Parameters: frame rate, ... (SMPTE, SMIL)
# Styling: XSL FO 1.0, CSS 2
# Layout and region
# Timing model: SMIL
# Basic Animation: SMIL, SVG
# Metadata
example timed text document
it's possible to use it in a streaming context, but you have to be careful what additions you make to the file
test suites
http://www.w3.org/2008/12/dfxp-testsuite/web-framework/START.html
one demo is with HTML5 using javascript to synchronise with the <video> element
the list of which features are supported in which player is given at http://www.w3.org/2009/05/dfxp-results.html
Adobe and MS players are still prototypes
JW player's support is disastrous
WBGH support is also not that great
I don't know what the plan is to update those implementations
plh shows different web browsers supporting html5 and MS/Flash implementation in test interface
*Recent Progress*
finishing on the testing
published last call
dynamic flow still needs testing
we're waiting on the implementation from Samsung
trying to become W3C recommendation by Dec 2009
but we need the dynamic flow implementations first
SP: when did you last update the HTML5 DFXP implementation
plh: I need to update it a bit, but there is cool stuff that can be done with DFXP
finishes plh demo
next up joakim
"Media Annotations Working Group - overview"
we started a year ago
I'm co-chair, Felix used to be staff contact
*purpose*
- facilitate metadata integration for media objects in the Web, such as video, audio and images
- means is to define an Ontology and API for metadata
we're part of the Web Video work in W3C
we're trying to re-use what exists
vision is to make it easy to use
the ontology relates the different metadata formats
the mapping between different formats is provided by the working group
some formats in sight:
XMP, DublinCore, ID3 etc
*Definition of metadata properties for multimedia objects*
example properties
* ma:contributor
* ma:language
* ma:compression
http://www.w3.org/TR/mediaont-10/
*Relating properties to existing formats*
gives an example where ma:contributor maps to media:credit@role in YouTube Data API protocol
ma: contributor maps to dc:creator in XMP
semantic mappings: exact, related to, more specific/more general
syntactic mappings e.g.
unicode string, also given
*API for MEdia Resources 1.0*
example for ma:contributor property
consists of id and role
API was published 2 weeks ago and is in first public working draft
*Challenges*
- General
— reading is easy part, but how to (or if to) write "ma:" properties into media files
— getting verification of mappings (needs to be based on actual usage, not on the specifications)
- specific to accessibility
— how does the media nnotations approach fit to a11y needs?
— are there a11y related attributes that are missing?
*Resources*
provides links to home page, requirements,, ontoloy, and API spec
Marisa: how would you use this?
DAISY is using the book ontology
we're interested in this ontology
Felix: it's easy to create the mapping table, but to get the feedback on mapping ontologies to each other is difficult
also, it's different what is being written into the spec and what is used in the wild
Doug: are they mis-using some of the bits in the ontology for other reasons?
Felix: no, just using subsets
ends joakim's presentation
Ian's presentation next
"Where are we in HTML5 with <video>?"
-html5 has an audio and a video element
- basically the same element
- defined as a single abstract concept of a media element
- the ui is basically up to the browser
- common codec is a challenge
http://www.w3.org/2009/Talks/1031-html5-video-plh/Overview.xhtml#%282%29
what we're seeing there is: the part above the controls is the video element
the part below the video is SVG - it could use HTML div/buttons etc
controls in this example are all scripted
when you hit the play element, it sends video.play() and then the video starts playing
there is scripted access to the loudness control
if you enable the video UA controls, they are also shown
browser UI and scripted UI are in sync, so if you silence the video through either, the other reacts
the API gives you further information
e.g. state of network
playback/buffering state
you can seek
it basically supports streaming content
browser is exposing a slowly moving window into the video
playback rate can be changed
you can make it loop
autoplay
the goal was not to do SMIL
if you needed the whole support of SMIL. you'd use SMIL
DB: we need to be careful to go down the road of
this
... similar things need to be called the same only if they are completely the
same
Ian:
2-3 ways that a11y is built into the API
* tracks that are built into the media resource
not currently exposed in the API
the browsers are expected to expose that if it's available
* javascript overlays
missing the cue ranges API now
* <source> element is not just used for different codecs, but also for different bitrate/quality videos
Next version is expected to have something similar to what silvia suggested <itext>
the main thing blocking this now is that the things already in the spec aren't even implemented
what's already in the spec needs to be implemented solidly before more spec is added
a test suite for this is still missing
* extensibility
HTML5 has successful extensibility mechanisms
ChrisLilly:
CL: <video> implementation being incomplete may also be because the spec is incomplete, so extending the spec would be better than waiting for full implementation
DB: you need to bring people in and you need a
level of functionality that is more attractive than just video playback
... you need to have a roadmap
Ian: what Silvia is proposing is pretty much what we need
DB: srt timing model is different to SMIL and
different from others
... if we come up with multiple timing models, that don't work together - in
particular with multitrack audio/video - it might be better to change the
timing model
... the impression I have is that we may need to look at <video> more
indepth and extend it more, instead of making it too small
... is missing the discussion on the timing model
Ian: my understanding is that it's using the timing model that people are expecting
DB: there are many practical issues for the
choice of the timing model - we think <video>'s timing model is too
restricted
... we may have a means extend this better
Doug: from what I understand HTML5, no decision has been made on the choice of the timing model and SMIL still fits into it
Ian: the design of the element was based on the idea that any timed actions that need to be done would use SMIL
Doug: the SMIL stuff that is in SVG could be reused in HTML5 - and since browser vendors like to reuse things, it's inclined to reuse that
Michael: no support currently for resource discovery?
Ian: not yet, but it seems silvia has a plan
Marisa: is there any approach to HTML5 where people are creating similar things to a digital talking book with overlays and synchronised SMIL scripts etc?
Ian: if the goal is to specifically synchronise the playback of audio and video, then the approach should be SMIL
joakim: I think media discovery would be a perfect way to use media annotations, e.g. two different versions of a video
Michael: basically the a11y of the video file is
left to the video format
... there are characteristics that you need to know about, whether it has
captions etc, so given that we have the capacity for scripted controls, then
the selector should be in the API
Ian: silvia's itext has that as a proposal
Matt: how do we determine what belongs in the API and what not ?
There's a process for how to extend HTML5
<Hixie> my answer was "that's a judgement call, there's no general rule" :-)
<plh> --> http://www.w3.org/2009/10/W3C-AccessibilityPA.pdf Dick Bulterman slides
<fsasaki> scribe: fsasaki
now judy brewer presentation
judy: interested in quality of acc. support
... current support in HTML5 may be insufficient for captions etc.
... profileration of different caption etc. formats is a problem
... silvia described high-level requirements for captions etc., liked that
... best practices are good, but need an overall approach of sets of
requirements
... and description of relations between existing standards / approaches
<plh> --> http://media.w3.org/2009/10/ACAV.ogv ACAV Project video
judy: html5 joint taskforce might be a place to
define what requirements that be
... Geoff said that html5 caption solutions are insufficient
... captions should be in html5 like video in html5, that is its own
element
... deaf community features are not sufficiently represented
... srt is easy to write with text editor
... but that might mean choose fast gain over long term benefit
silvia: a clarification: itext element tries to
be format independent
... srt could be a good baseline format, not XML , but that can be
discussed
... allows for linking to any format that your brower supports
judy: Geoff said audio and video should be on the
same level, e.g. audiodesc and videodesc element
... time is perfect for html5 to have good solution with caption-specific
elements
... externally referencing captions is another need, formulated by Geoff
dick: understand why geoff said that external captions might be good
<ChrisL2> http://www.evertz.com/resources/eia_608_708_cc.pdf
<plh> --> http://www.w3.org/2009/10/MarisaDeMeglio.DAISY.pdf DAISY slides
dick: but would be a shame if we had two mechanisms (two and external) to handle the same thing
<plh> --> http://www.w3.org/2009/10/html5 access mawg-20091101.ppt Joakim's slides
now presentation by dave singer
dave: good acc. needs three things: good specs,
uptake by authors and users and user agents
... it is easy on one of the three
... we can do better than TV, not replicating what is there in the non-web
world
... timed acc. problems. e.g. captions in audio, video contrast
... some people need high, some people low contrast
... general time management: acc. by flipping information with the media
... rate preferences slower than normal rate sometimes necessary for acc.
... question of untimed acc. : having a transcript
... also untimed: longdesc, fallback not only for non-support, but also for
support of e..g video, but not for a specific user
... a question: inside or outside the video container?
dave: inside: media container can have overlay
time tracks.
... no mechanism for outside avail. yet, so synchronization with inside is
easier to achieve
... meeting users needs: select the resource that the user needs
... choices: by preference, or by action, or both?
hypothesis: out of scope is a user preference repository
<ChrisL2> [Judy explains seizure disorders]
dave descriping possible choice approaches for user needs, see slide 8 of presentation
<ChrisL2> Scribe: Chris
dave: it matters who renders captions engine or somewhere else
silvia: could it all be done by web engine?
ian: depends on media framework used
... we can expose API, but depends on media framework if it is possible or
not
dave: scripted accessibility
Janina: you might prefer additional content depending on the specific part of the media
dave: in HTML5, the approach is to link to SMIL
for synchronization
... in future we want to have a video in different areas of a page
dick: several ways of doing things
... be careful of controling things: scripted control vs. declarative
control
... "scripting" is not always the term you want to use
chaals: worth to see in html5 how to get there
James: flash already supports some of these things
dave: about sign language, there was just one code
silvia, felix: has been solved in latest version of bcp47
dave: summary: what do we need what is not in
HTML5, what should be in a best practices document?
... cue ranges are very important
... describing user preferences, probably informative in HTML5 view, since in
other specs
... and script access to control features of the media
... and CSS media queries
silvia: about cue ranges
... everybody means something else talking about it
... making the requirements for it clear would help
dave: look not only in captioning, but at the big picture
james: interest in universal design for all
users
... including technical needs, e.g. related to band width
... looked into content selection without knowing preferences
dave: web developers asked for "how do we find
out if a user uses a screen reader"
... currently, SWIF / FLASH are the only means to achieve that
... that has some security implications
... in CSS media query you download everything, different than content
selection before download
... potential that user wants to share preferences with certain web servers
... if there are methods in video element like getCurrentCaptions, that would
have security implications as well
... certain security restrictions could be more lax for certain users
shepazu: "cross origin resource sharing" is the
way we are going to do this
... a question: what do people think about privacy concerns?
james: thought of some preferences, e.g. color-blindness, which you might not want to convey via your browser
dave: very important aspect
matt: heuristics can be used to determine whether a user uses a scree reader
<plh> --> http://www.w3.org/2009/10/dws-access-workshop.ppt Dave' slides
matt: even flash are not guarentees that information is not conveyed
dave: platform integration is a problem, e.g. if
the screenreader does not find a play-button
... scripting does not integrate well with platform specific heuristics to
find things
janina: important that API can access control
information
... what is default list of controls being disposed
... if it is left to developer to come up with controls, there will be a small
set of acc. controls
doug: I am editor of DOM 3 events
... hardware controls are a way of bypassing the problem
james: having video controls standardized methods would help
<silvia> http://www.marcozehe.de/2009/06/11/exposure-of-audio-and-video-elements-to-assistive-technologies/
james: system could have its own key etc. usages
silvia: there is acc. of controls by shortcuts
implemented by browser vendors
... important for all users, not only acc.
... through javascript interfaces there is also access to buttons
... that kind of control is on the roadmap for firefox
@@: for webkit as well
janina: have browser folks looked into structural
navigation?
... e.g. subscene to subscene, ...
... could be in a 3-hours physics presentation very important
silvia: the @@@@ website works in this, that is
navigation markers
... not hierarchical yet, nobody has looked into that yet
janina: DAISY did that to some extend
silvia: if we have time-aligned text tracks, that
can go back into video
... both search and a structural overview are important scenarios
dick: producer and consumer aspect are important, latter e.g. user annotations
dave: likely that we will get automated scene detection etc., not manual annotation
janina: there are also use cases for manual creation
john: needs to be a method to do the automatic way too
dick: in SMILtext, people used links in captions
<chaals> [+1 for IPTV and other TV-based standards being important liaisons to keep in mind]
sally: doing audio description, but also looking
into IPTV
... audio description often underrepresented, but is very important
... our broadcasting team says, audio description is mixed in content, or
separate
... think that separation is better, or at least awareness that sometimes it
is mixed
... using existing guidance from here is good, but how do we integrate work in
ISO or ETSI?
... route of deliverance are various, e.g. eTest, eAssignments etc. Need also
to be taken into account
takagi-san: work is part of japanese government
program
... only 0.5% programs of TV has audio descriptions
... huge expection to provide more audio description also on the web
... cost is most important in audio description
... requires special expertise, human skilled narrator
... text-to-speech has become very good today
<chaals> [impressive demo of emotionally inflected Text to Speech]
takagi-san: project is NICT (Media Accessibilty platform)
Takagi-san shows video describing the benefits of Media Access. platform
dave: all Text-to-speech is done automatically?
Takagi-san: yes, we just have language tagging
... we use also a kind of ruby
... we also plan to use emotional markup
... choices in audio description are: human prerecorded voice, prerecorded
tts, server-side tts, client-side tts
... not sure yet what is most appropriate
<ChrisL2> http://www.w3.org/TR/emotionml/ Emotion Markup Language (EmotionML) 1.0
Takagi-san: content protection is another requirement
john: how do you do the synchronization?
Takagi-san: used windows time seeking functions from windows media player
silvia: you have high-quality speech synthesis
... could be a web-service: user sents a text description, turns that into
audio, potentially on a different server
dick: source from TTS: where does it come from?
Takagi-san: from a script
dave: somebody typed in a scene description
dick: there are legal issues about changing the content flow e.g. for Disney
Takagi-san: yes, discussing always with content
providers like Disney
... legal issue is really important
... that is why we work with Japanese government
janina: this is a problem to be solved on government level, not industry
short break, short discussion after that
dave: need a paper or s.t. describing how to get good acc. into html5
michael: made some notes related to that, and
about existing solutions
... many people said that there are non-acc. use cases as well
<chaals> Scribe: Chaals
MC: Need to gather use caes and requirements
... gather the non-accessibility use cases
... and the existing solutions.
... There are proposals about technology - which are proposals, which are
solutions
... How do we make sure that the technology makes it possible to meet WCAG
requirements (not limited to them, but at least getting to that level)
[MC is staff contact for WCAG]
DS: Non-accessibility uses: I am a bit hard of hearing, and there will be stuff I cannot catch. I rewind, mute, and watch the captions. Would like to be able to call the captions for the last section in parallel. Is that a non-accessibility?
JS: Another one is the ability to slow audio as a way to increase comprehensibility
JB: Don't think that is a non-accessibility use case.
JC: The semantics of "accessibility" is a bit in
flux.
... preference for low-bandwidth - is that accessibility?
<silvia> https://wiki.mozilla.org/Accessibility/Video_a11y_requirements
CMN: Question - where to do this? PF? HTML Accessibility TF? somewhere else? I suggest HTML accessibility task force
SP: Have written a requirements doc for Mozilla that would be a good basis.
[/me has a brief skim and thinks it is a very good basis to steal things from]
Doug: Should we start from something like this?
MC: We need a "how to do accessibility" and then gap analysis of HTML and what it does now (and what it needs)
DB: We can comment on a document that does this from the SYMM group.
<dsinger> http://www.w3.org/WAI/PF/html-task-force
JS: Makes a lot of sense that we take the work in the HTML Accessibility Task Force.
MC: Concerned that the HTML Accessibility task
force has a heavy load.
... concerned that the current make-up of the task force lacks video
expertise
JS: Maybe a group within that group would take the work.
Doug: People copy code. Maybe we should also look at the things that people can already do in HTML 5 and show how to do what is already possible.
<jcraig> s/withing/within/
SP: There is a fear that at the next level of HTML5 there is no way we can put more into it. How serious is the prospect of a complete freeze?
<shepazu> creating tutorials also helps discover gaps in functionality
IH: The consideration is not serious - the spec will continue to evolve. The rate-limiter is how fast the browsers implement. It might be that we make a feature-freeze and the work goes into HTML 5+some ... right now the video implementations are still flaky, and interoperability is still poor. By the time that is solved, I see no reason not to add better stuff.
CL: Heard you say that you have code on a private branch. Is there a chicken-and-egg problem?
SP: The trial implementation is available to influence the spec already.
[Process discussion about our process]
<plh> --> http://www.w3.org/2009/10/VideoA11yWorkshop.txt Silvia's slides
JC: Even if we are going with e.g. itext, should we consider requirements on media formats (e.g. the need for dealing with some external captions and some internal captions?
[several]: Yes
CMN: Is there anyone prepared to take on the editing of a requirements and use cases document?
JB: Think we should follow the idea of working in the HTML access task force
DaveS: Think we should try to make it happen there.
MC: This means we need people to join the task force, which means being a member of HTML-WG
SP: I've done requirements work and implementation for itext and tried to validate it. So far still seems good, but I am happy that this goes to others looking and figuring out if it is what we need, or what we need to change.
<dsinger> ACTION: everyone to look at joining the HTML accessibility TF [recorded in http://www.w3.org/2009/11/01-media-minutes.html#action02]
<janina> HTML A11y TF is at:
<janina> http://www.w3.org/WAI/PF/html-task-force
<dsinger> ACTION: everyone to review Silvia's requirements document [recorded in http://www.w3.org/2009/11/01-media-minutes.html#action03]
SP: think itext seems to be going down the right track, but will be making sure that I am not just going off on my own and ignoring what others want and do.
JB: Right. I am concerned about fragmentation - but that is what we keep an eye on in the process of going forward.
DB: Sounds like so far you are happy with what
you have - and agree that it's nice. The part of the process that brings things
to working groups for comment I want to be sure that there is a way to comment
and say changes are needed, if that should be the case.
... also worried about what we do with SRT etc, who owns the problem, how it
fits in with re-using the technology, etc.
SP: Sure. This is happening in the HTML WG.
<Zakim> MichaelC, you wanted to say if we want video to be a focused group within HTML accessibility task force, need all the interested people to join (call for participation should go
KH: I'm new to the process, so...
... It sounded like Ian wanted something implemented before it could be
accepted as part of the spec.
... So the process is "have an idea, implement, get others to implement". But
if we implemented it, do we not have to get anyone else to do so?
IH: Stuff in the spec has to be implemented better before we get to adding more stuff. It definitely helps to have experimental implementations to decide what shuld be the standards. In practice, they get shipped, people start relying on it, and then we are stuck with not breaking that. Which isn't ideal, but the reality is somewhere in the middle
DSing: We don't want the specs or implementations to get too far ahead of each other.
Doug: If a few pages do something wierd, we can insist they change...
IH: Depends on who/what they are.
... poster child was canvas. We found some serious problems, and we had to
figure out how to change it. We changed some of it, but then we found more that
would cause huge problems to fix it further, and we were kind of stuck with
this.
PLH: I don't believe HTML5 can move forward
without addressing video accessibility. We have to figure that out. Best we can
do is review proposals and provide feedback, not just follow the first
implementation and bless it without thinking.
... But it will not be acceptable to simply move forward without getting it
right.
DaveS: John and I are volunteering to take responsibility for video accessibility within the HTML Accessibility task force at least for now - chase peopleinto it, get documents together, etc.
<scribe> ACTION: Silvia to put links to existing content into the task force wiki [recorded in http://www.w3.org/2009/11/01-media-minutes.html#action04]
<scribe> ACTION: DaveS and JohnF to take responsibility to drive this work into existence in the HTML Accessibility Task Force [recorded in http://www.w3.org/2009/11/01-media-minutes.html#action05]
THANKS to John, and to Dave, for making it happen.
ADJOURNED
This is scribe.perl Revision: 1.135 of Date: 2009/03/02 03:52:20 Check for newer version at http://dev.w3.org/cvsweb/~checkout~/2002/scribe/ Guessing input format: RRSAgent_Text_Format (score 1.00) Succeeded: s/oops/Hiro/ Succeeded: s/multimedia and/Hypertext CG cochair, CSS and/ Succeeded: s/Foliot/Foliot and Stanford workflow for Captioning/ Succeeded: s/pointer into this/link that shows us some of how the stanford system works and looks for users/ Succeeded: s/Lyris/Liris/ Succeeded: s/Bo/Bu/ Succeeded: s/emdia/media/ FAILED: s/ Lewen (sp\?)/Walter Lewin/ Succeeded: s/Cotnent/Content/ Succeeded: s/"l"/"nl"/ WARNING: Bad s/// command: s/www./http:\/\/ Succeeded: s/??/joakim/ Succeeded: s/frustrated/frustrating/ Succeeded: s/by/but/ Succeeded: s/used to be chair/used to be staff contact/ Succeeded: s/substitutes/subsets/ Succeeded: s/Matt: basically/Michael: basically/ Succeeded: s/Matt: there/Michael: there/ Succeeded: s/jeff/Geoff/ Succeeded: s/jeff/Geoff/ Succeeded: s/jeff/Geoff/ Succeeded: s/Bolterman/Bulterman/ Succeeded: s/Bolter/Bulter/ Succeeded: s/@@@/James/ Succeeded: s/@@/Janina/ Succeeded: s/flahs/flash/ Succeeded: s/origion/origin/ Succeeded: s/silvia:/janina:/ Succeeded: s/bu /but / Succeeded: s/we/web/ Succeeded: s/0.5/0.5%/ Succeeded: s/Takagi/Media Access. platform/ Succeeded: s/choices/choices in audio description/ Succeeded: s/scence/scene/ Succeeded: s/DS: Should/Doug: Should/ Succeeded: s/DS: People/Doug: People/ FAILED: s/withing/within/ Found ScribeNick: chaals Found Scribe: Chaals Inferring ScribeNick: chaals Found ScribeNick: ChrisL2 Found ScribeNick: sylvia WARNING: No scribe lines found matching ScribeNick pattern: <sylvia> ... Found ScribeNick: silvia Found Scribe: Silvia Inferring ScribeNick: silvia Found Scribe: fsasaki Inferring ScribeNick: fsasaki Found Scribe: Chris WARNING: "Scribe: Chris" command found, but no lines found matching "<Chris> . . . " Continuing with ScribeNick: <fsasaki> Use "ScribeNick: dbooth" (for example) to specify the scribe's IRC nickname. Found Scribe: Chaals Inferring ScribeNick: chaals Scribes: Chaals, Silvia, fsasaki, Chris ScribeNicks: chaals, ChrisL2, sylvia, silvia, fsasaki Present: Judy_Brewer Dick_Bulterman Sally_Cain Eric_Carlson James_Craig Michael_Cooper Marisa_DeMeglio John_Foliot Geoff_Freed Ken_Harrenstien Philippe_le_Hégaret Ian_Hickson Chris_Lilley Charles_McCathieNevile Matt_May Frank_Olivier Silvia_Pfeiffer Janina_Sajka Felix_Sasaki David_Singer Joakim_Söderberg Hironobu_Takagi Victor_Tsaran Doug_Schepers Got date from IRC log name: 01 Nov 2009 Guessing minutes URL: http://www.w3.org/2009/11/01-media-minutes.html People with action items: daves everyone john johnf silvia[End of scribe.perl diagnostic output]