Web&TV IG -- 23 Jul 2014

<yosuke> https://www.w3.org/2011/webtv/wiki/New_Ideas

yosuke: Let's look through the use cases.
... The reviewer of the first use case is me.
... This is a simple use case.

https://www.w3.org/2011/webtv/wiki/New_Ideas

UC2-1

<kaz> https://www.w3.org/2011/webtv/wiki/New_Ideas#UC2-1_Audio_Fingerprinting

yosuke: I think there are three entities web developers need to specify.
... The first is audio source - mic, etc.
... The second is finger-print generation algorithm.
... The third is the finger-print database, e.g. on the web.
... These three things are enough to declare a fingerprinting service.
... In addition, if we have timeout or duration we can have better control.
... So, this interface should be an asynchronous interface, e.g. JavaScript promises.
... Because it will take time to resolve the fingerprint from an online service.

PaulHiggs: What do you mean generation?

yosuke: You need to generate a finger print from the audio source.

PaulHiggs: What you're trying to recognise is in the audio source.
... and then process it. I don't think you're creating anything, rather returning an identifier for the audio source.

yosuke: In many cases, fingerprinting services use only one algorithm for their services.
... In that case, we don't need to specify which algorithm we need.

PaulHiggs: Are we confusing watermarking and fingerprinting?
... Watermarking would take an extra identifier encoded as an inaudible tone.

CyrilRa: On the backend you have to have a hash.
... Would you have the hashing done on the server-side or send a sample?

PaulHiggs: I thought fingerprinting was sending an audio sample.

CyrilRa: So then you'd send it to a recognition service.

PaulHiggs: You could do a local hash but that's not generation, that's hashing.
... If the second item said hashing that would be fine.

yosuke: So the front end gets the audio, the back-end service generates a fingerprint.
... In other services, the front-end gets a hash and sends that to the backend.
... I'll do some research about existing fingerprinting services.
... If it's just in audio clips then we don't need to clarify the generation in the use case.

Bin_Hu: It seems we have two functions - one is a database service and one is audio clip matching.
... The algorithm to match the audio is in the implementation so I think it's a good starting point to do more research about what existing services provide.

kaz: I was wondering if we should think of a model like EME for this.
... EME has a model for the idea of the mechanism.
... Maybe we could use that as a starting point for the fingerprinting discussion.

<kaz> EME

yosuke: You mean we should create a diagram to understand the architecture?

kaz: Right

yosuke: OK, I'll create a diagram based on my understanding.
... I'll create that and Daniel can check it.

kaz: That's great.

UC2-2

<kaz> https://www.w3.org/2011/webtv/wiki/New_Ideas#UC2-2_Audio_Watermarking

yosuke: Next use case is audio watermarking
... I think watermarking is much simpler than fingerprinting because we don't need a backend service to generate the watermark.

ddavis: Do you still need a backend?

PaulHiggs: No, the data is within the audio stream, as long as you know the algorithm.
... If someone wanted to, they could encode a link to another service.

CyrilRa: Fingerprinting helps you identify what audio was played, watermarking helps you take action.
... You have to have audio triggers that can be recognised by your client with watermarking.

PaulHiggs: You can think of it like old Teletext scanlines that used to be in the signal.

Bin_Hu: For watermarking, the service provider has to encode something within the stream.
... Are standards such as the MPEG standard planning to create a standard for embedding this?

CyrilRa: There's no standard I can think of.

Bin_Hu: From a W3C perspective are we planning to support such a format or accept that it's out of scope.
... Lots of new codecs are coming out, e.g. within MPEG, so are we going to look into the method of multiplexing?
... Or will it be left to the implementation so we won't be directly involved.

CyrilRa: That's one of the main issues with watermarking - you need to know what you're looking for first.
... There's some work being done on the fingerprinting side where you have backend side, and the frontend (player) is capturing samples constantly.
... You then match between these two at exactly the right time.
... That's a way of overcoming the burden of having something inaudible embedded, and also of having to know what to look for.

UC2-3

https://www.w3.org/2011/webtv/wiki/New_Ideas#UC2-3_Identical_Media_Stream_Synchronization

kaz: Originally this had the HTML task force and Web sockets.
... I also added SMIL by the Timed Text WG.
... Also SCXML is a new version of SMIL and these can be used for synchronization.

yosuke: As a next step, we need to clarify what the requirements are.

kaz: My understanding is delivery a single stream to multiple destinations at the same time.

yosuke: In many cases, the bandwidth or transport system is different so they have different buffers or time lag.
... The exact timing could be different, so we need to think about how to adjust the synchronisation between different devices.

kaz: You mean how to keep the multiple streams (with identical content) synchronised.

yosuke: Yes. Maybe a player on a "better" device would have to wait to achieve synchronization with other slower devices.

kaz: So maybe we should add that point.
... What if the system is using DASH?
... It's even more complicated but we should think about that as well.

yosuke: DASH can help with this use case
... If DASH is used, the client will have better presentation timing
... Probably we need a more generic API to synchronize.
... The use case is simple but the technology could be complicated.
... For example, I have a video element and my girlfriend has the same content separately. We'd like to match the timing to achieve synchronisation.

kaz: Maybe we could use WebRTC.

PaulHiggs: I don't know if we need WebRTC. This is not sharing streams.
... I'm watching something and a friend is watching the same thing from the same source, not re-streaming it.

kaz: So probably without WebRTC.

CyrilRa: What you'd need is a sync service.

kaz: yes, what we need is a very generic timeline mechanism.

yosuke: Could you make a note please on the wiki?

kaz: Will do.

yosuke: Next use case.

UC2-4

https://www.w3.org/2011/webtv/wiki/New_Ideas#UC2-4_Related_Media_Stream_Synchronization

kaz: This is similar

yosuke: We can talk about this next time.

UC2-5

<kaz> https://www.w3.org/2011/webtv/wiki/New_Ideas#UC2-5_Triggered_Interactive_Overlay

ddavis: What key events were you thinking of?

Bin_Hu: E.g. during the world cup, if there's a goal that would trigger an event.

yosuke: The basic way to deliver such metadata is using a text track.
... What you're talking about is additional information. If we implement that we could use HTML5 text track.
... Is that correct?

Bin_Hu: Text track may be a fundamental way but in the live event it's not predictable - it may not be possible to add that in a text track.
... Maybe additional information could be pulled in out-of-band.
... Text track could be possible if not a live event.
... Or advertising is another situation.

kaz: Maybe the event can be sent to another channel. The destination channel is what the viewer is looking at.
... E.g. if we're watching Harry Potter the info could be in the text track for some event.
... There is a service in Japan like YouTube called NicoNico
... You can add lots of annotations to a video using timings.
... Those kind of annotations could be a trigger to these events.

Bin_Hu: Exactly.
... This would encoded in-band.
... The platform implementation would be able to decode this and dispatch the events.

kaz: So the point of the use case is to send such events and show an overlay.

Bin_Hu: Events like start overlay, dismiss overlay should be supported.

yosuke: Next use case: Clean Audio

UC2-6

https://www.w3.org/2011/webtv/wiki/New_Ideas#UC2-6_Clean_Audio

yosuke: I added a section called Initial Gap Analysis
... If clean audio tracks are provided through an HTML5 audio element, you can select them through existing interfaces.
... If they're provided in-band, you can use the in-progress in-band resource tracks specification.
... There's another feature that a therapist can adjust the acoustic features of audio tracks to assist a disabled user.
... You can achieve this using the Web Audio API.
... There are examples of audio equalizers already.
... So only one point remains - if you use encrypted media extensions for your media tracks it's extremely unlikely the audio could be modified.
... So I think we should ask the accessibility task force about this point. EME can decrease media accessibility.
... I thought I should check dependencies with existing web standards and we can basically achieve this use case with existing standards.
... From a practical viewpoint, clean audio is helpful for disabled people.
... An API to achieve this use case is not so helpful.
... Promoting the use case itself or encouraging media service providers is a key point to improve accessibility.

ddavis: So it's more about awareness

yosuke: The EME part is important, but apart from that we can achieve this use case with existing standards.
... We could make a note about how to do this which can help service providers.
... We can also ask the EME guys and accessibility task force about the potential drawback of using EME.

ddavis: Sounds like a good idea.

<kaz> Media Accessibility User Requirements

kaz: The current draft of the Media Accessibility User Requirements doesn't include encryption.
... We can talk about it with the media accessibility task force and HTML media task force.

yosuke: Kaz or Daniel, could you make this feedback?

kaz: Yes, next Monday is the next media accessibility call.

yosuke: I'll create a note about how to implement clean audio with existing web standards.
... After that I'd like to ask the accessibility task force to review it.

ddavis: I'm sure they'd be happy to do that.

Joint meeting with the Accessibility TF

yosuke: Any further questions or comments?

kaz: During the previous call I had a task to speak with the media accessibility task force about meeting during TPAC in October.
... They're also interested in a joint session.

yosuke: We could have a joint session during the TV IG meeting or we can join their meeting. Do you have any ideas?

kaz: My suggestion is to join their meeting.
... We already have the TV Control API CG joining our TV IG meeting.

yosuke: What's the next step?

kaz: If it's OK, let's ask them to join their meeting. I can suggest this at our next joint call.

yosuke: If we have an accessibility session, it would not be a long session.
... They can deliver it to more people if they come to our meeting.
... We could give them a 10-20 minute session and people could learn from them.
... Then, if IG people are interested in accessibility, they could join their meeting.

kaz: ... We could have our meeting with them joining on Monday, and then we join them on Tuesday.

<kaz> TPAC schedule

yosuke: Any other business?
... Thank you - meeting is adjourned.

<yosuke> Thank you very much for scribing the meeting, Daniel.

You're welcome.

Thanks Kaz

- DRAFT -

Web&TV IG

23 Jul 2014

Attendees

Contents