Meeting minutes
Time alignment for media synchronization will be discussed in this session
Komatsu: Media over QUIC, no head of line blocking
… In video cases, each frame is fransferred over an independent QUIC stream
… That's the main difference compared to HLS or DASH
Media over QUIC is a relay protocol over QUIC or HTTP/3, often featured in the use of live streaming although the usecases are not limited to it.
Komatsu: CMAF is fragmented MP4. 50 seconds to 2 seconds latency, depending how you use it
… WIth MoQ, per-frame transfer is used, so that each QUIC stream contains under 34 milliseconds duration of data
… For realising low latency live stream services, this duration is important
… You can get very low latency services over MoQ
… HLS and DASH have flexibility, and similarly with MoQ, live and ondemand services
… Synchronising A/V data and arbitrary data is interesting
… Here's a demo to show low latency and sync
MoQT is flexible enough that developers could handle synchronization between multiple types of data such as audio and video
Komatsu: The sender sends the video and audio data and auxiliary data. Then it's transferred to a relay server
… We have this in the cloud
… We use moxygen, developed by Meta
… [Shows demo with sender and receiver]
… There's very low latency
… Glass to glass delay is under about 100ms
… Now I'll demo data synchronisation
… [Demo shows real time face detection]
… We can also send MIDI data
… With this data we can provide live services
MoQT Synchronization Demo: face landmarks/avatar data and video synchronization is performed
Komatsu: [Demo shows a virtual avatar overlaid in the video image]
… Just the data is transferred, and the avatar is rendered on the receiver side
… I think this is a fantastic feature of MoQ
… Now I'll explain about the synchronisation. The diagram shows the sender side
sending point cloud data only enable developers to render the 3d avatar on the subscriber side with its own preferences
Komatsu: Video image is transferred to WASM
… MIDI data will be transmitted to Web MIDI
Each AV and data are multiplexed into single MoQT session using multi tracks on the sender side
Komatsu: In the MoQ context, we can get capture timestamps
… MOQT is the MoQ Transport protocol
… We can send each data in a track: audio, video, data
… Send over MOQT to the relay server
… On the receiver side, the browser receives the MOQT
… Get raw image data from each frame from the capture timestamp. and the MIDI data, and syncrhonise rendering
… MIDI can be used with synthesizers etc
… In live entertainment cases, you can show a live show on the screen, and with MIDI data, the piano sound can be enjoyed by the viewers
On the receiver side AV and other data are synchronized according to captured timestamp
Komatsu: Rendering not only to screen, but orchestrated to external devices
… How do I synchronise the data?
… On the sender side, the browser uses rAF, which clocks at 16ms (60 fps screen update)
… Get the video image data, and data with the same timing. On the receiver side, get from each timeslot and render
Synchronizing face landmarks and AV is relatively easy as landmarks are sent at the same time as requestAnimationFrame function
Komatsu: With external midi devices the data is asynchronous. Inside the 16 ms slot, a MIDI event is fired in the browser
… Playing to external midi devices on the receiver side. The video clock isn't enough, because there's a time interval
… Web MIDI has a send() method, where you can indicate a time interval. MIDI works well on the receiver side
… Concern is the time lag on the input side
… In this model, once MIDI data is transferred to the browser, it goes to the event buffer, then an event is emitted
Concern about time lag of MIDIInput: is there a time lag between device driver and event emit? are melody and rhythm changes because of that?
Komatsu: With JavaScript we can the capture time at the time of event emission, not the time of input
… Example of 120 bpm music, each note could be 62.5 ms apart
… If the time lag is 3ms, it makes a 6% fluctuation
… Other use cases beyond entertainment. It can apply in other cases: remote gaming, with the time lag of the GamePad API
… Remote robot control over WebUSB. Is there a time interval argument to transferOut data, similar to WebMIDI?
There are other cases with the same kind of problems such as remote gaming, remote robot control and remote drawing
Komatsu: Similar problem in Smart City
Kaz: This would be useful for Smart City, as there are many components, devices, sensors to be connected with each other depending on the user need. So this can be an interesting mechanism
… Given the potential time lag between client and server, would it help to a real time OS on each side to manage the time synchronisation?
question from kaz: would it be ideal for both publisher/subscriber to have some kinds of realtime operating systems?
Komatsu: A realtime OS could be onsidered
… Jitter in the network itself, whether it works or not is a question
… My idea is that event objects have a capture timestamp property
… That would be enough for the internet cases I've seen
… What accuracy is required depends on the use case
answer from komatsu: putting captured timestamp in objects might be enough for most usecases
Komatsu: Don't want to talk about details of API changes, but instead talk about whether this is a question or not
… Is timeline alignment really a problem? What use cases should be considered?
… Worth discussing?
… Any other related topics to cover?
Song: Excellent presentation. I raised a similar topic in teh Web & Networks IG about cloud gaming
… With cloud gaming, China Mobile launched last month, have millions of subscribers
… We transmit the data with WebRTC, the time duration is 15 ms in the network across China
… Rendering is 15-20ms, still acceptable
… The biggest part for end to end commercial use is translation of mouse and keyboard events for games, this can cost 90 ms
… For every business case, e.g., digital twins, could be very different
… With the data example I mentioned, we get complaints from game and web developers
song: in the new cloud gaming service from China Mobile, data are sent via WebRTC tkaing about 15ms, rendering takes 15-20ms and user events in games takes about 90ms
Song: The infrastructure is based on IP network, which is best effort
… The request we get from game companies is a deterministic network
… The headache for us is breking the network conditions for milliions of users
… In the Web & Networks IG, we have 2 solutions. One is MoQ. That's in 3GPP release 20, called 6G
… That can change the synchronisation systematically, the router, switch, radio access, coordinate the MIDI with the time clock in the device. Long term solution
… Second is use cloud edge client coordination. If I can't change the best effort IP network, this is why WNIG incubates the cloud edge
… What do you think?
Komatsu: Delay would fluctuate, does that cause confusion for gaming services?
Song: Can follow up with you
Bernard: There are several classes of use case: syncing media+events from a single particpants. Will discuss in Media WG tomorrow
… Trickier is syncing from multiple clients. We found we need additional extensions, both to the network and the web
… Web RTC WG is working on absolute capture timestamp, sync to the server's wall-clock
Bernard: synchronizing between multiple participants would be way more difficult
… We're investigating the timing information necessary, then in the media, everything to be synchronised will need the capture timestamp
Komatsu: I agree. NTP. Depends on the use case. The current WebRTC activity should be considered
Bernard: We want to make it general, not only WebRTC but MOQ and across the board
Harald: A lot of these problems are familiar
… To sync, you need to know the delay between the real world happening
… Differs by device, which is awkward
… Jitter in the network, difference in delays. That has to be handled with jitter buffers, which introduces delay
… A too short jitter buffer means losing stuff
… That's the trade-off
… When we did WebRTC for stadia, we had a concept called ??
… You'd sync events that happened before, so the efects are visible. You wish for time travel!
… Timestamping everything at the outset is a good starting point
Komatsu: MoQ we can manipulate the jitter buffer in the browser
jya: We tried doing media sync with Media Session. Not to this level of synchronicity
Paul: Look at the Proceedings of the Web Audio conference over the years
… We're able to do sub-millisecond sync of real time audio
… In general, the question is tradeoff between latency and resilience
… Need to consider clock domain crossing. Clocks on the sender side, different no the rx side. Need a source of truth, and re-clock and resample so there are no gaps
… This means the relationship between the audio and midi events are preserved, then you offset that by the latencies (audio output, video, etc), reclock everything
… Important to preserve the relationship between the two
… Typically between two sound cards, can be 0.08% difference. If you watch a movie for 1 hour, it's skewed and broken, needs to be taken care of
… Installation at WAC showed real time audio streams playing across different audio devices nicely. There is hope, but it's a clock thing. Delay vs resilience is the question
Jer: To add to jya's point, we were going for about a second
Michael: I'm in Audio WG, co-editor of Web MIDI. We have an issue about syncing MIDI with Web Audio on the same host
… IMO jitter in MIDI is more important than latency. Now is a good time to add things to the spec, if those are easy
Kaz: Given those potential and promising technologies, I wonder about what kind of mechanism would be preferred to handle sync of multiple threads? Interesting to think about the contorl mechanism
Paul: Web Audio gives you the system and audio clocks, so you can calculate the slope. rAF(), two timestamps, understand the slope and drift. With this, it's possible
… Real time OS might be overkill. We get good results with commercial OS's
… If we're looking at variation about 1ms, a regular computer with proper scheduling classes will go far
Komatsu: To wrap up, I want to talk about next steps
… Community group, or existing CG or IG?
Chris: You're welcome to bring this to MEIG if you want to discuss more about use cases and requirements
Harald: Attend the Media WG / Web RTC meeting where we'll discuss sync
Komatsu: Thank you all!
[adjourned]