Metadata in Production Workflows

Presenter: Bruce Devlin (SMPTE)
Duration: 6 minutes

All talks

Slides & video

Keyboard shortcuts in the video player
  • Play/pause: space
  • Increase volume: up arrow
  • Decrease volume: down arrow
  • Seek forward: right arrow
  • Seek backward: left arrow
  • Captions on/off: C
  • Fullscreen on/off: F
  • Mute/unmute: M
  • Seek to 0%, 10%… 90%: 0-9

So, you want to watch video on the web. There are a bunch of standard APIs and protocols for getting moving images and sound into your browser. Sure, some of them are super high latency, and super high compression, while others are super low compression and super low latency. But by and large, you can pick an operating point and find enough tooling to get 90 to 100 percent of the job done for you.

Now, you decide to go the other way. You want to ingest some video. Again, there are a bunch of APIs and protocols that will allow you to, well, vary the compression ratios, and the latency, and the quality, and the reliability, and the speed, and the realtime-ness, and a bunch of other things based on your needs, to get an ingest working. It's not quite as slick, but it works.

Now, you want to add some production metadata to the ingest and get it into your cloud production system. I normally class this sort of metadata as exotic", because there are, well, loads of different types, and each type has typically got a tiny little usage, and there are very few really good infrastructures or frameworks for solving the general case. I've seen mixes of Arduino and Raspberry Pi, with RTP feeds over WiFi, with custom rate control, and varying file types, different ways of time... portrayal, and different synchronizations, different error corrections, and, meanwhile, the common problem remains the same.

I'd like to get this metadata from that thing, over a timeline, onto my video and audio, while I'm shooting, and I want to be able to figure out which metadata, from which device, was associated with which take, from which camera, during the shoot. Simple, huh?

Well, none of this is really new. We've been bundling these types of systems together since the early days of cinema and television. What's changed is that we now have the ability to pull vast quantities of data into a common cloudy store, and apply as much compute, and as much... stuff as you can afford in the Cloud. The wonderful folks at Arri, Nablet, and TrackMen helped put together some kit to explore the outline of a generic solution, and we documented that piece of work on the website mxf-live.io. You can see that we captured some data from the inside of the Arri camera, as well as an external data feed with the pan, the tilt, and the oar from the tripod head, encapsulated it into a standard file format, we stuck it across the WiFi, and then serialized it onto the editor's timeline, so that we could do post-production on the live feed, with the captured metadata. Great! Job done.

Not quite. We figured out that there are really four major sorts of metadata, split along two axes. The first axis is pretty simple. Is it binary, or is it text? And this is important, because that turns out to be a good predictor of how you're gonna process the metadata downstream. The second axis is where the data is isochronous, in other words, one sample for each clock tick, every clock tick, or is it kinda lumpy, with embedded timing?

Let's look an example of metadata for each of those quadrants. Isochronous binary is pretty common. That's the sort of thing that you'd find the position of the lens, sampled every chirp of its piezoelectric motor, or maybe, sampled every frame. Isochronous text, this might be all the Dolby Vision high dynamic range metadata property, stored as an XML document for every frame of the video. Blobs of binary data, this might be an event track, signaling which smoke machine was turned on, and at what time of day. Blobs of text data, well, this is pretty common. That's exactly what closed captioning and subtitles are, where each phrase of text is as labeled, with the timing information for that phrase.

Once you've got these four fundamental types, you can then look at transport. It helps to be able to store this timed data as a serializable stream of packets, and for that we chose MXF, because of the available infrastructure, and the ability to represent clocks as rational numbers, so the precise timing can be maintained without the risk of drift if the systems are left running for long periods of time, like weeks, and months, and years. Those packets can then be mapped and layered with different transports, like WebRTC, to get them from where they are, to where they need to be.

And now, it starts to get interesting. Whilst MXF is pretty handy for the hardware and firmware layers, there are open source projects, like OpenTimelineIO from Pixar, and the ASWS, that are much more friendly for interacting with that data in a generic, product-independent way. Having said that, we know that transcoding video is lossy, and transcoding metadata may actually destroy its usefulness. So, wherever possible, retaining the original, possibly bulky, metadata in its original form, is vital, and computing simplified proxies for that metadata becomes important for visualization. If you're going to do all of that in the general case, then managing identifiers of the metadata type, and how it's associated with other assets, could become a complexity nightmare, unless you design some sort of common framework for associating elements, based on simple identifiers, such as URIs.

And that's as far as we've got. There's interest from studios, and vendors, and a whole bunch of people, and I hope to put more time into it, personally, when I pass on my role as SMPTE Standards Vice President to someone else in January, and if it interests you, then I'd love to talk. I'm sure there is something here that the brain trust of W3C and SMPTE can genuinely create something that's useful for the world of media, as well as for other verticals. Thanks!

All talks

Workshop sponsor

Adobe

Interested in sponsoring the workshop?
Please check the sponsorship package.