Video Metadata For Moving Objects & Sensors On The Web (WebVMT) -- 27 Oct 2020

<scribe> scribenick: cpn

<RobSmith> Slides: https://w3c.github.io/sdw/proposals/geotagging/webvmt/meetings/2020-10-26_TPAC_Breakout.pdf

Rob: I'll give a presentation, then discuss video metadata for moving objects. OGC-16 testbed
... I hope you're familiar with the ongoing GitHub discussion
... I'm an invited expert in the Spatial Data On the Web group, leading on WebVMT, an open format for synchronizing location metadata with video on the web
... Overview of GitHub process. Three discussion topics: aligning arbitrary data with web media; interpolation to calculate intermediate values; attributes for moving objects and sensors
... WebVMT is designed to allow sync of the metadata parser with the media playhead.
... Video metadata is anything in the media file that isn't audio or video
... The simplest example of metadata is the file size
... In this case, we're interested in location data associated with the video
... Parsing to the current playback time includes sufficient information for display, including interpolation
... If you're moving from point A to B, you haven't arrived yet, so there's a limit to what you can do. Not so for playback
... It uses a modular cue format, so one cue doesn't affect others. It supports mixed interpolation schemes, e.g., stepped.
... Sections with no data, need to distinguish where there's missing data or where data is static
... Base commands, written at capture, are standalone. Modified by optional subcommands for interpolation and animation
... Why talk about moving objects and sensors, what's special?
... It's persistent metadata collected over a time interval
... Contrast with subtitles, which are individually are completely separate. One does not affect another
... With location data, previous values are important and associated
... Samples are taken at discrete points, can be interpolated
... Timed metadata occurs at points in time, global metadata applies to whole of the file or stream
... A use case for this is in media broadcast, a live sports event, where metadata could be added to describe which player is in posession of the ball
... For moving objects, values are derived from changing location over time. This is timed metadata. Most obvious is the location itself, sampled and interpolated to get any position in time
... Distance and heading, speed are lightweight calculations, minimal overhead
... Also sensor data, a sequence of timed observations, not timed locations. The metadata tends to be global - type: string, enumaration, number, etc
... Recording the units is important. Could be stored as a string. Adding notes for annotation
... Range for numeric values, e..g, latitude is from -90 to +90 degrees, is useful for numeric types
... Small storage requirement, makes them good candidates
... Comparing moving objects and sensors. Global metadata for a moving object, description is a good field. Timed metadata for sensors, interpolated value is equivalent to location
... Aim for this session is to create a list of attributes suitable for a Web API that's lightweight and compact
... WebVMT is a format for geo-tagged video. My goal is to get this to the Rec track to get a standard. Build an online community
... I want to gather expert feedback in this session
... Spans separate communities: geo-spatial, web, and broadcast media
... Covers a wide range of devices: drones, dashcams, body-worn cameras. These are in separate silos, want to bring together
... Attributes: distance, speed, heading. Type, units, description for sensors
... I want to align with existing standards. Moving Features is a geo-spatial standards, WebVMT is aligned with that
... OGC Sensor Things is about data sync and interpolation; align these new features with that
... WebVMT shouldn't replace existing standards, make them accessible online. Mapping from existing standards, GPMF, MISB

<RobSmith> MISB link: https://www.gwg.nga.mil/misb/

<RobSmith> GPMF link: https://gopro.github.io/gpmf-parser/

Rob: WebVMT can encapsulate arbitrary data in JSON form
... Distinguish moving object and sensor data. Are these attributes valuable? What are the common use cases?

Scott: Is this limited to outdoor locations, or can it be used indoor?

Rob: There's no limitation

Scott: We've done some indoor proejcts at OGC at the past. OGC candidate standard. There might be a spawning of new use cases there
... I'm a project manager at OGC. We're running a project to develop demonstrators that Rob is contributing to. We have a parallel interest here, looking for areas of mutual interest with W3C

Rob: I've been involved in testbed 16, the aim there is to extract MISB metadata, location data from video, data inband the file then presented out of band to make it more accessible

Christine: I'm an IE at W3C, co-chair of PING
... I'd like to hear about the privacy and security model of the API
... Ed Parsons session on Wednesday is about responsible use of geospatial information, may be of interest

<RobSmit> https://www.w3.org/2020/10/TPAC/breakout-schedule.html#ResGeo

Rob: What would you like to see in terms of privacy?

Christine: There's work done in W3C about privacy design for sensor APIs. Geospatial is an interesting case, I see potential misuse of the API or unintented leaking that may cause a privacy risk
... I'm here to learn about what you're doing

Rob: There's a search use cases I'm investigating, to be able to search by location, or searching a video archive by location
... A potential problem there is scraping of the machine readable location data
... By separating the metadata to make it more efficient, you can apply permissions, so you could protect the audio or video with separate security permissions.
... Also monitor access to the audio or video content. It would be obvious if there were accesses to the video conten. I'm concerned with unique personal signatures, e.g., images of people that identify where they are
... We'll discuss in the Wednesday session from different perspectives

Chris: Where do these APIs fit into an architecture?

Rob: For moving objects and sensors, the API would be in the browser, as a way to access the location metadata - distance, speed - via JavaScript
... The search use cases is through a search engine, so a web crawler can index the video archive, so results are returned from a web page
... I suppose it's part of the browser to present the location data in a common format, so web apps can be built on top of it
... It makes the data accessible in the browser

Chris: Does the browser itself need to understand the location metadata, or is it handles in a JS web app?

Rob: I want to be able to return data with a type.
... We're prototyping an extension to HTML to deliver timed video metadata through HTML in WICG, coming up with use cases
... Subtitles are well supported, but metadata isn't. That's the gap we're working on
... We've proposed a DataCue with arbitrary data and a type field, which would be a label to say what the format is
... For example, you could have org.webvmt.example as data type
... So that defines the format of the arbitrary data, and it's up to the recipient to parse the data, based on the knowledge of the particular format

Anil: I'm new to W3C. I saw some examples on GitHub. Are you considering weather data in these descriptions?

Rob: Yes. WebVMT can encapsulate any data type, anything that can be captured in JSON form. We'd link that to the DataCue structure with an appropriate weather-specific type field
... There's an implicit schema that defines the data structure.

Anil: I'd love to see a weather example

Rob: Please do post an example or a description, we can add as a use case

Mahmoud: What exactly is being synchronized? A video could have many objects that are all moving

Rob: We support multiple objects, so you can have as many as you like. The camera itself is moving, and anything in the field of view can be described
... For example, dashcams in a car race on a track. Aggregate all location data together to see relative positions of the cars, also from trackside
... For a rally race, where it's sequential, you could superimpose runs from different cars

Mahmoud: Is there a way to relate paths to objects in the video?

Rob: You'd represent the location of the object on a map. All would be recorded, so the display would be responsible or choosing which paths to show, using identifiers. All the information is recorded, it's up to the implementation how to display it

David: How about frame-level synchronization, modifying a video based on telemetry. Is that in scope? Roll and shutter, etc, so you can manipulate the media after capture

Rob: Yes. WebVMT is a format, the implementation is a separate issue. There's no limitation on the timing accuracy. For real time display, it may not be accurate enough in a browser

David: You'll have the camera having more critical timing than other objects, for example a heart rate monitor. You can't do some time critical things depending on where the data came from. Label as time critical or not.

Rob: We allow instantaneous events, time critical cues, or at some time in the interval. Let's take this use case offline to discuss.
... Returning to the session goals. This is a good list of attributes and sensor data. For moving object: location, can derive distance, speed, heading. Description of the moving object.
... For sensor data, timed metadata is the value, can be interpolated. Global metadata includes type, units, range, description.
... Is this reasonable? Any objections?

Mahmoud: One suggestion, cumulative distance - the distance travelled from the beginning to now. Could be useful in the racing use case

Rob: I think that's already covered
... I'll summarise this and update GitHub with conclusions.
... Suggest we follow up from that. Please follow the GitHub issue
... Thank you!

- DRAFT -

Video Metadata For Moving Objects & Sensors On The Web (WebVMT)

27 Oct 2020

Attendees

Contents

Summary of Action Items

Summary of Resolutions