<scribe> scribenick: cpn
<RobSmith> Slides: https://w3c.github.io/sdw/proposals/geotagging/webvmt/meetings/2020-10-26_TPAC_Breakout.pdf
Rob: I'll give a presentation, then
discuss video metadata for moving objects. OGC-16 testbed
... I hope you're familiar with the ongoing GitHub discussion
... I'm an invited expert in the Spatial Data On the Web group,
leading on WebVMT, an open format for synchronizing location
metadata with video on the web
... Overview of GitHub process. Three discussion topics: aligning
arbitrary data with web media; interpolation to calculate
intermediate values; attributes for moving objects and
sensors
... WebVMT is designed to allow sync of the metadata parser with
the media playhead.
... Video metadata is anything in the media file that isn't audio
or video
... The simplest example of metadata is the file size
... In this case, we're interested in location data associated with
the video
... Parsing to the current playback time includes sufficient
information for display, including interpolation
... If you're moving from point A to B, you haven't arrived yet, so
there's a limit to what you can do. Not so for playback
... It uses a modular cue format, so one cue doesn't affect others.
It supports mixed interpolation schemes, e.g., stepped.
... Sections with no data, need to distinguish where there's
missing data or where data is static
... Base commands, written at capture, are standalone. Modified by
optional subcommands for interpolation and animation
... Why talk about moving objects and sensors, what's
special?
... It's persistent metadata collected over a time interval
... Contrast with subtitles, which are individually are completely
separate. One does not affect another
... With location data, previous values are important and
associated
... Samples are taken at discrete points, can be interpolated
... Timed metadata occurs at points in time, global metadata
applies to whole of the file or stream
... A use case for this is in media broadcast, a live sports event,
where metadata could be added to describe which player is in
posession of the ball
... For moving objects, values are derived from changing location
over time. This is timed metadata. Most obvious is the location
itself, sampled and interpolated to get any position in time
... Distance and heading, speed are lightweight calculations,
minimal overhead
... Also sensor data, a sequence of timed observations, not timed
locations. The metadata tends to be global - type: string,
enumaration, number, etc
... Recording the units is important. Could be stored as a string.
Adding notes for annotation
... Range for numeric values, e..g, latitude is from -90 to +90
degrees, is useful for numeric types
... Small storage requirement, makes them good candidates
... Comparing moving objects and sensors. Global metadata for a
moving object, description is a good field. Timed metadata for
sensors, interpolated value is equivalent to location
... Aim for this session is to create a list of attributes suitable
for a Web API that's lightweight and compact
... WebVMT is a format for geo-tagged video. My goal is to get this
to the Rec track to get a standard. Build an online community
... I want to gather expert feedback in this session
... Spans separate communities: geo-spatial, web, and broadcast
media
... Covers a wide range of devices: drones, dashcams, body-worn
cameras. These are in separate silos, want to bring together
... Attributes: distance, speed, heading. Type, units, description
for sensors
... I want to align with existing standards. Moving Features is a
geo-spatial standards, WebVMT is aligned with that
... OGC Sensor Things is about data sync and interpolation; align
these new features with that
... WebVMT shouldn't replace existing standards, make them
accessible online. Mapping from existing standards, GPMF, MISB
<RobSmith> MISB link: https://www.gwg.nga.mil/misb/
<RobSmith> GPMF link: https://gopro.github.io/gpmf-parser/
Rob: WebVMT can encapsulate arbitrary
data in JSON form
... Distinguish moving object and sensor data. Are these attributes
valuable? What are the common use cases?
Scott: Is this limited to outdoor locations, or can it be used indoor?
Rob: There's no limitation
Scott: We've done some indoor
proejcts at OGC at the past. OGC candidate standard. There might be
a spawning of new use cases there
... I'm a project manager at OGC. We're running a project to
develop demonstrators that Rob is contributing to. We have a
parallel interest here, looking for areas of mutual interest with
W3C
Rob: I've been involved in testbed 16, the aim there is to extract MISB metadata, location data from video, data inband the file then presented out of band to make it more accessible
Christine: I'm an IE at W3C, co-chair
of PING
... I'd like to hear about the privacy and security model of the
API
... Ed Parsons session on Wednesday is about responsible use of
geospatial information, may be of interest
<RobSmit> https://www.w3.org/2020/10/TPAC/breakout-schedule.html#ResGeo
Rob: What would you like to see in terms of privacy?
Christine: There's work done in W3C
about privacy design for sensor APIs. Geospatial is an interesting
case, I see potential misuse of the API or unintented leaking that
may cause a privacy risk
... I'm here to learn about what you're doing
Rob: There's a search use cases I'm
investigating, to be able to search by location, or searching a
video archive by location
... A potential problem there is scraping of the machine readable
location data
... By separating the metadata to make it more efficient, you can
apply permissions, so you could protect the audio or video with
separate security permissions.
... Also monitor access to the audio or video content. It would be
obvious if there were accesses to the video conten. I'm concerned
with unique personal signatures, e.g., images of people that
identify where they are
... We'll discuss in the Wednesday session from different
perspectives
Chris: Where do these APIs fit into an architecture?
Rob: For moving objects and sensors,
the API would be in the browser, as a way to access the location
metadata - distance, speed - via JavaScript
... The search use cases is through a search engine, so a web
crawler can index the video archive, so results are returned from a
web page
... I suppose it's part of the browser to present the location data
in a common format, so web apps can be built on top of it
... It makes the data accessible in the browser
Chris: Does the browser itself need to understand the location metadata, or is it handles in a JS web app?
Rob: I want to be able to return data
with a type.
... We're prototyping an extension to HTML to deliver timed video
metadata through HTML in WICG, coming up with use cases
... Subtitles are well supported, but metadata isn't. That's the
gap we're working on
... We've proposed a DataCue with arbitrary data and a type field,
which would be a label to say what the format is
... For example, you could have org.webvmt.example as data
type
... So that defines the format of the arbitrary data, and it's up
to the recipient to parse the data, based on the knowledge of the
particular format
Anil: I'm new to W3C. I saw some examples on GitHub. Are you considering weather data in these descriptions?
Rob: Yes. WebVMT can encapsulate any
data type, anything that can be captured in JSON form. We'd link
that to the DataCue structure with an appropriate weather-specific
type field
... There's an implicit schema that defines the data structure.
Anil: I'd love to see a weather example
Rob: Please do post an example or a description, we can add as a use case
Mahmoud: What exactly is being synchronized? A video could have many objects that are all moving
Rob: We support multiple objects, so
you can have as many as you like. The camera itself is moving, and
anything in the field of view can be described
... For example, dashcams in a car race on a track. Aggregate all
location data together to see relative positions of the cars, also
from trackside
... For a rally race, where it's sequential, you could superimpose
runs from different cars
Mahmoud: Is there a way to relate paths to objects in the video?
Rob: You'd represent the location of the object on a map. All would be recorded, so the display would be responsible or choosing which paths to show, using identifiers. All the information is recorded, it's up to the implementation how to display it
David: How about frame-level synchronization, modifying a video based on telemetry. Is that in scope? Roll and shutter, etc, so you can manipulate the media after capture
Rob: Yes. WebVMT is a format, the implementation is a separate issue. There's no limitation on the timing accuracy. For real time display, it may not be accurate enough in a browser
David: You'll have the camera having more critical timing than other objects, for example a heart rate monitor. You can't do some time critical things depending on where the data came from. Label as time critical or not.
Rob: We allow instantaneous events,
time critical cues, or at some time in the interval. Let's take
this use case offline to discuss.
... Returning to the session goals. This is a good list of
attributes and sensor data. For moving object: location, can derive
distance, speed, heading. Description of the moving object.
... For sensor data, timed metadata is the value, can be
interpolated. Global metadata includes type, units, range,
description.
... Is this reasonable? Any objections?
Mahmoud: One suggestion, cumulative distance - the distance travelled from the beginning to now. Could be useful in the racing use case
Rob: I think that's already
covered
... I'll summarise this and update GitHub with conclusions.
... Suggest we follow up from that. Please follow the GitHub
issue
... Thank you!