Media Fragments Working Group Teleconference -- 09 Dec 2008

<trackbot> Date: 09 December 2008

<raphael> scribenick: raphael

1. Admin

I would like we talk about the composition of the group in order to know if more people/companies are about to join

YouTube/Google Video: Ken Harrenstien is more interested in Media Annotations

scribe: it would be interested to have someone that has implemented fragments access

<scribe> ACTION: Yves to find out with Philippe who from Google would be interested to join [recorded in http://www.w3.org/2008/12/09-mediafrag-minutes.html#action01]

<trackbot> Created ACTION-17 - Find out with Philippe who from Google would be interested to join [on Yves Lafon - due 2008-12-16].

<scribe> ACTION: Raphael to see with Marie Claire who from Daily Motion can join [recorded in http://www.w3.org/2008/12/09-mediafrag-minutes.html#action02]

<trackbot> Sorry, couldn't find user - Raphael

<Yves> ACTION-2 on Troncy

<Yves> ACTION-2 due December 16 2008

<trackbot> ACTION-2 Set up a questionary for seond MediaFrag F2F in Gent (8. and 9. Dec) due date now December 16 2008

Frank (Canon): it would be difficult for Canon to join in 2009

Adobe: Larry Masinter answered, he has not yet someone to nominate in the group, but Adobe supports strongly this group

<scribe> ACTION: Troncy to check with Karen about Blinx joining or not W3C and Colm the WG [recorded in http://www.w3.org/2008/12/09-mediafrag-minutes.html#action03]

<trackbot> Created ACTION-19 - Check with Karen about Blinx joining or not W3C and Colm the WG [on Raphaël Troncy - due 2008-12-16].

<scribe> ACTION: Michael to check with Wolfgang whether he is still interested in this WG [recorded in http://www.w3.org/2008/12/09-mediafrag-minutes.html#action04]

<trackbot> Created ACTION-20 - Check with Wolfgang whether he is still interested in this WG [on Michael Hausenblas - due 2008-12-16].

<nessy> I'm idling

<nessy> will have go for 2 hours, but back then

<scribe> ACTION: Erik to check with Philippe the status of Cisco (Paul Bosso), Apple (Dave Singer or Eric Carlson) [recorded in http://www.w3.org/2008/12/09-mediafrag-minutes.html#action05]

<trackbot> Created ACTION-21 - Check with Philippe the status of Cisco (Paul Bosso), Apple (Dave Singer or Eric Carlson) [on Erik Mannens - due 2008-12-16].

<scribe> ACTION: Raphael to check with Karen the status of Fox Interactive, if they could have an interest in the group [recorded in http://www.w3.org/2008/12/09-mediafrag-minutes.html#action06]

<trackbot> Sorry, couldn't find user - Raphael

<Yves> ACTION: Rapha�l to check with Karen the status of Fox Interactive, if they could have an interest in the group [recorded in http://www.w3.org/2008/12/09-mediafrag-minutes.html#action07]

<trackbot> Sorry, couldn't find user - Rapha�l

Erik: should we have a stronger liaison with HTML5?

Yves: we have work to do, it is good to keep contact, but we could ask more feedback when we have better documents
... same for browser vendors

2. Discussion Existing Technologies

On the wiki: http://www.w3.org/2008/WebVideo/Fragments/wiki/Existing_Technologies_Survey

Tom (IBBT) going through the wiki

Presentation also available at: http://www.w3.org/2008/WebVideo/Fragments/meetings/2008-12-09-f2f_ghent/IBBT-State_of_the_art.pptx

Tom: first explain what SMIL can do (Jack will be here later today and tomorrow)
... MPEG-7 (see slide 3)

Raphael: should we discuss the format for representing the time point?

Yves: you can adopt the ISO Dates one, the XML Schema one
... the MPEG-7 one is based on XML Schema, minus the Time Zone, but adding the frame number

<Yves> ttp://www.w3.org/2002/ws/databinding/edcopy/report/all.html

Better: http://www.w3.org/2002/ws/databinding/edcopy/report/all.html

Yves: we should say we consider only time that is local to the media
... so de don't care about time zones for example

Tom: SVG has no temporal fragment
... TimedText: it shows text at a given time
... seems to have another format for representing time point
... CMML derives from Annodex, it requires an off file that has been annotated

Yves: Silvia points the problem of accessing a given frame, if it is not an I-Fram

<Yves> (when referencing only time)

Raphael: we can decide to always go to the previous I-Frame, that precedes a time point

Tom: CMML specifies time with npt, smpte and clock

Raphael: should we do the same?

Tom: default seems to be npt

Raphael: we come back to these questions when Sivlia is on the phone

Tom: CMML/Annodex/TemporalURI has no spatial Fragment
... can select Tracks (such as in a CD)
... has the notion of naming a fragment and refer to this name

Yves: is there an error in the named fragment example? Should the '/' be escaped for selecting the tracks 'a' and 'b' ?

Tom: MPEG-21 has 4 different schemes (ffp, offset, mp, mask)
... offset works in bytes range
... mp scheme has the time dimension (npt, smpte, utc, mpeg-7) and the spatial dimension (polygon, rectangle, elipse)
... mask is similar to kind of naming a fragment
... HTML5 (see slide 5)
... no support for fragmentation or time reference (like in SVG)
... has Time Daatatypes: Date, Time, Date and Time, Time Zones (UTC: add a Z at the end; others: add time difference to UTC with + or -)
... values come from XML Schema (perhaps with one small difference, since the seconds can be omitted)

Raphael: go through the spatial fragments specifications (image maps, MPEG-7, SVG)

Erik: how this technological survey be used ?

Raphael: we will provide either informally or in the spec a mapping between the URI schem and these various XML syntaxes

Yves: Since we want a URI scheme, we will not support everything we have seen, but the maximal possible subset

<nessy> back now

<rtroncy> Silvia, we have a number of questions for you :-)

Question 1: We reviewed http://www.w3.org/2008/WebVideo/Fragments/wiki/Existing_Technologies_Survey#CMML

scribe: we wonder if there is not a mistake in the URI example, and if the '/' should not be escaped

Silvia: first we use a '-' and then move to '/'

<Yves> in the examples on CMML, the / after the ? should be escaped

<Yves> => %2f

Silvia: I think it is ok to have the '/' in the fragment ('#') but not for the query ('?')

Silvia will check whether there is a syntax error or not

Question 2: CMML covered 3 schemes for representing time point: npt, smpte and clock

Raphael: should we do the same?

Silvia: we wanted to be interoperable with all formats
... in practice, people tend to use the 'npt' scheme
... maybe it is better to talk with video professionals
... they need to access the frame level
... I think that for most use cases, the npt scheme is accurate enough
... npt is the default scheme in CMML
... don't confuse: ntp, the network time protocol (unix) and npt, what we are discussing
... npt = normal playback time

Raphael: Question 3: Frame access, should we always go to the last I-Frame that precedes the time point we want to access

Silvia: depends on the what the codecs allows
... with Theora, we jump to always to the previous I-Frame
... we need to be accurate when we store the fragment (cache), it seems less important on the client side

Davy: I agree with Silvia, we might want to provide some guidelines for some specific formats
... we cannot define an algorithm that says that a time point corresponds to a particual frame for all encoding cases, it's not possible

Silvia: we can say that previous I-Frame is accurate enough

Raphael: what the cache will finally store?

Silvia: Cache will store what the servers is serving, and recompose fragments based on bytes, not using the URI requested by the UA

<davy> http://www.w3.org/2008/WebVideo/Fragments/wiki/HTTP_implementation

3. Define Types of Addressing

<Silvia> zakim: mute me

http://www.w3.org/2008/WebVideo/Fragments/wiki/Types_of_Fragment_Addressing

Erik: page prepared by Davy (with Guillaume input?)

Davy: this page has origin from the list of issues we have discussed during our 1st face to face meeting
... Track: whether a media format supports tracks or not depends on the Container format, but not the Coding format

Frank: do you consider all the video quality level in one track? for example in a media adaptation use case

Davy: I do not think that a different quality of the video is a fragment

Raphael: discussion about what is the boudaries of the track definition

Yves: examples such as multiple camera angles, multiple resolution of the same video in the same stream, audio languages, subtitles: are all of these tracks ?

Silvia: the boundary should be what the encapsulation format exposes
... or rather the container format

Raphael: if the container format exposes the notion of tracks, we could address them, otherwise, we should NOT invent them
... we look at the table

Silvia: different camera angles can be seen as multiple video tracks
... different resolution: encoding format does not work that way, they tend to provide different files
... we should not worry about that now, can be dealt with later

Davy: Temporal dimension: need to take into account the precision we can get in the time point
... Spatial dimension: we cannot generally extract a region, not make yet a decision if we consider only rectangle regions or arbitrary shapes
... Name dimension: again depends on the container format! For example, one can include a CMML or TimedText description in a MP4 or Ogg container

Silvia: QuickTime has 'QuickTimeText' that can be used to jump to a dvd chapter

<Silvia> cueranges

Silvia: Flash has cueranges

<Silvia> http://www.apple.com/quicktime/tutorials/texttracks.html

<Silvia> ups, s/cueranges/cuepoints/

Yves: http://help.adobe.com/en_US/Soundbooth/2.0/WSA5A1DDFB-6BE2-4486-BE0C-A10CEEF119ADa.html ?

Davy: the table is not complete yet, for some format, I couldn't figure out what is possible or not
... summary is that generally, the temporal dimension is not a problem
... for the spatial dimension, this is more problematic!
... a ROI can be extracted with H264, but this is not a crop, rather a decrease in quality
... but generally not possible to extract a region in the compressed domain

Raphael: it is not clear what to do with a spatial fragment
... my suggestion would be that the server send the whole picture, but the UA does something with the fragment, e.g. highlight the region

Davy: for a mobile use case, it makes more sense to not download the whole image, but just the region

Frank: why not specifying that in the URI, whether the client want to download the complete resource or not

Yves: can be done in HTTP with an extension
... the discovery phase will be: server, tell me what do you support
... for example, using the option method, or some parameters in the GET, there are many options
... we can then implement the OPTIONS response, or a content negociation
... discovery is always painful!

Silvia: we always found that discovery was difficult
... we had to find out which tracks were available

<Silvia> http://wiki.xiph.org/index.php/ROE

Silvia: we discussed a format, named ROE, which is a media file format description
... this is currently used in Metavid
... I'm not sure how the discovery and selection should be handled by URI or not
... the UA asks for the ROE file, parse the XML and knows which tracks are available, the UA can then request the right track

<Silvia> e.g. ?track=a1,v1,sub1,cap1

Raphael: can we find the track description in some headers of the container format?

Davy: it depends on the format, it might be the case

Silvia: I agree, it didn't exist for ogg, that's why we invented ROE
... I'm in favor of specifying a syntax, even though just one format will be able to deal with it
... so have a way of specifying tracks and we may list later on which codec and container formats can process

Davy: the audio encoding formats are just relevant for the temporal dimension
... the still images format: JPEG2000 is pretty advanced
... the container formats: mov, mp4, 3gp allows to select track and names, but we need to modify some values (for example change the length field)
... for other formats such as MXG, ASF, I put question marks

<scribe> ACTION: Davy to complete the table, trying to get the answer for the current question marks, except when this is a close format [recorded in http://www.w3.org/2008/12/09-mediafrag-minutes.html#action08]

<trackbot> Created ACTION-22 - Complete the table, trying to get the answer for the current question marks, except when this is a close format [on Davy Van Deursen - due 2008-12-16].

Davy: some formats are then useless for our purpose, because they will support nothing (e.g. WAV, AIFF, AU, XMF)

Raphael: it is still interested to report this information in the document
... Summary: we agree to cover these 4 dimensions
... perhaps the syntax will be simpler for the temporal dimension, since it will be 99% of our use cases
... perhaps the temporal dimension will be the default one
... up to decide to the WG when we will talk about the syntax

LUNCH TIME

Silvia, quick poll

<Silvia> yes?

Media Annotations is willing to organize the next joint face to face meeting in Barcelona

prior to the WWW conference in Madrid

<Silvia> awww - I'd love to go there!

potential dates are: 16 and 17 of April

WWW conference will then be 20-24 of April

so you have to spend the week-end in Barcelona and/or Madrid

scribe: we can also go to the beach :-)

will you be able to make it ?

<Silvia> maybe

<Silvia> will need to see from the biz POV and whether I can get Mozilla to sponsor it

depends on your funding ? Mozilla?

<Silvia> (or some of it)

good, but I note your interest

4. Implementation Issues (protocol & caching)

<Silvia> when you're in australia, meeting people in your field is of major interest, since everybody is so far away

Yves leads the dicussion

Yves: look at http://www.w3.org/2008/WebVideo/Fragments/wiki/HTTP_Fragment_Caches
... there is a discussion in the mailing list between myself, Silvia and others
... about the solutions recommended by annodex, the 4-way handshakes
... and I was discussing the alternative 2-way handshakes
... both are limited, because it will be difficult to access track fragments, and even worst spatial fragments, because transcoding might be required

Silvia: we should not use fragment when a transcoding operation is needed

Yves: when you deal with tracks, do you think we can handle everything in the compressed domains?

Silvia: yes, tracks are dealt with by the container formats

Davy: yes it depends on the format
... why do you think the outcome of a transcoding operation is not a fragment anymore ?

Silvia: because there is no one to one mapping between the bytes of the original file and the outcome file
... i'm talking about physical fragment and not logical fragment

Yves: I argue that a fragment in the URI spec does not specify if it is a compressed resource
... it is a part of the resource

Silvia: I argue that a fragment of a original resource must be a part of the resource

Yves: I do not argue
... you can have lostless transformation process, but there is not a single byte range process

Silvia: I didn't argue about having a single or multiple byte ranges

Yves and Davy disagree

Yves: if you transcode to a different format, yes, this is a different resource
... but if you transcode to the same format, I would consider this is the same resource
... so a valid fragment
... example, get all <H1> in a HTML page, this is a fragment
... it might not be a continous fragment, so difficult to cache, but it is still a fragment

Silvia: YES, but you're not changing the bytes, you have the same bytes

Yves: ok, but if you use FLAC, which is lostless, you will have a valid fragment
... mp3 is definitively not the same thing
... the fact that the stored bytes are different is not relevant
... the criteria is what you get, what you watch

Raphael: ok, but how cache will handle that?

Yves: caches are not forced to store _all_ fragments, need to be specified
... merging does not involve a simple concatenation of bytes ... you already add information for serving the fragments

Silvia: I think we will have a lot of pitfalls with this path

Yves: which ones ? I want to see examples

Silvia: take samples of a FLAC file, decode them, and re-encode them, you will not get a playable piece ???
... I just do not see happening this extra complexity

Yves: perhaps, but nothing prevent to do that?
... i'm arguing that such operation done in the cache might be more efficient

Silvia: i don't want to support transoding with loss of information

Yves: agree, fair thing to do
... but not all merging operation have to be done with transcoding

Raphael: which formats are we talking about ?

Davy: most of them that are used loose information anyway

Yves: if you do a lossy transformation, then you will get another resource, so another URI
... but it can be done transparently using content negotiation
... another issue, is that annodex create an unlimited numbers of sub-resources
... because it uses the ? and not a real fragment

Silvia: we did that because we thought it was not appropriate to use the '#' ... but I'm happy to use now the hash

Yves: i just want to come to the header (and footer) of the current annodex solution ... that is done in the compressed domain
... the solution I'm talking about has headers modified
... smart caches will have a way of doing merge in the compressed domain
... it can be done in specialized proxies (dedicated to media)

<Silvia> in Ogg, it is not possible to have a video file with a gap at the beginning and a gap and a gap in the middle

<Silvia> it will not result in a valid resource

<Silvia> thus, if you have more than one segment, it needs to be done as video playlists

Silvia: the solution we advocated in Annodex will not store n times the overlap

Yves: in my solution, we will store just the complete playable files
... so we are talking about the same thing, except that in my case, we store additional headers and footers

Hi Conrad ...

<conrad> hi raphael :-)

Conclusion: Yves advocates to store and cache what the server is serving, playable resources, so the bytes corresponding to the fragment requested enhanced with the appropriate header/footer depending on the encoding format

Yves: same as byte ranges, if there is overlap, the cache will merge them
... we are talking about smart caches ... the other ones will not cache them

Silvia: I will favor we go mainly for byte ranges and see immediate implementations
... and see later what can be improved

Yves: if you want to do only byte ranges, you should do it such a way that it is still a fragment
... the solution I was advocating does that naturally, but requires smart caches

<Silvia> The byte range based proposal for caching web proxies in annodex is one that can be supported by existing web proxies

<Silvia> therefore I suggest we support that first

<Silvia> the difference between this and what Yves proposes is that the recomposition intelligence goes into the server or into the web proxy

Raphael: but does annodex solution implies storing additional information such as header/footer ?

<Silvia> if Yves case, the web proxy has to know about all the encoding formats and needs to understand how to recompose them

Erik: I wonder if your two solutions are orthogonal or not?

<Silvia> I was proposing to allow both

Raphael: Silvia, YES, but Yves talked about smart dedicated web media proxies

Yves: I agree with supporting both, but I would add we put too much emphasis on caching, given that most of the traffic is not cached anyway
... so we should not spend too much time on caches

<Silvia> it's done through services like Akamai

<Yves> who are not using HTTP caches for that (at least the CDN I know of)

<Silvia> no, they are using proprietary solutions

<Silvia> and thus avoiding the existing Web proxy infrastructure

<Silvia> but that's outside of what we need to worry about :)

<Yves> yes :)

<Silvia> say hi to the media annotations guys :)

<Silvia> I will continue to hang out here

Coffee break

5. Joint Session with Media Annotations WG

joint meeting MF & MA

<erik> poll third F2F: 16/04-17/04 @ Barcelona (prior to WWW conference @ Madrid)

<erik> everybody finds it a good idea

<erik> Raphael to talk about status of MF group

<erik> http://www.w3.org/2008/WebVideo/Fragments/wiki/Main_Page

<erik> Raphael summarizes wiki-pages under "Preparation of Working Draft"

<erik> * Use Cases

<erik> ... http://www.w3.org/2008/WebVideo/Fragments/wiki/Use_Cases_%26_Requirements_Draft

<erik> ... functional and non-functional requirements are also part of that UC page

<erik> * Communication between client and server

<erik> ... http://www.w3.org/2008/WebVideo/Fragments/wiki/HTTP_implementation (to be elaborated soon)

<erik> ... 2-way & 4-way handshake

<erik> * Existing technologies

<erik> ... http://www.w3.org/2008/WebVideo/Fragments/wiki/Existing_Technologies_Survey

<erik> ... cover all technologies out there & in the end convert our solution back to existing ones

<erik> Question: what type of fragments will be possible?

<erik> ... temporal for sure in v1

<erik> ... temporal & spatial for video is most difficult one in v2

<erik> ... also tracks are in scope in v1

<erik> ACTION: Erik (together with Jean-Pierre) to add TV-Anytime also to Existing Technologies Survey [recorded in http://www.w3.org/2008/12/09-mediafrag-minutes.html#action09]

<trackbot> Created ACTION-23 - (together with Jean-Pierre) to add TV-Anytime also to Existing Technologies Survey [on Erik Mannens - due 2008-12-16].

<erik> ability of using XMP for notition of tracks ... this seems possible (link MF & MA) ... to be investigated

<scribe> scribenick: erik

UNKNOWN_SPEAKER: common scenario from MA (description of resources) to MF (communication client/server through content negotiation) for selecting tracks

<fsasaki> http://www.w3.org/2008/WebVideo/Annotations/wiki/XMP

<fsasaki> Ingredients

<scribe> ACTION: Erik (through extra info from Felix) to ask Adobe (Larry) more info about xmpMM:Ingredients [recorded in http://www.w3.org/2008/12/09-mediafrag-minutes.html#action10]

<trackbot> Created ACTION-24 - (through extra info from Felix) to ask Adobe (Larry) more info about xmpMM:Ingredients [on Erik Mannens - due 2008-12-16].

Felix to talk about status of MA group

* stating problem of information loss when mapping (setting) information from one format to generic MA ontology

scribe: Raphael: is common subset on metalevel not enough?
... felix: looked at existing meta-models today ... all "getting" models, not "setting" ... issues (protocol, information loss)
... Raphael: maybe MF can give some input via explanation of our table ... within ...
... http://www.w3.org/2008/WebVideo/Fragments/wiki/Types_of_Fragment_Addressing
... summary: 5th column ... everywhere where there is a "1"&"2" it is possible to add metadata within

<raphael> Raphael: in case some metadata are embedded into the header of a media resource, should we be able to have access to it using a fragment ?

<raphael> ... using which dimension ? the 'name' dimension ?

felix: will named fragments be possible?
... Raphael: yes (i18 will be problem to handle though)

* XMP overview

scribe: http://www.w3.org/2008/WebVideo/Annotations/wiki/XMP
... Raphael: what about collisions of types/values?

<Daniel> http://dev.w3.org/2008/video/mediaann/mediaont-api-1.0/mediaont-api-1.0.html

scribe: only get-functions for the moment (cfr. "setting"-problem)

<fsasaki> http://www.w3.org/2008/WebVideo/Annotations/wiki/FeaturesTable

<fsasaki> http://dev.w3.org/2008/video/mediaann/mediaont-req/mediaont-req.html

scribe: UC document

formal review of MA UC Doc by MF (probably before 31/12/08)

formal review of MF UC Doc by MA (and others SVG, HTML5, TimedText) (probably before 31/01/09)

<raphael> adjourn

<raphael> thx the organizers

<vmalais> logout

- DRAFT -

Media Fragments Working Group Teleconference

09 Dec 2008

Attendees

Contents