Re: Review: Use Case & Requirements Draft from Silvia Pfeiffer on 2008-11-08 (public-media-fragment@w3.org from November 2008)

From: Silvia Pfeiffer <silviapfeiffer1@gmail.com>
Date: Sat, 8 Nov 2008 14:34:46 +1100
To: "Raphaël Troncy" <Raphael.Troncy@cwi.nl>
Cc: "Media Fragment" <public-media-fragment@w3.org>
Message-ID: <2c0e02830811071934g6dbd16e5h4201340a842aa075@mail.gmail.com>
Hi Raphael, all,

Great feedback, thanks! I have some comments inside and also some
questions for everybody.


On Sat, Nov 8, 2008 at 4:34 AM, Raphaël Troncy <Raphael.Troncy@cwi.nl> wrote:
> * Section 1.1:
>  - (intro): I don't really like the term "secondary resource". I understand
> what you mean but the terms 'primary' and 'secondary' are sometimes
> ambiguous, and even in the broadcast world, they use them with a different
> semantics (secondary resources meaning the textual documents that go with
> the primary video resource). I would suggest to use instead the term 'part
> of', so: "A media fragment URI allows to address this part of the resource
> directly and thus enables the User Agent to provide AND PLAY just the
> relevant fragment".

This is not a term I have selected or invented. It is the term stated
in the standard: http://www.ietf.org/rfc/rfc3986.txt , section 3.5 on
fragments. I have added a link to the rfc to clarify this.

Also, I have replaced the word "provide" with "receive", because that
is what I really meant (even though "provide" would have been
user-facing and "receive" is now server-facing).


>  - (scenario 2): I'm not sure we want to say that only the region of an
> image should be displayed. What about saying: "Tim wants the region of the
> photo highlighted and the rest grey scaled"?

If we get the rest of the image in grey scale, then we have to receive
the rest of the data. This is not a fragment then. In this case, you
would receive all the image and do some transformation on some parts.
That is not what media fragments would be for IMO.


> * Section 1.2:
>  - This use case seems to be an application of 1.1, that is, linking for
> bookmarking. What about specializing 1.1 with this one? I will have the same
> remark for 1.4, see below.

I think it makes sense to keep these use cases separate. From a user
agent POV, the process of receiving, decoding and display is a very
different use case to the one of storing a uri. The question to me
would be: what do we win by merging them into one other than a shorter
list and that people may point out that we have missed some use cases?


> * Section 1.3:
>  - (scenario 2): It is an interesting accessibility scenario, but I think
> the description should be a bit extended. What do you mean by "audio
> representations"? Audio tracks? Additional audio annotations? Both? When you
> said "... by using the tab control ...", do you have in mind a screen
> reader? Since this is not obvious for the casual reader, I would suggest to
> describe what exactly do you mean.

An audio representation is e.g. an extra audio track that provides a
description of what is happening in a video, or an extra text track
that can be rendered through braille or through a TTS engine and has
the same description. Generally, accessibility people would know what
it is, but I have extended the second example and hopefully it is more
comprehensible now.


> * Section 1.4:
>  - This use case is again for me an application of 1.1, that is this time
> linking for recomposing (making playlist)

Recompositing poses very different challenges to an application that
just playback. I can see where you come from - for you it is just a
piece of content delivered to a user agent and you don't really care
what the user agent does with it. However, I am looking at use cases
from the user's POV. A user would not regard watching a video as the
same use case as editing clips together. I am just weary that we might
oversee some things if we throw these use cases together too much.


>  - (scenario 2): I let Jack answer regarding the possibility of SMIL to be
> used as a background image ;-)

That's where I hoped an answer would come from. ;-)


>  - (scenario 4): Should we mention some formats for composing playlist of
> mp3 songs, formats that would allow to make use of media fragment URIs?

I have mentioned xspf, SMIL and RSS (which could be media RSS or
iTunes RSS) in the examples. Do you want to make these more explicit?
I don't think it is our mandate to analyse playlist formats, which is
why I left it out and just used some prominent example.


> * Section 1.5:
>  - (scenario 1): I think we should describe further this use case. For
> example, precise that Raphael does not create RDF descriptions of objects of
> his photos, but rather systematically annotates some highlighted regions in
> his photo that depicts his friends, families, or the monuments he finds
> impressive. This could be then further linked to the search use case (Tim).

Go ahead - this is why the use case bears your name. :-)
I hoped people would jump on "their" use cases and make them more complete!


> * Section 1.6:
>  - (scenario 1): I find it out-of-scope. I think it is worth to let it in
> the document but to say that this is where we think it is out of scope ...
> if everybody agrees :-)

I tend to agree. However, I would like to qualify why we think it's
out of scope. I think what should be in scope with fragments are where
we create what we called at the F2F "croppings" of the original file,
i.e. where we take the original file, make a selection of bytes, and
return these to the user agent. No transformations or recoding is done
on the original data (apart from potentially changing a file header).
This however means that as soon as we have a codec that is adaptable,
i.e. where a lower framerate version or a smaller image resolution can
be created without having to re-encode the content, we may have to
consider these use cases as being part of media fragmentation.

Maybe what we can agree on is that this is potential for future work
as such codecs evolve and become more common. It is not our general
use case right now though.

What do ppl think?


>  - (scenario 2): I find this scenario also really borderline / out-of-scope.
>  As it has been pointed out during the face to face meeting in Cannes, the
> interactivity seems to be the most important aspect in the map use cases
> (reflecting by zooming in/out, panning over, etc.) and I guess we don't want
> that in our URI scheme. Do we?

I included this because we had a map use case. I'd be quite happy to
decide this to be out of scope, but wanted to give the proponents of
that use case a chance to speak up.


>  - (scenario 3 and 4): I love them since they introduce the need for
> localizing tracks within media, but I would suggest to merge these 2
> scenarios. Are they supposed to express different needs? I cannot see that.

I think blind users and karaoke user are two very different use
examples. I'd like to keep these separate.
My understanding is that use cases help us create requirements (i.e.
technical needs). They are not meant to be non-overlapping for
technical needs. In fact, the more overlap we have for a technical
need from different use cases, the more important that technical need
becomes.


> * Section 2:
>
> I have hard time to understand what do you mean with these technology
> requirements. I understand the need for enabling other Web technologies to
> satisfy their use cases but I'm not sure this is strong enough to make a
> next headline. Actually, I can easily see all the subsections merged with
> the existing use cases, see below. Therefore, I would suggest to remove the
> section 2.

So, section 2 looks at media fragments from a technology POV, not from
a user POV. Yes, most of these technology rquirements are expressed in
the use cases above. However, they are not expressed as such. It is a
different dimension of description of the needs that we have.

I have tried to write an introduction to this section. I believe it is
important to explicitly spell out these dimensions to make people from
these areas that have a large interest in media fragment uris
understand explicitly that they are being catered for.


> * Section 2.1:
>  - (scenario 1): this scenario introduces the need for having fragment names
> (or labels) in addition to their boundaries specifications in the URIs. I
> think we could add this scenario in the section 1.5 related to the
> annotations of media fragments.

I don't see an absolute necessity of defining anchors originate simply
from the need of attaching metadata to segments. In fact, I found it
really hard to argue this point from a simple user's POV, while it
makes total sense from a technology POV. This is why I have created
this section.


> * Section 2.2:
>  - this scenario seems to me connected to the scenario 2 of the section 1.2
> (bookmarking media fragments)

Hmm - 1.2 only has one scenario. I don't understand...


>  - CMML needs a reference

I did not reference xspf, SMIL, media RSS or any of the other
technologies that were cited in the sceanrios either. We should either
cite them all or none of them. I just expected ppl to use google for
that. :-)


> * Section 2.3:
>  - this scenario seems to me connected to the scenario 3 of the section 1.4
> (media recomposition)

Hmm ... there is no playlist in this example. It just explains the
"clipping". I do not see a connection between these two use cases.


>  - typo: "... while travelling to make the most of his time." => "... while
> traveLing to make the most of his time USEFUL".

It's a wiki, mate. :-)
Also, that "useful" at the end doesn't make sense from an English
language POV. I've instead added a comma, which might clarify the
sentence.


> * Section 3.1:
>  - I suggest to add a schema that corresponds to Silvia's drawing [2]

Is somebody with graphics skills around?


> * Section 3.3:
>  - "secondary" => same remark than previously, what about using the term
> "part of a resource" to designate the fragment?

"Secondary" is a technical term defined by the URI standard, see comment above.


>  - Do we also want to cover the case where the user wants to explicitly
> create a new resource?

I thought we discussed this at the meeting and decided against it.
This is why the use of the "#" is the only way to go.


> * Section 3.7:
>  - Do we have access to referencable figures and/or estimates regarding  how
> the video traffic is spread in Internet nowadays in terms of protocols? I
> mean, how much traffic goes through http, rtsp, p2p? Silvia, does your
> company provide such stats?

No, we don't do this kind of statistics. ComScore and Nielsen do
these, somewhat.

This might give you an indication:
http://www.techcrunch.com/2008/10/31/yahoo-does-something-right-leapfrogs-to-no-2-spot-in-web-video/
and
http://www.techcrunch.com/2008/09/10/google-trounces-web-video-competitors-with-5-billion-views/
These essentially tell us that most *number of* videos are viewed on
youtube/google, myspace and yahoo, which means: flash over http.


On the other hand, these two articles:
http://blogs.techrepublic.com.com/tech-news/?p=2078
and
http://www.betanews.com/article/Comcast_opens_up_negotiations_with_BitTorrent_on_bandwidth/1206629393
suggest that the most *bandwidth* on the Internet is being consumed by
bittorrent over tcp.\


I think that nowadays the share of rtp/rtsp is minuscule compared to
these other two.
Yet, I have found that YouTube provides video to mobile over rtsp:
http://www.techcrunch.com/2008/01/24/youtube-goes-more-mobile/ .
So, go figure...


> * Section 3.8:
>  - I find this requirement very strong and I feel we are still discussing
> the issue. Perhaps we can phrase that as: "we should avoid to decode and
> recompress media resource" ?

I'd like to have this discussion and come to a clear conclusion,
because it will make things a lot more complicated if we allow
recompression. Davy and I have discussed this thoroughly. Can ppl
express their opinions on this? In fact, is anyone for allowing
recompression (i.e. transcoding) in the media fragment URI addressing
process?


Cheers,
Silvia.
Received on Saturday, 8 November 2008 03:35:24 UTC