<scribe> scribenick: kaz
<scribe> scribe: kaz
<kaz> Recorded video from this call
ChrisN: A few topics today, we've
been doing a series of calls looking at the APIs in scope of the
Media WG.
... The two main ones not covered so far are Media Playack Quality
and AutoPlay Policy Detection.
... Can also mention planning for the IG F2F meeting at TPAC.
Anything else?
Song: Can I present the bullet curtain proposal? Can wait until the next call, though.
ChrisN: Yes
Jeff: I wanted to mention that Mark
Vickers would like to step down as co-Chair after TPAC.
... I've had good dialog with current chairs about new co-chairs,
have some good candidates. If anyone here also would like to
suggest someone, please reach out to me.
<Barbara_H> Should we review Media Capture in this group at some point?
ChrisC: I've recently taken a closer
look at this API. Google has been talking about how to do better
reactive playback quality signals for a while, now scheduled some
time to look at it.
... The API is very simple, the most useful property is the number
of dropped frames relative to the number of decoded frames.
... Sites like YouTube, Netflix, and others use this to adapt
between bitrate quality levels to ensure not to many frames are
dropped, for a good experience.
... The API is shipped in most major UAs, with the notable
exception of Chrome.
... This is about prioritisation, and some concerns about the spec.
The API isn't perfect, the definitions in the spec are coarsely
written, e.g.,
... what is a dropped frame, exactly? We can drop frames for
different reasons, are some more important than others?
... That said, everybody's shipped it, and there's a prefixed
dropped frame attribute and total frames attribute in Chrome.
... Websites largely treat these as interoperable, but there's some
evidence that they're not. We should make this true, if it's not
currently true.
... In the next quarter and beyond, our priority is to get this
shipped, it's unpleasant that it isn't.
... A prefixed version is shipped in Chrome, something we'd like to
eliminate.
... I'm planning to spend some time on the spec, clarify the
definitions, and work with other UAs to make sure it's
interoperable.
... For dropped frames, it's about performance. It wouldn't be
right to count a frame as dropped if the machine had it ready in
advance, but the monitor refresh rate is so high that you never saw
the frame.
... Would be a poor choice for the site to adapt to that
condition.
Kaz: Any resource on the API?
<chcunningham> https://w3c.github.io/media-playback-quality/
ChrisC: Another property we want to
clarify, or remove altogether, is corrupted frames.
... It's not something we've been able to implement in
Chrome.
... If the video is corrupted, you'll generally get a decode error.
If a single frame is corrupted, you may never know about it.
... The end user may see artifacts, but the next i-frame will come
along and fix the corruptin, or there may be a decode error that
terminates the playback.
... So there isn't really a situation in Chrome where we could
accumulate a count of corrupt frames.
... Not clear what that would mean to app developers.
... I'll nominate to cut this from the spec.
... We'd also like to make some additions and some ergonomics
changes, mostly backwards compatible.
... This API is poll based. I did a quick survey in the video-dev
Slack group about what players are using, what polling
interval.
... Seems to be largely in the order of 1 second, but we don't have
a lot of data. Polling is unpleasant.
... We're open to discussing adding an event when the dropped frame
count increases, which is hopefully less than once per
second.
... Alternatively, something like the performance observer API
where you could subscribe to changes you're interested in.
... We can't raise an event on every change, as the decoded frames
total is constantly changing.
... Safari added a display composited frames field, which are
frames rendered using the hardware overlay path.
... It's not in the spec, people have asked for it in Chrome, seems
useful and straightforward to add.
... Other things from FOMS last year: signalling the frame rate,
active bit rate, active codec.
... This would be helpful for sites to adapt. Would be nice for the
API to surface at a clean boundary when the stack has reflected the
adaptation.
... Codec is tricky, Chrome is increasingly strict about passing in
valid codec strings with canPlayType, to give unambiguous answers,
and Media Capabilities.
... It's hard to produce a codec in the reverse direction. A valid
codec string with profile and level, we know large parts of
that.
... For MP4, could do that reliably, but tricky for every container
everywhere.
... Not unsolveable, it's something we need think about. The
changing of codec at this point is less important than those other
things.
... Another thing is decoder health signals. In the
HTMLMediaElement spec, there's readyState: HAVE_CURRENT_DATA,
HAVE_FUTURE_DATA, HAVE_ENOUGH_DATA.
... Not clear to me what these mean in terms of being about
network, or if my computer has decoded enough frames to pre-roll
and display the data.
... Seems to be mostly about network. It's described in the spec
under the resource fetch algorithm.
... This means there aren't clear signals for when the decoder is
not keeping up.
... In Chrome, HAVE_CURRENT_DATA is signalled either in the network
condition or in the decoder condition, so it's a conflation of
things.
... It would be interesting to have a separate signal to say if
your computer is not able to decode enough frames and the video is
frozen on the screen for a moment, this is an underflow caused by
the system. It may or may not be reflected in an increment to
droppedFrames, but would be an interesting signal for
adaptation.
... Another is time to decode. If your frame rate is 30
frames/second and you're decoding slower than that, another
interesting signal to consider for adapting.
... Those are my high level plans. Any questions or proposals for
things to add to the API?
ChrisN: I have a question about dropped frames, does this become a proxy for CPU usage on the device, if it can't keep up with decoding and rendering?
ChrisC: Yes it is, sort of. There are some cases where the CPU may be quite high and you're not dropping frames, so not a perfect proxy, but it shows that the system is under strain.
ChrisN: Thinking generally about QoE
measures, inability to fetch the resource, or buffer underruns,
things that stall the playback.
... There are other APIs we can use to get that kind of
information, so not everything needs to come from Media Playback
Quality?
ChrisC: This is true. If you're using
MSE, you have a full picture of how the network looks, regarding
fetching the chunks.
... There's the readyState on the HTML video element. I believe
this to be more a resource fetch and network thing,
... although in Chrome these are a mixed up, not sure what other
UAs have done.
ChrisN: Anyone else have thoughts or things you want to see from this API?
(none)
ChrisN: Looking back, there's a WHATWG page about video metrics
<cpn> https://wiki.whatwg.org/wiki/Video_Metrics
ChrisN: This has a survey of metrics
available from different media players at the time.
... This links to a couple of bugs, which were referred to the web
performance group,
... as more general network type of issues.
... Not sure to what extent this list is still relevant today, but
may be worth looking at again.
ChrisC: That's definitely true. It
has some things we have today, some things we don't.
... I see some definitions of bitrate here, start up time.
... Some of them could be hard to do, like latency, where the app
knows better than the UA.
... But this is a good resource, still relevant.
ChrisN: My next question is about
codecs. There's a negotiation between the web app and the UA,
... using Media Capabilities API to understand what codecs are
available,
... and what's compatible with the content I want to fetch.
... What role do you see with Media Playback Quality, e.g, is it
where the UA makes adaptation decisions?
ChrisC: We see them working in
tandem. With Media Capabilities, you'll get a clear answer about
what codecs are supported.
... But as many codecs are decoded in software, or as new codecs
come out, like AV1, push the limits of what the system can
do,
... decoders are being optimised, the proliferation of 4K,
etc.
... Media Capabilities is not a guarantee. We say the codec is
supported and will be smooth.
... But the smooth part of that claim is a prediction. It's
possible the user will begin playback of something that has
historically been smooth,
... but then alo begin to play a resource intensive video game, and
this would affect video playback.
... We encourage people to use the Media Capabilities API to know
up front what the limitations are, what codecs might be power
efficient on the device, so they can do the optimal thing for the
user.
... Then we engourage players to listen for these reactive metrics
to see if anything about the predictions have changed.
ChrisN: I understand that this was part of MSE, then dropped because of lack of interoperable implementations needed for a W3C Rec.
ChrisC: That's right. It's good that
it was dropped, as it's something that people should be aware of
regardless of Media Source Extensions.
... It does feel at home on the media element.
ChrisN: And so it could be applied to other video resources, e.g, getUserMedia. Is it hooked up there as well?
ChrisC: Absolutely. I think all the same notions apply, you could be dropping frames. Also the new thing from Apple, could be applied.
ChrisN: Any other comments/questions?
Barbara: The relationship with Media Capabilities, any thoughts on how those two would work together?
ChrisC: Those two specs are totally
separate. Internally in Chrome, we use droppedFrames as a
historical signal for what your capabilities would be.
... The Media Capabilities doesn't require that, it allows
implementations to use whatever heuristics they like. Firefox and
Safari probably do something different than Chrome.
... There could be an opportunity for the specs to reference each
other, for certain definitions. For example, the Media Playback
Quality may surface when a decode is hardware accelerated,
something Media Capabilities already does.
Barbara: From a hardware vendor's perspective, thinking about changing the configuration based on information either from the playback or media capabilities, on which hardware decoder to be using.
ChrisC: Media Capabilities has a
isPowerEfficient property, which is not quite the same as being
hardware decoded, but is meant to be effectively the same thing,
for developers who want to save battery.
... If you're just concerned about performance, then use
smoothness.
... The reason it's not exactly hardware decoded, and described as
power efficient, is that at lower resolutions and lower frame rates
all video is more or less the same in terms of efficiency, whether
it goes through a hardware or software.
... Media pipelines below a certain resolution cutoff will opt to
decode in software, even if they have the hardware resources.
... It could be interesting to define some of these terms in one of
the specs, and cross reference.
Barbara: I'm glad to see the both specs are done within one Working Group.
Will: The Media Playback Quality
interface has absolute count of frames. That means anyone who wants
to monitor in terms of frame rate or dropped frame rate has to
implement some sort of divide by time or clock.
... It would be useful if the API could provide dropped frame rate,
and the UA keeps track of the frame rate. Often it's the frame rate
that would trigger a switch, rather than the absolute count.
ChrisC: That's a great point. We'd have to figure out a good boundary for a window for the running average, assuming you don't want it windowed over the entire playback.
Will: Correct. A reasonable one would
be 1 second, and get feedback from player developers?
... More importantly, how that correlates. Typically you want that
as a quick predictor to be able to react,
... so I'd prefer a shorter time window that has more variance as
you play through, but you can use it as a switching metric, which I
believe is the intent.
ChrisC: I like it.
ChrisN: What's the best way to send feedback, as issues in GitHub?
ChrisC: Yes, or you can contact me personally.
Chris: Would you like to give an introduction the background, and the problem this API intends to address?
Becca: I'm a software engineer at
Google, been working on autoplay for a while.
... Right now, there's no way for a website to detect whether it
can autoplay before it calls play() on the media element.
... We want to give sites a way to determine whether a video is
going to play or not.
... This is so they can pick different content or change the
experience. For example, they might want to mute the video, or do
other things.
... The API is relatively simple. There's something at the document
level that determines the autoplay policy.
... This can either be: autoplay is allowed, muted autoplay is
allowed - in that case the site may want to change the source or
change the UI - or autoplay is disallowed and it will be
blocked.
... There's another check at the media element level, which returns
a boolean.
... This is designed so that you can set the source on the media
element, then call whether this will autoplay.
... This will look at the document autoplay policy, and the
metadata on the source, e.g, is there an audio track? Then it will
return whether it can autoplay, before you call play().
... We're still working out the ergonomics of the API. I'd be happy
to take questions or suggestions.
ChrisN: Could you please expand on the difference between the document level and media element level?
Becca: The document level gives you
the auto play policy: allowed or disallowed. At the media element
level, it's taking into account the source.
... Some browsers have some autoplay restrictions at the media
element level, so this will take those into account.
ChrisN: And so the purpose is to allow the web app to make decisions on what content to present based on the policy that's applied.
Becca: A common use case is if
autoplay is allowed but only for muted content, some sites may want
to switch the source and play different content.
... Or update the UI and show something to the user, to show
autoplay is blocked.
ChrisN: I remember when the autoplay
restrictions were implemented a couple of years ago, it was
controversial,
... e.g, in the Web Audio community where they were
programmatically generating audio content.
... This doesn't apply just to media coming from a server, also
synthesized content from Web Audio.
... Becca: Yes, the document level autoplay also applies to Web
Audio. We don't have a plan to add a method to AudioContext,
... but you can use the document level API to check before creating
an AudioContext.
... Any questions?
Mark: You mentioned two cases: a
media source with no audio track, and a site attempting to play
video with an audio track but the media element is muted.
... Are those cases considered differently for autoplay policy, or
are they equivarant?
Becca: They should be essentially equivalent.
Mounir: Not all browers actually check for the audio track. So if you want to be cross browser compatible you may just want to mute instead of removing the audio track.
ChrisN: And the circumstances where autoplay is allowed will vary between UAs, as they can be using different criteria.
Becca: That's correct, and why we recommend sites use this API rather than trying to predict what the policy is.
ChrisN: As a developer, is there a way that I can test how my site behaves with each different policy level applied to it?
Becca: Right now there is no way to test that, but it's something we can consider adding, I think.
ChrisN: We had difficulty with our
player component, which was in an iframe, and the user interaction
in the containing page,
... and we wanted to send a message to the iframe to tell it to
play, and this was blocked by autoplay.
Becca: For Chrome, there's an
autoplay policy command line switch that you can use to set
different autoplay policies.
... For example, one of these is "no user gesture required", which
would allow anything to autoplay.
Greg: How is the media element level API different to calling play() and having it reject, and then reacting to that?
Becca: It's kind of similar. Autoplay allows you to check before calling play(), so you can change the source or make a change in the UI.
Greg: I could call play(), then if it
rejects I could mute then call play() again. Or, if it rejects,
we'd leave up a still image.
... Also, IIUC, this API requires you to have metadata, so it seems
you have to do the work to get ready to play, so it seems similar
to calling play() and seeing what happens.
Will: Feature detection vs error is
never a robust architecture. While you can call play() and see that
it doesn't, there are other reasons it may not play than
autoplay.
... So I think it's a cleaner implementation if there's an explicit
API to figure out the autoplay vs relying on error conditions on a
play request.
Greg: Does play reject for any other reason?
Will: I don't know for sure, but it might in future, so your assumption today becomes brittle whereas an API explicitly about autoplay gives a clearer picture.
Mounir: One reason is specific to
Safari on iOS. They have a rule that autoplay is per-element, so we
want to be mindful of that, and have a per-element API.
... What a lot of websites do is create an element, try to play
using it, the element has no audible track, or no source. If it
rejects, they know it can't autoplay. Then they create the real
element after that.
... That can be done, and would work as well as the media element
API, but on Safari on iOS, not a good way of doing things, as the
second media element you create would not be allowed to play
because of the user gesture.
... The document level API solves other problems, but the media
element level API is there to solve this Safari iOS issue.
ChrisN: What is the current status? There isn't a draft specification on GitHub, is this being worked on at the moment?
Becca: We're still working on it, there's discussion on the shape of the API, and we'll discuss at TPAC and hopefully resolve.
ChrisN: I saw some discussion on whether the API should be synchronous or asynchronous, don't need to go into that now.
Mark: There is an explainer document in a branch.
Becca: Yes, it's in a pull request
<cpn> https://github.com/w3c/autoplay/blob/beccahughes-explainer/explainer.md
Mark: If the document level API says that autoplay or muted autoplay is allowed, is that a guarantee it's allowed, or do I also need to check the media element API?
Becca: If the document level API says
it's allowed, you should not have to check at the media element
level.
... We're thinking of adding a fourth state at the document level,
which is unknown, so please check the media element level.
Greg: Would the media element api detect that it's in a user gesture?
Becca: Yes, it should be able to.
ChrisN: Any further questions?
(none)
ChrisN: Thank you Becca, this was really helpful. We can send feedback on the GitHub repo?
Becca: Any issues are welcome
ChrisN: Also please join the Media WG as well.
<Song> https://w3c.github.io/danmaku/index_en.html
ChrisN: Can you give an introduction, as people might not be familiar?
Song: There's a description in the
introduction. The name comes from the Japanese word
"Dan-maku".
... After discussion, we chose "Bullet Chatting" as the official
name.
... It's for dynamic comments floating over a video or static
image, at a specific point in time in the video.
... It brings an interesting and unexpected experience to the
viewer.
... There's a picture...
Kaz: (suggests Song share his screen on WebEx)
Song: Figure 1 shows a typical bullet
chatting screen. There's text floating over the video, which makes
watching the video more fun.
... We did some research over several solution providers, to
compare the attributes and properties they use, appearance, time
duration, font size, colour, timeline, and container.
... (1.2) As characteristics, there's the independence of space,
deterministic rendering, uniformity of modes.
... The important feature of bullet chatting is that it's quite
flexible compared to traditional comment presentations.
... (1.3) There are four modes of bullet chatting: rolling, reverse
bar, top and bottom, basically four directions for the text.
... For example, rolling mode is the most used mode, which scrolls
the text from the right to the left.
... (1.4) Regarding the commercial operation of bullet chatting,
here are some figures. For example, iQiyi is top video service in
China, they have 575 million monthly active users.
... For Niconico in Japan, the usage is also high. Every service
provider listed here provides bullet chatting for every
player.
... The functionality can cover lots of subscribers of the movie
content.
... (2) WebVTT and TTML are relevant to the implementation of
bullet chatting.
... (3) As background for the use case, Figure 2 shows the typical
chat room, the comments scroll on the right hand side at a fixed
speed.
... Figure 3 shows bullet chatting, where we present the comments
on screen.
... The advantage of displaying the video with bullet chatting is
in typical chat room, the messages scroll quickly.
... For bullet chatting, there's a higher density of information
because it's presented over the full screen video, so there's a
wider display area, which gives a better user experience reading
individual messages.
... In a normal chat room, every message scrolls up at the same
speed, so it's hard to do some specific handling, but with bullet
chatting mode each message moves separately with their own paths
and update frequency so it's possible for users to read the
comments.
... (Figure 6) In a typical chat room, if you're watching the video
content in the player, it's difficult to concentrate on both the
video area and the comment area.
... In the bullet chatting mode, you don't need to move your eye
from the left to the right.
... There's also some advantage for the reading habit, as with
bullet chatting the moving direction is right to left, and for
people who have a habit of reading left to right it can be more
convenient for them to read and understand the whole message.
... In common understanding, having text floating over the video
can be distracting for the user. But another perspective from
social psychology is that watching a video alone gives a feeling of
joining a group activity, and placing the comments over the video
will make the user more cheerful.
... Bullet chatting can show the text and video together in a
comfortable way, with a sense of participation in a group activity
for the user.
... Without moving his sight from the video content, the user is
able to read other's comments on the specific scene or upcoming
clip. This increases everyone's social presence, particularly for
millenials.
... (4.1) The comment content can be used for on-demand video
interaction, or live streaming interaction.
... (4.2) The chatting can be a direct interaction between the
anchor and the viewers in a live stream.
... (4.3) The use of the bullet chatting data, if there lots of
comments at a specific point in the video, this indicates a lot of
users are interested in that specific point of the video, so this
can be used for data analysis of consumer behaviour.
... (4.4) It can also increase the effect of the video
content.
... (4.5) Another usage is for interaction within a web page. For
example, with WebVTT you watch the comments over the video, but
with bullet chatting, it can be simple for any webpage.
... (4.6) With an interactive wall, the host can present comments
from attendees on a wall. In this case there's no relationship
between the comments and the video. From our understanding, this is
a case that WebVTT can't cover.
... (4.7) Masking is being used to avoid conflict between different
comments. The comments will avoid overlapping the people on the
screen.
... (4.8) Another example non-text bullet chatting, for example,
emojis.
... (5) There's a recommended API. I won't go into the details.
Anyone who's interested can read the details in GitHub.
... (6) A gap analysis of bullet Chatting and WebVTT. WebVTT is a
file format for making external text track resources, which is kind
of a fixed format. Bullet chatting is more flexible to provide text
over the video content.
... Considering bullet chatting as a subset of WebVTT, there are
difficulties, such as the interaction and tracking. The WebVTT
formatting is done in a fixed way, so we think bullet chatting
could be a separate design rather than a subset of WebVTT. That's
the main result from this proposal.
ChrisN: Thank you. This has really progressed since the last time we spoke about this.
Song: There are some other companies involved in this proposal, the video service members of W3C.
ChrisN: What's the next step? Do you want feedback from other IG members?
Song: If the other people are
interested, we plan to collect issues and use cases in
GitHub.
... We can make a proposal based on the use cases, completed by
other members.
ChrisN: I would suggest more detailed discussion with TTWG members, as they develop WebVTT, so that's a good place to raise those things.
Song: I agree.
ChrisN: Thank you for sharing this, good to see the progress.
<cpn> scribenick: cpn
Kaz: Is there any issue with the
rights in the pictures used (figure 10, 11, etc)? We should use
content right free pictures.
... When Angel Li from Alibaba gave a presentation on this at AC
meeting in Quebec in April, I mentioned Niconico's work from Japan,
also EmotionML for additional emotion related information.
... This is a mixture of a timed text format, with positional and
emotion information. It's a kind of extension to WebVTT.
... I agree with Chris, to talk to TTWG, but also look at
EmotionML, JSON-LD and other semantic notation.
<kaz> scribenick: kaz
Song: I understand your feedback.
We'll speak with the TTWG members, to see if we can make it a
subset of WebVTT or a new API.
... We'll clarify the copyrights. Thanks for the reminder.
ChrisN: Thank you. I look forward to
continuing this conversation.
... Are you planning a meeting at TPAC for this?
(Discussion of meeting possibilities at TPAC: break out session, meeting with TTWG)
ChrisN: I am currently working with
the IG co-chairs on the agenda for the F2F meeting at TPAC.
... The timing of TPAC isn't ideal, due to overlap with IBC, so not
everyone can come.
<cpn> https://www.w3.org/2011/webtv/wiki/Face_to_face_meeting_during_TPAC_2019
ChrisN: We want to use some time in
the afternoon for open discussion on future directions for media on
the web.
... Look ahead, think about use cases and requirements for future
capabilities.
... If you are coming to TPAC, please take a look at the agenda.
This is our meeting, so please let me know if you have
suggestions.
Kaz: September 3?
ChrisN: Yes, again please let us know
about topics.
... Chris and Becca, thank you for your presentations!
[adjourned]