27249 – Correct the “All or Nothing” Approach Currently Implemented For HTMLMediaElement / MSE

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 27249 - Correct the “All or Nothing” Approach Currently Implemented For HTMLMediaElement / MSE

Summary: Correct the “All or Nothing” Approach Currently Implemented For HTMLMediaElem...

Status:	RESOLVED WONTFIX

Alias:	None

Product:	HTML WG
Classification:	Unclassified
Component:	HTML5 spec (show other bugs)
Version:	unspecified
Hardware:	All All

Importance:	P2 enhancement
Target Milestone:	---
Assignee:	This bug has no owner yet - up for the taking
QA Contact:	HTML WG Bugzilla archive list

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2014-11-05 14:39 UTC by Karan Lyons
Modified:	2016-04-25 17:06 UTC (History)
CC List:	8 users (show)

See Also:

Attachments

Description Karan Lyons 2014-11-05 14:39:35 UTC

(Apologies for the length and possibly poor nomenclature of the below.)

Currently if you use an HTMLMediaElement:
1) The browser automatically determines the file types / codecs of the sources and chooses the “best” source to play.
2) The browser automatically parses the metadata of the chosen source in order to enable abilities like seeking, get the duration, etc.
3) The browser implements some version of source buffering, but offers you know control over it. Not even at the very least starting and stopping buffering, and at most, even doing things like setting how much to buffer ahead, etc.
4) The browser exposes *none* of this known metadata (file type, codec, file structure for seeking, bitrate, etc.) with the exception of the height, width, and duration.

If you use MSE:
1) The browser expects you to know ahead of time if the data you’re putting into a SourceBuffer is decodable by the browser.
2) The browser expects you to know the metadata associated with the source in order to set the duration and implement seeking *yourself*.
3) The browser implements no buffering.

Those are your only two options. If you want to be able to do something as simple as starting or stopping buffering, you need to know all the associated metadata for your file (file type, codec, stco atom in the case of QT containers, height, width, duration, perhaps more). Your options are either to use a prepared manifest of all that required data, which is only possible if you know what content you’re to play ahead of time, or determine all that information dynamically, for which as a developer you’re now forced to use far less performant javascript, *despite the fact that the browser already implements fast compiled functions to do all this work*.

This “all or nothing” approach permeates both specs. As one example, HTMLMediaElement will tell you if it can play a source with canPlayType(), which requires a string of the format “filetype; codecs”. However it provides no way to *get* the format of a file, despite the fact that it can quite clearly determine this information (how else does it find the “best” source to play?).

Another example is that readyState change events, though fired based on buffer data, contain no snapshot of the TimeRanges buffered, despite the fact that those TimeRanges can and will change in time between the event being triggered and an event listener running and iterating through HTMLMediaElement.buffered.

MSE seems to have less of these problems, though an obvious one is the requirement for a type string with addSourceBuffer(), despite—again—no way to *get* the type of a buffer/file.

This honestly seems fairly difficult to fix since it’s a bug in the thinking behind the implementations and not just some aspect of the implementations themselves. Moreover, it requires more to be changed in the HTMLMediaElement spec than MSE’s. But I do think it’s very important that these shortcomings are addressed.

At the very least allowing greater control over the buffer (whereby “greater” I just mean the ability to start and stop the buffer, and *perhaps* control its length) would allow most developers to stick with HTMLMediaElement as opposed to being forced to use the more complex MSE.

At the most there needs to be a far greater degree of transparency in all the work the browser does regarding media. Metadata, file type, codec information, etc. should *all* be exposed to the developer, along with functions for determining them from an arbitrary SourceBuffer/ArrayBuffer/Blob/etc. The aim should be to ensure that the handling of media can continue to be dynamic (e.g. no manifests) and performant (e.g. no javascript based container parsing to get metadata) regardless of where along the continuum of HTMLMediaElement <-> MSE the developer chooses to reside.

Comment 1 Silvia Pfeiffer 2014-11-06 09:33:41 UTC

>     1) The browser automatically determines the file types / codecs of the
> sources and chooses the “best” source to play.

That's not how it works. Out of a list of <source> elements, the browser picks the *first* one it can play, even if it's not the best one.

> At the very least allowing greater control over the buffer (whereby
> “greater” I just mean the ability to start and stop the buffer, and
> *perhaps* control its length) would allow most developers to stick with
> HTMLMediaElement as opposed to being forced to use the more complex MSE.

We've had discussions about exposing the ability to control buffering before. That's how the @preload attribute was created. It was, however, always deemed that the browser has more in-depth knowledge about the CPU state, the network state, the memory situation and similar information than the Web developer. Thus, browsers should be in a better position to decide about how much to buffer.

> At the most there needs to be a far greater degree of transparency in all
> the work the browser does regarding media. Metadata, file type, codec
> information, etc. should *all* be exposed to the developer, along with
> functions for determining them from an arbitrary
> SourceBuffer/ArrayBuffer/Blob/etc. The aim should be to ensure that the
> handling of media can continue to be dynamic (e.g. no manifests) and
> performant (e.g. no javascript based container parsing to get metadata)
> regardless of where along the continuum of HTMLMediaElement <-> MSE the
> developer chooses to reside.

There has been a desire to expose more metadata about the selected resource in the past. The question about metadata is always: which to pick. A whole W3C Working Group has discussed this issue and come up with a spec: http://www.w3.org/2008/WebVideo/Annotations/ . However, I am not aware of any efforts of browsers to implement these.

Comment 2 Karan Lyons 2014-11-06 18:12:09 UTC

Yes, it picks the first source it can play. I couldn’t find that word so settled for “best” in quotes. Sorry for the confusion there. The browser still is doing some amount of work to determine the file type and codecs of the various sources, but this information isn’t exposed to the developer (nor is there any paired function to go with canPlayType, something like getMediaType(URI)).

The preload attribute is useful, but doesn’t help in this case. I agree that theoretically the browser should be in the best position to determine how much to buffer, but removing control entirely from the developer drastically limits their options within HTMLMediaElement (and right now, as far as I can tell, the only way to mimic HTMLMediaElement’s streaming of data is with XHR hacks to stream the binary data as text, convert it back to an ArrayBuffer on progress events, and then appendSource() it, and after all those machinations there does not *appear* to be a way to then remove some amount of the data, (it might be covered by appendWindow, I’m still not certain.)). So there are definitely use cases that aren’t currently covered by either HTMLMediaElement nor MSE, but could easily be if either implementation opened up just a *bit* more control to the developer. I understand that the question of where to draw that line is a difficult one, and though I can’t give a definitive answer, I at least can say that it needs to be moved a tad.

At the very least, regarding metadata, anything the browser is expected to use in order to setup and run playback should also be available to the developer. The first obvious one to me would be file type and codec information. This doesn’t have to change in HTMLMediaElement, but in MSE at least there shouldn’t be a need for manifests and the predetermining of media information when we’ve proven with HTMLMediaElement that this is stuff that browser can handle. There should be methods available in order to leverage what the browser can already do.

One shouldn’t have to write a streaming mp4/webm parser in javascript (I truly can’t think of another way, and I’m not trying to be facetious. I looked through the specs, tried everything I could think of, and then wrote a partial mp4 parser to process streaming data from XHR with an overridden MIME type.) in order to get codec information for MediaSource.addSourceBuffer() when it’s clear the browser already has some way of getting this information or handling bad file types/codecs.

Comment 3 Philip Jägenstedt 2014-11-14 13:34:08 UTC

What's the use case for exposing container and codec formats for plain HTTP playback? Other metadata like the usual ID3 tags in MP3 I can see, but the codec information seems like a nice-to-have with no obvious use case.

As for control over the buffering, what problems are you trying to solve? Bandwidth waste? The built-in heuristics being to pessimistic and ending up buffering? Something else?

Comment 4 Karan Lyons 2014-11-14 19:18:28 UTC

For codec information there’s the desire for API symmetry, and more practically MSE SourceBuffers require that you specify the filetype and codec information. Given that this is determinable by the browser, the ability to get that information should be exposed to the developer. It also can be useful to have in plain HTMLMediaElement scenarios, for example: Given a list of <source>s with varying quality and multiple filetypes (say HD and SD versions of audio/video in ogg and mp4), allow the user to switch between the two quality versions, supplying them with the correct filetype/codec based on what their browser can play. Requiring the end user or developer to supply codec information as an attribute on the source is less than ideal, again given that the browser is fully capable of working this out itself. And guessing based on file extension (the browser is currently playing ".mp4”, so the other mp4s *probably* work), is again not ideal because as unlikely as that is to be incorrect, it’s still a guess.

As for the buffer scenario, the goal is to account for waste in various scenarios. A good example (and one very similar to my own) is handling media playback when the user’s actions influence the amount of media you want to play post initialization. Even being able to do some form of media fragment URI like functionality would suffice (but *without* having to reload the source, which would require a momentary media dropout, and is thus a bad user experience). This can be useful for things like preroll ads that aren’t required to be watched the whole way through, or switching source qualities on the fly without requiring the user to experience a dropout (controlling the buffer is important here as buffering two sources makes it harder to move up to a higher quality. Going full MSE to solve this problem is also not a good solution at this time, again because it requires up front knowledge about filetypes/codecs, something you can’t provide if you don’t control the sources, and that—again—the browser is fully capable of figuring out itself), and other cases.

There is also certainly the matter of browser heuristics for buffering varying from vendor to vendor, and while it would certainly be great to have full control over buffering (for example, setting the “window” as a TimeRange of data that you’d like to be buffered, which can be updated at any time), I can certainly understand some of the argument against it. Even still, it’d provide some very helpful functionality for many developers looking to do more complex work with HTML5 media.

Comment 5 Silvia Pfeiffer 2014-11-14 21:03:40 UTC

Some browsers support MPEG dash in the video element. I think it solves some of your problems.

Comment 6 Arron Eicholz 2016-04-25 17:06:24 UTC

HTML5.1 Bugzilla Bug Triage: Incubation needed

This bug constitutes a request for a new feature of HTML. The current guidelines [1], rather than track such requests as bugs or issues, please create a proposal outlining the desired behavior, or at least a sketch of what is wanted (much of which is probably contained in this bug), and start the discussion/proposal in the WICG [2]. As your idea gains interest and momentum, it may be brought back into HTML through the Intent to Migrate process [3].
[1] https://github.com/w3c/html#contributing-to-this-repository
[2] https://www.w3.org/community/wicg/
[3] https://wicg.github.io/admin/intent-to-migrate.html