Wide review of Media Capture and Streams

The WebRTC and Device APIs Working Group requested wide review of the Media Capture and Streams document as part of the process-2005 Last Call review process in April 2015.

Key:

No formal objection have been raised on the specification.

Input from W3C Groups

CommentorCommentWorking Group decisionCommentor reply
LC-3027 Phillips, Addison <addison@lab126.com> on behalf of Internationalization Working Group (archived comment)
https://www.w3.org/International/track/issues/465

http://www.w3.org/TR/2015/WD-mediacapture-streams-20150414/#widl-MediaStreamError-message

The 'message' value of MediaStreamError has no associated language or base direction information. Multiple 'message' values in different languages are not provided for. Since the message is generated by the user-agent, this may already be appropriately localized, but what about remote or hosted cases? And having the language and direction metadata may be useful when presenting the string in JS alerts or in an HTML context.

Addison Phillips
Globalization Architect (Amazon Lab126)
Chair (W3C I18N WG)
We recognize that the internationalization of error messages is a huge unsolved problem for the Web platform.

These error messages are primarily intended for debugging and not for display in the user interface.

Should error messages need better internationalization support, appropriate tooling needs to be added at the EcmaScript or WebIDL level, at which time the WG will happily consider making use of such tools.

The Working Group doesn't plan on making change based on this comment.
no: can understand why you're not changing the design and will not formally object later
LC-3026 Phillips, Addison <addison@lab126.com> on behalf of Internationalization Working Group (archived comment)
The first issue is I18N-ISSUE-464:

http://www.w3.org/TR/2015/WD-mediacapture-streams-20150414/#widl-MediaStreamTrack-label

The 'label' value is described as follows:

--
User Agents MAY label audio and video sources (e.g., "Internal microphone" or "External USB Webcam"). The MediaStreamTrack.label attribute MUST return the label of the object's corresponding source, if any. If the corresponding source has or had no label, the attribute MUST instead return the empty string.
--

Since the value is intended to contain natural language text, probably for consumption/display to the end-user, maybe it should be possible to determine or set the language (@lang) and base direction (@dir) of the text. This will allow the text to be displayed properly in different contexts: most text rendering systems depend on this information to do a good job, even within the same display context.

In addition, it may be useful to allow multiple labels in different languages (although generally the source's label is applied by the user's user-agent, and so will be appropriately localized??)

[1] https://www.w3.org/International/track/products/78

Addison Phillips
Globalization Architect (Amazon Lab126)
Chair (W3C I18N WG)
The Working Group feels the topic of localizing human readable strings in JavaScript needs to be solved at the platform level rather than in this particular specification.

We tried getting input and insights from the Internationalization Working Group on how to progress such an addition to the platform, to no avail.
LC-3023 Joe Berkovitz <joe@noteflight.com> on behalf of Web Audio Working Group (archived comment)
The Web Audio WG so far has identified one key item that we would like to
see addressed. The MediaDeviceInfo result from enumerateDevices() (
http://www.w3.org/TR/2015/WD-mediacapture-streams-20150414/#idl-def-MediaDeviceInfo)
lacks information that is typically available in the underlying OS
implementations that we think would be very helpful for implementations:



· Channel count and configuration (Mono, Stereo, 5.1, 7.1, etc…)

· Physical Output (Headphone, Speaker, HDMI, …)

· Latency (this matters a lot for gaming -- it will be very low for
on-board hardware, perhaps quite high for wireless audio bridging like
Apple TV)

· Output capabilities (bitstream passthrough vs PCM – relevant in
digital media adapter cases (Chromecast, etc))


It is perhaps sufficient from a user interface point of view to have a
string to display, but for a program to be able to either adapt to the user
selection or to guide and default the user selection, the above are pretty
important characteristics, at least in some use cases. Many if not most of
the host OSes that user agents run on expose these sorts of output device
characteristics.


Aside from the difficulty with enumerating devices, there is also perhaps a
need to make it possible for applications to query the set of available
devices with respect to the above charateristics. MediaTrackConstraints and
MediaTrackSettings do not currently include constraint attributes that map
to items in the above list. And even if they do, arriving at a
practical goodness-of-fit
metric that can be generalized across a spectrum of audio apps may be
difficult.



The same concerns apply to the set of input devices.
After discussions in the TF and with the commenter, two changes have been implemented:
- Add a channel count for input devices (https://github.com/w3c/mediacapture-main/pull/210)
- Add capability to discover input device capabilities via enumerateDevices (https://github.com/w3c/mediacapture-main/pull/211)

Regarding output devices and capabilities, this is more in scope for the Audio Output Devices API document (https://github.com/w3c/mediacapture-output) and should be brought as comment to that document or alternatively as a proposed new document outlining a suitable API.
yes
GH-212 Yan Zhu (on behalf of the TAG) The W3CTAG is asking new specs to include a section with answers to the questions at https://w3ctag.github.io/security-questionnaire/. For an example, see w3ctag/security-questionnaire#1. If possible, I'd encourage expanding the Privacy and Security Considerations section to include answers to all of the questions in the questionnaire. (Or add an appendix with the answers.) We completed our privacy & security consideration section based on the answers we developed on the questionnaire No formal sign-off from the TAG
GH-249 Nick Doty on behalf of the Privacy Interest Group
You've heard from the TAG already about whether use of the API ever makes sense in unprivileged contexts. That is, when the user is asked for permission to access their camera, do they understand that they're granting this permission to all network attackers as well as the site they think they're talking to? I suspect this PING email thread is not going to change your minds about that already discussed topic. However, it would be worthwhile to note this security threat in the security considerations section and to note for user agent implementers the difficulty for this permission prompt
We documented the trade-off in the security & privacy section yes
GH-250 Nick Doty on behalf of the Privacy Interest Group
Best Practice 2 is in a section entitled "Implementation Suggestions", but contains a normative MUST statement. If this is an interoperability requirement and MUST is defined as in 2119, then I think "suggestions" (and indeed, "best practice") is probably incorrect terminology.
We clarified that the requirement is imposed by a separate spec (RTCWEB-SECURITY-ARCH) yes
GH-251 Nick Doty on behalf of the Privacy Interest Group
can we mark the fingerprintability of the device enumeration section?
We added fingerprinting marks to all sections where potential fingerprinting can emerge yes
GH-252 Nick Doty on behalf of the Privacy Interest Group

To say that such an identifier MUST persist across browsing sessions is a guarantee that the requirement won't be satisfied. Many users, for example, configure their browsers to delete all cookies on closing the browser. How about:

"Identifiers MAY be persisted across browsing sessions. Persistent identifiers let the application save, identify the availability of, and directly request specific sources."

Any site that assumes that identifiers will persist will set themselves up for failure (for example, when the user clears cookies); the spec should not encourage that false assurance.

We added text to clarify that persistent identifiers (including device ids) are to be cleared with cookies, and that the identifiers don't persist unless device access has been granted.
Cool. It sounds like that would address the concern about access to persistent deviceIds prior to a permission grant.
Nick Doty on behalf of the Privacy Interest Group
The browser must provide mechanisms for users to revise and even completely revoke consent to use device resources such as camera and microphone
We reworked our permission system to be based on the Permission API, where revokation is addressed no response
GH-267 and GH-268 Nick Doty on behalf of the Privacy Interest Group
Permissions for getUserMedia seem to be specific to entry script origin. Is this what users will expect? For example, if I grant and persist permission to callmyfriends.com to use their service and later I browse to example.com which has an embedded iframe of callmyfriends.com , will users be shocked to see their camera turn on and a picture of themselves on the screen? Permission breadth may be a flexible option for the user agent ("Optionally, e.g., based on a previously-established user preference, for security reasons"), but it might be useful for the spec to establish some expectations here. Top-level origin/embedded origin pairs, for example, might be a useful model, as in some implementations of Geolocation.
We revised our permission model to be double-keyed by the top-level origin and the entry-script origin; furthermore, iframes will have to be expliticitly be allowed to use getUserMedia, via a new allowusermedia attribute yes
CSP as a signal for permission persistence, Nick Doty on behalf of the Privacy Interest Group
It might make most sense for browsers just to check at the time that a permission is first granted whether a relevant CSP is present and use that as a signal in determining whether to persist the permission grant.
Tying CSP to the persistence of permissions would add significant complexity in the permission management (tracking evolution of CSP policies for a given site over time) and require different interpretation of CSP depending on whether they come from HTTP headers vs embedded market. Yet that change would reduce only very narrowly a potential attack surface. The Working Group decided against that trade-off. Such a change would be better considered by the WebAppSec Working Group. yes
Simultaneous hardware events across origins, Nick Doty on behalf of the Privacy Interest Group
firing a devicechange event simultaneously in different browsing contexts (including tabs or iframes not in the foreground, or in different browsers altogether, that have not asked for any permissions) creates a risk of unexpected correlation of browsing activity
We restricted the devicechange event firing to Web apps with permissions to list devices no response

Input from Individuals

CommentorCommentWorking Group decisionCommentor reply
LC-3010 Elliott Sprehn <esprehn@chromium.org> (archived comment)
== enumerateDevices should be getAll() to match other APIs ==
```mediaDevices.getAll()``` is pretty clear and matches other APIs
like the Cache API in SW.
The Working Group does not contemplate any change based on this comment. No response from commenter
LC-3020 Kuu Miyazaki <miyazaqui@gmail.com> (archived comment)
Two kinds of MediaStreamTracks, 'audio' and 'video' are defined in the spec.
But shouldn't we add another kind, 'video+audio (muxed)'?
I thought there might be a platform that doesn't support separate
sources for audio and video but only supports an encoded/muxed stream
as a source.
In such case, it would be hard for User Agent to implement for
instance removeTrack.
This has not been discussed in the Task Force, but it seems that the relevant cases can be handled with the current set of APIs.

No change in response to this comment is contemplated.
No response from commenter
LC-3009 Garrett Smith <dhtmlkitchen@gmail.com> (archived comment)
How can you apply a filter chain for video capture to a VIDEO element's src?
This document does not aim to solve the problem of defining filter chains for video. A later project can build on this basis, but the Working Group feels that it is out of scope for the 1.0 version of this document.

No change in response to this comment is contemplated.
yes
LC-3015 Anne van Kesteren <annevk@annevk.nl> (archived comment)
== MediaStreamTrackEvent's track member is not nullable ==
Either you need to make this nullable or you need to require the
dictionary argument and not give that a default value of null.

See https://github.com/w3c/mediacapture-main/issues/160
The Working Group has considered this comment, and has made the corresponding change in its Working Document https://github.com/w3c/mediacapture-main/commit/8d330d290d8318c57628a1f9c6f275fb58a86cc8 yes
LC-3016 Anne van Kesteren <annevk@annevk.nl> (archived comment)
== Remove "Direct Assignment to Media Elements" ==
It conflicts with the definition given in the HTML Standard, which
also allows for setting a `Blob` object and such. Given that it's
integrated there, providing a pointer seems better.

See https://github.com/w3c/mediacapture-main/issues/161
The Working Group agrees the specification needs to defer to the existing description of assignments made in the HTML specification and had modified the specification accordingly.

The Media Capture and Streams spec keeps the parts that are not yet reflected in the HTML specification (as reported in https://www.w3.org/Bugs/Public/show_bug.cgi?id=28785 ).
yes
LC-3017 Anne van Kesteren <annevk@annevk.nl> (archived comment)
== MediaStreamError should not be an interface ==
Errors in the platform are represented by JavaScript `Error` object
subclasses.

See https://github.com/w3c/mediacapture-main/issues/162
We believe we have addressed this issue through a revisition of the error handling in the spec - most importantly PR #194, which added the "overconstrained error".

We believe this change, and associated other changes, together resolve the issue.
yes
LC-3018 Anne van Kesteren <annevk@annevk.nl> (archived comment)
== Please use [Exposed] ==
That way it is much clearer what is exposed to Window and/or Worker.

See https://github.com/w3c/mediacapture-main/issues/163
The Working Group discussion concluded that explicit marking with [Exposed] is valuable, but did not find any support for adding exposure to workers at this time; thus, the Working Group decided adding [exposed=Window] to all relevant WebIDL constructs and has reflected this in its working document as illustrated in https://github.com/w3c/mediacapture-main/commit/6a479c794deeaf1bba40d87ae1299827cfa79773#diff-ea76d38900f79cfae8f60e5f7cf16dd1 yes
LC-3022 Charlie Kehoe <ckehoe@google.com> (archived comment)
Some applications involve listening to audio for a potentially extended
period of time (with user consent, of course), and are not particularly
latency-sensitive. An example would be the "Ok Google" hotwording available
on the Chrome new tab page, or other types of continuous speech
recognition. For these applications, a typical low-latency audio
configuration can lead to excessive power usage. I've measured 20% CPU
usage for audio capture in Chrome, for example.

My proposed solution is to offer a way to change the audio buffer size.
This enables a tradeoff between latency and power usage. For example, a
member could be added to MediaTrackConstraintSet
<http://w3c.github.io/mediacapture-main/getusermedia.html#dictionary-mediatrackconstraintset-members>
:

dictionary MediaTrackConstraintSet {
...
audioBufferDurationMs of type ConstrainLong
};

This would be an integer number of milliseconds. Perhaps the name could
mention latency instead (e.g. audioLatencyMs).

How does this simple change sound?

- Charlie
The Working Group agreed to add a way for the application to control the audio buffer size by means of a new MediaStreamTrack constraint to represent latency. yes
LC-3025 Iñaki Baz Castillo <ibc@aliax.net> (archived comment)
Hi,

The current draft states that both onaddtrack and onremovetrack events
"are not fired when the script directly modifies the tracks of
aMediaStream".

I don't like that. When I call close on a WebSocket I get the onclose
event. Events indicate that something happened regardless who or what
caused it.

I see no reason at all to just fire those events due to a track
modification made by the script in a MediaStream.
A note was added to the document that clarifies that the addtrack and removetrack events are defined to be used by other specs that use the MediaStream API and need to notify the script that the User Agent has updated a MediaStream's track set "from the background".

[1] https://github.com/w3c/mediacapture-main/commit/13ad8737791455ffae8f9f91c018d8aa896ca379
yes
LC-3021 Justin Uberti <juberti@google.com> (archived comment)
In Sections 4.3.6 and 4.3.7, the various parameters that can be specified
are declared, but there is no text that defines their exact meaning. Are
they defined somewhere else, or did I just miss them?

e.g.
http://w3c.github.io/mediacapture-main/getusermedia.html#media-track-constraints
4.3.7.2 Dictionary MediaTrackConstraintSet
<http://w3c.github.io/mediacapture-main/getusermedia.html#idl-def-MediaTrackConstraintSet>
MembersaspectRatio of type ConstrainDouble
<http://w3c.github.io/mediacapture-main/getusermedia.html#idl-def-ConstrainDouble>
deviceId of type ConstrainDOMString
<http://w3c.github.io/mediacapture-main/getusermedia.html#idl-def-ConstrainDOMString>
echoCancellation of type ConstrainBoolean
<http://w3c.github.io/mediacapture-main/getusermedia.html#idl-def-ConstrainBoolean>
facingMode of type ConstrainDOMString
<http://w3c.github.io/mediacapture-main/getusermedia.html#idl-def-ConstrainDOMString>
frameRate of type ConstrainDouble
<http://w3c.github.io/mediacapture-main/getusermedia.html#idl-def-ConstrainDouble>
groupId of type ConstrainDOMString
<http://w3c.github.io/mediacapture-main/getusermedia.html#idl-def-ConstrainDOMString>
height of type ConstrainLong
<http://w3c.github.io/mediacapture-main/getusermedia.html#idl-def-ConstrainLong>
sampleRate of type ConstrainLong
<http://w3c.github.io/mediacapture-main/getusermedia.html#idl-def-ConstrainLong>
sampleSize of type ConstrainLong
<http://w3c.github.io/mediacapture-main/getusermedia.html#idl-def-ConstrainLong>
volume of type ConstrainDouble
<http://w3c.github.io/mediacapture-main/getusermedia.html#idl-def-ConstrainDouble>
width of type ConstrainLong
<http://w3c.github.io/mediacapture-main/getusermedia.html#idl-def-ConstrainLong>
The parameters are defined in section 4.3.9 of the document. yes
LC-3024 Martin Thomson <martin.thomson@gmail.com> (archived comment)
On 20 April 2015 at 05:40, Harald Alvestrand <harald@alvestrand.no> wrote:
> There's more text on what they mean in section 14.1, "Track
> Constrainable Property Registration".


Can we remove the registry? Is there any reason that we can't simply
maintain the document with the definitions of the things we are using?
The reference to the registry has been removed from the document. Recorded here: https://lists.w3.org/Archives/Public/public-media-capture/2015Oct/0018.html yes
LC-3011 Anne van Kesteren <annevk@annevk.nl> (archived comment)
== Language does not seem tamper-free ==
E.g. if I overwrite `MediaStream.prototype.addTrack` does that affect
`MediaStream`'s constructor? I think both `addTrack()` and the
constructor are meant to invoke a "private" algorithm. This happens
throughout the specification.

See https://github.com/w3c/mediacapture-main/issues/158
The WG agreed to use private algorithms instead of the exposed API to reference the intended functionality as recommended by reporter of this comment.

PR #167 fixes this issue and a similar issue in the MediaStream.clone() method.

[1] https://github.com/w3c/mediacapture-main/pull/167
yes
LC-3012 Anne van Kesteren <annevk@annevk.nl> (archived comment)
== MediaStream's active attribute ==
The link from the constructor to "active" indicates a `MediaStream`
state of active/inactive. However, the prose doesn't reference to this
as a state but rather as something that is a boolean. This is rather
confusing.

See https://github.com/w3c/mediacapture-main/issues/159
The Working Group agreed that the the value for the active state should not be calculated in the MediaStream constructor, but instead be defined by the stream's track set.

PR #168 [1] removed the concerned text.

[1] https://github.com/w3c/mediacapture-main/pull/168
yes
LC-3014 Harald Alvestrand <harald@alvestrand.no> (archived comment)
In the current specification, we have two concepts related to sources
and tracks:

- A track can be stop()ed, in which case it is ended.
- A track can be detached from its source.

The text says:

A) in terminology for "source", we have:

Sources are detached from a track when the track is ended for any reason.

B) Under "Life-cycle and Media Flow", we have:

A MediaStreamTrack can be detached from its source. It means that the
track is no longer dependent on the source for media data. If no other
MediaStreamTrack is using the same source, the source will be stopped.
MediaStreamTrack attributes such as kind and label must not change
values when the source is detached.

C) Under the "enabled" attribute of a track, we have:

On getting, the attribute must return the value to which it was last
set. On setting, it must be set to the new value, regardless of whether
the MediaStreamTrack object has been detached from its source or not.

Under the "stop" function for a track, we have:

3. Set track's readyState attribute to ended.

4. Detach track's source.

It seems to me that this is one concept more than we need.
Whether there is a relationship between a stopped track and its source
or not is an implementation detail, and we shouldn't be constraining it
in our API description.

So my suggestion:

In A, C and D, simply remove the text that refers to "Detach".

In B, instead say:

If all MediaStreamTracks that are using the same source are ended, the
source will be stopped.

I think that simplifies the terminology, and doesn't change any
observable property of the API.

What do people think?
(If others like it, I'll file a bug for it.)

Harald
The Working Group agrees with the suggestion, and commit [1] removes the concept of detaching a source from its track.

[1] https://github.com/w3c/mediacapture-main/commit/8a0561644d0f7d922ccf15f8dd3e7bb725b6163f
yes
LC-3013 Nigel Megitt <nigel.megitt@bbc.co.uk> (archived comment)
Does this work include the capture of related streams of media and/or data
with common timing references, such as captions or subtitles?
This document does not deal with streams of data related to the media. Such functions could be contemplated as further work once this document is done, but the Working Group does not suggest adding this functionality at this time.

The specification has been updated to clarify its scope (captured audio and video media) and to give more directions on how it can be extended to handle new type of synchronized data.
yes
Hardware fingerprinting mitigation 312, Kolanich
Maybe we should prescribe browser vendors to mitigate the identified possibilities of hardware fingerprinting? For example detect dead pixels (and any other static defects) on webcams, add some fake dead pixels, discard and interpolate them.
The Working Group doesn't plan on addressing further fingerprinting mitigations, where each browser can innovate. Should such innovation lead to clearly accepted mitigation practices, the group can revisit this in a later iteration of the document No response
GH-311, Kolanich

Replace device enumeration API with API returning a secure widget

the API should return a secure DOM node (see this discussion about how such elements should behave, btw we need a standard for this), which allows a user to select a source but disallow the webpage to know all the properties of the sources it is possible to prevent a webpage from knowing.

This will allow a user to select the device but won't allow a webpage to get information about hardware/environment directly.

There is currently no support for such a secure DOM node, and the commentor didn't bring new information as to why the very limited amount of information that enumerateDevices() prior to authorization justify such a change. No reply
GH-304 jnoring
gUM firing repeatedly for the same page load
(i.e. getUserMedia allows an implementation to prompt for permission too often)
The spec leaves the particular prompting strategy to the implementations; the fact that some implementations can lead to poor UX is not something the spec should try banning No reply
GH-303 jnoring
getUserMedia: spec does not define what happens when browser gUM dialog disappears completely
This is linked to a specific implementation issue, not a spec issue No reply
GH-299 jnoring
Unclear how to associate current active device with enumerated devices
The spec already addresses how to do this, but implementations haven't caught up with it yet Accepted
GH-236 Tobie Langel
API to expose the angle of view of a camera
Additional constraints should be done as extensions to the main spec at this point; we clarified how to write and design such extensions. Media Capture Depth Stream defines related constraints No reply
GH-202 Jeffrey Yasskin
https://w3c.github.io/mediacapture-main/#idl-def-MediaDeviceInfo defines the groupId field to identify a physical device. http://webaudio.github.io/web-midi-api/#idl-def-MIDIPort defines the id field to identify the device. https://webbluetoothcg.github.io/web-bluetooth/#bluetoothdevice defines the instanceId field to identify a device. A single physical device could appear in more than one of these APIs, and it'd be nice to let web pages figure out that it's a single device, to the extent the browser knows.
These device ids aren't logically equivalent at this point, but further coordination on the topic has been raised as a TAG Issue OK in issue
GH-196, Domenic Denicola Mis-use of public algorithms in internal ones Fixed No other similar error spotted
GH-192, Owen Campbell-Moore
UAs should allow clients to control the focusing of the device's camera.
This should be developed as part of an extension; we improved the documentation on how to build such extensions no reply
GH-191, Owen Campbell-Moore
UAs should allow clients to specify, and dynamically modify a zoom attribute.
This should be developed as part of an extension; we improved the documentation on how to build such extensions no reply
GH-127 Steven Sokol
Lack of timeout / cancellation leads to UI inconsistencies
getUserMedia relies on promises; when promises get cancelable, this will be handled, but it doesn't make sense for this group to define an ad-hoc mechanism for this spec alone no reply