Wide review of Media Capture and Streams

Input from W3C Groups

Commentor	Comment	Working Group decision	Commentor reply
LC-3027 Phillips, Addison `<addison@lab126.com>` on behalf of Internationalization Working Group (archived comment)	https://www.w3.org/International/track/issues/465 http://www.w3.org/TR/2015/WD-mediacapture-streams-20150414/#widl-MediaStreamError-message The 'message' value of MediaStreamError has no associated language or base direction information. Multiple 'message' values in different languages are not provided for. Since the message is generated by the user-agent, this may already be appropriately localized, but what about remote or hosted cases? And having the language and direction metadata may be useful when presenting the string in JS alerts or in an HTML context. Addison Phillips Globalization Architect (Amazon Lab126) Chair (W3C I18N WG)	We recognize that the internationalization of error messages is a huge unsolved problem for the Web platform. These error messages are primarily intended for debugging and not for display in the user interface. Should error messages need better internationalization support, appropriate tooling needs to be added at the EcmaScript or WebIDL level, at which time the WG will happily consider making use of such tools. The Working Group doesn't plan on making change based on this comment.	no: can understand why you're not changing the design and will not formally object later
LC-3026 Phillips, Addison `<addison@lab126.com>` on behalf of Internationalization Working Group (archived comment)	The first issue is I18N-ISSUE-464: http://www.w3.org/TR/2015/WD-mediacapture-streams-20150414/#widl-MediaStreamTrack-label The 'label' value is described as follows: -- User Agents MAY label audio and video sources (e.g., "Internal microphone" or "External USB Webcam"). The MediaStreamTrack.label attribute MUST return the label of the object's corresponding source, if any. If the corresponding source has or had no label, the attribute MUST instead return the empty string. -- Since the value is intended to contain natural language text, probably for consumption/display to the end-user, maybe it should be possible to determine or set the language (@lang) and base direction (@dir) of the text. This will allow the text to be displayed properly in different contexts: most text rendering systems depend on this information to do a good job, even within the same display context. In addition, it may be useful to allow multiple labels in different languages (although generally the source's label is applied by the user's user-agent, and so will be appropriately localized??) [1] https://www.w3.org/International/track/products/78 Addison Phillips Globalization Architect (Amazon Lab126) Chair (W3C I18N WG)	The Working Group feels the topic of localizing human readable strings in JavaScript needs to be solved at the platform level rather than in this particular specification.	We tried getting input and insights from the Internationalization Working Group on how to progress such an addition to the platform, to no avail.
LC-3023 Joe Berkovitz `<joe@noteflight.com>` on behalf of Web Audio Working Group (archived comment)	The Web Audio WG so far has identified one key item that we would like to see addressed. The MediaDeviceInfo result from enumerateDevices() ( http://www.w3.org/TR/2015/WD-mediacapture-streams-20150414/#idl-def-MediaDeviceInfo) lacks information that is typically available in the underlying OS implementations that we think would be very helpful for implementations: · Channel count and configuration (Mono, Stereo, 5.1, 7.1, etc…) · Physical Output (Headphone, Speaker, HDMI, …) · Latency (this matters a lot for gaming -- it will be very low for on-board hardware, perhaps quite high for wireless audio bridging like Apple TV) · Output capabilities (bitstream passthrough vs PCM – relevant in digital media adapter cases (Chromecast, etc)) It is perhaps sufficient from a user interface point of view to have a string to display, but for a program to be able to either adapt to the user selection or to guide and default the user selection, the above are pretty important characteristics, at least in some use cases. Many if not most of the host OSes that user agents run on expose these sorts of output device characteristics. Aside from the difficulty with enumerating devices, there is also perhaps a need to make it possible for applications to query the set of available devices with respect to the above charateristics. MediaTrackConstraints and MediaTrackSettings do not currently include constraint attributes that map to items in the above list. And even if they do, arriving at a practical goodness-of-fit metric that can be generalized across a spectrum of audio apps may be difficult. The same concerns apply to the set of input devices.	After discussions in the TF and with the commenter, two changes have been implemented: - Add a channel count for input devices (https://github.com/w3c/mediacapture-main/pull/210) - Add capability to discover input device capabilities via enumerateDevices (https://github.com/w3c/mediacapture-main/pull/211) Regarding output devices and capabilities, this is more in scope for the Audio Output Devices API document (https://github.com/w3c/mediacapture-output) and should be brought as comment to that document or alternatively as a proposed new document outlining a suitable API.	yes
GH-212 Yan Zhu (on behalf of the TAG)	The W3CTAG is asking new specs to include a section with answers to the questions at https://w3ctag.github.io/security-questionnaire/. For an example, see w3ctag/security-questionnaire#1. If possible, I'd encourage expanding the Privacy and Security Considerations section to include answers to all of the questions in the questionnaire. (Or add an appendix with the answers.)	We completed our privacy & security consideration section based on the answers we developed on the questionnaire	No formal sign-off from the TAG
GH-249 Nick Doty on behalf of the Privacy Interest Group	You've heard from the TAG already about whether use of the API ever makes sense in unprivileged contexts. That is, when the user is asked for permission to access their camera, do they understand that they're granting this permission to all network attackers as well as the site they think they're talking to? I suspect this PING email thread is not going to change your minds about that already discussed topic. However, it would be worthwhile to note this security threat in the security considerations section and to note for user agent implementers the difficulty for this permission prompt	We documented the trade-off in the security & privacy section	yes
GH-250 Nick Doty on behalf of the Privacy Interest Group	Best Practice 2 is in a section entitled "Implementation Suggestions", but contains a normative MUST statement. If this is an interoperability requirement and MUST is defined as in 2119, then I think "suggestions" (and indeed, "best practice") is probably incorrect terminology.	We clarified that the requirement is imposed by a separate spec (RTCWEB-SECURITY-ARCH)	yes
GH-251 Nick Doty on behalf of the Privacy Interest Group	can we mark the fingerprintability of the device enumeration section?	We added fingerprinting marks to all sections where potential fingerprinting can emerge	yes
GH-252 Nick Doty on behalf of the Privacy Interest Group	To say that such an identifier MUST persist across browsing sessions is a guarantee that the requirement won't be satisfied. Many users, for example, configure their browsers to delete all cookies on closing the browser. How about: "Identifiers MAY be persisted across browsing sessions. Persistent identifiers let the application save, identify the availability of, and directly request specific sources." Any site that assumes that identifiers will persist will set themselves up for failure (for example, when the user clears cookies); the spec should not encourage that false assurance.	We added text to clarify that persistent identifiers (including device ids) are to be cleared with cookies, and that the identifiers don't persist unless device access has been granted.	Cool. It sounds like that would address the concern about access to persistent deviceIds prior to a permission grant.
Nick Doty on behalf of the Privacy Interest Group	The browser must provide mechanisms for users to revise and even completely revoke consent to use device resources such as camera and microphone	We reworked our permission system to be based on the Permission API, where revokation is addressed	no response
GH-267 and GH-268 Nick Doty on behalf of the Privacy Interest Group	Permissions for getUserMedia seem to be specific to entry script origin. Is this what users will expect? For example, if I grant and persist permission to callmyfriends.com to use their service and later I browse to example.com which has an embedded iframe of callmyfriends.com , will users be shocked to see their camera turn on and a picture of themselves on the screen? Permission breadth may be a flexible option for the user agent ("Optionally, e.g., based on a previously-established user preference, for security reasons"), but it might be useful for the spec to establish some expectations here. Top-level origin/embedded origin pairs, for example, might be a useful model, as in some implementations of Geolocation.	We revised our permission model to be double-keyed by the top-level origin and the entry-script origin; furthermore, iframes will have to be expliticitly be allowed to use getUserMedia, via a new `allowusermedia` attribute	yes
CSP as a signal for permission persistence, Nick Doty on behalf of the Privacy Interest Group	It might make most sense for browsers just to check at the time that a permission is first granted whether a relevant CSP is present and use that as a signal in determining whether to persist the permission grant.	Tying CSP to the persistence of permissions would add significant complexity in the permission management (tracking evolution of CSP policies for a given site over time) and require different interpretation of CSP depending on whether they come from HTTP headers vs embedded market. Yet that change would reduce only very narrowly a potential attack surface. The Working Group decided against that trade-off. Such a change would be better considered by the WebAppSec Working Group.	yes
Simultaneous hardware events across origins, Nick Doty on behalf of the Privacy Interest Group	firing a devicechange event simultaneously in different browsing contexts (including tabs or iframes not in the foreground, or in different browsers altogether, that have not asked for any permissions) creates a risk of unexpected correlation of browsing activity	We restricted the `devicechange` event firing to Web apps with permissions to list devices	no response

Commentor

Comment

Working Group decision

Commentor reply

LC-3027 Phillips, Addison <addison@lab126.com> on behalf of Internationalization Working Group (archived comment)

https://www.w3.org/International/track/issues/465

http://www.w3.org/TR/2015/WD-mediacapture-streams-20150414/#widl-MediaStreamError-message

The 'message' value of MediaStreamError has no associated language or base direction information. Multiple 'message' values in different languages are not provided for. Since the message is generated by the user-agent, this may already be appropriately localized, but what about remote or hosted cases? And having the language and direction metadata may be useful when presenting the string in JS alerts or in an HTML context.

Addison Phillips
Globalization Architect (Amazon Lab126)
Chair (W3C I18N WG)

We recognize that the internationalization of error messages is a huge unsolved problem for the Web platform.

These error messages are primarily intended for debugging and not for display in the user interface.

Should error messages need better internationalization support, appropriate tooling needs to be added at the EcmaScript or WebIDL level, at which time the WG will happily consider making use of such tools.

The Working Group doesn't plan on making change based on this comment.

no: can understand why you're not changing the design and will not formally object later

LC-3026 Phillips, Addison <addison@lab126.com> on behalf of Internationalization Working Group (archived comment)

The first issue is I18N-ISSUE-464:

http://www.w3.org/TR/2015/WD-mediacapture-streams-20150414/#widl-MediaStreamTrack-label

The 'label' value is described as follows:

--
User Agents MAY label audio and video sources (e.g., "Internal microphone" or "External USB Webcam"). The MediaStreamTrack.label attribute MUST return the label of the object's corresponding source, if any. If the corresponding source has or had no label, the attribute MUST instead return the empty string.
--

Since the value is intended to contain natural language text, probably for consumption/display to the end-user, maybe it should be possible to determine or set the language (@lang) and base direction (@dir) of the text. This will allow the text to be displayed properly in different contexts: most text rendering systems depend on this information to do a good job, even within the same display context.

In addition, it may be useful to allow multiple labels in different languages (although generally the source's label is applied by the user's user-agent, and so will be appropriately localized??)

[1] https://www.w3.org/International/track/products/78

Addison Phillips
Globalization Architect (Amazon Lab126)
Chair (W3C I18N WG)

The Working Group feels the topic of localizing human readable strings in JavaScript needs to be solved at the platform level rather than in this particular specification.

We tried getting input and insights from the Internationalization Working Group on how to progress such an addition to the platform, to no avail.

LC-3023 Joe Berkovitz <joe@noteflight.com> on behalf of Web Audio Working Group (archived comment)

The Web Audio WG so far has identified one key item that we would like to
see addressed. The MediaDeviceInfo result from enumerateDevices() (
http://www.w3.org/TR/2015/WD-mediacapture-streams-20150414/#idl-def-MediaDeviceInfo)
lacks information that is typically available in the underlying OS
implementations that we think would be very helpful for implementations:

· Channel count and configuration (Mono, Stereo, 5.1, 7.1, etc…)

· Physical Output (Headphone, Speaker, HDMI, …)

· Latency (this matters a lot for gaming -- it will be very low for
on-board hardware, perhaps quite high for wireless audio bridging like
Apple TV)

· Output capabilities (bitstream passthrough vs PCM – relevant in
digital media adapter cases (Chromecast, etc))

It is perhaps sufficient from a user interface point of view to have a
string to display, but for a program to be able to either adapt to the user
selection or to guide and default the user selection, the above are pretty
important characteristics, at least in some use cases. Many if not most of
the host OSes that user agents run on expose these sorts of output device
characteristics.

Aside from the difficulty with enumerating devices, there is also perhaps a
need to make it possible for applications to query the set of available
devices with respect to the above charateristics. MediaTrackConstraints and
MediaTrackSettings do not currently include constraint attributes that map
to items in the above list. And even if they do, arriving at a
practical goodness-of-fit
metric that can be generalized across a spectrum of audio apps may be
difficult.

The same concerns apply to the set of input devices.

After discussions in the TF and with the commenter, two changes have been implemented:
- Add a channel count for input devices (https://github.com/w3c/mediacapture-main/pull/210)
- Add capability to discover input device capabilities via enumerateDevices (https://github.com/w3c/mediacapture-main/pull/211)

Regarding output devices and capabilities, this is more in scope for the Audio Output Devices API document (https://github.com/w3c/mediacapture-output) and should be brought as comment to that document or alternatively as a proposed new document outlining a suitable API.

yes

GH-212 Yan Zhu (on behalf of the TAG)

The W3CTAG is asking new specs to include a section with answers to the questions at https://w3ctag.github.io/security-questionnaire/. For an example, see w3ctag/security-questionnaire#1. If possible, I'd encourage expanding the Privacy and Security Considerations section to include answers to all of the questions in the questionnaire. (Or add an appendix with the answers.)

We completed our privacy & security consideration section based on the answers we developed on the questionnaire

No formal sign-off from the TAG

GH-249 Nick Doty on behalf of the Privacy Interest Group

You've heard from the TAG already about whether use of the API ever makes sense in unprivileged contexts. That is, when the user is asked for permission to access their camera, do they understand that they're granting this permission to all network attackers as well as the site they think they're talking to? I suspect this PING email thread is not going to change your minds about that already discussed topic. However, it would be worthwhile to note this security threat in the security considerations section and to note for user agent implementers the difficulty for this permission prompt

We documented the trade-off in the security & privacy section

yes

GH-250 Nick Doty on behalf of the Privacy Interest Group

Best Practice 2 is in a section entitled "Implementation Suggestions", but contains a normative MUST statement. If this is an interoperability requirement and MUST is defined as in 2119, then I think "suggestions" (and indeed, "best practice") is probably incorrect terminology.

We clarified that the requirement is imposed by a separate spec (RTCWEB-SECURITY-ARCH)

yes

GH-251 Nick Doty on behalf of the Privacy Interest Group

can we mark the fingerprintability of the device enumeration section?

We added fingerprinting marks to all sections where potential fingerprinting can emerge

yes

GH-252 Nick Doty on behalf of the Privacy Interest Group

To say that such an identifier MUST persist across browsing sessions is a guarantee that the requirement won't be satisfied. Many users, for example, configure their browsers to delete all cookies on closing the browser. How about:

"Identifiers MAY be persisted across browsing sessions. Persistent identifiers let the application save, identify the availability of, and directly request specific sources."

Any site that assumes that identifiers will persist will set themselves up for failure (for example, when the user clears cookies); the spec should not encourage that false assurance.

We added text to clarify that persistent identifiers (including device ids) are to be cleared with cookies, and that the identifiers don't persist unless device access has been granted.

Cool. It sounds like that would address the concern about access to persistent deviceIds prior to a permission grant.

Nick Doty on behalf of the Privacy Interest Group

The browser must provide mechanisms for users to revise and even completely revoke consent to use device resources such as camera and microphone

We reworked our permission system to be based on the Permission API, where revokation is addressed

no response

GH-267 and GH-268 Nick Doty on behalf of the Privacy Interest Group

Permissions for getUserMedia seem to be specific to entry script origin. Is this what users will expect? For example, if I grant and persist permission to callmyfriends.com to use their service and later I browse to example.com which has an embedded iframe of callmyfriends.com , will users be shocked to see their camera turn on and a picture of themselves on the screen? Permission breadth may be a flexible option for the user agent ("Optionally, e.g., based on a previously-established user preference, for security reasons"), but it might be useful for the spec to establish some expectations here. Top-level origin/embedded origin pairs, for example, might be a useful model, as in some implementations of Geolocation.

We revised our permission model to be double-keyed by the top-level origin and the entry-script origin; furthermore, iframes will have to be expliticitly be allowed to use getUserMedia, via a new allowusermedia attribute

yes

CSP as a signal for permission persistence, Nick Doty on behalf of the Privacy Interest Group

It might make most sense for browsers just to check at the time that a permission is first granted whether a relevant CSP is present and use that as a signal in determining whether to persist the permission grant.

Tying CSP to the persistence of permissions would add significant complexity in the permission management (tracking evolution of CSP policies for a given site over time) and require different interpretation of CSP depending on whether they come from HTTP headers vs embedded market. Yet that change would reduce only very narrowly a potential attack surface. The Working Group decided against that trade-off. Such a change would be better considered by the WebAppSec Working Group.

yes

Simultaneous hardware events across origins, Nick Doty on behalf of the Privacy Interest Group

firing a devicechange event simultaneously in different browsing contexts (including tabs or iframes not in the foreground, or in different browsers altogether, that have not asked for any permissions) creates a risk of unexpected correlation of browsing activity

We restricted the devicechange event firing to Web apps with permissions to list devices

no response

Input from Individuals

Commentor	Comment	Working Group decision	Commentor reply
LC-3010 Elliott Sprehn `<esprehn@chromium.org>` (archived comment)	== enumerateDevices should be getAll() to match other APIs == ```mediaDevices.getAll()``` is pretty clear and matches other APIs like the Cache API in SW.	The Working Group does not contemplate any change based on this comment.	No response from commenter
LC-3020 Kuu Miyazaki `<miyazaqui@gmail.com>` (archived comment)	Two kinds of MediaStreamTracks, 'audio' and 'video' are defined in the spec. But shouldn't we add another kind, 'video+audio (muxed)'? I thought there might be a platform that doesn't support separate sources for audio and video but only supports an encoded/muxed stream as a source. In such case, it would be hard for User Agent to implement for instance removeTrack.	This has not been discussed in the Task Force, but it seems that the relevant cases can be handled with the current set of APIs. No change in response to this comment is contemplated.	No response from commenter
LC-3009 Garrett Smith `<dhtmlkitchen@gmail.com>` (archived comment)	How can you apply a filter chain for video capture to a VIDEO element's src?	This document does not aim to solve the problem of defining filter chains for video. A later project can build on this basis, but the Working Group feels that it is out of scope for the 1.0 version of this document. No change in response to this comment is contemplated.	yes
LC-3015 Anne van Kesteren `<annevk@annevk.nl>` (archived comment)	== MediaStreamTrackEvent's track member is not nullable == Either you need to make this nullable or you need to require the dictionary argument and not give that a default value of null. See https://github.com/w3c/mediacapture-main/issues/160	The Working Group has considered this comment, and has made the corresponding change in its Working Document https://github.com/w3c/mediacapture-main/commit/8d330d290d8318c57628a1f9c6f275fb58a86cc8	yes
LC-3016 Anne van Kesteren `<annevk@annevk.nl>` (archived comment)	== Remove "Direct Assignment to Media Elements" == It conflicts with the definition given in the HTML Standard, which also allows for setting a `Blob` object and such. Given that it's integrated there, providing a pointer seems better. See https://github.com/w3c/mediacapture-main/issues/161	The Working Group agrees the specification needs to defer to the existing description of assignments made in the HTML specification and had modified the specification accordingly. The Media Capture and Streams spec keeps the parts that are not yet reflected in the HTML specification (as reported in https://www.w3.org/Bugs/Public/show_bug.cgi?id=28785 ).	yes
LC-3017 Anne van Kesteren `<annevk@annevk.nl>` (archived comment)	== MediaStreamError should not be an interface == Errors in the platform are represented by JavaScript `Error` object subclasses. See https://github.com/w3c/mediacapture-main/issues/162	We believe we have addressed this issue through a revisition of the error handling in the spec - most importantly PR #194, which added the "overconstrained error". We believe this change, and associated other changes, together resolve the issue.	yes
LC-3018 Anne van Kesteren `<annevk@annevk.nl>` (archived comment)	== Please use [Exposed] == That way it is much clearer what is exposed to Window and/or Worker. See https://github.com/w3c/mediacapture-main/issues/163	The Working Group discussion concluded that explicit marking with [Exposed] is valuable, but did not find any support for adding exposure to workers at this time; thus, the Working Group decided adding [exposed=Window] to all relevant WebIDL constructs and has reflected this in its working document as illustrated in https://github.com/w3c/mediacapture-main/commit/6a479c794deeaf1bba40d87ae1299827cfa79773#diff-ea76d38900f79cfae8f60e5f7cf16dd1	yes
LC-3022 Charlie Kehoe `<ckehoe@google.com>` (archived comment)	Some applications involve listening to audio for a potentially extended period of time (with user consent, of course), and are not particularly latency-sensitive. An example would be the "Ok Google" hotwording available on the Chrome new tab page, or other types of continuous speech recognition. For these applications, a typical low-latency audio configuration can lead to excessive power usage. I've measured 20% CPU usage for audio capture in Chrome, for example. My proposed solution is to offer a way to change the audio buffer size. This enables a tradeoff between latency and power usage. For example, a member could be added to MediaTrackConstraintSet <http://w3c.github.io/mediacapture-main/getusermedia.html#dictionary-mediatrackconstraintset-members> : dictionary MediaTrackConstraintSet { ... audioBufferDurationMs of type ConstrainLong }; This would be an integer number of milliseconds. Perhaps the name could mention latency instead (e.g. audioLatencyMs). How does this simple change sound? - Charlie	The Working Group agreed to add a way for the application to control the audio buffer size by means of a new MediaStreamTrack constraint to represent latency.	yes
LC-3025 Iñaki Baz Castillo `<ibc@aliax.net>` (archived comment)	Hi, The current draft states that both onaddtrack and onremovetrack events "are not fired when the script directly modifies the tracks of aMediaStream". I don't like that. When I call close on a WebSocket I get the onclose event. Events indicate that something happened regardless who or what caused it. I see no reason at all to just fire those events due to a track modification made by the script in a MediaStream.	A note was added to the document that clarifies that the addtrack and removetrack events are defined to be used by other specs that use the MediaStream API and need to notify the script that the User Agent has updated a MediaStream's track set "from the background". [1] https://github.com/w3c/mediacapture-main/commit/13ad8737791455ffae8f9f91c018d8aa896ca379	yes
LC-3021 Justin Uberti `<juberti@google.com>` (archived comment)	In Sections 4.3.6 and 4.3.7, the various parameters that can be specified are declared, but there is no text that defines their exact meaning. Are they defined somewhere else, or did I just miss them? e.g. http://w3c.github.io/mediacapture-main/getusermedia.html#media-track-constraints 4.3.7.2 Dictionary MediaTrackConstraintSet <http://w3c.github.io/mediacapture-main/getusermedia.html#idl-def-MediaTrackConstraintSet> MembersaspectRatio of type ConstrainDouble <http://w3c.github.io/mediacapture-main/getusermedia.html#idl-def-ConstrainDouble> deviceId of type ConstrainDOMString <http://w3c.github.io/mediacapture-main/getusermedia.html#idl-def-ConstrainDOMString> echoCancellation of type ConstrainBoolean <http://w3c.github.io/mediacapture-main/getusermedia.html#idl-def-ConstrainBoolean> facingMode of type ConstrainDOMString <http://w3c.github.io/mediacapture-main/getusermedia.html#idl-def-ConstrainDOMString> frameRate of type ConstrainDouble <http://w3c.github.io/mediacapture-main/getusermedia.html#idl-def-ConstrainDouble> groupId of type ConstrainDOMString <http://w3c.github.io/mediacapture-main/getusermedia.html#idl-def-ConstrainDOMString> height of type ConstrainLong <http://w3c.github.io/mediacapture-main/getusermedia.html#idl-def-ConstrainLong> sampleRate of type ConstrainLong <http://w3c.github.io/mediacapture-main/getusermedia.html#idl-def-ConstrainLong> sampleSize of type ConstrainLong <http://w3c.github.io/mediacapture-main/getusermedia.html#idl-def-ConstrainLong> volume of type ConstrainDouble <http://w3c.github.io/mediacapture-main/getusermedia.html#idl-def-ConstrainDouble> width of type ConstrainLong <http://w3c.github.io/mediacapture-main/getusermedia.html#idl-def-ConstrainLong>	The parameters are defined in section 4.3.9 of the document.	yes
LC-3024 Martin Thomson `<martin.thomson@gmail.com>` (archived comment)	On 20 April 2015 at 05:40, Harald Alvestrand <harald@alvestrand.no> wrote: > There's more text on what they mean in section 14.1, "Track > Constrainable Property Registration". Can we remove the registry? Is there any reason that we can't simply maintain the document with the definitions of the things we are using?	The reference to the registry has been removed from the document. Recorded here: https://lists.w3.org/Archives/Public/public-media-capture/2015Oct/0018.html	yes
LC-3011 Anne van Kesteren `<annevk@annevk.nl>` (archived comment)	== Language does not seem tamper-free == E.g. if I overwrite `MediaStream.prototype.addTrack` does that affect `MediaStream`'s constructor? I think both `addTrack()` and the constructor are meant to invoke a "private" algorithm. This happens throughout the specification. See https://github.com/w3c/mediacapture-main/issues/158	The WG agreed to use private algorithms instead of the exposed API to reference the intended functionality as recommended by reporter of this comment. PR #167 fixes this issue and a similar issue in the MediaStream.clone() method. [1] https://github.com/w3c/mediacapture-main/pull/167	yes
LC-3012 Anne van Kesteren `<annevk@annevk.nl>` (archived comment)	== MediaStream's active attribute == The link from the constructor to "active" indicates a `MediaStream` state of active/inactive. However, the prose doesn't reference to this as a state but rather as something that is a boolean. This is rather confusing. See https://github.com/w3c/mediacapture-main/issues/159	The Working Group agreed that the the value for the active state should not be calculated in the MediaStream constructor, but instead be defined by the stream's track set. PR #168 [1] removed the concerned text. [1] https://github.com/w3c/mediacapture-main/pull/168	yes
LC-3014 Harald Alvestrand `<harald@alvestrand.no>` (archived comment)	In the current specification, we have two concepts related to sources and tracks: - A track can be stop()ed, in which case it is ended. - A track can be detached from its source. The text says: A) in terminology for "source", we have: Sources are detached from a track when the track is ended for any reason. B) Under "Life-cycle and Media Flow", we have: A MediaStreamTrack can be detached from its source. It means that the track is no longer dependent on the source for media data. If no other MediaStreamTrack is using the same source, the source will be stopped. MediaStreamTrack attributes such as kind and label must not change values when the source is detached. C) Under the "enabled" attribute of a track, we have: On getting, the attribute must return the value to which it was last set. On setting, it must be set to the new value, regardless of whether the MediaStreamTrack object has been detached from its source or not. Under the "stop" function for a track, we have: 3. Set track's readyState attribute to ended. 4. Detach track's source. It seems to me that this is one concept more than we need. Whether there is a relationship between a stopped track and its source or not is an implementation detail, and we shouldn't be constraining it in our API description. So my suggestion: In A, C and D, simply remove the text that refers to "Detach". In B, instead say: If all MediaStreamTracks that are using the same source are ended, the source will be stopped. I think that simplifies the terminology, and doesn't change any observable property of the API. What do people think? (If others like it, I'll file a bug for it.) Harald	The Working Group agrees with the suggestion, and commit [1] removes the concept of detaching a source from its track. [1] https://github.com/w3c/mediacapture-main/commit/8a0561644d0f7d922ccf15f8dd3e7bb725b6163f	yes
LC-3013 Nigel Megitt `<nigel.megitt@bbc.co.uk>` (archived comment)	Does this work include the capture of related streams of media and/or data with common timing references, such as captions or subtitles?	This document does not deal with streams of data related to the media. Such functions could be contemplated as further work once this document is done, but the Working Group does not suggest adding this functionality at this time. The specification has been updated to clarify its scope (captured audio and video media) and to give more directions on how it can be extended to handle new type of synchronized data.	yes
Hardware fingerprinting mitigation 312, Kolanich	Maybe we should prescribe browser vendors to mitigate the identified possibilities of hardware fingerprinting? For example detect dead pixels (and any other static defects) on webcams, add some fake dead pixels, discard and interpolate them.	The Working Group doesn't plan on addressing further fingerprinting mitigations, where each browser can innovate. Should such innovation lead to clearly accepted mitigation practices, the group can revisit this in a later iteration of the document	No response
GH-311, Kolanich	Replace device enumeration API with API returning a secure widget the API should return a secure DOM node (see this discussion about how such elements should behave, btw we need a standard for this), which allows a user to select a source but disallow the webpage to know all the properties of the sources it is possible to prevent a webpage from knowing. This will allow a user to select the device but won't allow a webpage to get information about hardware/environment directly.	There is currently no support for such a secure DOM node, and the commentor didn't bring new information as to why the very limited amount of information that enumerateDevices() prior to authorization justify such a change.	No reply
GH-304 jnoring	gUM firing repeatedly for the same page load (i.e. getUserMedia allows an implementation to prompt for permission too often)	The spec leaves the particular prompting strategy to the implementations; the fact that some implementations can lead to poor UX is not something the spec should try banning	No reply
GH-303 jnoring	getUserMedia: spec does not define what happens when browser gUM dialog disappears completely	This is linked to a specific implementation issue, not a spec issue	No reply
GH-299 jnoring	Unclear how to associate current active device with enumerated devices	The spec already addresses how to do this, but implementations haven't caught up with it yet	Accepted
GH-236 Tobie Langel	API to expose the angle of view of a camera	Additional constraints should be done as extensions to the main spec at this point; we clarified how to write and design such extensions. Media Capture Depth Stream defines related constraints	No reply
GH-202 Jeffrey Yasskin	https://w3c.github.io/mediacapture-main/#idl-def-MediaDeviceInfo defines the groupId field to identify a physical device. http://webaudio.github.io/web-midi-api/#idl-def-MIDIPort defines the id field to identify the device. https://webbluetoothcg.github.io/web-bluetooth/#bluetoothdevice defines the instanceId field to identify a device. A single physical device could appear in more than one of these APIs, and it'd be nice to let web pages figure out that it's a single device, to the extent the browser knows.	These device ids aren't logically equivalent at this point, but further coordination on the topic has been raised as a TAG Issue	OK in issue
GH-196, Domenic Denicola	Mis-use of public algorithms in internal ones	Fixed	No other similar error spotted
GH-192, Owen Campbell-Moore	UAs should allow clients to control the focusing of the device's camera.	This should be developed as part of an extension; we improved the documentation on how to build such extensions	no reply
GH-191, Owen Campbell-Moore	UAs should allow clients to specify, and dynamically modify a zoom attribute.	This should be developed as part of an extension; we improved the documentation on how to build such extensions	no reply
GH-127 Steven Sokol	Lack of timeout / cancellation leads to UI inconsistencies	getUserMedia relies on promises; when promises get cancelable, this will be handled, but it doesn't make sense for this group to define an ad-hoc mechanism for this spec alone	no reply

Commentor

Comment

Working Group decision

Commentor reply

LC-3010 Elliott Sprehn <esprehn@chromium.org> (archived comment)

== enumerateDevices should be getAll() to match other APIs ==
```mediaDevices.getAll()``` is pretty clear and matches other APIs
like the Cache API in SW.

The Working Group does not contemplate any change based on this comment.

No response from commenter

LC-3020 Kuu Miyazaki <miyazaqui@gmail.com> (archived comment)

Two kinds of MediaStreamTracks, 'audio' and 'video' are defined in the spec.
But shouldn't we add another kind, 'video+audio (muxed)'?
I thought there might be a platform that doesn't support separate
sources for audio and video but only supports an encoded/muxed stream
as a source.
In such case, it would be hard for User Agent to implement for
instance removeTrack.

This has not been discussed in the Task Force, but it seems that the relevant cases can be handled with the current set of APIs.

No change in response to this comment is contemplated.

No response from commenter

LC-3009 Garrett Smith <dhtmlkitchen@gmail.com> (archived comment)

How can you apply a filter chain for video capture to a VIDEO element's src?

This document does not aim to solve the problem of defining filter chains for video. A later project can build on this basis, but the Working Group feels that it is out of scope for the 1.0 version of this document.

No change in response to this comment is contemplated.

yes

LC-3015 Anne van Kesteren <annevk@annevk.nl> (archived comment)

== MediaStreamTrackEvent's track member is not nullable ==
Either you need to make this nullable or you need to require the
dictionary argument and not give that a default value of null.

See https://github.com/w3c/mediacapture-main/issues/160

The Working Group has considered this comment, and has made the corresponding change in its Working Document https://github.com/w3c/mediacapture-main/commit/8d330d290d8318c57628a1f9c6f275fb58a86cc8

yes

LC-3016 Anne van Kesteren <annevk@annevk.nl> (archived comment)

== Remove "Direct Assignment to Media Elements" ==
It conflicts with the definition given in the HTML Standard, which
also allows for setting a `Blob` object and such. Given that it's
integrated there, providing a pointer seems better.

See https://github.com/w3c/mediacapture-main/issues/161

The Working Group agrees the specification needs to defer to the existing description of assignments made in the HTML specification and had modified the specification accordingly.

The Media Capture and Streams spec keeps the parts that are not yet reflected in the HTML specification (as reported in https://www.w3.org/Bugs/Public/show_bug.cgi?id=28785 ).

yes

LC-3017 Anne van Kesteren <annevk@annevk.nl> (archived comment)

== MediaStreamError should not be an interface ==
Errors in the platform are represented by JavaScript `Error` object
subclasses.

See https://github.com/w3c/mediacapture-main/issues/162

We believe we have addressed this issue through a revisition of the error handling in the spec - most importantly PR #194, which added the "overconstrained error".

We believe this change, and associated other changes, together resolve the issue.

yes

LC-3018 Anne van Kesteren <annevk@annevk.nl> (archived comment)

== Please use [Exposed] ==
That way it is much clearer what is exposed to Window and/or Worker.

See https://github.com/w3c/mediacapture-main/issues/163

The Working Group discussion concluded that explicit marking with [Exposed] is valuable, but did not find any support for adding exposure to workers at this time; thus, the Working Group decided adding [exposed=Window] to all relevant WebIDL constructs and has reflected this in its working document as illustrated in https://github.com/w3c/mediacapture-main/commit/6a479c794deeaf1bba40d87ae1299827cfa79773#diff-ea76d38900f79cfae8f60e5f7cf16dd1

yes

LC-3022 Charlie Kehoe <ckehoe@google.com> (archived comment)

Some applications involve listening to audio for a potentially extended
period of time (with user consent, of course), and are not particularly
latency-sensitive. An example would be the "Ok Google" hotwording available
on the Chrome new tab page, or other types of continuous speech
recognition. For these applications, a typical low-latency audio
configuration can lead to excessive power usage. I've measured 20% CPU
usage for audio capture in Chrome, for example.

My proposed solution is to offer a way to change the audio buffer size.
This enables a tradeoff between latency and power usage. For example, a
member could be added to MediaTrackConstraintSet
<http://w3c.github.io/mediacapture-main/getusermedia.html#dictionary-mediatrackconstraintset-members>
:

dictionary MediaTrackConstraintSet {
...
audioBufferDurationMs of type ConstrainLong
};

This would be an integer number of milliseconds. Perhaps the name could
mention latency instead (e.g. audioLatencyMs).

How does this simple change sound?

- Charlie

The Working Group agreed to add a way for the application to control the audio buffer size by means of a new MediaStreamTrack constraint to represent latency.

yes

LC-3025 Iñaki Baz Castillo <ibc@aliax.net> (archived comment)

Hi,

The current draft states that both onaddtrack and onremovetrack events
"are not fired when the script directly modifies the tracks of
aMediaStream".

I don't like that. When I call close on a WebSocket I get the onclose
event. Events indicate that something happened regardless who or what
caused it.

I see no reason at all to just fire those events due to a track
modification made by the script in a MediaStream.

A note was added to the document that clarifies that the addtrack and removetrack events are defined to be used by other specs that use the MediaStream API and need to notify the script that the User Agent has updated a MediaStream's track set "from the background".

[1] https://github.com/w3c/mediacapture-main/commit/13ad8737791455ffae8f9f91c018d8aa896ca379

yes

LC-3021 Justin Uberti <juberti@google.com> (archived comment)

In Sections 4.3.6 and 4.3.7, the various parameters that can be specified
are declared, but there is no text that defines their exact meaning. Are
they defined somewhere else, or did I just miss them?

e.g.
http://w3c.github.io/mediacapture-main/getusermedia.html#media-track-constraints
4.3.7.2 Dictionary MediaTrackConstraintSet
<http://w3c.github.io/mediacapture-main/getusermedia.html#idl-def-MediaTrackConstraintSet>
MembersaspectRatio of type ConstrainDouble
<http://w3c.github.io/mediacapture-main/getusermedia.html#idl-def-ConstrainDouble>
deviceId of type ConstrainDOMString
<http://w3c.github.io/mediacapture-main/getusermedia.html#idl-def-ConstrainDOMString>
echoCancellation of type ConstrainBoolean
<http://w3c.github.io/mediacapture-main/getusermedia.html#idl-def-ConstrainBoolean>
facingMode of type ConstrainDOMString
<http://w3c.github.io/mediacapture-main/getusermedia.html#idl-def-ConstrainDOMString>
frameRate of type ConstrainDouble
<http://w3c.github.io/mediacapture-main/getusermedia.html#idl-def-ConstrainDouble>
groupId of type ConstrainDOMString
<http://w3c.github.io/mediacapture-main/getusermedia.html#idl-def-ConstrainDOMString>
height of type ConstrainLong
<http://w3c.github.io/mediacapture-main/getusermedia.html#idl-def-ConstrainLong>
sampleRate of type ConstrainLong
<http://w3c.github.io/mediacapture-main/getusermedia.html#idl-def-ConstrainLong>
sampleSize of type ConstrainLong
<http://w3c.github.io/mediacapture-main/getusermedia.html#idl-def-ConstrainLong>
volume of type ConstrainDouble
<http://w3c.github.io/mediacapture-main/getusermedia.html#idl-def-ConstrainDouble>
width of type ConstrainLong
<http://w3c.github.io/mediacapture-main/getusermedia.html#idl-def-ConstrainLong>

The parameters are defined in section 4.3.9 of the document.

yes

LC-3024 Martin Thomson <martin.thomson@gmail.com> (archived comment)

On 20 April 2015 at 05:40, Harald Alvestrand <harald@alvestrand.no> wrote:
> There's more text on what they mean in section 14.1, "Track
> Constrainable Property Registration".

Can we remove the registry? Is there any reason that we can't simply
maintain the document with the definitions of the things we are using?

The reference to the registry has been removed from the document. Recorded here: https://lists.w3.org/Archives/Public/public-media-capture/2015Oct/0018.html

yes

LC-3011 Anne van Kesteren <annevk@annevk.nl> (archived comment)

== Language does not seem tamper-free ==
E.g. if I overwrite `MediaStream.prototype.addTrack` does that affect
`MediaStream`'s constructor? I think both `addTrack()` and the
constructor are meant to invoke a "private" algorithm. This happens
throughout the specification.

See https://github.com/w3c/mediacapture-main/issues/158

The WG agreed to use private algorithms instead of the exposed API to reference the intended functionality as recommended by reporter of this comment.

PR #167 fixes this issue and a similar issue in the MediaStream.clone() method.

[1] https://github.com/w3c/mediacapture-main/pull/167

yes

LC-3012 Anne van Kesteren <annevk@annevk.nl> (archived comment)

== MediaStream's active attribute ==
The link from the constructor to "active" indicates a `MediaStream`
state of active/inactive. However, the prose doesn't reference to this
as a state but rather as something that is a boolean. This is rather
confusing.

See https://github.com/w3c/mediacapture-main/issues/159

The Working Group agreed that the the value for the active state should not be calculated in the MediaStream constructor, but instead be defined by the stream's track set.

PR #168 [1] removed the concerned text.

[1] https://github.com/w3c/mediacapture-main/pull/168

yes

LC-3014 Harald Alvestrand <harald@alvestrand.no> (archived comment)

In the current specification, we have two concepts related to sources
and tracks:

- A track can be stop()ed, in which case it is ended.
- A track can be detached from its source.

The text says:

A) in terminology for "source", we have:

Sources are detached from a track when the track is ended for any reason.

B) Under "Life-cycle and Media Flow", we have:

A MediaStreamTrack can be detached from its source. It means that the
track is no longer dependent on the source for media data. If no other
MediaStreamTrack is using the same source, the source will be stopped.
MediaStreamTrack attributes such as kind and label must not change
values when the source is detached.

C) Under the "enabled" attribute of a track, we have:

On getting, the attribute must return the value to which it was last
set. On setting, it must be set to the new value, regardless of whether
the MediaStreamTrack object has been detached from its source or not.

Under the "stop" function for a track, we have:

3. Set track's readyState attribute to ended.

4. Detach track's source.

It seems to me that this is one concept more than we need.
Whether there is a relationship between a stopped track and its source
or not is an implementation detail, and we shouldn't be constraining it
in our API description.

So my suggestion:

In A, C and D, simply remove the text that refers to "Detach".

In B, instead say:

If all MediaStreamTracks that are using the same source are ended, the
source will be stopped.

I think that simplifies the terminology, and doesn't change any
observable property of the API.

What do people think?
(If others like it, I'll file a bug for it.)

Harald

The Working Group agrees with the suggestion, and commit [1] removes the concept of detaching a source from its track.

[1] https://github.com/w3c/mediacapture-main/commit/8a0561644d0f7d922ccf15f8dd3e7bb725b6163f

yes

LC-3013 Nigel Megitt <nigel.megitt@bbc.co.uk> (archived comment)

Does this work include the capture of related streams of media and/or data
with common timing references, such as captions or subtitles?

This document does not deal with streams of data related to the media. Such functions could be contemplated as further work once this document is done, but the Working Group does not suggest adding this functionality at this time.

The specification has been updated to clarify its scope (captured audio and video media) and to give more directions on how it can be extended to handle new type of synchronized data.

yes

Hardware fingerprinting mitigation 312, Kolanich

Maybe we should prescribe browser vendors to mitigate the identified possibilities of hardware fingerprinting? For example detect dead pixels (and any other static defects) on webcams, add some fake dead pixels, discard and interpolate them.

The Working Group doesn't plan on addressing further fingerprinting mitigations, where each browser can innovate. Should such innovation lead to clearly accepted mitigation practices, the group can revisit this in a later iteration of the document

No response

GH-311, Kolanich

Replace device enumeration API with API returning a secure widget

the API should return a secure DOM node (see this discussion about how such elements should behave, btw we need a standard for this), which allows a user to select a source but disallow the webpage to know all the properties of the sources it is possible to prevent a webpage from knowing.

This will allow a user to select the device but won't allow a webpage to get information about hardware/environment directly.

There is currently no support for such a secure DOM node, and the commentor didn't bring new information as to why the very limited amount of information that enumerateDevices() prior to authorization justify such a change.

No reply

GH-304 jnoring

gUM firing repeatedly for the same page load

(i.e. getUserMedia allows an implementation to prompt for permission too often)

The spec leaves the particular prompting strategy to the implementations; the fact that some implementations can lead to poor UX is not something the spec should try banning

No reply

GH-303 jnoring

getUserMedia: spec does not define what happens when browser gUM dialog disappears completely

This is linked to a specific implementation issue, not a spec issue

No reply

GH-299 jnoring

Unclear how to associate current active device with enumerated devices

The spec already addresses how to do this, but implementations haven't caught up with it yet

Accepted

GH-236 Tobie Langel

API to expose the angle of view of a camera

Additional constraints should be done as extensions to the main spec at this point; we clarified how to write and design such extensions. Media Capture Depth Stream defines related constraints

No reply

GH-202 Jeffrey Yasskin

https://w3c.github.io/mediacapture-main/#idl-def-MediaDeviceInfo defines the groupId field to identify a physical device. http://webaudio.github.io/web-midi-api/#idl-def-MIDIPort defines the id field to identify the device. https://webbluetoothcg.github.io/web-bluetooth/#bluetoothdevice defines the instanceId field to identify a device. A single physical device could appear in more than one of these APIs, and it'd be nice to let web pages figure out that it's a single device, to the extent the browser knows.

These device ids aren't logically equivalent at this point, but further coordination on the topic has been raised as a TAG Issue

OK in issue

GH-196, Domenic Denicola

Mis-use of public algorithms in internal ones

Fixed

No other similar error spotted

GH-192, Owen Campbell-Moore

UAs should allow clients to control the focusing of the device's camera.

This should be developed as part of an extension; we improved the documentation on how to build such extensions

no reply

GH-191, Owen Campbell-Moore

UAs should allow clients to specify, and dynamically modify a zoom attribute.

This should be developed as part of an extension; we improved the documentation on how to build such extensions

no reply

GH-127 Steven Sokol

Lack of timeout / cancellation leads to UI inconsistencies

getUserMedia relies on promises; when promises get cancelable, this will be handled, but it doesn't make sense for this group to define an ad-hoc mechanism for this spec alone

no reply