W3C

Second Screen WG/CG TPAC F2F - Day 2/2

07 November 2017

Meeting Minutes

Welcome

Anssi: Welcome to the Second Screen CG meeting. Yesterday we focused on the APIs developed by the WG. Today, we're going to focus on a set of protocols called the Open Screen Protocol for these APIs
… Lots of developments in this area. Mark Foltz has been doing most of the investigation work here.
… Round of introduction

MarkW: From Netflix. Some services based on Second Screen use cases. We were involved but then we stopped because we switched our Web site to HTTPS. Now looking at it again to see what solutions emerge.

Tomoyuki: KDDI. We provide set-top boxes for cable operators.

Igarashi: Sony Corporation. Yesterday, I showed some broadcaster use cases for second screen during the joint session with the Media & Entertainment IG. I would like to apply second screen to broadcaster use cases.

schien: Mozilla Taiwan. Previously working on Firefox OS for TV. Now willing to see agreement on the protocol to confidently implement the APIs in Firefox.

Francois: W3C. Staff contact. Would love to see the outcomes of the protocol to be able to tell a fantastic interoperability story.

Anssi: Intel. Chair of the group. Very happy to see progress in this group, in response to TAG's comments.

MarkF: Google. My team implements the APIs. Driving the discussions on the protocols now.

Chris: BBC. We've been involved in the companion screen mechanism in HbbTV. Very interesting to be able to use the WG APIs.

Louay: Fraunhofer FOKUS. Research institution. Multiscreen, 360 video domains.

Steven: Google. Involved in second screen and cast devices for several years for now.

Geunhyung: Dong-Eui University. Interested in the second screen architecture using the Web.

Yam: ACCESS. Browser vendor, game console, set-top box, etc.
… I'm trying to keep an eye on evolution of the APIs.

Kosuke: JCBA. I usually work in the division of making programs.
… A bit far from this topic, but I'm interested to hear about it.

Masaya: NHK. Working on a protocol. I presented an Hybridcast update yesterday.

Agenda bashing

Anssi: Whole day for the Open Screen Protocol, although we'll save some time in the end to finalize WG rechartering discussions.
… Mark, which parts of the protocol suite would benefit from discussion here?

MarkF: I think we should take a look at almost everything.
… As there are more open discussions on specific issues, we can take a closer look.
… Transport and discovery have multiple proposals, we should discuss that more thoroughly.
… And authentication, because that's the main area for which we do not have proposals yet.

Open Screen Protocol

[MarkF quickly going through documents in the Open Screen Protocol]

MarkW: Application level protocol could be done on top of multiple protocols?

MarkF: Yes. We will mandate one transport. There has to be some mapping, but the content should be independent of that.
… Message-oriented or stream-oriented protocol.

MarkW: When you say authentication, do you include securing the exchanges?

MarkF: Authentication also includes privacy and protection against man-in-the-middle attacks.
… Certificate management will be done in parallel to this.

Requirements for Open Screen Protocol

<anssik> Requirements for Open Screen Protocol

MarkF: Will go over this quickly.
… Scope are UA that implement the Presentation API (presentation of entire documents) and Remote Playback API (presentation of media elements).
… The work has focused on 2-UA mode where we send the URL to render to a receiver device.
… 1-UA mode is not something that we're going to pursue in this first iteration.
… First, we need to know if there is a presentation display available on the same Local Area Network, with multicast discovery mechanism.
… We also need to query whether a presentation display can render a particular presentation URL. Depends on URL scheme for instance.
… The Presentation API allows the controlling browsing context allows to connect, reconnect, and then exchange messages. Messages can be text and binary. No size limit, but implementation may impose restrictions based on capabilities.
… Either the user or the page should be able to terminate presentations started in the past.
… We haven't gotten contributions on the Remote Playback API front. In many ways, it is simpler.
… We hope that this will be a second application layer protocol that can reuse the rest.
… And that can be done in a second phase.
… We also looked at non functional requirements. We want protocols to be implemented on a wide range of devices. Low-end devices included. Essentially, Raspberry PI 2 is the basic device that we'd like to support.
… We'll use that for our benchmarking work.
… It should be similar to low-end smartphones released in the past few years.
… We want to minimize the amount of network traffic, and minimize power consumption.
… We do want to provide some privacy and security guarantees to prevent other entities present on the network to intercept messages.
… That requires having a good story on authentication, but also on privacy and security.
… This affects how much information we can exchange during the discovery process. If it's private, it should be exchanged at a later phase.
… We prefer protocols that minimize latency. Quick discovery, quick notification when a device is no longer available.
… For authentication, one model involves pairing. We should try to use that pairing as much as possible and not require another pairing each time.
… If the user has a preferred locale, we should be able to support that.

MarkF: Finally, vendors may want to add additional features on top of the protocol, we need to have some extensibility point so that they don't have to create additional protocols. For instance, Chrome has additional code to setup Cast devices. Vendor specific, but makes sense to reuse the same protocols.

Anssi: Any feedback on the requirements?

Chris: One question. Timeliness of the message delivery. I'm thinking about Remote Playback API when you may want to pass the "currentTime" back to the controller. Could be useful for synchronization.

Anssi: HbbTV synchronization?

Chris: Not only that.

MarkF: There is a requirements in the UX section. No specific figure, if you have one we could try to aim for it.
… For doing really fine-grained synchronization, you may want something else at the network layer.

Sample Device Specifications

<anssik> Sample Device Specifications

MarkF: Basically, I looked around and tried to find some specifications for available devices.
… Roku, Fire TV. Cast devices fall in that ballpark as well. What I found is that the Raspberry PI 2 is close to all of these devices. A bit slower. So I feel it's a reasonable target.
… If others have other devices in mind, it would be great to share. I did not go back before 2015, because I expect these protocols to be adopted by newer devices.

Igarashi: What is the most critical feature? CPU? Memory?

MarkF: I suspect it's a combination of CPU and memory.
… In order to support modern network stack, you need a reasonable CPU.
… And then memory to process the document.

Igarashi: Rendering of the document is probably the most critical part.

MarkF: Yes. I don't really expect these specifications to constrain any of the protocol choices that we could have to make.

Anssi: Mayve the Media & Entertainment IG has experience about TV devices.

Igarashi: But most TV vendors do not expose that information.

MarkF: OK, having concrete information would still be useful.
… We'll do a bit of research, we'll see what I can come up with.

Igarashi: OK, I think we should just clarify that it does not impact the protocol itself, but the rendering of documents.

Chris: CPU is usually quite low, not a lot of memory, but indeed usually not explicit.

MarkF: The other area that has not been researched yet is the hardware crypto capabilities of the various devices. The security folks certainly prefer that cryptographic operations be done in the hardware when possible. That may also show up in this list if I get information.

Louay: Is the rendering part of the protocol?

MarkF: No.

Louay: So we don't need to consider the rendering as part of our evaluation.

MarkF: As far as our evaluation is concerned, I was more interested in getting numbers that are as best as practical for the protocol. More realistic scenarios would include rendering of documents.

Action: mfoltzgoogle to clarify that device requirements are driven by rendering Web content and not the network stack

Action: mfoltzgoogle to consult internally to find out if smart TVs fit within these requirements

Action: mfoltzgoogle to add hardware cryptography capabilities to device specs

Action: cpn to propose latency ranges for media synchronization use cases (lip sync vs frame sync)

[Some side discussion on synchronization requirements. 10ms for audio sync, 25ms for frame sync. But latency of the messages exchanged on the network is not necessarily relevant to that]

MarkF: Does the HbbTV specification mandate any time of constraint there?

Chris: I'll have to look closer.
… Message passing is JSON over Websocket. But clock synchronization uses a UDP mechanism.

Yam: Some use cases will require low latency. In the general case, my sense is that rendering will be the bottleneck for UX.

MarkF: Yes.

Yam: Today, we'll talk about the protocol, but will rendering be benchmarked?

Anssi: Generally, Web specs do not require specific hardware requirements.

MarkF: I'm not committing to cover the rendering part yet, that's another part of the work. If people want to do additional tests using content that they are interested in, that would be good.

Steven: It could be good to have a short list of content that we know we'd like to render using this protocol.
… It would be easy to identify cases where devices fall down from a rendering performance perspective, but they may not be representative of the use cases that we will most likely want to see addressed.

MarkF: That's fair action to take.

[Matt from Youtube joining the meeting remotely]

Matt: Google. Youtube team managing the second screen experience across the baord. Interested in open source community. Front-end platform perspective.

MarkF: We reviewed requirements and discussed benchmarking with and without content.

Discovery

<anssik> Discovery protocol evaluation template

MarkF: Setting the framework. You can submit proposals, using the provided template on GitHub.

MarkF: [going through the template]

Francois: Just wondering about the overall goal. Main goal is to find one and only one protocol to mandate, right?

Anssi: Yes.

Francois: So no real need to list "alternative" discovery protocols that we would not really expect to be mandated as default protocol.

MarkF: Evaluating the 2 proposals on the table. We found out that both proposals have strengths and weaknesses.
… Cast devices support both.
… 7% each way of devices need to be discovered using of one the two protocols.
… The most reliable solution would be to have the controlling device implement both mDNS, and SSDP.
… SSDP is simpler to implement and used in DIAL.
… We should figure out if the battery life and network impact is sufficient that we might want to revisit that decision.

Steven: The most frustrating for users is when they have a device that does not show up. That's what they tell us.
… I feel there are ways to mitigate the battery and network impact.

MarkF: Right. There are ways to reduce the frequency of discovery polls in both mechanisms.
… Focusing on the availability first seems better.
… More implementation work, so feedback from other implementers would be good.

MarkF: 85% of devices can be discovered through both mechanisms. 8% only through mDNS. 6% only through DIAL.

[Side discussion on network settings that can impact discovery]

Schien: In Mozilla, we tried to implement both of them before.
… In our observations, mDNS is slightly more reliable than SSDP.
… SSDP, you try multicast at first, then unicast on different ports which can be blocked.
… mDNS, same port, so the router may be happier to leave the port open for unicast.

Igarashi: Do you have requirements to support discovery across subnets.
… Most SSDP implementations do not think beyond the subnet.

Schien: Not only the router. The switch or Wifi AP.

Igarashi: In case of SSDP, most devices use TTL 1.

MarkF: My understanding is that both mechanisms use multicast.

Igarashi: it's possible to support subnets by setting TTL greater than 1.
… It clearly depends on the network configuration though. We cannot guarantee that working across subnets will work.

MarkF: Correct.
… That's something we can look into. The common use case is a single subnet.

Igarashi: In that case, the reality of discovery is relatively the same in both protocols.

MarkF: I feel like that we still need to understand better if there is a way to mitigate the issues we see with mDNS before we pick it up.
… Now there are advantages to use both. The dual mode would be a good path forward if it's ok with implementors.

Anssi: Is it the de facto way to do discovery?

MarkF: I think this is pretty unique to Cast actually. We can track down the data by platform to see if we find a particular pattern.
… It may provide some insight into which protocol to put priority on.

Anssi: The UA is usually the one that needs to do the hard work, so that things work better for developers. We should optimize for developers.

MarkW: You could imagine devices that embed multiple browsers. How can you choose which browser you want to use to render the URL?

MarkF: That's a new thing for us. Expectation would be that it would appear as different devices.

Discovery - SSDP

<anssik> SSDP evaluation

Louay: Discovery layer for UPnP
… Also part of other protocols. Used by DIAL and HbbTV 2.0.
… Two layers: discovering, device description. Device description is not part of SSDP per se, but useful and needed to describe the device.
… Some consideration on similar discussion in Physical Web project.
… Already implemented in the Android Physical Web application.
… Back to second screen. The workflow has a control point. That's the device doing the discovery. Root device is the TV. That's the device advertising itself.
… All the control device need to listen to a multicast address.
… Assuming you missed the info, you can send a multicast search request and you will get a unicast response from available devices.
… You can specify which search target (ST) you're interested in. In UPnP, this can be media renderers. In our case, this should be compatible second screen devices.
… The response by the root device includes the address of an endpoint to use for further communication
… The root device may also send ByeBye messages.

MarkF: In our implementation, we don't listen to alive and byebye messages.

Louay: Right.
… This affects the number of network requests you need to make.

MarkF: So de facto, the control point triggers the discovery.

Francois: Alive and ByeBye messages are optional in SSDP?

Igarashi: They are mandatory.

Louay: But they can be ignored by the Control Point.

Igarashi: But not always implemented.
… Meant to optimize search when there are many devices available.

MarkF: It improves the latency of discovery. As soon as a device connects to the network, we know that it is there.
… In Chrome, we poll every 2 minutes. The advertisment does help.

Louay: Moving on with the device description. The "Location" header contains a link to the XML description. In DIAL, they also send the endpoint of the root device.
… For each device, in our case, we need to make a GET request to retrieve the URL of the device description.
… We propose to return more information in the first response to avoid having to make two requests.
… Such as the friendly name, and the supported protocols.

MarkF: This is to optimize the process of matching controllers with compatible receivers for the presentation URL.
… Caveat is that it is a broad whitelist. And it is advertising more information to everyone on the local network.
… The reason why the friendly name is an ugly name is that SSDP does not support UTF-8. So base64 is used here

Francois: Any limit to the friendly name length?

MarkF: The whole message has to fit within one packet. Roughly 1200 bytes.

Louay: One other option is that the controller sends the presentation URL in the search request, so that the receiver can decide whether it sends a response or not.

MarkF: That could be a third option.

Louay: According to UPnP, we can add additional parameters to the response, used here.

MarkF: Custom headers include the friendly name, the receiver IP and port, additionally the list of URL protocols that it supports, and the hosts is a comma-delimited list of URL hosts that are compatible. The last two are not mandatory.

Shien: For the Hosts, do we want to allow wildchar?

MarkF: Potentially. *.domainname.ext for instance.

Louay: So third option is about sending the presentation URL to the receiver. No need to send back more information about protocols and hosts. But then it has privacy implications.

MarkF: Yes, in general I'm not in favor of protocols that broadcast in the clear information from a secure context.

Louay: For reliability, UPnP recommends sending all SSDP messages 2 or 3 times.
… The controller can detect join and leave immediately if it listens to corresponding messages.
… Note the MX SSDP header that makes it mandatory for the client to wait a random amount of time between 0 and MX in between messages. That affects latency.

MarkF: Typically, we do see a delay that is much more related to our polling time.

Louay: Do you know what MX values you're using usually?

MarkF: I can't speak for Youtube, but I can find out what we're using in Chrome.

Louay: Network efficiency depends on the number of devices available in the network.
… Regarding power efficiency, it depends on the method used. If the device needs to fetch and parse the device description, this will negatively impact the power efficiency.
… It is very easy to implement SSDP. Two implementations listed here, in Node.js and a Cordova plugin. We can use it for benchmarking.

MarkF: Most implementations use the one from upnp, which is available for a long time.
… I'm not saying I would recommend it, but most implementations use it.

Louay: I'll complete the list to add libupnp.

Action: Louay to complete the list of implementations for SSDP, add e.g. libupnp

Action: mfoltzgoogle to find out the MX value used in Chrome

Louay: IPv4 and IPv6 are both supported. SSDP is implemented on lots of CE devices, at least on TV devices.
… For standardization, SSDP is part of the UPnP device architecture. It is used in many devices, not necessarily only DLNA certified devices.
… Note that the spec is now in the Open Connectivity Foundation (OCF), which requires membership and licensing to access the uPnP test suite and obtain uPnP certification. But we should not need that.

MarkF: That is my interpretation as well.

Louay: There is no specific SSDP document. It's all part of UPnP.

MarkF: It seems to be a de facto standard.

Francois: Any licensing required to implement the spec?

MarkF: I don't think so. You're licensing the test suite.

Igarashi: I think licensing is required in theory to implement the spec. Part of UPnP licensing.

Louay: For privacy, it depends on the method chosen. Some information leaked on the network, either on the controlling or receiving side.

MarkF: For security, SSDP has been around for a long time. There have a been a few exploits because of bad implementations. Some routers expose SSDP to their outer interface, allowing external attackers to mess with your local network.
… Also, poor input handling, with request/response not being parsed correctly, leading to security exploits.
… Whatever implementation is adopted, it needs to be audited and fuzz testing. It should make sure that it only handles requests coming from the LAN, not from the WAN.
… There is a class of attacks where amplification is used for DDoS expoit. We need to take care of that.
… A report showed that many existing UPnP devices have security issues with SSDP.
… It's an implementation issues.

Louay: It may make sense to have guidelines for implementations.

MarkF: If we can come up with a valid implementation, that can be good as well. We use our own implementation in Chrome for instance.
… The question going forward, if we adopt that mechanism, is, do we align it with DIAL, in which case we're more going to set parameters in the XML, or rather we're going to customize SSDP to our needs.
… If there were a potential to support a broader set of devices that support DIAL, we're in favour of that, even though it makes the implementation more complex.
… Something more along the lines of Method 1.

<anssik> Method 1

MarkW: The question, if you're targeting existing implementations, is: if you require additional parameters, do existing devices support that already, or do they need to be updated?

MarkF: [Reflexions on backward compatibility and future looking approach]

Schien: We use a different namespace and different service type (ST). For legacy DIAL applications, you need to support different types. You're trying to discover totally different devices.

MarkF: Maybe we can decouple the discussion between supporting DIAL and supporting SSDP.

MarkW: In the latest DIAL specification, applications do not get to influence the XML description. Within the XML response, there is a field that applications can put additional parameters in. One of them is for the Presentation API.
… That's your little space that you have.

MarkF: So the flow would be DIAL discovery, then launch the application, then get that information?

MarkW: You don't need to launch the application.

Louay: A GET request will give you that info.

MarkW: Right. And a POST request to launch the application.

Louay: In HbbTV, the receiver application is a browser plus some broadcast thing. I think Samsung has its own way to launch pure HTML documents, using their own DIAL application.

MarkF: That's an interesting idea. If we are going to specify some newer discovery mechanisms and transports, I would recommend to do that as opposed to put that into DIAL.

Louay: Yes, DIAL does not include transport anyway.

MarkF: It sounds like the 2 bits of work that come out of this: focus on new devices without DIAL. And then support for legacy DIAL devices.

[Note Method 3 is actually the most privacy preserving option, no presentation URL leaked as opposed to what was minuted earlier]

Francois: Are we in a position to choose a method already?

MarkF: Method 3 is the easiest one to implement. Additional information will be obtained from the application level protocol.

Schien: From a privacy perspective, you will still need to share the presentation URL with the device you discovered before user consent.
… So eavesdropping is still possible.
… If the user has to fall back between devices because one does not support the URL, it's a bad UX.
… The second choice is to build a list of devices that is compatible with the content to present, but a malicious device can then record the presentation URL.
… It does not matter if the connection is secure or not, it will get the information.

MarkF: If you send the request before the user has paired, then that is a problem.
… Either we require pairing before leaking the presentation URL or we assume that it is not too big of an issue.

Francois: Do we have existing use cases where the mechanism in method 2 is not enough? Can the controller make the decision each time and can that be enough?

MarkF: For Cast devices, there are only a restricted number of white listed domains. But then there are other use cases. I think we have to support the whole range of possibilities.
… Custom schemes are the ones that require white listing. Maybe we can say that whitelisting is not supported in HTTPS.
… And then focus filtering on other schemes.

Francois: But then, could we say that this is just an extensibility point for vendor specific features? The APIs do not say what happens with non HTTP URLS, typically.

MarkF: Yes. Then filtering would not work for HTTPS.

Discovery - mDNS

<anssik> mDNS evaluation

MarkF: Moving quickly through this. Multicast DNS extends DNS to local queries on the network.
… You send a multicast DNS query to a specific service type. The responder is the service that responds to the query.

MarkF: The responder returns 4 records: the domain name, which is usually the friendly name.
… The text record advertizes a set of key/value pairs.
… The SRV record is a service record. It's just a port that you can use in this case.
… And then an address record.
… One important aspect of mDNS that makes it different from SSDP is the use of TTL per record.
… Two RFCs that go into details of implementaions of mDNS. It's been around for many years. Started as Bonjour, an Apple protocol.
… Supported by many platforms today.
… Discovery is pretty straightforward.
… The two things to drill on: the friendly name. It can be in the SRV record. There is a length limit and also limited to ASCII.
… Not a good way to support locale strings.
… We have to see if people are using friendly names that cannot be encoded in the length available.
… Hopefully 255 would be enough.
… The other aspect is that we don't have a good way to eliminate non compatible devices, but we should be able to use TXT records.
… Cache accuracy can depend on a number of mechanisms. Revalidation. There are a couple of other things. I think Chrome does most of these. There is some implementation complexity to deal with cache issues.
… If you're on a platform that has its own mDNS listener, you may run into port conflicts, or run into issues where you cannot receive mDNS responses. Also generic networking problems with firewalls blocking these messages.
… I will try to share some of the data that we have broken down by platform.
… The network and power efficiency is highly dependend on TTL. One thing we should do as a group is recommend some TTL values.
… Platform support for mDNS is quite good.
… Apple provides an open source implementation. There's also a more complete implementation called Avahi for Linux devices.
… Standardization status: still a bit fuzzy in IETF but pretty stable
… The security history is similar although not quite as bad as UPnP [going through reported attacks]
… A lot of the same criteria apply. We should have guidelines. Make sure that implementors respect due diligence.
… Strengths are availability across platforms. Caching can be an advantage as well. DNS packets are more restricted. That's a downside.
… It should be enough though.
… This is definitely a protocol we want to support.

Schien: Based on our previous experiments, it would be easier to use mDNS. Adding new things in the TXT record is much easier than adding things in the headers of SSDP.

MarkF: Yes.

Chris: Worried about the size limit.

MarkF: Two limits. Ethernet packet size (about 1400). And then DNS packets over 512 bytes are invalid.
… I'll check whether we run into problems with Cast devices.
… The one downside is the issue with port conflict. We've spent some time thinking about workarounds. If you have Chrome and Firefox on the same desktop, how do we allow them to cooperate?
… We should think of ways to make it easy.

Schien: Also, people on Linux that run Avahi will already have the service running in the background, blocking the port for Firefox and Chrome.

[Side discussion on ways to work around situations that prevent mDNS discovery]

MarkF: Let's revisit this quickly after lunch and evaluate consensus.

[Lunch break until 13:30]

[MarkF showing testing lab with 16 Raspberry PI]

MarkF: I'll probably setup a GitHub repo with the source code running on these devices.

MarkF: Back to mDNS, any further question specific to mDNS?

[None heard]

MarkF: As long as we see gap in devices that can be found using only SSDP, we're still in favor of dual mode.
… We can try to get a little bit more of data. We can also try to improve mDNS, but there will always be issues that make it not an option for some folks.

Schien: For dual mode. For SSDP, the display name is the UUID. For mDNS, it is the display name. How are you going to synchronize the information coming from these two protocols?

MarkF: I think we're using IP/port for the transport endpoint. The second option is to put the device UUID in the TXT record in mDNS.

Authentication

MarkF: We don't have a concrete prosposal for authentication yet.

<anssik> [Meta] Propose security mechanisms for Open Screen protocol #13

MarkF: Choice of model for authenticating the device and whether it's identified on the transport.
… Two models that we think make sense:
… 1/ The Cast model. TLS and self-signed certificates. Certificate management question is sort of complicated.

Schien: And certificates expire...

MarkF: 2/ A user-facilited pairing mechanism, using PAKE for instance, to establish trust in a specific device.
… 3/ Application provider could provide some trust mechanism. Some challenge that will be signed with a key obtained from the application provider. We would be doing the authentication on behalf of the application provider.
… My concern with this third approach is that the authentication is then tied to the origin of the application.
… Next step would be to define the user pairing protocol more concretely.
… For QUIC, it would be TLS 1.3. For RTCDataChannel, it would be more DTLS.
… The installable certificate mechanism, I need to assess support and interest here.
… The third model would require engagement for third parties interested.

<schien> https://‌wiki.mozilla.org/‌WebAPI/‌PresentationAPI:Protocol_Draft#Device_Pairing

Francois: For 2/, I remember you presented something Shih-Chiang. Is that something you can submit as a proposal to the discussion?

Schien: Yes!

MarkF: we'll need something a bit more generic that we can transfer to our application protocol.

Schien: OK. Goal is to exchange keys between two parties without sending the keys during the initial phases. What they did was some maths. Each peer would transfer half of the components to calculate the key.
… The receiver user agent would display something like a 4 digit passcode on screen. If the user can enter the same passcode on the controller, then both peers can compute the same key without exchanging it.
… Then, the goal is to verify the keys. Here, we're using double-hash to check keys.
… If the controlling side can finally re-identify the key received from the receiving side, then both know that they have completed the challenge and can now use the key.

MarkF: Is that passcode entered each time?

Schien: No. If there is no known key, we will trigger this negotiation. If we already saved the key, then during the first message, we'll try to trigger the key verification to know that both sides have the same key already to avoid user gesture.
… Actually, if we already have the key, then instead of using the passcode as J-PAKE input, we will use the key to generate a new key.

MarkF: How many bits is the key?

Schien: That is configurable in the J-PAKE protocol. I don't quite remember what our previous parameters were.

MarkF: I think the difference in how I was thinking of using the PAKE is that I was thinking to exchange public keys that would have a longer time span.
… PAKE used to challenge ownership of the public keys associated with private keys generated on each side.
… If it is DTLS-based, you can use that to have the identify assertion as well.
… You can use that to verify the identity, without having to pass that through the cloud.
… The main difference is the key lifetime. We were looking for a solution that does not require a user gesture for each session.
… If there would be a way to add support key rotation.

Francois: But that's already the case, right?

Schien: Yes. The second time, if both ends remember the same key, they use that as input into the PAKE protocol. So a new key gets generated.
… That said, 4 round trips are needed each time to compute the key, no way to avoid these 4 round trips in the model we propose.

MarkF: Yes. There is the number of round trips and the complexity of operations, and it does not seem too bad here.

Schien: Yes, we did it on mobile devices, and that is fast.

Louay: It has to be done each time you want to send a message?

Schien: No, only for initial connection.
… From a security perspective, this is good, because multiple man-in-the-middle attacks cannot break it as long as the attacker does not know the initial passcode.

MarkF: If you wanted to either have the receiver device provide some assertion of its identity, would that be done by having the user agent challenge the device with the passcode?

Schien: PAKE is for anonymous authentication. No device identity, the only guarantee is that both peers agreed on something.

MarkF: This could be leveraged to accept a public key from the device, as I alluded before.

Schien: Previously, when we were developing this protocol, we were trying to avoid any pre-installation of certificates.
… We use this mechanism to replace the TLS key exchange phase. After this you can do the same level of security that TLS can provide.

Tomoyuki: Note integration of TLS and J-PAKE is proposed in an expired IETF draft.

MarkF: Our ideal would be to combine this with some identity to increase confidence. That may involve extra exchanges to be able to assert that at the transport level.
… This may require extra steps on that protocol.
… I'm hopefully we can find a solution that does not require baking in keys.

<tomoyuki> FYI: there is an internet draft that proposes J-PAKE integration into TLS: https://‌tools.ietf.org/‌html/‌draft-cragie-tls-ecjpake-01 (but it has expired)

MarkF: This certainly looks feasible to implement.
… I want to study that more.
… Also, from Chrome's perspective, the UX is equally important, so I'd like to evaluate that as well.

Action: mfoltzgoogle to review Internet-Draft for ecjpake

Action: mfoltzgoogle to propose adding identity assertion step to J-PAKE proposal.

Action: schien to contribute J-PAKE proposal to repo

Francois: Wondering about other potential proposals for this, e.g. from the HTTPS in local network CG?

Tomoyuki: Not for the moment, we're evaluating solutions and we'll have a breakout session tomorrow.

MarkF: Does Mozilla have an internal security process for new features?

Schien: Yes. The J-PAKE approach is used in our previous remote control features which allowed Firefox for Android to install plug-ins and connect to TV and turn into a remote controller.
… This mechanism was provided by our security team.

MarkF: We had a similar feature for Chrome for remote desktop. Similar features have been through security review but has not been exposed yet.
… I may share more when we have something more concrete, to compare approaches and see if there are significant differences.

Action: mfoltzgoogle to review J-PAKE proposal with similar features in Chrome (remote desktop pairing)

Control protocol

MarkF: Control protocol should be non controversial, although we received from the TAG not to define a new binary format, so a lot of it needs to be thrown away and reformulated again.
… One thing that I want to make sure is that when connection is established, each peer can take whatever role it needs to.
… The browser could then also accepts requests for presentation as well.
… The second thing is that often the same structure is passed across messages. Lot of duplication. Converting to a better data format should help remove that redundancy.

[Looking at CBOR, binary serialization of JSON, roughly]

MarkF: Objections to JSON were: security guys do not like untrusted code, even in JSON. Then it's actually harder to specify it, because anything can be of any type.

MarkW: Yes, I don't think it helps you with validation.
… Security analysis may be easier on a specific chunk of code than on a generic parser. On the other hand, a well-established generic parser might be easier to assess.

MarkF: Protocol buffers was also mentioned. Pretty well-defined syntax, but requires a lot of JavaScript to manipulate. Might expose 100Kb of additional code.
… Not my preferred choice for now.
… But then, if we have binary JSON on the line and JSON exposed to the JavaScript layer, that may be good.

[In summary, 4 options to evaluate: custom binary protocol, JSON, CBOR, protocol buffers]

Transport protocol - QUIC

<anssik> Transport evaluation template

MarkF: First, template for proposals. Fairly similar to the one for discovery.
… [going through template]

MarkF: [notes the importance of latency, a large message should not block smaller ones]
… First proposal is QUIC. I spoke with one of the developers of it.
… This has been in development at IETF for a little while. It started as SPDY, replacement of HTTP. Once a TLS-like authentication has been done, it's 0 round-trip to establish a connection.

<anssik> QUIC proposal

MarkF: [going through QUIC properties]
… Every connection has an ID. That ID can be used to restart a connection without having to re-negotiate.
… Two IETF specs. One is the wire spec. The other is about TLS use. Quite long. How streams are established, how multiplexing works, rules, etc.
… What is interesting is how we can map the Presentation API.
… Stream 0 would be reserved for control channel. Then each application can allocate a stream ID.
… That way, the protocol can multiplex data between connection streams and not block things.
… We have to devise a framing protocol. That's currently how the control protocol is structured.
… One open issue to investigate if WebSockets are possible on QUIC.
… Need to check whether it's possible to do that.
… [going through presentation mapping sections]
… QUIC could be used to terminate presentation. Control messages should be preferred for things triggered by the API, and use QUIC for other types of network failures.
… [latency of establishment depends on authentication protocol]
… Regarding implementation, I can have a look at how easy it would be to implement outside of Chromium.

Schien: I think there are available libraries. Mozilla has one, I think, and I believe there are 4-5 implementations available already. I can ask our team for a pointer.

MarkF: Yes, that would be valuable.
… I don't anticipate any particular issue with our hardware requirements, but I can ask around.
… QUIC is on the standards track at IETF. Last Call expected next year. It should be a pretty stable spec already. I'll check on whether there are expected changes.
… The only security layer is TLS 1.3. There's potential we can use J-PAKE for stream 1 but the behavior is not super well defined.

Schien: TLS 1.3 is still on its way to standardization.

MarkF: Yes. You can perhaps check about the timeline with Mozilla folks involved in the spec.

Schien: I think it's stable. However, I think we see interoperability issues with TLS 1.3 right now, which prevents deployment.

Action: schien to check the timeline of TLS 1.3 deployments from Mozilla's editors

MarkF: My thoughts about QUIC depend on:
… 1/ can we find a standalone implementation?
… 2/ can we take advantage of the quick connection, re-connection?
… 3/ is TLS 1.3 stable enough to use as part of our transport scheme?
… Otherwise, because it has this stream mapping, I feel it's a good fit for what we're trying to achieve.

Schien: The controller establishes a connection to the UDP port reserved by the receiver. Control messages and application messages would go to the same connection?

MarkF: Correct.

Schien: My proposal is more having a standalone TCP server that can be using QUIC. And then we spawn new connection so that we can isolate the transport line for that particular presentation.
… that would be the major difference between our proposals.

MarkF: We use WebRTC for other products, including Chrome for EDU app. At least, I'm not aware of a way to use an RTCDataChannel without using a separate control channel.

Transport protocol - RTCDataChannel

<anssik> WebRTC Data Channel proposal

Schien: Data channel is a protocol part of the WebRTC framework for transmitting non media messages.
… Benefits are that it is already implemented in all major browsers.
… Inherently, WebRTC is trying to do peer to peer connections that can support heterogeneous network environments. It would be easier if we use it to extend to non LAN use cases.
… It already defines data framing format for transmitting commands. So less work for us.
… The current design specifications are in IETF. First part is the establishment protocol and the second part is the actual data formatting on top of the data channel.
… This is just a brief introduction. UDP based, DTLS for encryption, SCTP for control messages.

<Louay> I just checked WebRTC Support on Safari 11.0.1 for Mac -> the following APIs are supported: WebRTC 1.0, ObjectRTC API for WebRTC, Data channel

Schien: Data channel on the protocol site, it can mesh multiple data channels onto single connection. Multiple data channels connecting between two fixed endpoints, it's already meshed.

MarkF: Multiplexing over a single transport, then.

Schien: Yes.
… Previously, my design was to have separate control server protocol to exchange control commands. In those control commands, to create a new one, a new data channel would be created.

Louay: So server acting as a signaling server?

Schien: Yes.
… Signaling server can be used to exchange ICE candidates and the SDP for bootstrapping the RTC data channel connection.
… The server could also be used to broadcast "there is a presentation to be joined" messages in the future.

MarkF: For this server, using QUIC could be a good solution. Establishing a TCP connection is going to consume more power. Using QUIC would provide benefits.

Francois: But if you use QUIC already, what's the reason to use data channels? And not QUIC directly?

Schien: Right. For each presentation connection, it's bound to one data connection.

MarkF: Will there be separate data channels for separate connections?

Schien: Separate data channels. But under the hoods, between the same controller and receiver, no matter how many data channels, it can be bound to the same DTLS connection.

MarkF: I was trying to understand how many connections might be required and if that scales.

Schien: Per user agent.

MarkF: If there are several user agents trying to connect to the same presentation, they will all negotiate the offers/answers at the same time, which may slow things down.

Louay: what is the presentation API signaling data channel?

Schien: It would go through the signaling server.

MarkF: So the advantage is that it maps pretty directly into the Presentation API and it has data framing.
… Do you know if there is fairness or head-of-line blocking for multiple messages on the same line?

Schien: I do not know. I can check.

[Discussion on fairness among buffers when 3 buffers need to be sent]

Schien: On the same data channel, the order of messages is respected.

Louay: The control messages are sent on a different data channel.

Schien: For each presentation connection, we have a data channel for it. So separate data channels per connection.

Schien: We can still reuse the same DTLS connection for this. Multiplexing is done under the hoods. No dependency between data channels. No blocking between messages on different data channels.
… [going through details of workflow to launch a presentation and establishing a data channel]
… For establishing the data channel, the controlling side will prepare offer. The receiving side prepares the answer. Then the ICE procedure gets initiated to establish the right way to connect the two devices. Possible IP addresses are sent to the other side over the control channel.
… Then both end can initiate the negotiation and establishment.

MarkF: Would it be possible for the candidates to be exchanged during discovery?
… The receiver should know that beforehand, which would reduce the time of establishment.
… The only time you require ICE candidate is for STUN traversal. If you already know IP addresses, that's not needed.

Schien: I need to take a look at the latest SDP descriptions.
… The local IP address is the first address you will know. I remember there is an entry for it in the SDP. If that's the case, then you can avoid candidate generation step during the ICE procedure.
… For the Remote Playback API support, we might be able to create an RTC connection to transmit the media stream. We could reuse the existing RTC technologies to do so.
… For the fling case, we still need to introduce an application protocol to be sent over the presentation connection data channel.

MarkF: In the past, we've tried to use RTC media transport for screen mirroring. It's not the greatest fit. WebRTC has made some improvements in the last few years, so we could look at that again.
… There are several challenges that need to be overcome.
… For mirroring, it's not really good. For remoting, it should be OK. We haven't done experiments.

Schien: I can imagine some difficulty because the data channel is in the page context, and you want to pass the media stream through that.

MarkF: Not the biggest problem in Chrome, but we have others.

Schien: From a reliability perspective, the protocols are well-suited. Re-negotiation is also included in the protocol.
… For the latency part, it will take three steps:
… 1/ control channel between two endpoints
… 2/ establish the first data channel
… 3/ an extra round of protocols to spawn a new data channel on top of the existing one when needed.
… It's not best designed to reduce the overhead of establishing data connections. It increases reliability, but it adds extra steps for that.

MarkF: At what point does DTLS negotiation happen? When the first data channel is established?

Schien: Yes.
… I don't have data on how long it takes to transmit a message through the data channel. To be figured out.
… From an implementation/deployment perspective. One well-known library on webrtc.org, shared across browsers. Less interoperability issue between browsers.

Louay: but the other endpoint is not a Web browser.

Francois: right, but all browsers will have the same problem.

MarkF: I imagine that device makers may choose not to include all of the RTC features to support that.
… I will look around, but it might be challenging to remove media-specific features from the library.

Schien: Yes, I think they embedded the whole RTP thing.
… Library size might be too big for embedded devices.

Action: mfoltzgoogle to investigate whether Data Channel can be factored out from the WebRTC.org library

MarkF: There is a framework for identifying providers. IdpProviders. It takes the fingerprint of a public key. This is what we could use for J-PAKE.

Schien: For hardware requirements, existing implementations might have issues on the library size. Since we are not really using the media part of the WebRTC, CPU is probably not an issue.
… Maintaining a long-live data channel may not be power effecient. Some keep-alive messages to exchange.
… Standardization is on-going at IETF.
… For security of data channels, they have security overview. We must have encrypted signaling channel for exchanging SDP.
… We still need to investigate that part.
… The rest is encrypted. There is also a STUN keepalive feature, sort of a constant pinging to the trusted third party so that they known that the data channel is still alive and has not been tampered with.
… The UX highly depends on the latency.

MarkF: It might be useful to use a plain TCP to monitor the time it takes to establish the data channel.
… I think the main advantage is that it provides a path towards guaranteeing that there is a direct connection path.
… For the scope of what we're taking up now, I'm still looking at the advantages of reusing QUIC.

Schien: If we're going to use QUIC as our major transport protocol, we might still take some part of the RTC stack, e.g. ICE for heterogeneous network support.

MarkF: I think we need to have a path to RTC data channel if we need to support heterogeneous network support.

Schien: I note that I heard that it is not going that well to integrate WebSockets on top of QUIC.

MarkF: It sounds like we may need to look at a hybrid approach where we use QUIC as the main transport protocol but leverages RTC data channel features so that we can move to a peer connection in more heterogeneous environments.

Resolved: Take a hybrid approach where we use QUIC as the primary transport protocol, while leveraging RTC data channel features to move to a peer connection in more heterogeneous environments later on.

Revisit Charter 2018

<anssik> Yesterday's Charter 2018 discussion

Anssi: We discussed the charter yesterday already.

Anssi: We heard yesterday that implementations will likely not be ready next year.

Francois: Good story around the open screen protocol.

Anssi: Yes, we satisfy conditions for transitioning the spec to Rec by working on the protocol.

[Discussion on length of extension. 1 year or 2 years]

MarkF: I would prefer short extension of 1 year. Because we'll know in one year from now whether we'll succeed or not.

Anssi: The features for v2, I would argue that they still fit in the charter.

<anssik> Presentation API v2 features

[Plan is to define a testing API surface that would be a section in the Presentation API and that could be used to automate testing]

[Some extensions to WebDriver will be needed as well]

MarkF: I would prefer the testing API surface to be in a normative spec.
… So that developers can test their own applications.

[need to figure out the logistics of where to put the testing API. In the Presentation API Level 2 spec]

[Expected completing for Presentation API should be Q4 2019, so not done by end of next year]

Anssi: Then Mark made a proposal for remote window API, which would make for a concrete v2 feature.

[v2 feature seems to fall in scope of the WG]

[No need to change the language in the description of the Presentation API Level 2]

[Question on the expected completion for Remote Playback API. Q4 2018 should work, with question mark on implementations]

[Text on the test suite could be updated to mention the presentation testing API]

[Update liaisons section]

[Need to wrap up by the end of the month]

Action: Francois to put charter in a GitHub repo

[Some discussion on how and where to map the API onto the protocol]

[Also need to justify push of completion deadline for Presentation API]

Meeting next year

MarkF: I would propose to have teleconferences on a regular basis while there is progress on the protocol. And then early next year to see if a F2F during spring 2018 would be good.

Anssi: Nice 2 days, thanks for attending!

Summary of Action Items

  1. mfoltzgoogle to clarify that device requirements are driven by rendering Web content and not the network stack
  2. mfoltzgoogle to consult internally to find out if smart TVs fit within these requirements
  3. mfoltzgoogle to add hardware cryptography capabilities to device specs
  4. cpn to propose latency ranges for media synchronization use cases (lip sync vs frame sync)
  5. Louay to complete the list of implementations for SSDP, add e.g. libupnp
  6. mfoltzgoogle to find out the MX value used in Chrome
  7. mfoltzgoogle to review Internet-Draft for ecjpake
  8. mfoltzgoogle to propose adding identity assertion step to J-PAKE proposal.
  9. schien to contribute J-PAKE proposal to repo
  10. mfoltzgoogle to review J-PAKE proposal with similar features in Chrome (remote desktop pairing)
  11. schien to check the timeline of TLS 1.3 deployments from Mozilla's editors
  12. mfoltzgoogle to investigate whether Data Channel can be factored out from the WebRTC.org library
  13. Francois to put charter in a GitHub repo

Summary of Resolutions

  1. Take a hybrid approach where we use QUIC as the primary transport protocol, while leveraging RTC data channel features to move to a peer connection in more heterogeneous environments later on.
Minutes formatted by Bert Bos's scribe.perl version 2.37 (2017/11/06 19:13:35), a reimplementation of David Booth's scribe.perl. See CVS log.