Second Screen WG/CG F2F - Day 1/2

Meeting minutes

Agenda bashing

mfoltzgoogle: We'll start with introductions. We've been working on the spec for about 1.5 years.
… We've more or less met all requirements for a 1.0 draft, published some time ago.
… I'll start the day with looking at what we've done in the WG and CG in the past 4 years, their implementation status.
… Then Peter will provide an overview of the draft 1.0 spec of the Open Screen Protocol, with changes since TPAC 2018.
… The first area we still have questions on is authentication. Two main directions. One is PAKE, another is question-answer challenge. We got some feedback internally from security folks that could change the direction we choose, so we want to discuss that today.

anssik: If that feedback from Chrome security could be public, that would help with documenting horizontal review for the spec.

mfoltzgoogle: Then a few more general issues. How do we use TLS 1.3 for instance? Mitigations for remote network attackers, etc. We'll touch on a few of those.
… The remaining topics for discussion are more details on messages exchanged by the two parties. How IDs are handled. A few others.

anssik: I'll have a question. Currently we map from protocol to API. Do we want to do two-way mapping so that the API spec mandates use of the protocol?

mfoltzgoogle: I will have to think about that. I think I would be fine adding implementation notes on how to use the Open Screen Protocol.
… I certainly think that the Open Screen Protocol should be mentioned in the API spec. Whether it's normative or not, I don't know.

anssik: Yes, it's an implementation choice.

mfoltzgoogle: At the conclusion of day 1, the goal is to have directions on ways to resolve most issues against the 1.0 spec, so that we implement the resolutions, e.g. by TPAC.

[Group runs a quick round of introductions]

anssik: Small but very effective group :)
… Any immediate question on agenda for day 1?

Overview of Group Work

Open Screen Protocol slides (3-17)

mfoltzgoogle: A bit of history. Work started before I joined with a breakout session at TPAC. The Presentation API was the first idea that came out of these discussions, incubated in the Second Screen CG for a year or so.
… Then we discussed transition to a Working Group, adding Remote Playback API. But we left implementations up to browser vendors.
… We knew we'd have to focus on interoperability at some point, which is the reason why we kicked out work on the Open Screen Protocol.
… Not much work in the WG in the meantime, with most of the work happening in the CG to prepare the Open Screen Protocol.
… In April of this year, we assembled the different bits that compose the Open Screen Protocol and released v1.0.
… The Presentation API allows a Web app to ask the user to present a different page on a second device.
… What the browser needs to do is, given a URL, figure out which devices can render it, asks the user to select a device, and establish the connection to run the presentation.
… A controlling page can close the presentation, or close the connection without closing the presentation and reconnect later.
… One example is a slideshow of images. It shows the different features of the API.
… In Chrome, this works with connected displays and Chromecast devices (not sure about cloud displays)
… The other specification is the Remote Playback API. As opposed to presenting another page, it focuses on remoting an audio or video element.
… The user or the page can ask the browser to look for compatible displays to remote playback of media.
… What is the CG really focused on? For these two features, we're addressing interoperability through protocol incubation.
… We wanted to make sure that we could handle the case where the controller and receiver are on the same local area network (LAN). We also decided to focus on the 2-UA mode.
… For the remote playback, we focused on the case where the media has a "url" attribute.
… In other words, we chose not to focus on streaming to start with. That said, a bit of work has been done in that area too.
… About functional requirements, we wanted to make sure that we could cover all the parts in the APIs that are left as "implementation details".
… [Mark reviews functional requirements slide]
… We also considered non-functional aspects. Make sure UX can be good, make sure we preserve privacy and security. Make sure that implementations can be efficient in terms of memory and battery since low-end devices may be used as receivers.
… We realized that there are lots of different scenarios that people are looking into (e.g. VR, cloud gaming), so we wanted to make sure that things can be as extendable as possible.
… We brainstormed about the different layers of the protocol stack. Before endpoints can communicate, they need to discover each other, authenticate each other, agree on a transport mechanism that they can use to exchange messages.
… For discovery, we chose mDNS / DNS-SD. That gives you an IP and port that can be used to establish a connection.
… For authentication, we're going to use TLS 1.3 with mutual authentication.
… The transport layer will be based on QUIC.
… To actual provide the message syntax, we looked at different possibilities, and decided to go with CBOR.

anssik: A lot of this work relies on work done in IETF.
… Question on where the work should happen. Overlap between people at the API level so work happening in CG for now.

mfoltzgoogle: We analyzed requirements, possible technical solutions. We took some time defining authentication approaches Challenge/Response using HKDF, and J-PAKE.
… The v1 draft needs more feedback but is complete in terms of addressing requirement.
… We've worked on an Open Screen Protocol Library that implmeents part of the 1.0 spec.
… Hope is that the library will help drive adoption, as happened in WebRTC.
… We've tried hard to minimize dependencies, so that others can adopt the library easily.
… It's complicated, we'll get to that, but goal is to make it work both inside Chromium and outside Chromium.
… We're still debating authentication mechanisms, some ID issues, some message details, and then capabilities and extensions.
… Above and beyond the spec itself, a few items that we may want to complete. We never fully defined what the requirements are for the Remote Playback API. The TAG asked for an explainer, which we started to draft.

anssik: We're doing that a bit backwards. Most groups come with an explainer first and then work on a spec. We have the spec already, and now writing an explainer.

mfoltzgoogle: Another document should talk about pros and cons of specifying custom schemes for use in the Presentation API. Many use cases rely on non HTTP schemes, such as Cast, DIAL, hbbtv. It seems important to reflect on how to do that properly.
… Also additional security analysis, and a few items that we'll dig into on Day 2.

anssik: Regarding horizontal reviews, the TAG has indicated interest to review CG work.
… I don't know about others.

francois: Good question, some may want to look into it. Depends what we want to do next, we can ask the different horizontal groups, e.g., security, accessibility for remote playback API around synchronization.

anssik: this is unusual, the W3C process doesn't recognise CG work
… let's start with TAG review, then we can ask the other groups

anssik: No clear guidance for CG work, that's the conclusion for now.

mfoltzgoogle: I know some groups may be interested at looking into some aspects of it. For instance, the Media & Entertainment IG has been looking at synchronization aspects.
… I would rather have them look at specific pieces of the protocol, so as not to write too many explainers.

anssik: OK, let's work on this tomorrow.

mfoltzgoogle: OK. As mentioned earlier, a parallel project is to develop the Open Screen Protocol library.
… That library is slowly converging to what the spec defines.
… Full mDNS support, full QUIC support. We have a platform abstraction layer that allows to port the code to different platforms.
… We have CBOR support via code generation based on CDDL to create parsers. Saves a lot of programming.
… We recently landed all the messages needed to support the Presentation API.
… All of these features have been demonstrated in examples in C++ and Go. Peter will give us the data.
… The things we have not yet finished: we haven't finished doing authentication, in part because we're still discussing them. Also Remote Playback. We're also planning to integrate the library in Chromium for the controlling part.
… We have been doing some exploration on media streaming. We may or may not want to add that into the scope of the group. We had a few discussions on doing LAN traversal using ICE. Not really in our CG scope either.

anssik: What's the driving use case for LAN traversal? From mobile to screen?

mfoltzgoogle: Main driver is education use cases.
… Or when you're at your friends house.
… No direct network connection.
… We also did as a group other investigations to discover devices. mDNS has a few hiccups. You may not be able to use multicast at all, e.g. for LAN traversal. Possibilities include Bluetooth, NFC, QR codes, etc.
… Also Peter has been looking at implementations based on lower primitives such as WebTransport/WebCodecs.
… There may be feature requests for a V2 as well.

anssik: Have people in the room thought about requirements for a V2? I'm thinking about HbbTV for instance. It would be good to document these requirements somewhere, just to make sure we have that data.
… A good topic for day 2.

Brief Overview of Draft 1.0 Spec

Open Screen Protocol slides (18-26)

Peter: There's the bucket of things we agreed on and that have been done, some remaining to do, and then things we have not yet agreed on.
… [going through lists on slides]
… About CDDL, we need to indicate type key for a given message, although I'm not sure we agreed on that actually, probably not in the right list
… We agreed on messages for both specs, with Remote Playback messages being extensive, now done.
… For streaming, we agreed on the concept of an audio- and video-frame. That's now in there.
… Some things we did but still need agreement on.
… First is thus the need to indicate a type key for a message.

PROPOSED: Keep comment in CDDL to indicate type key for a given message

anssik: Seems the most straightforward way

Resolved: Keep comment in CDDL to indicate type key for a given message

Peter: mDNS has a limit for the size of the display name.
… It's allowed to have a longer display name but what comes across mDNS is going to be truncated.
… We want to make sure that the agent must compare the truncated one with the full one and make sure it's a prefix.

anssik: Would the user see the truncated name based on the mDNS before seeing the full name?

Peter: It's possible that the browser would display the names before, but check enforces the prefix does not change. When you truncate, you put as many characters as possible. The other endpoint can tell the string was truncated thanks to the last character.
… 64 characters, I think

Eric: It would be a mistake for user agents to display to users anything other than a full name.
… There should be advice to only display the fullname
… With Unicode, Emojis, it's easy to fill 64 characters.

anssik: Just make sure that implementers are aware of the issue.

Eric: make it a should not display truncated strings.

mfoltzgoogle: The other item is that we may also mention the name collision protocol in the mDNS spec. If distinct endpoints advertize the same name, the mechanism allows them to figure that out and resolve the collision.

anssik: Also how to distinguish strings that look similar but use different (similar-looking) characters
… That's a UI guideline. It may fit more in the API spec

Peter: I believe we could put that in the protocol spec so that agents which are not browsers are also aware of that.

PROPOSED: Add recommendation at the SHOULD level for agents to display fullname (as opposed to display truncated display names)

scottlow: Should the spec mandate the receiver to show its own name?

Peter: But you wouldn't want all displays to suddenly show their name.

mfoltzgoogle: For first authentication, yes, it would be reasonable for receiver to show its name.
… Chromecast dongles show their name when they are on and no application is launched. It may be tricky to require receivers to do so though.

anssik: Maybe in the future, you have meeting rooms where you have screens everywhere. Hard to tell how to do it today though.

mfoltzgoogle: Separate from authentication, some way to disambiguiate receivers that have similar names would be useful.

Peter: In the future, we may look into other types of discovery and authentication mechanisms.

anssik: This seems like a good v2 feature. "Find my second screen!". Scott, if you want to raise it on GitHub, that would be good.

Resolved: Add recommendation at the SHOULD level for agents to display fullname (as opposed to display truncated display names)

<scottlow> V2 issue raised on GitHub around "pinging" a receiver

Peter: Along with the name being shown, we added a mechanism with two flags inside the agent info to tell whether the agent can do audio/video or audio only.
… Currently, the flags are just bits. We may expand on them to show which protocols and formats you speak.

Eric: I just want to make sure we bake it in.

Peter: After authentication, you have access to much more information. This is before authentication.

anssik: This would only be used for the UI?

Peter: Not only, you can use this to know that the device is an audio device if you're going to do streaming.

<mfoltzgoogle> Pull request for additional capabilities in agent-info

Peter: For any of the protocol, you'll want to use different icons.

mfoltzgoogle: It's been discussed but we don't expose device capabilities right now. We don't allow to filter devices out based on capabilities for now.

Peter: It's not just for Remote Playback. There are different use cases where you want to use different icons.

Peter: Regardless on the agreement on booleans or anything, some agreement on agents having a way to describe audio/video capabilities

Resolved: Agreement to use receives-audio/receives-video capabilities as specified

Peter: Moving on to length prefix

mfoltzgoogle: CBOR messages all have an inherent length. The length prefix would simply have allowed the client to know how many bytes it needs before it sends that for parsing. But that's not needed.

Peter: The parse tells you whether it's done.
… The question of what we do for forward compatibility when we change structure of the fields, that's something that we should create an issue about.

PROPOSED: Don't do length prefix for CBOR messages

Resolved: Don't do length prefix for CBOR messages

Peter: Separately, we should create an issue for when you can add/remove fields and how you can do that (forward-compatibility)
… Now, for the type key prefix, I chose a QUIC uvarint (2 bits for size). I looked into CBOR tags. I looked into 1-bit varints. The only downsize is that the number of values you can get is 64. You may get 128 with other mechanisms.

mfoltzgoogle: Better than CBOR where you have only 24.

Eric: How many do you have now?

Peter: Around 50.
… I divided them into the ones that should be small and ones that don't matter. The total space is enormous.

Resolved: Use a QUIC uvarint (2 bits for size) for type key prefix

Peter: The next two go together. Related to the Remote Playback API. Plenty of mechanisms, notably around texttracks.
… You can change text tracks or change existing ones by adding a cue. You can also change the mode of the text track.
… At TPAC, we had not figured out a way to do all of this.
… The only limitation is that, when you add a cue, there are many things that you may want to do related to positioning, etc. Not addressed here.
… At least, it lays the foundations to be able to add/remove cues and manage text tracks.
… Get placement, positioning is v2, but the ability to manipulate text tracks and cues is v1.

Eric: Right, this seems sufficient for WebVTT support.

[Discussion on generic cues]

Chris: We want to be able to support TTML and IMSC cues, which we can't do with Remote Playback API as it stands

Eric: Generic Cue, discussed at FOMS is the planned solution there

Chris: We are also proposing DataCue, in early stage incubation, could be a V2 feature to discuss

Resolved: changed-text-tracks on remote playback controls allows for adding and removing cues instead of separate method

Resolved: added-text-tracks allows for adding text tracks

Peter: Now two slides that should go faster. First things that probably don't need agreement.
… First thing is that we used "agent" for an implementation of the Open Screen Protocol. The next is that we borrowed "controller" and "receiver" from API specs
… In the context of streaming, we used "sender" and "receiver".

francois: I note that controller isn't the exact same term as used in the Presentation API, but that's explained in the spec
… Reading the current spec, I'm not clear as to what an agent needs to implement
… In the Presentation API, an agent may want to take on different roles, partial implementations, etc
… Would it be better to precisely define what an implementation needs to support?

Peter: We could say what all agents must implement. For example, it's not required to implement both Remote Playback and Presentation API, or either - you could just do streaming

francois: What do I need to test for, when testing an implementation, i.e., normative statements? It's not entirely clear to me

anssik: What is the best practice for test suites for protocol work at IETF?

Peter: Typically there are interoperability events, where implementers come together

mfoltzgoogle: I think the spec should map which parts should be implemented by different conformance classes
... This would help clarify what conformance to the spec means

Peter: Good idea to map conformance classes to capabilities

Action: Mark Foltz to document which parts of the specs apply to which conformance class from an API perspective

Peter: On to the hash used for mDNS fingerprint. sha-512 or sha-256, but not md5 or md2. We just want to make sure we're not using old insecure ones.
… The mDNS timestamp, we ended up renaming it "mv" instead of "ts".
… Maybe we can save the bit on authentication for later, but if we go with no JPAKE, there are certain parameters that you need to pick for HKDF.
… Maybe we don't need a resolution here, I just wanted people to be aware that these little decisions were made.

Nigel: Only comment is that, over time, parameters may need to change.

Peter: Some of these are on the wire things that the agents can negotiate.
… The tuning of the Presentation ID was initially done by the controller. But we realized that it's easier if the receiver chooses the ID to make scoping easier. It didn't really have any impact on the protocol.
… Next point is that originally we had HTTP headers a big blog of HTTP/1.1. Doesn't make sense, so replaced as key/value pairs.
… The protocol was designed to be completely independent of the APIs so that you could do implement it between two non-browsers. There's now a separate section for the mapping between the APIs and the protocol.
… In the Remote Playback API, when you want to determine which audio track is enabled/disabled, now using a set of IDs and not booleans.
… For streaming, I showed a payload of "any", but "bytes" is more logical.
… "frame sequence number" instead of "frame ID"
… Some streaming capabilities have been added, e.g. color profiles, native resolutions, minimal video bit rate, max audio channels, etc.

[Some discussion on color profiles, Media Capabilities and reference to CSS spec]

<anssik> Display capabilities in Media Capabitilies

Peter: So we should look at the Media Capabilities which then references CSS.

Eric: Exactly.

Peter: Moving on to things have not been done.
… First one is PAKE or not.
… Another one is the possibility to extend capabilities. Links to #123 with a PR #171 that could perhaps help address it.

mfoltzgoogle: What's in v1 is the framework for extension, but touch screen example is v2.

Peter: Extensions could use numbers to express support for capabilities.

Eric: Information is only between sender and receiver?

Peter: Yes, right now, there's no API for it. However, there's one place where I thought this could be useful.
… E.g. prompt and only show things in the list that have this set of capabilities.

Eric: How does an application know whether to prompt? It seems that exposing this information to applications increases fingerprinting surface
… Could use it to detect whether a user is at work or at home, for example

Peter: This example (slide 170) show how API capabilities map to the protocol

Peter: Next is about 0-length connection ID.

PROPOSED: PR #171 is good for landing

PROPOSED: PR #171 is good for landing to address issue 123

Resolved: PR #171 is good for landing to address issue 123

Peter: The QUIC protocol changed recently with regards with connection IDs. Now, the short header does include one of the connection IDs all the time, which means that you have to decide whether you're going to use 0-length connection IDs. The big limitation here is that you cannot have connection migration.
… If I change my IP and port, and if I don't send that connection ID, you're not going to know that it's me.
… However, 0-length connection IDs allows to reduce the size of packets.
… Connections are pretty ephemeral anyway, so not a big deal.
… I think we should say that we're going to use 0-length connection ID. That's issue #169.
… I put it at the SHOULD level in PR #170. Maybe between client and server, you might want to use real connection IDs.

Eric: What percentage of the message would be used by the connection ID?

Peter: It depends on the length of the number you're going to use. It could be two bytes. In lot of cases, this may not be a big deal.
… If we were then to do QUIC with ICE, then you would never have a connection ID.

mfoltzgoogle: The other use for this is if you have proxies in place.

Peter: Yes, in that context, it might be useful to have connection IDs, which is why it stuck to the SHOULD level.

PROPOSED: PR #170 is good for landing to address issue #169

Resolved: PR #170 is good for landing to address issue #169

Peter: One of the decisions when writing the spec was to pick type keys. We picked some. One thing we could do is to talk about it.
… A possible V2 thing is remote decoding. Instead of giving a URL, you would stream media over the wire. I wrote a pull request that describes how to stream media over the wire.
… It's possible, but we need to decide whether that's v1 or v2.

Chris: This also affects Media Capabilities API, as you'd want to use the capabilities of the remote device rather than the controller, to decide which segment representation to request

anssik: Isn't that like 1-UA mode?

mfoltzgoogle: The basic capability needed for media remoting or 1-UA is streaming.

anssik: We heard from Eric that he would like to see this earlier than later

mfoltzgoogle: It doesn't delay our current roadmap for the spec, like wide review or the like.
… It expands the scope of the spec.

anssik: The API doesn't distinguish between 1-UA and 2-UA.

mfoltzgoogle: Yes, the API was designed to be agnostic as much as possible to the underlying mode.
… Some things are easier in a given mode.

anssik: So no change to the API. So it's only about expanding the scope of the protocol.

mfoltzgoogle: Yes. We'd have to look further into things such as color support, codec capabilities, etc.

anssik: Streaming would help get more implementations.
… I don't know if we can take a decision right now.
… We don't want to do work that is not useful.

mfoltzgoogle: We're not going to be able to support all of the features.

Eric: it's a very important feature.

anssik: OK, let's come back to this issue tomorrow and see if we can take a decision.

Eric: One question about <source> is what happens if application changes it after remoting. Does it get re-evaluated?
… Are there issues with syncing up again?

Peter: From a protocol perspective, question is whether the information gets sent to the receiver.
… so that it can make its own decision.

Eric: Yes. Depending on the capabilities of the decoder on the receiving size, you may get a different choice.

Peter: so question is whether the selection is done by the controller or the receiver.
… If I were to make a PR, I would need to have a complete set of all the data that needs to be passed over.

mfoltzgoogle: Yes, I don't think it's feasable to have the controller evaluate those on the receiver's behalf. It should be up to the receiver to choose the source.
… The Remote Playback API does not say anything about that. It just talks about synchronizing state.

[Lunch break]

Action: Mark to review open issues and tag as v1-spec as needed

Algorithm for what messages to send when local/remote media element changes (#158)

Peter: How precisely do we need to describe how a user agent should send remote playback control messages, when the state changes on the controlling media element?

Eric: For UAs to have the same behaviour, it would need to be verbose, as per the HTML spec

Eric: More will be needed, as HTML only talks about this for local playing content
... e.g., the stalled event when no data is received. The receiver side would have to have the same thing
... You would have to send a message back, and make those associations explicit

Mark: I'm proposing to refer to parts of HTML
... There probably aren't that many, things that change without input from the user
... cite>'progress', 'stalled' events - a handful of events were you need to send state back

Peter: Using 'stalled' as an example, we have a bit in the message for that, but it doesn't currently say *when* you should send the message

Eric: Would be enough to say the message needs to be sent when HTML wants to generate the event

Peter: What if the HTML spec changes?

Eric: Its unlikely, it's been stable for a couple of years
... There's a difference between state changes caused by sender input, and changes that happen at the receiver
... Only a few things can change without input from the controller

Peter: Can the receiver UA allow the user to make changes?

Eric: Yes

Peter: Can we say, when the state looks different from the JavaScript perspective, a message should be sent?

Eric: Yes

Eric: We should talk about 'muted' and 'volume'. Should it be possible to control the audio characteristics of the receiver from the controller?
... Does the volume attribute represent the volume at the controller or receiver, and do they have to be in sync?

Peter: We do have 'volume' and 'muted', which syncs the two media elements, equivalent to executing from JavaScript

Eric: With AirPlay, changing the volume on the controller doesn't affect the receiver. You use the remote control to change volume on the device

Peter: So as written now, it would change the volume, not the hardware volume but the attenuation

Anssi: Is this an area where we need flexibility for implementers?

Mark: Yes. From our experience, having a protocol to support the hardware volume is important, so the protocol needs to distinguish those cases
... Should add control of the hardware volume as a separate protocol feature

Peter: I propose: For fastSeek(), play(), pause() are called, send a message, or if the attributes observably change, send a message

Eric: It would be logical to update it on progress events, should be every 350ms

Peter: What things are on demand? There's currentTime

Eric: That might be all

Action: Peter to raise a pull request for issue 158

Peter: Other remaining issues not done relate to codec names and mime types

Peter: Three options: put the whole mime type, use the mime type without the prefix

Eric: using without the prefix won't work, there are lots of audio types

Peter: For well known types, we'd want a known string there

Eric: Also info on profile level?
... If we do that, we could use extended mime type

Peter: How complex would that get?

Eric: It's different for every codec, each codec defines its own

Mark: To support media remoting on top of streaming, we'd want to send the same fidelity of data about the media stream as is available through HTML
... Are extended mime types the most accurate way to describe them?

Peter: Could put the codec string from the mime type?

Eric: Yes, it's the best we have
... rfc6381

Mark: Are there RFCs for the syntax for describing different codecs?
... Need consistency for how to interpret the strings

Peter: Last one is about capabilities for HDR

Eric: It's in scope for the new Media WG

Mark: There's whether the media engine understands the metadata, and whether the display is capable of showing it

Eric: Don't try to solve it here, wait for it to be solved for HTML media, then use that here

Anssi: This concludes the overview session

Authentication

Open Screen Protocol slides (27-40)

Peter: At TPAC, we decide to investigate challenge/response, simpler than J-PAKE
... We specced it, cleared it with security people, then realised there's an issue
... The security folks recommended using scrypt to harden the PIN number
... This requires specifying how "memory hard" it should be
... Could be 32MB RAM
... We asked if we could get away with less, but awaiting reply on that
... So we made a PR for J-PAKE as a back-up plan. Talking to the security folks, not clear which PAKE to use (SPAKE, J-PAKE, OPAQUE)
... SPAKE2 nice, has implementations
... J-PAKE has an RFC that's done. Security folks don't have clear advice
... Also not clear how "memory hard" it is
... If challenge/response doesn't need a lot of memory, we'd use that
... If PAKE doesn't need a lot, we could use that
... otherwise, not sure what we'd do!
... We didn't find any C++ implementations of J-PAKE, so we'd have to implement ourselves
... SPAKE2 has an implementation, but the spec is draft
... We're hoping to move ahead with challenge/response
... Want input from security reviewers
... Someone from IETF suggested asking their review, but requires writing an IETF draft

Mark: Did the relationship between memory and PIN entropy come up?

Peter: Yes, there is a relationship. If the PIN is very large, we wouldn't need the scrypt stage at all

Eric: We wouldn't want more than about 8

Nigel: I'm not sure these things are orthogonal. scrypt is useful for mapping a small dictionary of passwords is something of larger entropy
... making it invisible to a man in the middle
... How to stop someone precomputing the mapping between all common passwords and the handshake?
... The latest draft of SPAKE introduces scrypt. They don't have a robust way to protect against simple passwords and rainbow tables
... Using a salt string, there's a race to produce an acceptable challenge/response as a legitimate describes
... scrypt slows down the attack
... I think you still you have the problem of an attacker knowing you use JPAKE and a limited number of passwords
... The latest draft of SPAKE has memory hard functions. A PAKE algorithm gives you a shared secret
... You may end up with the same weaknesses, needing large passwords, or use scrypt. There's a tradeoff

Peter: It's wasn't clear to me that the PAKE was more secure, does it need a memory hard function
... SPAKE2 spec says the MHF is out of scope of the document
... The parameter selection is out of scope

Nigel: So scope it to the most constrained device, but then your attacker has much more compute resource than you do

Mark: The memory requirement and search space determines what the attacker needs to do

Mark: Regarding common passwords, there are solutions that don't allow the user to set the password, but generate them from a strong RNG
… That also changes the parameters of the attacker's solution, to get real entropy

Nigel: I agree, I think that's the right thing to do

Peter: The part of the spec is the cost of the scrypt, if we can get the number down to 10, would be good

Peter: Moving on to PINs. Need to resolve who shows the PIN and who enters the PIN
… Both sides could include an auth-capabilities message that includes a value indicating PSK ease of input
… 100 is super easy, 0 is impossible to enter
… Whichever side is easier gets to do the entry
… In a tie, the server presents, and the client inputs
… For example, a phone has easy numeric input, and the TV says it's possible, but harder, so here the TV shows the number or QR code and the phone enters
… Or, a phone and a speaker, or a TV and speaker
… It requires two fields: the ease of input and the type to use (numeric, alphanumeric, QR code)
… We discovered the need for this when we tried to implemented it

Mark: This is send pre-authentication. Are there any downgrade attacks possible?
… It doesn't change the cryptographic parameters of the challenge?

Peter: That's right

Peter: With Asian languages, alphanumeric is more difficult so they tend to stick to numeric codes
… The QR code would have to encode the same PIN, as you'd display both together

Anssi: In the speaker / TV case, could it be done without user interaction?

Nigel: There's also a human aspect to this, as people want it to be straightforward, whereas the protocol demands high entropy
… It's hard to see how choosing either one for input would go wrong

Resolved: Agreed use auth-capabilities message for decision where to input the pairing code

Chris: We did a study into the human aspects: https://‌dl.acm.org/‌citation.cfm?id=2858377

Mark: We may want to get accessibility review on this

<anssik> [discussing issue #111]

<anssik> "The challenger must limit the time the responder has to send a response to 60 seconds (to avoid the possibility of brute-force attacks.)" https://‌webscreens.github.io/‌openscreenprotocol/#authentication

Peter: We propose having a field to indicate the minimum bits of entropy, from 20 to 60, and the default is 20

Mark: How would an agent choose the entropy? Hardware limitations

Peter: It's a balance between difficulty of input and security

Mark: This allows us to change the requirement over time, so we may want to increase the number of bits
… Seems like a good thing to negotiate
… Typically, the controller will be the one setting the minimum

Peter: An attempt to downgrade would result in auth failure

Nigel: So it's an insistence

Mark: So the spec should include an analysis of the downgrade

Resolved: Move ahead with psk-min-bits-of-entropy, with a range of 20 to 60 to resolve #111

<anssik> [discussing issue #135]

Peter: For issue 135, the kinds of certificates. Should we support EC certs or RSA?
… We should require acceptance of EC certificates, but can use RSA if you want

Nigel: TLS 1.3 doesn't do RSA encryption, so this goes with the flow of the community

Peter: What certificate extensions should we use? Mark suggested extensions should be ignored

Mark: In the future, we may want to specify attributes of the certificates that are used, but not ready to spec what those are now
… Implementations today should not look at extensions, unless added to the spec
… TLS 1.3 extensions don't seem necessary in our application, but I'd like to see how implementations make use of extensions to decide what the spec should say about that

Nigel: This is a wise approach. Until you have a use case, you don't know you need an extension

Resolved: For issue 135, require acceptance of EC certs, ignore cert extensions, and no requirement for TLS1.3 extensions

<anssik> [discussing issue #118]

Peter: Issue 118 relates to the UI for trusted and untrusted data
… What should we say anything in the spec about what the text that displays the PIN is like?
… Do I show whether you're authenticated or not, or show previous failed auth attempts?
… Do we need to show the auth state of the name or icon before authn

Anssi: Generally, specs don't go into the UX, each platform has its own guidelines

Eric: We'd want to do some experimentation, as we don't have much experience with this

Chris: Non-normative text to illustrate the flow?

Peter: We may want to say that agents should make it clear which other agents are authenticated or not, for example.

Mark: We may be able to write some general principles
… The spec talks about flagging devices that may be trying to impersonate other devices
… Could go in the main spec or a separate paper. I agree we shouldn't require specific UI in the spec

Resolved: The spec should not require specific UX but may want to give guidance on particular aspects, e.g., showing whether an agent is authenticated or not

Security and Privacy

Open Screen Protocol slides (41-55)

Mark: Some other open security and privacy issues, not covered by the auth section
… We talk about TLS1.3 being important for the security architecture for OSP
… Avoid issues with past TLS implementations
… A few key issues that any application should be aware of, some potential attack vectors
… Attempts to change how the handshake is done, change modes of use, downgrades, ciphers have tradeoffs
… attacks based on timing and length of payloads
… Content of keys and secrets
… If the key is compromised, does that compromise future sessions: forward secrecy
… 0-RTT, early data, which can potentially improve performance, but replay attacks are possible
… There's RFC 8446 C-E that's good background material
… For each attack vectors, there are high level things we can do to make it more resistant to attack
… We can forbid OSP endpoints from downgrading to TLS 1.2

Peter: Does QUIC require TLS 1.3?

Mark: So we may get this for free...

Peter: (checks) It does require it

Mark: The solution for avoiding cipher based attacks is to require longer ciphers
… We should do some benchmarking to guide our choices
… For timing attacks, some are based on time to encrypt the payload. TLS 1.3 requires use of constant time ciphers
… We discussed using TLS pre-shared keys, but in the spec I don't think we require them to be used

Peter: We aren't using them

Mark: That removes issues with compromise of pre-shared keys
… For replay attacks, there are advantages to using early data, we'd want to use for certain message types

Peter: The simplest way to avoid 0-RTT problems is to not use it. I don't think we have a use case that would benefit from 0-RTT
… For now, let's not use 0-RTT and reassess if we think there's benefit

(general agreement)

Peter: The QUIC WG and TLS WG are pushing for only strong ciphers, not sure we need to do more than use what they've chosen

Mark: I wanted to check which ciphers have good hardware acceleration support on ARM chipsets
… Impacts for CPU requirement, particularly for media streaming
… It basically comes down to which key length you use with AES, or ?. Wasn't clear whether block based ciphers are a good fit for streaming applications
… For items 1,3,4,5, we propose to update the spec to require TLS 1.3 and constant time ciphers. For 3 and 4, note that those features won't be used. For 2, consider hardware requirements when recommending ciphers

Nigel: Regarding pre-shared keys with TLS 1.3. There are two circumstances where it's used: out of band or resumption, something we should note in the spec

Mark: Session resumption is part of the spec

Peter: The application could rule it out, to avoid storing state between sessions

Action: Peter to research if there is advantage to session resumption outside of early data

Peter: Resuming a connection could use less power. Then we'd have to figure out what needs to be stored for session resumption

Eric: Are there any issues with keys? There are for content that needs authentication. What happens if you start playback on the controller then continue remotely. Do you require authentication?

Peter: No, you shouldn't require it.

Eric: So how does it work for encrypted content?

Peter: The keys for DRM are completely separate

Mark: This is about securing the connection between controller and receiver. It doesn't describe how the content is encrypted in addition.

Peter: I'm OK with banning 0-RTT and TLS less than 1.3.

Mark: I think it depends on the ciphers we choose for item 2.
… I'd like to understand better the trade-off with constant time AEAD ciphers.
… Most of the ciphers for TLS 1.3 are constant time.
… We may get it for free

<anssik> [discussing issue #131]

Mark: Issue 131. What can network agents outside the LAN potentially do, if they can route traffic to OSP agents on the LAN
… This happens a lot, as home routers are very bad, and allow internet traffic to be routed to internal endpoints due to poor UPnP implementations
… Simplest thing to do is nothing. The worst that happens is a DoS, where you swamp the device with authn requests or failed handshakes
… Or we can tell the user if we detect attempts from unexpected network endpoints
… For mitigation, I had an idea (but not convinced of) is to put something in the early handshake that could only be found through mDNS
… Most restrictive, we could ban connections from non-private IP addresses
… https://‌www.theverge.com/‌2019/‌1/‌2/‌18165386/‌pewdiepie-chromecast-hack-tseries-google-chromecast-smart-tv

Mark: Better to do earlier, before attacker can have any side effects on the target device. Ideal if the extra data is provided before a PIN prompt is shown

Eric: Required, rather than ideal

Mark: Don't want to prevent us from using ICE in the future
… Advertise a token through another mechanism, eg, Bluetooth

Nigel: Is there information that's already present that could be used, rather than having to generate a separate token?

Mark: We currently advertise the fingerprint, but that's not unique information, we'd want something that's only accessible to the discovery mechanism

Resolved: Add a token, to be advertised through mDNS, and be required in the authn required prior to PIN display

Mark: In the Presentation API spec, we had privacy review feedback. If you start a presentation, and another party also connects, then the UA should be able to notify the user that this happened
… So that you know that information you share may be visible to someone else
… The protocol didn't have a way to notify all controllers of the individual connections added to a presentation

Anssi: Should the joining party also be notified?

Mark: This should address both cases. To open a connection to a presentation, you send a request and get a response
… We can add the number of presentation connections and send an event to the other connections with the number
… So everyone who is connected has the same view of the number of connections

Resolved: Add a presentation connection changed event that includes the number of connections to the presentation

Mark: Issue 114, trusted displays. This is a complicated topic. It's important to come up with ways to distinguish levels of trust, and a lot of requirements to think about
… No concrete proposal yet.
… We've focused on MITM, ensuring data remains private, we haven't focused on providing provable properties of the device (manufacturer, software, specific name, etc)
… If we want this, we should add an attestation protocol.
… What facts do we want to verify? Where to they come from: the manufacturer or the software?
… Information to verify manually by the user.
… Who cares about doing the verification: the user agent or the webpage? is it for the user or application?
… When we've gathered some of this information, can make a proposal. But would be speculative to do it now
… My request to the group is to think about these questions and feed back. We'll want to get internal input from Google

Peter: What do other browsers feel about the question of streaming from tab capture when there's encrypted content on the page?
… Cast allows that now, but only because of the certificate on the device

Eric: Also, what if I'm signed in and play encrypted content, then fling the URL, The other device would need to be able to decrypt using the keys already exchanged

Mark: Netflix are interested in this

Eric: AirPlay has a mechanism to share information such as cookies
… A future version of AirPlay that's OSP based would have to have this

Mark: It requires a solution for managing root keys

Peter: We can standardise the mechanism, but then it's up to vendors which root keys they support

Mark: We want to avoid having all receivers having to be certified by all controllers

Nigel: We have to be careful, our experience as a broadcaster, getting trust in a horizontal market is very difficult
… Also implies management and a compliance regime

Peter: Feels like a V2 issue

Mark: Knowing what information is important from one agent to another will help start that process (manufacturer, serial number, certificates, ...)

Peter: The use case is "can this agent be trusted with encrypted content"

Mark: That usually implies something about the hardware, where it came from, and a trusted software stack

Resolved: Issue 114, defer to V2, and invite feedback on which information is important to establish trust between agents

Presentation API Protocol Issues

Open Screen Protocol slides (70-75)

Mark: We realised that a couple of messages, there's nothing the embedder or JavaScript needs to know about what happened
… We could simplify the protocol around closing connections and terminating presentations
… This would simplify implementations, but we'd lose some debugging information
… When you close a connection from the receiver, we send an event to the controller saying the connection is closed
… Proposal to send a close event, then send a change event to all other parties. Then we could remote close-request and close-response, as it doesn't require a response.

Peter: A close response doesn't go anywhere in the Presentation API

Mark: The channel is basically useless once one side closes it

Peter: And the next time it's used, you'd get an error anyway

Mark: Terminate works the same way. When you decide to terminate, you only signal one side.
… But you can end up with a presentation that's still running that you can't terminate, unless you reconnect
… If we change the change the spec to give the controller feedback on termination, we could use request/response, or use an event

Peter: Seems strange for the controller to send a terminate event, as it's something that occurs at the receiver

Mark: An event is more a request without a response

Peter: What about a receiver that refuses to terminate, there'd be no way to know

Mark: We don't have a way in the spec to see if a presentation is still running
… Keeping the response would be helpful for debugging issues
… But, if we follow the spec strictly, it's not required
… I think Chrome would want to know if termination requests were failing

Resolved: Remove request/response messages for presentation close, and keep request/response for terminate and explain why this is needed

Streaming and Capabilities

Open Screen Protocol slides (87-92)

Peter: Showed an idea on how we can have audio and video streams
… [showing streams in the spec]
… Basically, anything that does not change very often goes into metadata, e.g. cvr or frame size. A frame can reference metadata that has been negotiated previously and does not need to send it over again.
… For video, you want to be prepared for things such as temporal scalability, where you need to reference frames that can be skipped.
… For time scale, you don't need to put the nominator and the denominator in the packet, only the nominator is enough.
… The part that is in the pull request is the concept to start or stop a session. The sender indicates the codec it supports. Rather than having the receiver select, the receiver could say which codec profiles it supports.
… You may want to specify the encoding and screen resolution.

Eric: Do we need the same thing in the offer?

Peter: That is a good idea.

mfoltzgoogle: Aspect ratio? Which site is responsible for producing that matches the aspect ratio?

Peter: So far, I'm assuming the receiver can do that.

Peter: I can usually include a sync time in some frame that the device can use to sync things up.

Eric: Is it out of scope to send video to a receiver and the audio to the speakers?

Peter: Yes.

mfoltzgoogle: We don't have a protocol yet to do that.

Peter: If you have some ideas

Eric: I don't but I know some people that do. I know it it an issue.

anssik: Don't know if you sync clocks across devices

Eric: yes, through PTP.
… Sending audio and video here, and also sending video elsewhere. Something that we'd definitely want to handle at some point.

anssik: There's a community group, called the Multi-Device Timing Community Group, who has been looking into this
… They have a proof of concept for synchronizing media, which was pretty convincing.
… Potentially an issue to open?

Peter: Yes, we should track this somewhere. I will look into what solutions may be possible.

Peter: Moving to remoting. Remoting is live streaming. The media is already there.
… You're trying to transfer from one buffer to another buffer. As in MSE. Different from streaming.
… The way remoting can work is that the receiver sends capabilities to the sender just as we talked in TPAC, and then, rather than offering encodings, sender just send the encoding for establishing a reference and starts pushing the media.

Eric: That's assuming that the app has enough information about the capabilities of the receiver.

Peter: Yes, this supposes the application has access to capabilities information.
… Need to know maximum bitrate and codec support.
… Transcoding is always a possible fallback.
… Size is another dimension.
… e.g. 720p vs. 4K.

cpn: Also whether it can decode in software or hardware

Peter: Yes, this is all assuming the app has the information or that the user agent can transcode.

Eric: I'd like to avoid the necessity to transcode.
… Decrypt, decode, encrypt, encode. You may not be contractually allowed to transmit unencrypted frames and may not have encrypt capabilities.
… We need to come up with a mechanism whereby the application could offer the different possibilities and the receiver could select one without revealing too much information.
… The application is the only one that knows what its server source can offer.

mfoltzgoogle: Then we would need a way for the app to know which one of these offers is chosen.

Eric: Yes, it would have to know.

Peter: Essentially, it's about having the streaming offer/response exchanges at the API level.

mfoltzgoogle: Through Media Capabilities, the app is already able to tell capabilities.

Peter: This is the opposite though.

mfoltzgoogle: Yes, it's better from a fingerprinting point of view.