Second Screen WG - TPAC F2F - Day 1/2

Meeting minutes

Introduction

anssik: Quick round of introductions would be good as I see some new faces. I'm Anssi, from Intel, chair of the group.
… Group started as a breakout session 4 years ago. Then we had a Working Group focused on APIs
… Two years ago, following feedback received from Mozilla, the work expanded into protocol work done in the Second Screen Community Group.

Peter: From Google, I have been working on WebRTC for many years. Among other things, I'm chair of IETF ICE group.
… Joined the open screen protocol discussion some time ago.

MarkFoltz: From Google. I've been involved in the group since very early on. Editor of the Presentation API, and driving the work on the Open Screen Protocol.

Mounir: From Google. Involved in the implementation of the Remote Playback API in Chrome.

Francois: Dev rel at Google.

Chahoud: From Microsoft. Ex-HTML and working on media, although I'm moving to Web Payments these days.

GeunHyung: Involved in this group since last year. Interested in media services on the Web.
… HTML Converged Technology forum in Korea

George: I'm with the DAISY Consortium.
… Been working on accessibility topics at W3C forever.
… Primary interest right now on digital publishing. Merge between IDPF and W3C.

Eric: From Apple. We've been following the work in this group since the beginning.

MarkArita: From Intel, want to check where this group is heading.

Takio: From Yahoo Japan.

Masaya: From NHK. Interested in the protocol and APIs for my work. Japanese broadcasting system has similar features. Unfortunately, the open screen protocol was not ready on time. Interested in alignment.

Chris: From BBC. Co-chair of the Media & Entertainment IG in W3C.
… The Presentation API and Remote Playback API enable important use cases for us.
… Looking at HbbTV compatibility here, as HbbTV has a Companion Device functionality.

Francois: From W3C. Team contact of the group. Also Media & Entertainment Champion at W3C.

Tomoyuki: From KDDI. In particular in this group, I'm working on test suite for the Presentation API.

anssik: Thanks for the intro. Happy to see browser vendors and the broader ecosystem in this room.
… I'm not going to bore you with the charter details of the WG and CG.
… The high-level description is that the WG is scope to the APIs: Presentation API, currently published as a Candidate Recommendation, and Remote Playback API, currently published as a Candidate Recommendation.
… Implementation in Chrome, but we'd love to get other implementations, so that we can move the specs forward on the Rec track.

MarkFoltz: For the Presentation API, it's a bit complicated because there are two classes of conformance. For the Controller part, Chrome supports it.
… For the receiver part, we're mirroring the tab locally and streaming that tab to the remote device (1-UA case).
… For the Remote Playback API, pretty complete implementation in Chrome for Android.
… Main target is Chromecast devices.
… Implementation in Chrome for desktop is much more limited.

anssik: I've heard the test suite for the Remote Playback API is not complete enough and blocking other implementers for now.

MarkFoltz: My team is not currently working on test cases for that.

anssik: About Open Screen Protocol implementations?

MarkFoltz: Some of the pieces are there, others are not. I'll get into more details tomorrow.

Agenda bashing

anssik: Thanks to Mark and Peter for preparing a detailed agenda. It's fair to say that you're doing most of the work, and we'll be hearing a lot from you at these F2F.
… Does anyone have change proposals to suggest to the agenda?

Detailed agenda

Chris: I have a colleague who will join tomorrow for the J-PAKE discussion.

[Figuring out how to update the agenda to have a discussion on J-PAKE on day 2]

Overview of the Open Screen Protocol

Open Screen Protocol slides (3-18)

MarkFoltz: I'm going to spend a little bit of time giving an overview of why we're here, history of the Community Group in particular, some of the directions taken over the last few meetings.
… Review the decisions taken last year, and in Berlin earlier this year.
… I'll hand over to Peter to talk about details of receivers
… Starting with a bit of history, the CG was formed in 2013 following a breakout session. Initially, we incubated the Presentation API, which graduated to the Second Screen Working Group.
… After that, the WG was mostly done, but there was a need to focus the efforts on interoperability between controllers and receivers.
… That's why the CG rechartered to look into that.
… Various discussions in meetings, looking at requirements, alternatives and benchmarking to make design decisions.
… The WG was re-chartered last year until end of 2019. Scope includes creating an interoperable ecosystem.
… The Presentation API allows one device to present content to a second device with a display.
… The browser lists compatible devices with the requested URL to present, the user chooses the display, and the browser will try to load the URL in the selected display.
… If that succeeds, the URL is loaded in the selected display, and a communication channel is created between the two sides, controller and receiver.
… The Remote Playback API allows one device to present media content to a second display.
… The API is connected to the HTMLMediaElement interface: "video.remote.prompt()"
… When the second device is connected, all control commands are passed on to it (play, pause, seek, etc.)
… Following reviews, including from the TAG, we had feedback on the importance of interoperability, for controllers and receivers that are on the same LAN.
… For the Presentation API, we focused on the flinging use case (2-UA mode), that is not around streaming.
… Users and developers should be able to get consistent behavior no matter what browser and secondary device they use.
… It's important that the protocol supports all the features of the APIs.
… In terms of the work we need to do: we want to write a spec that is complete enough.
… We want to keep in mind the constraints of the devices and platforms that are available.
… We want to use modern protocols and cryptography, because the devices will be around for a long time.
… First step was to look at functional requirements for the Presentation API.
… [going through the list]. Includes discovery of devices, reconnection to a presentation.
… We want to make sure that the messages exchanged are delivered in-order in a reliable way.
… [and a few others]
… Looking at non-functional aspects, these include usability, privacy and security, resource efficiency (discovery uses background processes, for mobile devices, we want to do that efficiently).
… We kind of think of the Raspberry PI as a good target for constrained devices.
… The way Google has thought about the protocol is that it's a stack of layers.
… For some time, we've looked at alternatives for how to implement these different pieces.
… At Berlin, we settled down on specific choices, with flexibility for the future.
… For Discovery: mDNS / DNS-SD
… For Authentication: TLS 1.3 via QUIC and J-PAKE
… For Transport: QUIC, new transport protocol using UDP
… One of the features that we're exploring for the future is network traversal, and QUIC can be used with ICE there
… For Application Protocol: CBOR, worked in IETF, compact binary format.

anssik: Are you using TinyCBOR?

MarkFoltz: Yes.
… We're looking at benchmarking in our lab. How many packets are exchanged, how long it takes.
… We use Raspberry PIs in our "lab". It's a bit ad-hoc. As our implementation progresses, we'll be able to deliver concrete benchmarking data.
… Where are we today?
… At Berlin, we agreed to pick up the technologies I mentioned and design an end-to-end protocol. That's what we're doing nowadays.
… In Berlin, we announced that we were working on an open source implementation. I'll give some updates tomorrow.
… I think it's important that we come up with a single complete specification.
… Some remaining work to do: finalize details on mDNS usage, some open questions on QUIC (how many connections to use, when do we start one, etc.), the actual format to use in CBOR, support for the Remote Playback API, authentication mechanisms for the protocol.
… Once we're done with that, we can come up with a reasonable v1 document.
… As the implementations progress, we'll be able to run benchmarking.
… Some forward-looking work that is not yet in scope of the CG but that could make sense to include: Media streaming, LAN traversal (ICE) support. Also looking at alternatives for discovery in case mDNS fails.
… The protocol might be implementable in JS depending on WebRTC/ORTC progress. That's something we keep an eye on.

Peter: I could make some slides to present that topic tomorrow.

Anssi: Yes, let's allocate some time for the discussion

Discovery

Discovery slides (19-23)

Peter: The main things to sort out the service name and the TXT record
… We need something of the form _X._udp.local, so what is X?
… Propose 'openscreen' for X

Mark: Do we need to allocate a port number?

Peter: No
… Will need to register the string with IANA

Anssi: How long would that take?

Peter: Not long

Anssi: Are people happy with this?

Mark: It seems to be available

Resolved: The service name will be _openscreen._udp.local

Peter: mDNS allows a place for key/value strings
… Two options: put device info into TXT, or put a minimal amount of stuff in TXT, then get device info over the QUIC connection
… For the first option, here are some examples.
… Could put the protocol version, device id, friendly name, more information
… Benefit is you have this information before doing a handshake or QUIC connection
… Downside is that this information is not encrypted or authenticated
… There are advantages to having encryption

Option B is to do a bare minimum in the TXT record, and put the rest over QUIC

... An identifier you can authenticate, e.g., a fingerprint

... Also a timestamp

... The example shows CBOR CDDL

... The nice thing with this is is that all the data is encrypted and authenticated

... Downside is that is adds an extra RTT, one for the QUIC handshake, another for the information

Anssi: Impact on user experience?

Peter: There'd be 2 RTTs on the local network, so 5 milliseconds times 2

Francois: This is only for new devices?

Peter: Yes, you can cache after that
… With this option, there's an issue with the J-PAKE authentication
… The information would be unauthenticated before doing the authentication
… But with option A, the information is always unauthenticated

Anssi: Is there an attack vector here?

Peter: We'll talk about that tomorrow. One is stealing a display name
… If the display name is encrypted, I can't know what that is going over the network

Mark: With the web today, you can see a URL with a hostname, but then you do the handshake for security
… We could show the unauthenticated information before pairing, then tell the user whether it matches
… If the only trade off is the RTT, the UI flow sounds reasonable. Speaking personally, option B seems more robust

Francois: Is there also some limits with option A in terms of size that's not in option B

Peter: Yes, you're limited to the size of the UDP packet
… With option B, it's all in CBOR, so can add new things

MarkArita: How is the device fingerprint created?

Peter: Each side creates a self-signed certificate, then hashed. It's from WebRTC, a common technique

MarkArita: What about re-establishing connections after leaving / coming back to the LAN?

Peter: When you make the connection, QUIC will give you the fingerprint (or whole cert) to tell you who the other side is
… As long as the same cert is used , you know who it is. Otherwise the TLS handshake would fail

Anssi: What's your personal preference?

Peter: I think we should pursue option B

Francois: Use will be devices in the home, so they're already known

Peter: So the con here is pretty rare

Mark: I want to think more about the use case where someone enters multiple devices not seen before
… May want to add some more information to option B to allow distinguishing the devices
… If it's something to authenticate later, can do so after the handshake

Peter: If we start with option B, we can move some things into option A later

Proposed: Think about adding a small amount of data to distinguish multiple new devices

FrancoisB: At some point, the mDNS API will allow web apps to discover devices on the web

Mark: This is a good argument for option B, don't want to expose more information

<anssik> PROPOSED RESOLUTION: Prefer option B (bare minimum; rest over QUIC) with possibly small amount of data to distinguish multiple new devices

Peter: Look at the fields to e used for discovery, should we have a version field?

Mark: QUIC and TLS have version numbers
… Scenarios include changing the metadata, using something other than QUIC or CBOR

Resolved: Prefer option B (bare minimum; rest over QUIC) with possibly small amount of data to distinguish multiple new devices

PROPOSED RESOLUTION: We will add a version field to the TXT record to indicate the open screen protocol version used

Resolved: We will add a version field to the TXT record to indicate the open screen protocol version used

George: The things you're discussion are super cool, great work. I'm assuming things enabled on the driving devices, e.g., closed captioning and audio description are passed through. Is that a good assumption?

Francois: The Presentation API isn't about streaming, about presenting web content in general. So the page you present could include a video with closed captioning
… For Remote Playback, I don't know

Peter: We haven't discussed Remote Playback support in the open screen protocol so far, but will come to this tomorrow

George: Regarding interoperability between two implementations. On the controlling side I'd imagine the testing will include the use of assistive technology. I'd suggest that the UI of the app be tested with AT, so when it goes through horizontal review we see it can be used with AT out of the box

Francois: Does it change test cases?

George: Most tests would be automated. But the UI tests could be very simple, ensure you use ARIA in the controls
… All WGs should be aware of these kinds of things

Peter: There were two other fields that could be in the bare minimum: a timestamp.
… If the metadata changes, you bump the timestamp

Mark: A follow up item would be to list the steps that should be done through mDNS when the timestamp changes
… mDNS records have a TTL, devices will want to update their caches if the TXT record changes. It's spelled out by the mDNS TTL

PROPOSED RESOLUTION: Add a timestamp to allow detection of changes to the device info

PROPOSED RESOLUTION: Add a timestamp to the TXT record to allow detection of changes to the device info conveyed over QUIC

Resolved: Add a timestamp to the TXT record to allow detection of changes to the device info conveyed over QUIC

Peter: We need an identifier to know this is the same person I talked to previously
… It's nice if this isn't just a random string, but tied to the authentication when connecting over QUIC
… Allows detection of forging the identifier right away
… It's a typical thing that other specs do

Mark: Can we steal the WebRTC definition?

Peter: Maybe, take the string, hash it and specify how its encoded

Peter: We want an identifier, but it's more than that, it's tied to the device's certificate

PROPOSED RESOLUTION: Add a fingerprint to the TXT record to function as an identifier that can be authenticated

Mark: We'll still need a spec for how the fingerprint will be generated

Resolved: Add a fingerprint to the TXT record to function as an identifier that can be authenticated

Remote Playback API

anssik: The Remote Playback API is pending further implementation feedback.

Jer: Testing these features is incredibly hard

MarkFoltz: Chrome for Android implementation is pretty complete. It supports "src=", not MSE. Compatible with Cast devices.
… As far as testing goes, I know that there are pretty complete unit tests of the implementation. I'm not the expert on functional tests.

anssik: I can take an action to find a person to work on test coverage.

Mounir: I don't remember that we had end-to-end tests.

MarkFoltz: We have the infrastructure to do it, but not sure how much we do today.
… For desktop, we support the "disableRemotePlayback" attribute.
… But we don't allow users to initiate that mode through the Remote Playback API.

Jer: If they choose to disable remote playback, you still do mirroring?

MarkFoltz: Yes.

Jer: Is that related to licensing?

MarkFoltz: Not really.

Jer: The reason I'm asking. The Remote Playback API was designed to be compatible with the Apple feature implementation. It will support non-MSE streams that are encrypted. And I'm wondering whether you bumped into issues there.

Mounir: [mentions 360 use cases, and remoting of media element]

MarkFoltz: For accessibility, we don't have any way to stream captions for now.

Action: Anssi to talk to Intel's w-p-t and QA people to help improve test coverage of Remote Playback API with focus on functional testing

MarkFoltz: We do have end-to-end testing of the remoting feature, but it's not really exposed.

Jer: Our implementation has not landed. End-to-end testing is difficult in AirPlay.
… I don't think that there will be any additional feedback that we'll have regarding the spec itself.

<jernoble> https://‌bugs.webkit.org/‌show_bug.cgi?id=162971

Jer: I haven't had time personally to push this forward.
… No spec reason why we can't ship it, as far as I can tell.

MarkFoltz: We spent some time at TPAC last year on how to write tests for these features. I wrote a draft proposal for a WebDriver extension to deal with the modal dialog. I haven't digged into mocking the remote device.
… The pioneer here is the Permissions API.

Jer: The intersection between WebDriver and WPT is what is going to allow most of these tests to run through.

MarkFoltz: WPT are run with regular browsers. To use WebDriver, you need to pass some control flag.

<anssik> WebDriver Extension API for Generic Sensor defines a mock sensor type

Mounir: There is something similar to "internals" in WPT.

<anssik> WebDriver Extension API for Generic Sensor PR

Mounir: Ideally, we could have a simulated device on the other side. A fake device would probably be enough.

MarkFoltz: The spec does not specify the exact set of video commands that need to be supported, so that should be enough indeed.

Francois: [parallel with getusermedia tests where a flag allows to disable user prompt, so that tests can be automated]

Jer: The Sensor API proposal seems pretty useful looking at it.

QUIC

QUIC slides (24-32)

Peter: Two concepts - connect and stream.
… [going into details on connections and streams in QUIC]
… Questions: how many QUIC connections?
… There's really no reason to have more than one per client/server pair.
… (with some exception)
… There is a case where the browser may have different profiles, and the remote side would be able to figure out that these profiles are coming from the same port.
… If you want to prevent that, it would make sense to use separate QUIC connections, using different ports.
… An implementation can use any number of QUIC connections that it wants, but most of the time, one is enough.
… You can multiplex messages going over a connection to tell where each message goes where.
… So multiple tabs are supported.

anssik: So if you have an incognito tab, you might want a separate QUIC connection

Peter: Right. In other cases, you can multiplex/demultiplex messages within the same connection, we just want to make sure that this is supported in the Open Screen Protocol.
… If the browser wants, it can create as many QUIC connections as it wants. Separate congestion contexts to maintain though.

PROPOSED RESOLUTION: Design the transport protocol so that the browser does not need to create more than one QUIC connection, and so that it may create multiple QUIC connections.

MarkFoltz: Do we want to disallow port sharing?

Peter: I don't think so.

PROPOSED RESOLUTION: Design the transport protocol so that the browser does not need to create more than one QUIC connection, and so that it may create multiple QUIC connections (e.g. for privacy reasons).

MarkFoltz: The QUIC algorithm, does it support fairness within the same connection or across connections?

Peter: Within one QUIC connection, there is one congestion control context.

MarkFoltz: I'm asking, because if we add streaming, we may want to assign different priorities to different types of stream.

Peter: Priorities are out of scope of QUIC. Implementation detail.
… We could, at the application level, implement different priorities.

[Some discussion on ways to achieving fairness and non fairness across streams]

MarkFoltz: If the implementation chooses to aggregate multiple streams in the same connection, we could add that it should attempt to achieve fairness.
… I think that's an implementation detail at this point.

Resolved: Design the transport protocol so that the browser does not need to create more than one QUIC connection, and so that it may create multiple QUIC connections (e.g. for privacy or congestion control reasons).

Peter: Another question is can a server demux more than one client?
… The answer is yes. When you receive a packet on IP, port, then you know where it needs to go.
… That's just part of QUIC.

MarkFoltz: From the same network interface on the client, would the IPv4 and IPv6 be considered as different connections?

Peter: Yes.
… Third question is: when QUIC connections should be kept alive.
… QUIC connections don't really care.
… You may not send anything.
… Problem is if one end disappears.
… You may want to send keep-alive messages. It has impacts on the battery.
… But then the question is: why do you want to keep the connection alive? You can also reconnect later on if you want from a client perspective.
… However, the server cannot reconnect to the client.

[Some discussion on which device is the server in our case. Controller or receiver? It does not matter]

Peter: Recommendation is for the server to send keep-alive messages to keep the connection alive. Not needed for the client.

anssik: How often to send messages?

Peter: There's a tension between sending pings more often and less often. ICE magic numbers are every 20-30 seconds. That seems a reasonable target.
… If we're using ICE, no need to worry about that, ICE does it on its own.
… We could add an API for controlling the connection timeout.

Mark: That would be on the PresentationRequest?

Peter: Yes

MarkFoltz: It needs more thoughts if we're using the same QUIC connection for multiple presentations.

Peter: Right, that's a TODO :)

MarkFoltz: If we could control the protocol level, that would be somewhat better.

Peter: We definitely want to avoid a situation where the app thinks it has to do its own keep-alive, because that would bring battery down. The radio would always be on.

PROPOSED RESOLUTION: Keep a QUIC connection alive if you're a client that needs to receive messages or a server that needs to send them. Otherwise, close the QUIC connection and reconnect when you need to (treat QUIC connections as ephemeral)

[Discussion on when clients/servers do NOT send keep-alive messages, default behavior, what the API would allow the application to say]

MarkFoltz: The proposal here is to define a keep-alive mechanism.

Louay: There is a mechanism to close the connection if you don't want to keep it around. And you can reconnect later on.

Peter: True.

Resolved: Keep a QUIC connection alive if you're a client that needs to receive messages or a server that needs to send them. Otherwise, close the QUIC connection and reconnect when you need to (treat QUIC connections as ephemeral)

Peter: Now that we want to send keep-alive, question is how?
… One option is to use ICE. But overkill in LAN scenarios.
… Another is using QUIC ping/pong frames, but QUIC libraries do not necessarily expose these.
… A third option is to send pings in QUIC streams as part of the open screen protocol. That's my recommendation, because the first two options are not too good.
… You won't need to send keep-alive messages if ICE is being used.
… but that can still be done.

PROPOSED RESOLUTION: For the keep-alive mechanism, add a ping/status message to the Open Screen Protocol.

Resolved: For the keep-alive mechanism, add a ping/status message to the Open Screen Protocol.

Peter: We talked about the frequency. I don't know if we need a resolution.

Francois: Is this specified in ICE for instance?

Peter: No. There is another spec that talks about it but that's all.

MarkFoltz: Do we need bi-directional keep-alive messaging?

Peter: If you send a ping and receive a pong, you know the other side can receive, but the other side does not know that you can receive.

MarkFoltz: But the ping could refer to the previous pong.

Peter: Hard to come up with a rule that works in all cases if one end is not exactly doing what you'd like it to do.

[ping-pong discussion]

MarkFoltz: If we decide to do unidirectional, then we need to have a good perspective as to when you're going to receive the next packet.
… That protocol would need more definition.

Francois: I suppose the timer resets if another message is received?

Peter: Yes.

PROPOSED RESOLUTION: For the keep-alive mechanism, add a SHOULD for implementation with a 25s delay (and further investigate a unidirectional keep-alive mechanism)

Resolved: For the keep-alive mechanism, add a SHOULD for implementation with a 25s delay (and further investigate a unidirectional keep-alive mechanism)

Peter: Last question. Each stream is what you want it to be. Messages in a stream are ordered. But messages in separate QUIC streams are not ordered. So if you need out-of-order, you want separate QUIC streams. And if you need ordering, you need the same QUIC stream.
… Basically, the way to think about it is that you have some groups of ordered messages.
… First recommendation is to map each group of ordered messages to a QUIC stream.
… Second recommendation is to use ephemeral QUIC stream IDs. Some IDs have precise menaing (e.g. stream ID: 0). We don't have any special stream IDs, but we might in the future. So we might want to reserve e.g. 1-10 for future usage.

MarkFoltz: So up to the implementation to decide when to use a new stream ID.

Peter: Right. It can pick any stream ID.

MarkFoltz: Do the client and server have to agree?

Peter: No.
… One bit says whether it's unidirectional/bidirectional. Another bits says [missed].
… We should talk about unidirectional/bidirectional too at some point.
… It's ok to have plenty of stream IDs.

MarkFoltz: Well, the implementations are going to allocate a buffer for each stream. That's not really good on constrained devices. So we might want to have a SHOULD somewhere to restrict the number of streams when possible

PROPOSED RESOLUTION: For QUIC stream messages, 1 group of ordered messages = 1 unidirectional QUIC stream. Reserve the 1-9 stream IDs for future usage, and use ephemeral QUIC stream IDs

Francois: About unidirectional?

Peter: I don't see any reason to use bidirectional.

MarkFoltz: Why have they been added to QUIC?

Peter: Originally, all streams were bidirectional. But then there was a need to have unidirectional.

MarkFoltz: OK, I don't really have any optional.

PROPOSED RESOLUTION: For QUIC stream messages, 1 group of ordered messages = 1 unidirectional QUIC stream. Reserve the 1-10 stream IDs for future usage, and use ephemeral QUIC stream IDs

Resolved: For QUIC stream messages, 1 group of ordered messages = 1 unidirectional QUIC stream. Reserve the 1-10 stream IDs for future usage, and use ephemeral QUIC stream IDs

[lunch break]

CBOR

CBOR slides (33-42)

Peter: How do we put multiple CBOR messages in a QUIC stream? Type of a CBOR message when it arrives? And a few other questions.
… To put multiple CBOR messages in a QUIC stream, two main ways: length-prefxed based - you put the size of the message.
… The other alternative is to put them back to back.
… That's doable, because CBOR messages have a beginning and an end.
… Advantage is that it doesn't add overhead, which can be important.
… If your CBOR library can parse streams, that's easy.
… My recommendation is to use back-to-back. However we could consider length-prefixed if we consider that the overhead is a good trade off for ease of implementation.

Francois: To detect the end, you need to understand the nested structure within CBOR? or there's a specific end token?

Peter: You need to understand the nested structure.

MarkFoltz: Each CBOR block has its internal size I believe.
… Are we requiring that all the top-level messages be dictionaries?

Peter: No.
… There would be no way to introduce non-CBOR-based data.

[Streaming would be structured CBOR messages]

MarkFoltz: One disadvantage is that you would not need where to end the buffer without parsing.

Peter: Right, if you're using a buffered parser, then you need to know the length in advance.

MarkFoltz: I'm trying to figure out how to make an informed decision.

Francois: Are there stream parsers for CBOR that would be good candidates for the implementation?

MarkFoltz: We can look into that.

[No good way to recover from broken CBOR messages in both options]

Some discussion on the "rest of stream" feature in CBOR. We don't really need that feature a priori.

PROPOSED RESOLUTION: For CBOR messaging in QUIC streams, don't use CBOR "rest of stream" feature

Subject to change triggered by implementation feedback, back-to-back is more efficient on the wire, so preferrable. It all depends on whether the added implementation complexity is huge or not. Different opinions expressed. Length-prefixed messages would reduce complexity and CPU usage.

MarkFoltz: The best we can do in a reasonable time frame is to look at which libraries support stream parsing. If we find them, then that strongly supports back-to-back.
… In the meantime, I'd leave that undecided.

PROPOSED RESOLUTION: For CBOR messaging in QUIC streams, don't use CBOR "rest of stream" feature

PROPOSED RESOLUTION: For CBOR messaging in QUIC streams, don't use CBOR "rest of stream" feature. Seek further implementation feedback to evaluate complexity of stream parsing before we take a decision on mechanism to put multiple CBOR messages in one QUIC stream.

Peter: It just occurred to me that we do have a different option. The two ends could negotiate which option to use, with a fallback to sending length-prefixed messages, which would have to be supported by everyone.

Francois: I guess I'm wondering about the gap in complexity between the two parsers. Adding an extra requirement to add code to handle the negotiation is probably not worth it.

Resolved: For CBOR messaging in QUIC streams, don't use CBOR "rest of stream" feature. Seek further implementation feedback to evaluate complexity of stream parsing before we take a decision on mechanism to put multiple CBOR messages in one QUIC stream.

Peter: About tagging CBOR messages. When the CBOR message comes across and you de-serialize it, what is the type of it?
… Several options. CBOR has a built-in tagging mechanism. The only thing is that you're supposed to register the type with IANA.
… That's good for us, but Mark pointed us to me that other people might want to extend this with new types.

anssik: What are the possible extensions that you can think of?

MarkFoltz: One use case would be setting up a new device.

Francois: Just to clarify the notion of "type", here, that's the type in terms of application protocol message type, right?

Peter: Yes, one example is "presentation initiation"
… Option B is to treat the QUIC stream as a CBOR array of type and value. But that's almost the same as doing type-prefixing in the QUIC stream.
… It doesn't require IANA registry.
… But it cannot be expressed in CDDL.
… Not such big of a deal, we could just have our little comment.

Francois: Question about what CBOR libraries would actually do with the prefix.

Peter: With Option C, you don't give the prefixed type to the CBOR parser. It's out of the CBOR structure.

Francois: But then, if we have stream parsing, you would need to exclude the prefixed type to the stream you give to the parser.

Peter: Right, that's a good point.
… One advantage of option C is that you could envision a way to interleave CBOR message to non-CBOR messages.

MarkFoltz: What I wasn't quite sure about with option A was whether we were following the spirit of CBOR by registering lots of types with IANA that are only specific to our context, whereas the registry seems to contain things that are common to separate protocols.
… That being said, we could allocate a thousand values in this table.

Francois: Can you register ranges? All of those in the table seem unassigned.

MarkFoltz: I think so, but that needs to be checked.

Francois: How many types would we need?

MarkFoltz: About 50.
… We need at least one tag for generic Open Screen Protocol messages.
… So that tooling can understand the type of messages passed for traffic.

CBOR Tags

Some discussion on tooling and human readability.

Peter: I think that Francois made a good point that pushes me back to option A.

MarkFoltz: We may need some combination of option A and C.

PROPOSED RESOLUTION: Seek further implementation feedback to assess which of tagging CBOR message with built-in tagging or using type-prefixing would be better. Check with IANA whether a range of values can be reserved.

MarkFoltz: I'd like to reserve something like a 16-bit range.

Francois: That seems like a lot if we only need 50 for now, especially considering that we could allocate more ranges as needed later on.

PROPOSED RESOLUTION: Seek further implementation feedback to assess which of 1) tagging CBOR message with built-in tagging or 2) using type-prefixing would be better. Check with IANA whether a range of values can be reserved.

Resolved: Seek further implementation feedback to assess which of 1) tagging CBOR message with built-in tagging or 2) using type-prefixing would be better. Check with IANA whether a range of values can be reserved.

Action: tidoust to check with IANA whether reserving a range of CBOR tags is possible and suitable for our purpose

Peter: Moving on to CDDL, which stands for Concise Data Definition Language.
… [going through an intro to the CDDL syntax]
… Object. Key, value, optional, array, enum.
… Three different ways to have a key. Positional (very efficient on the wire, but not flexible, because you can't really do optional), string keys (very flexible, but takes space on the wire), Integer keys (in-between, flexible and limited impact on the wire)
… In CDDL, you have to put the name of key in comment when using integer keys though, which is not very convenient.
… In any case, using integer keys seems like a good default to go with, but we should have some comment pattern to label the integer keys.
… You cannot really mix the three different ways. You can, but it's a bit odd.

MarkFoltz: For debugging, we'll want some way to get back to names.

Peter: Right, that's where the comment pattern is going to help.
… Can be automated if needed, provided we stick to that convention.

PROPOSED RESOLUTION: For CBOR values, use integer keys with a CDDL comment convention for the field name

Resolved: For CBOR values, use integer keys with a CDDL comment convention for the field name

Peter: Moving on to timestamps and durations. CBOR has a timestamp type. Problem is that it is defined as a float, which does not strike me as good for a timestamp.
… I would rather go for a microsecond integer.

Eric: One problem with doing this is that you have to use rational numbers to accurately represent timestamps according to a clock. Depending on the timescale, there are numbers for which this won't work.

Peter: Most of the use cases are for easy time cases. Not the streaming one. So I guess my proposal is for the default.

Eric: The correct way to encode a rational number would be to have 2 ints: numerator and denominator.

Peter: For some stats, the precision does not matter very much. Another use case is for how long you're interested into something, and you don't need precision.
… The other one is for HTMLMediaElement. And we should be talking about that.
… Main recommendation is "don't use floats", but we can have rational in some cases.

PROPOSED RESOLUTION: For encoding timestamps/durations in CBOR, never use floats. Use microseconds expressed as uint, unless there's a reason to use a different timescale (such as with audio and video)

Resolved: For encoding timestamps/durations in CBOR, never use floats. Use microseconds expressed as uint, unless there's a reason to use a different timescale (such as with audio and video)

Peter: Moving on. CBOR has a way to embed structure definitions. You'll see that most messages have "request-id" for instance.

Louay: Is request ID always needed?

Peter: If you expect a response, then yes. If you don't, then I wouldn't call that a request in the first place.
… In CBOR, common error code and types can be specified, slightly similar to an enum but without the ampersand.
… Ranges can be specified as well.
… From there on, we'll be looking at lots of CDDL and we need to figure out what kind of resolutions we want to have.

MarkFoltz: How do the binary protocol that was drafted some time ago and these CDDL proposals relate?

Peter: I modeled the CDDL proposals out of the specification.

MarkFoltz: The validation of these API specific messages is: "can we map the algorithm steps in the API to specific CBOR messages?"
… If we can complete the algorithms with the data we get back, then that's good.
… Beyond that, it's looking at how efficient the messages are.
… Here, I think we want feedback on the general shape of the messages.

Francois: Were there some specific issues that the exercise revealed?

Peter: We'll go through them in more detail.

MarkFoltz: The area that may benefit from F2F discussion is remote playback.
… The spec does not give specific items on how to remote specific playback commands, so we might want to spend more time on that.

Janina: If I may, that's where we have some historical interest from an accessibility perspective.
… Second screen is a very good way to get captioning in some cases, e.g. in classrooms when they are required to some.
… I'm hoping to prototype a smart campus where this would be possible, using an amplification mechanism system.

MarkFoltz: Anssi, do we have accessibility use cases?

anssik: We went through the wide review

Francois: And went through the Media accessibility requirements document. That was a long time ago though, certainly worth revisiting.

anssik: Looking at the wide review results, the APA WG was good with the API two years ago when we did the review.

Francois: There are interesting scenarios to consider, e.g. related to synchronization between the primary and secondary devices. That's out of scope of the Working Group.

MarkFoltz: Media synchronization is one extension being considered.

Francois: Talking about extension points, can the CBOR messages be extended?

<anssik> Results of Presentation API evaluation against the Media Accessibility User Requirements spec

Peter: Yes, implementations will simply ignore the fields they don't understand, provided we maintain backwards compatibility.

MarkFoltz: Still the question of what can be extended with custom fields.
… Actually, it might be worth creating a registry for extensions.

Francois: What would the registry contain?

MarkFoltz: We want to avoid collisions. If they are extending individual messages with keys, we want to make sure the extensions don't collide. It might be through registering a range of integer keys.
… I'm more interested in the process.

Francois: Happy to investigate. I don't really expect any simple answer to be honest. The question of registries has been on the table for years.

Action: tidoust to investigate ways to create a registry for Open Screen Protocol extensions (W3C, IANA, etc.)

Peter: Looking at ping/pong/status messages, easiest structure. Request/Response ID, status. Perhaps a timestamp for synchronization.

MarkFoltz: We mentioned computing clock skew for synchronization, so that indeed might be useful for that. But that depends on frequency requirements as well.
… But QUIC might give you some of that info already.

Peter: Definitely.

Presentation API messages

Presentation API messages slides (46-51)

Peter: First part of it is URL availability
… First message is a request message with a list of URLs, and the response gives a list of URL availabilities that tell you whether it's compatible, not compatible, or not even an URL.
… Then, there is an event object to notify changes.
… The mechanism to register interest in change notifications is to add a timestamp to the request.
… Two ways to specify the event object: 1) refer to the initial request ID. But then the requester would need to remember it, and the change may only affect one particular URL.
… 2) list the URLs that changed and the availabilities
… The receiver has to remember the URLs.

Francois: I note a crappy implementation could leak to controller A the list of URLs requested by controller B in this scenario. That cannot happen with option 1).

Side discussion on the need to have an ID there to be able to associate the message with the right tab, especially as QUIC connections will be shared by different tabs.

In the Remote Playback API, watchAvailability is tied to a particular iframe. In that scenario, there is a need for a watch-id.

MarkFoltz: Interestingly, the spec uses a "long" id, not a uint, but I think that's OK.

Some discussion on whether the timestamp needs to be precise to the point where a rational number would be useful. That's a priori not the case here, it's more "Give me updates in the next 5 minutes"

Peter: Moving on to initiation, tied to calling "start".
… Some requirements for the Presentation ID to be at least 16 characters long, which seems a bit odd, but that's OK.

anssik: That's to make the ID hard to guess.

MarkFoltz: Right, we used the GUID generation algorithm there. We could revisit that in the future.

Peter: Then we need to pass headers. I don't remember why.

MarkFoltz: That has to do with language headers, typically. Passing on the Accept-Language HTTP header typically so that the receiver can use the right fonts, language settings and so on to render the page.
… Goal is to passe the locale parameters that the controller would have used if it was loading the presentation.

Discussion on the presence of a connection-id field and whether it's optional. The spec mandates the creation of a connection ID whenever "start" is called. There is no reason not to make it mandatory.

Peter: Looking at the response, you'll simply get the result.

MarkFoltz: We could include an HTTP response code that could be useful for debugging purpose.
… Initiation really happens in two steps. The first is the receiver gets your requests, starts loading the page and from the controller point of view, the connection is in a connecting state.
… And when the loading is done, then you tell the controller that you're connected and you can start sending messages.

Peter: So you're telling that we do not need to send the response as long as we're not connected.

MarkFoltz: Right, that's why it would be useful to interleave this with Presentation API algorithm steps.

Chris: Is that also useful for the user agent to explain why things failed?

MarkFoltz: Yes. We haven't really talked about debug info for now, but that could be a useful thing to look at.

Louay: If we need special codes for other schemes, how do we do it?

MarkFoltz: We talked before about extending some of this with vendor specific fields.
… That would be part of that, but wouldn't be specified here.

Chris: We talked a bit about capability negotiation over lunch.

Peter: That's out of scope for this for now. This is really tied to the current Presentation API.

MarkFoltz: For the Presentation API, a set of a more elaborated set of APIs related to capabilities has been talked about. There's an open GitHub issue around it.

Peter: Moving on to Presentation termination.

Discussion on whether to use a combined list for error codes. Inclination is to use a combined list for now.

Peter: Interesting thing about termination is that it can happen without the user requesting it, so an event is needed as well.

Event is only for when there is no request. If the controller sends a request, it gets a response but does not get the event.

The event is "broadcast" in the sense that it's sent to multiple connections.

Some bikeshedding on names of error results.

Peter: Moving on to open/close connections.
… Question around whether the presentation ID and the connection ID need to be always passed together, or whether you can pass the connection ID without the presentation ID when it's not needed.

MarkFoltz: Receiver needs to keep a map of connection ID per controller mapping to presentation ID.

Peter: Similar to terminating, the receiver may close the connection without user input.

Design discussions around the error messages.

Peter: Moving on to Presentation connection messages. This one's sweet and short.
… CBOR allows to define either byte or text.

In the spec, binaryType is tied to the connection, not to the event.

But binaryType only determines the handling of binary, not the handling of text messages.

MarkFoltz: My only comment here is that I want to make sure that it's compatible with the streams spec.

Peter: This certainly is.

anssik: Sangwhan just joined IRC. Through side discussion with him, I want to report that the TAG is pleased with this group adopting CBOR.

Remote Playback API messages

Remote Playback API messages slides (52-58)

Mark: Remote playback availability gets more complicated. The question of "can I play this URL?" can be answered by the remote device
… It would have to fetch the remote url and analyse the content

<sangwhan> TAG official reaction on using standardized binary serialization: http://‌alexhowe.com/‌wp-content/‌uploads/‌2016/‌04/‌Duo-V2.jpg

Mark: or the controller could do some of that work and pass over the metadata

Mark: The chrome implementation takes the mime type or canplaytype string and uses that to make decisions about remote playback ability
… We may want to see what we can learn from current availability checks

<anssik> [The Second Screen group humbly acknowledges TAG's official reaction.]

Mark: We'd welcome feedback on this

Peter: Requires more investigation on the use of the URL to determine availability

Mark: There's Media Capabilities, but this may be overkill here

Peter: Next, you want to start or stop playback. Much like start/stop for presentations
… You provide the URL and an initial set of controls
… Response includes an initial state
… To stop, you provide the ID
… We don't have reasons in the stop event, currently

Peter: The HTMLMediaElement has many controls you can use from JavaScript
… Source, preloading, play, pause, playback rate, seek, loop, volume, mute, width and height (for video), poster image
… Some of these are optional, others should be mandatory for receivers to implement

Francois: What about enabling / disabling tracks?

Peter: If you have tracks, that's streaming not remote playback?

Eric: The HTML media spec has a single video track and an array of 0 or more audio tracks
… The video track can be turned off and similarly the audio tracks can be enabled and disabled
… And text tracks, of course

Mark: How is the track list populated?

Eric: By the UA, to reflect the internal state of the media file

Francois: Can you add or remove tracks while it's playing?

Eric: No
… There's a spec bug. You can create a TextTrack via script, but there's no method to remove a TextTrack
… Support for captions is bad enough natively that what the polyfills do is use that API to add their own, and polyfills to support formats other than WebVTT and convert them to WebVTT
… It's important for accessibility

Peter: Is anything else missing?

Eric: Something you may or may not want to support is fastSeek
… For when you don't care about the exact time, you show the approximate frame

Mark: How is it implemented?

Eric: Seek to the approximate time, nearest i-frame. Controls use it while using the scrub bar, then do a seek to the precise position

Peter: There are two ways we can send these controls to the receiver
… One is to have lots of small messages. Changing lots of things at the same time won't be possible

Eric: The spec says there should be no observable change, so anything done in one iteration would be applied at the same time

Peter: That suggests we should have one big message instead
… [explains fields in remote-playback-controls]
… currentTime is where it gets interesting
… What type would you prefer to use? Is this where you think a rational makes more sense?

Eric: Yes, definitely

Peter: Is it OK that it's not time any more, it's two integers?

Eric: That'd be fine

Peter: We'll need to define 'rational' if it's not in CBOR already

Eric: Can we pass infinite there? When you get the duration, we need to be able to represent a live stream
… How are set width and height used? They're read only

Peter: Let's remove those
… There's state on the receiver to convey to the controller
… These are the read only attributes in HTMLMediaElement
… The receiver can send an event with the new state. It can also be included in the first response to start remove playback
… State values include current source, network and ready state, how optional things are supported, paused and ended flags, initial time, current time, etc

Eric: The media element spec says you have to generate a timeupdate event every 250 milliseconds
… initial time is for a live stream where you want to indicate a time relative to the time the stream started

Chris: corresponds to getStartDate?

Peter: yes

Chris: We send feedback on this long ago. Use case was generating interactive events in the browser against a long running live stream
… where you want to synchronize the events against the stream time position, rather than the time that the user joined the stream

Louay: In HLS it's called programme date time

Eric: For duration, we need NaN, unknown, mediatime, or +Infinity

Peter: We'll make that it's own type

Peter: Is there anything missing that should be readable?
… We'll need to add track ids and related attributes

Eric: These are used locally to allow the user to select the tracks to play

Peter: for values that can be controlled, eg volume, need these to also be readable as the remote side could set them?

Eric: Yes
… [discussion of defaultMuted and other attributes that could change]

Peter: I'll take another pass at this to add the controls and readonly values
… How does the receiver indicate to the controller which optional things are supported?
… I added a list of bools for rate, preload, poster-image

Mark: Do we want to report supported for other things, such as playback rate?

Eric: What would this mean for applications that use custom controls?

Mark: It could cause a problem from a UI consistency point of view

Eric: I think that's fine, short of adding a bunch of stuff to the media element spec, which may not make sense

Peter: I think the enabling of tracks follows the state setter model we have. Adding and removing cues is more tricky

Peter: other events we've missed than track related?

<anssik> anssik: we move "HbbTV Support in Open Screen Protocol" to Day 2 morning in the interest of time

<anssik> [up-to-date agenda at https://‌www.w3.org/‌wiki/‌Second_Screen/‌Meetings/‌October_2018_F2F#Agenda]

Eric: readyState change
… volume change, timeupdate
… so long as you push state when anything changes, you're fine
… There's an abort event, but you can infer that by going into the error state
… There's not a specific event for network or ready state change, but different events fire as you go between them
… There's a suspend event, which fires when network loading stops, when the state goes to idle

Peter: What happens if you add a cue or remove a cue, does it splice things?

Eric: It's either in the list or not
… Do you have a seeking flag? The UA fires seeking and seeked events
… currentTime is not supposed to change until the seek completes
… So you're not trying to tackle MSE?

Peter: No

Eric: The UA is supposed to fire a progress event every 250 milliseconds while data loads. If no data has loaded for more than 3 seconds, it fires a stalled event

Peter: The readyState doesn't change?

Eric: No, it can be at any readyState after HAVE_METADATA
… Adding a stalled flag will do.
… once it starts up again, it will fire progress events again

Peter: pauseOnExit?

Eric: Getting this right is going to be a lot of work

Eric: Audio, Video, Text tracks are not implemented in Chrome, so you may want these to be optional
… although TextTracks should be mandatory as this is an accessibility feature

Mark: Many of these are optional support, as implementations don't have them all
… We should prioritise the accessibility related ones

Peter: [discussion of uniqueness of track ids across audio and video tracks]

Eric: The id is a reflection of the id in the media file. Not all formats have ids, e.g., MP3 streams
… There can be only one video track, but multiple audio or text tracks. But the video file itself could have more than one track

Peter: Are we ok with representing this as a list of ids, so when you disable a track you remove it from the list?

Eric: So when you read it, you infer the state of the others. That's fine.

Peter: [discussion of track fields]
… [list of active cues and mode]

Mark: Is there a way to send just the information needed to render the captions?

Eric: That would be hard. It makes sense for it to be done on the receiver. In the case of a VTT file, you want to load it on the remote side
… You don't have the data on the controller side as you don't have the data
… Doesn't make sense to load the data and send the cues over one at a time
… There are three types of text tracks: in-band (where you need to reflect it back to the controller), out-of-band (which is loaded by the engine and specified by markup), and a track made by script
… All three of those types show up in getTextTracks()
… The user agent is responsible for rendering cues based on time and enable/disabled state of the track

Francois: When you start the remote playback, you'll need to send the URL of the text tracks

Eric: Yes. It's something we do for Apple TV. We don't support addTrack() or modifying the DOM. But you do need to send over the list of URLs for the tracks
… That list is in the markup, there can be <track> child elements of the video

Louay: Also multiple video URLs for multiple sources?

Eric: The video element doesn't support that, it's only for MSE?

Louay: I mean the src attribute for multiple encodings

Eric: Yes, you're right
… I think it makes sense to send the list of sources

Eric: A source element has a URL and an optional (extended) mime type that specifies codec info. Also a media query
… The rules say you look to the mime type, skip if not supported, then look at the media query which can give screen resolution
… To help the UA pick the most appropriate one
… Use of media query is fairly rare, mostly used for file formats supported on different platforms (e.g., codec support)

Mark: This kind of functionality is being replaced somewhat by Media Capabilities API

Eric: It does and it doesn't. The nice thing about this is that you can do it with static markup, you don't need script

Louay: I think for text track cues, you should support language

Eric: Yes, you have to

Peter: Also missing are text track state - cues and active cues. And we don't have a way to distinguish showing and hidden
… Hidden means you're not showing it, but you should keep loading it

Eric: Yes

[adjourned]