WebID, a federated SignIn API

Facilitator: Majid Valipour

Understand current federated sign-in state on the web and brainstorm various ideas on how to make it more privacy-preserving. Present the current thinking and ideas in WebID proposal and brainstorm solutions.

Minutes (including discussions that were not audio-recorded)

Previous: Revenue Models for the Web All breakouts Next: Maps for HTML Community Group

Transcript

Welcome to the WebID Breakout Session.

My name's Ken.

Majid and Sam are also on this call.

We are at Google and we're engineers on the Chrome Team, and we've been working on some web platform identity projects for a little while.

And in particular this project called WebID which you can read about here.

This is how it's been moved into we have a repo in the WICG.

Lots of information's there, a lot more than I'm gonna be able to present in this.

This is more of a high level overview.

So we welcome feedback and discussion.

And one note about the name which has come up is that there actually is already a WebID within the W3C.

So there's a bit of a naming conflict, and that refers to identity data structure and a protocol.

Where we're talking about trying to define a new API and exactly what to do about that name as TBD.

So this session we're discussing is...

First, we're gonna go over the problem, and exactly why this project was spun up.

It has to do with federated identity, and particularly though for the privacy changes that are happening on the web.

We're gonna go over that to try to get a shape of the problem, and to kinda get us toward the right mindsets to think about how to solve it.

We're going to give sort of the framework we're using to think about how to solve it which is we don't have a specific solution in mind.

We've done a lot of work trying to identify trade-offs, and tried to bucket the general approaches we can use and figure out in large part ask the right questions and let us go forward.

And then there's a little bit at the end talking about where we're going from here, like the challenges and the community engagement that we're looking to, that we're looking forward to.

Okay, so federated identity on the web.

We think this is a really good thing to exist.

It makes sign in for accounts on most websites much easier and more secure and better for users.

You can see, I think most, most people here understand how that works or have seen this on the web in some ways.

You can typically this website which we would call a relying party which is the site that you're gonna create an account in or sign into, might offer you a way you can enter in your email address or some other username, and then a password to create an account or you can log in, sign in with Google, sign in with Facebook, sign in with Apple or others which is one of the larger identity providers that exist on the site, on the web, sorry.

Identity provider is the term that we use for those services that can provide identity across origins to these other to the relying parties So I'll be using those two terms a lot, relying party or RP, and identity provider, or IDP throughout this.

Now this is good on the web, as I mentioned.

The IDPs can basically take on a lot of the security responsibility and invest a lot in it.

Account protection, abuse mitigation, and ways that offloads that responsibility from relying parties and also it just makes it easy for users because you don't have to have passwords, have separate password for every site which causes all sorts of problems.

So the problem is, these were built these protocols that enable a federated identity to work are built on top of the web.

They're not a part of the web per say, they exist independent to the web, I should say 'cause like if you think a lot often or when ID connect and some on their standards, and then there's ways that they are integrated into the web so to make these flows of work.

But essentially they just rely on the primitives that the web offers.

And some of these and at its heart federated identity relies on different origin is being able to exchange identifying information which becomes a problem when we're trying to change the web or change some of the behaviors of these primitives to improve privacy.

One example is all the browsers have made efforts to limit third-party cookie access although all the major browsers I should say.

So that when iframe is loaded within another web page, it doesn't necessarily have access to the same cookies it had when you loaded it as a top level frame.

This is problematic for some of these cases of federated identity.

In this case, you see personalized signed in a button at the bottom which is that an iframe that's loaded into see that who we were signed in as with the identity provider and offer you an option to sign in in that way becomes a problem.

There are also flows like reauthentication, like sign in reauthentication that can use third-party cookies and identity providers need to figure out a way to deal with that however best they can, as those cookies get locked down.

A larger looming problem is that as cookies become less useful for trackers and other tracking vector is essentially just identifying information across navigations, top level navigations where all sites have access to their first-party cookies.

So like we've seen this called navigational or bounced tracking.

And it's there isn't as far as I know, there's not a consensus on exactly what to do about this yet, but this is technically indistinguishable from how most federated sign-ins work on the web.

So anything we do to mitigate this or to give users more control over this kind of tracking will negatively impact federated signing and we could possibly make it unusable or unpalatable for relying parties or users which would decrease its usage and potentially in our worst case, and the worst case forces us back into just simply using usernames and passwords on every site on the web and creating all the attendant problems from that.

So we have a name for this, we call it the classification problem which is to say, we want information to be able to pass between sites, but we want it to be able to pass for specific purposes.

And it's hard to, for the user agent to be able to classify.

To be able to classify what data is for what purpose.

So we might say that we want, that users may want to allow this identifying information between identity providers and relying parties for the sake of signing in but they may choose not to allow it for other kinds of uses.

And a lot of note about the graphics here throughout is that we use colors, and this is, so these are kind of simple marks that we drew up.

We use colors to try to distinguish who's drawing what.

So for instance on this page, the browser Chrome is at the top and green relying party page, which is the site that you're signing into is gray and you can see iframes from the IDPs and blue and light blue and orange there.

So we'll be using conventions like that to try it because those are important as we try to talk about the user flows as we go forward.

Okay, so relying party consequences of web identity.

So, one of the problems that we have, one of the, up until previously on this, we've been talking about the problems if we try to make web primitives more better under user control and more private or privacy enabling then we have, then basically we have to adapt better identity to work with that.

But there are also some specific problems to how identity works and that also can create its own set of privacy that has its own set of privacy consequences.

One of them is simply the act of signing in which is not just federated identity but all identity.

Like if you, when you provide an email address to a website from an identity provider or simply from autofill or where you would type it in manually, you provide the RP Global Knowledge of who you are which can be correlated with other relying party's knowledge of who you are and generally every place that your website appears on the internet.

An issue with that is that it can be that that's beyond you, completely beyond user control, that those can be used to effectively do database joins and create a new attracting profile of that user which is, this is something we also like to communicate because we see this as a sort of one of the easier ways to create tracking profiles when some of the other ways become harder.

And on the flip side of that, something inherent to federated sign-in as it works today, is that the identity provider can create a profile simply because they know every site that you use to sign in, that you sign in, do a federated sign-in and see or any kind of authorization.

And this is, this has some advantages for the user.

Like you can manage the permissions that you're granting to your account access or your information, but it also is not, if a user would want to use Federated sign-in and not have this risk then they don't really have control of it because it's inherent to how it works.

Okay, so talking about what we're trying to actually cover in this, there's a few things we don't that we're kind of not really thinking about right now that are generally in the space of, if we've been asked to vote a few times or one thing is identity providers have the ability to impersonate users using federated sign-in, and that's hard to solve.

It's not an immediate goal, but it would be something that would be nice if we could do.

We're not too focused on the issue of that like cross-device sign-in state and there's, it would also be nice if you could carry sign-ins and sign-in sessions' credentials across devices or across browsing sessions in some way we're not trying to tackle that yet.

And one, another one for a federated identity and I don't know if this is how commonly this term is known, but it's often called the NASCAR flag problem which is going back a bit.

You see that there's a sign-in with A and a sign-in with B on the relying party.

Hypothetically, if there are six widely used identity providers, then you could have six buttons there.

You just have to sign-in which is like decals on a NASCAR that you could and that's not appealing or user-friendly.

So it tends to make limit the sites want to limit how many identity providers they can support which is not great for the ecosystem and it would be better if we could have provide a way that they could have it a more user-friendly way for users to select which IDP they wanna use.

But also I'll that's also not what we're trying to tackle at this point.

A big one which we are thinking about but we don't have some solutions for yet is enterprise use cases, because what we're talking about with federated sign-in with that identity providers on the web is really just one slice of this giant Identity Ecosystem.

There are many, many other ways it's used in enterprises, in EDU in Institutional Usage Government.

They have different privacy requirements so they may not care about it as much about the goals that we're trying to do and to be honest, we're not even sure we know all the use cases yet because it's hard to identify in many of the acquired cases.

So we're trying to engage and get better answers on this but right now we're not discussing exactly how that's gonna go.

And so we move on to, here's what we are thinking.

Yeah, this is important.

We don't really have answers yet.

I kind of wanna keep stressing this, that we're not trying to push us over with not like, like this discussion has been mostly been within Google.

We've been reaching out to other browsers, having some external discussions, we've been reaching out to the identity community but there are more questions than answers and we are still trying to shape it.

And in part that's why we don't have a firm proposal yet it is something we have sort of a more of framework and how we're thinking about solving it as we try to get more information.

And this is the reason it's hard because the ecosystem is really complicated.

We, as I mentioned we don't necessarily know all the use cases that we have that should be covered by this, it's hard.

There are really difficult trade-offs between usability, privacy properties, also offering developer control of the user experience and because this open ID connect has been rolling out for something like a decade and it has been increased that we have, there's a question of how easy can we adapt the web to a new API and what would be the timeline of doing that?

So all of these questions are sort of weighing into where we're going.

One thing that's a little bit separate there that's orthogonal to some of the other stuff we're gonna talk about is this notion of directed identifiers which tries to solve the problem I talked about when I said RPs can do a database join and correlate all the email addresses that they have to build a tracking profile.

Something we would like to build in and this exists in existing standards at least the notion exists, although there's no, it's only used in fairly limited context as far as far as I know.

But a directed identifier is one so it's essentially a derived identifier from so you have a global identifier such as your email address.

A directed identifier is something that can't be correlated and that effectively, every time you sign into a new relying party, they would get a different picture of who you are.

And then there's a one-way mapping from the, your global identity to each directed identifier.

Sorry, was there a...

Okay, I just noticed there was a chat.

Not sure I understand.

Well, maybe we can talk after if that's okay.

Great.

Sorry for the interruption.

Right, so we have, we would like this to be part of the solution that identity providers offering the capability for directed identifiers to exist.

So this is, let's just say, and we do see an example of this Apple has been using it as part of their federated new system it's that they call their Hide My Email feature.

And, but if we could have a standardized way of doing that on the web and one of the things we're thinking about is this notion of verifiably directed identifiers which allows the browser to positively assert that an identifier is directed and it's not global, and it can't be correlated with that can be done using hashing.

So that's a little bit orthogonal it has some trade-offs built into it as well.

And so, sorry, I thought fogging all to the other things we're gonna talk about which is the high level approaches I'm moving on to now.

As we sort of brainstormed all the different ways that we could do this, they sort of naturally fell into sort of three buckets of like, how do we actually think about this working this API?

What does it actually do?

So the first and simplest is that we can just think of it as like a new permission on the web.

So we can say, well, do you give permission to the IDP to give identifying information to the relying party, and please understand that there's some tracking risks according to it.

In each of these variations, there are questions underneath them that could affect how it's shaped, but this is the but at a high level that's we can think of it as maybe this is what that API simply needs to do.

It needs simply needs to say, we're going to exchange some information back and forth.

There's some tracking risks please make the browser or user agent, please make sure that the user is aware of it and is okay with that.

A second approach is a little bit more invasive in that the browser would actually mediate the protocols and take on some of the identity providers responsibilities and to its own UI which can provide some more user-friendly flows because it can condense permission prompts and make things clearer but also offers a lot less developer flexibility.

And a third one is a little further out which is a little which is just somehow think if we deprioritize the if we say, well this is hard to deploy so let's make it really hard to deploy, and then not to worry about backwards compatibility at all.

We can sort of rethink how Federation works and come up with something better which would be kind of redefining the role of the identity provider in the federated ecosystem.

So that in a way that would be creating something new that eventually would need to replace what happens today in the web.

And as with all of them's, there are significant trade-offs with it.

So this is an example of the first time you in the permission-oriented approach.

So if we think about it as a permission this is this and the first time you try to sign in to a relying party with a given identity provider, what might look like.

So the green section is browser, browser UI so you can see at the bottom and the first screen, and then it lasts it's rendering prompts.

Now that's not ideal to have two prompts like that.

There's some questions of like, well, can we get that down to one, if it's somewhat difficult, but it sort of depends on some of the privacy trade-offs that we're willing to work with.

And I should emphasize that when you return to relying party and sign in again, that those wouldn't necessarily be there because the browser is aware that you've already granted permission and things can be streamlined.

So this is generally the sign-up experience or if you're signing in with a new browser or with a fresh profile on your browser, that you'd have to see those.

In the middle here, we still have the, you still have the identity provider screen.

And here we have the identity provider offering to say, well, do you want to provide your real email address, or just like a fake email address or a proxy email address that would forward to your real email address which goes back to the directed identifiers that I already talked about.

And you can imagine that if I say, if the user selects that they just want a proxy email address, and then only a direct identifier would be provided, and that reduces the tracking risk and could potentially remove the requirement for another browser permission prompt on the third frame there.

So for mediation is the second approach which is not, as I said, the browser becomes a deeper part of this flow.

And here you can see that in this given in this example, given certain requirements are met which is a say, I'm signed in with that IDP with an active account, and that is a valid session.

When I click the sign-in button, the browser can simply show me its own dialogue and I don't have to necessarily interact with the identity provider at all.

Now, in some cases you would have to like if the provider needs you to sign-in again, are we off, then you would still need a redirect.

But this would require the browser to know a lot more about, have a deeper concept of what's happening.

Like you'd have to know about your accounts and the question is if you had multiple accounts, would it offer an account to their dialogue for instance.

And 'cause you have multiple emails to sign in from.

And also if I signed in, if you chose to sign in with a different would have to show the screen again, if I chose to sign in with a different account from the one I previously signed in with.

(coughs) Excuse me.

So neither of those address the notion of tracking, but the tracking capabilities that federated signing grants to an IDP which I mentioned earlier.

Simply the fact that you're using an IDP for signing, tells the IDP all the sites that you're signing into which is constitutes user information and constitutes a profile to some extent with the user.

The next step is to think about if we wanted to mitigate that, how would we do it?

And that's the delegation-based approach.

And now essentially I said that it redefines the role of the IDP in the ecosystem and what it has done is that the browser has become an essential component.

It's essentially become part of the identity provider.

The identity provider delegates the ability to mint identity tokens to the browser so therefore the browser can issue them without the identity provider knowing who they're being issued to.

Some variants of this have been, this is not entirely novel, some variants of this have been tried and implemented before, but we're trying to match this with some of our other ideas such as directed identifiers and also trying to cover some cases like backup.

I'm not gonna fully go through this right now, the sort of this messy diagram sort of illustrates how we envision the protocol working.

We've been trying to hash this out and modify it for a while now.

But it's more a bit, more of a, the more the point is that we're demonstrating that it can be dyed, it can work.

Like we can do something if we throw out requirements for backward compatibility, then we have this system by which the user agent can be granted a certificate to provide directed email address, or direct it to issue directed identifiers to prevent, to prevent the joining problem and correlation by the relying party and also hide from the identity provider who were issuing these things to.

Now the server side run party backwards compatibility is something we think about quite a bit.

We think of it sort of as a bright line.

It's a lot easier to deploy if we don't require people to modify their current open ID connect servers or whole lot servers, if they don't have it, to require any code changes to that.

So basically if we can, in some way tunnel the existing protocols and hide from the end points that certain things have been in the middle.

That the, this flow in particular doesn't need that at all.

So one other thing I liked when I mentioned is that we've talked about sign-in a lot.

We're also thinking about authorization which is not a sign-in flows which is think of it, I guess in my head, I think of it that this is a little off, but all the other things it does besides being a layer that open ID connect is built on.

So Auth is used for many things, people use it for Twitter, People use it for Google services for Facebook all the time, and that's effectively your gift granting capabilities other than simply being able to sign into a website.

It's a lot harder for the Glint Browser to intermediate that because we simply can't anticipate all the use cases.

So we think it's sort of degraded it into a permission-based flow.

Although there's still questions about how well we can do with that.

Going forward, there were a lot of challenges, open questions.

One of the things is direct identifiers.

So as we think that they are very good and useful, they provide users a lot more control they are very privacy enabling.

Relying parties are mostly fairly negative on them, they like to have users real email addresses, and they may or may not be intending to join them and to create tracking profiles after the fact but an example, there are a few reasons for that, one which we have written here is that, customer support.

If I sign into, if I go to a newspaper and I sign in using federated identity and then I have a problem with my account or then I wanted to contact the newspaper and tell them who I am, I can't actually do that because they have a very specific directed notion of me and they don't have any, they don't know my, as they only know my full name, unless I've told them outside of the flow which defeated the point of directed identifiers and they don't even know my real email address.

So there needs to be ways to work around that and it is going have carry some costs.

We have technical questions like how easy is it to enforce directed identifiers?

Right now, a lot of the, a lot of these identity flows happen through service or server communication, so effectively when you do this action through the web, the token gets passed from the IDP to the RP, then the RP contacts on an API on the IDPs service to actually get the identity token and the browser doesn't even see it.

So if we're going to really think we need to enforce direct identifiers in a technical way, we can't allow that to happen or we would have to have to assume that they're not directed if they do happen.

And then there's the question of simply like, well, can we have how we useful is it if we simply requiring a policy that these direct identifiers are directed and not require that the user agent be able to verify it, is that seems less good but the question is how much and these are this is part of contributing to the trade-offs that we're discussing?

And then I mentioned, I had mentioned earlier though, the issues with enterprise which we're giving some thought to as well.

And we welcome feedback and engagement on this.

This is the we're trying to reach out to stakeholders the relying parties at any providers, both public ones and like enterprise focused ones or companies that provided any services.

We're talking to other web browsers about or a browser implementers about their, how they're thinking about these problems and how they're prioritizing them 'cause everybody's thinking about privacy to some extent now, and the Identity Ecosystem is huge.

That's, it's been around for, I said, I think open ID connect has been in deployment for about 10 years, but it's even much older than that really.

And yeah, feedback is welcomed.

So there's again, a link to where WICG, the repository there.

And that concludes the presentation.

Thank you.

Sponsors

Platinum sponsor

Media sponsor

For further details, contact sponsorship@w3.org