Secure Data Store Group

Facilitator: Kaliya Young, Dmitri Zagidulin, Tobias Looker

Update on the work of the Secure Data Store group that is jointly convened with the W3C Credentials Community Group and the Decentralized Identity Foundation.

Previous: Voice Agents All breakouts Next: Web of Things Applications and Use Cases

Transcript

Yeah, hello everyone.

Welcome to the Secure Data Storage Working Group, shortly to be possibly renamed, Confidential Data Storage Working Group, breakout session at TPAC.

So, we wanna do a brief introduction of the work that we're doing and answer any questions and invite you to collaborate, to come join us.

The link to the session slides is on IRC and in Zoom Chat, I'm also screen sharing.

Feel free to ask questions as we go along also.

Usual mechanism, Q Plus and RC, or if you can't get to RC, say something in Zoom Chat.

So I'm here co-presenting it with co-chairs and co-editors of the working group.

We have here on the call, Kaliya and- Hi, everyone.

And Orie. Hello.

And please, Kaliya and Orie, jump in and feel free to add comments as we go.

Dmitri, just one small thing, excuse me, to interrupt you.

Not at all.

But just, we have to be precise here.

When you are talking about Data Storage Working Group, this is not a W3C Working Group.

That's right, so- We have to be careful about that.

Got it, okay.

So Confidential Data Store Group, which is a joint group of W3C Credentials Community Group, and Decentralized Identity Foundation.

So it is a working group chartered under DIF, but it is with a W3C involvement.

The IPR policy is compatible.

There's been titanic amounts of work done between W3C leadership and DIF leadership to align the API polities, yeah, to bring the two communities together.

We have, so usual weekly calls in order to join, sign the IPR Agreement.

And we have a working draft specification under decentralized-identity/secure-data-store.

It is...

We'll get into the layered nature of the spec shortly, but in general, it's a coming together of multiple communities focused on Confidential Data Storage.

The W3C CCG Encrypted Data Vaults work, the DIF Identity Hub work, as well as a number of other smaller communities such as Hyperledger Aries, W3C Solid Community Group, as well as other similar groups working on matters of encrypted and Confidential Data Storage.

The group is chartered to create a low-level foundational layer for secure data storage in general.

So, very relevant to the W3C and to TPAC audience in that it is focusing specifically on an HTTP API and encryption for not only data they didn't transition, which covered by TLS and so on but also at rest, covers the protocol and the data model for them.

The problem that we're solving is client-side full encryption.

Everybody's familiar with the current state of cloud storage providers.

There's a number of very popular services like Dropbox, Google Drive, Amazon's AWS S3, any number of cloud storage providers.

The challenge with the current state of affairs in cloud storage is one, lack of standardization.

Everybody has a completely different, incompatible, both storage and authorization protocols and data models.

But the other underlying problem is that the cloud storage provider has the ability to access and view stored data.

The various companies have varying levels of policy and guarantees.

But at the end of the day, the technology is such is that it is server-side encrypted, and the company holds the encryption keys.

What we're attempting to do is to provide a full protocol of data model, sorry, full suite of protocol and data model to make sure that the data is encrypted before it hits the cloud storage layer.

So we're tryna approach this in a separation of concerns, layered manner.

We have a low-level encrypted storage primitive.

There's currently Title-Encrypted Data Vaults.

We have a number of richer primitives than APIs on top of that, the current working name for that is Identity Hub.

And then obviously the usual engagement of outreach to similar groups across different standards bodies.

So we've had presentations and outreach to the minded global movements, ITF, GNAP, which is the Grant Negotiation Authorization Protocol various IPFS and ledger-based groups and so on and so forth.

So many different, similar communities.

Okay so a couple of bullet points that I won't read out about the specific limitations of the charter.

We do wanna point out that as much as possible, we tried to tightly constrain the scope to the HTTP API and a couple of crucial data models.

So the idea of we're not gonna be designing any new cryptography.

We're not inventing any new authentication or authorization mechanisms.

We're not focusing on new querying mechanisms.

We're not developing any blockchains, or databases, or anything like that.

Though the group is interested in non-HTTP APIs for constrained devices, for Bluetooth, for web of things.

The group is explicitly focused on an HTTP API first.

So I wanna say a couple more words on the layered architecture approach to this then hand it over to the other chairs and editors and answer questions.

So, way to read this diagram here is from the bottom, up.

At base level, we recognize that there is a countless number of storage providers and mediums.

We have any number of cloud providers, we have local databases, we have the local file system, mobile devices and their intricacies, and any other storage both protocol and data model that you care to use.

So all of that is abstracted as Layer A on this diagram as the Raw Bytes Storage layer.

And our spec, our work really begins here at Layer B of how do we encrypt these objects before each individual adapter to various storages takes over?

So the API of the various storage methods is out of scope, the adapters are out of scope, what's in scope begins at Layer B. We have in terms of data model, only three really primitive entities.

We have, what's known as a vault, which you can think of as a flats, one level folder.

We have a number of encrypted resources that lives in that vault.

And we have indexes also encrypted for encrypted querying on the vault and that's it.

So we have the configuration of the vault, the resources themselves, and any number of indexes.

We have a typical CRUD, HTTP API on three of those.

So Create and Delete and so on, to the vault config, to each individual encrypted resource, and to the query indexes.

We have a primitive layer of authorization which is currently what the group is focusing on you know, this past week and the week before that.

We're very much in conversation about putting together selection protocols for which authorization scheme we're going to adopt, and looking at existing specifications such as OAuth2, GNAP, linked data authorization capabilities, and so on.

So currently the group is focusing on the parts outlined in green.

The other important thing that I haven't mentioned so far but is crucial to the group and sort of differentiates it from other similar communities, is not only the emphasis on client-side encryption and required encryption.

So everything from the data to metadata is always encrypted.

And we often get the question of, Okay, what if I wanna serve a public HTML file from my confidential data store?

And the answer is, You can always hand over the decryption key right with the URL to that resource.

It's very similar to how whole disk encryption works on any number of desktop and server operating systems.

You can have a plain text, public text file on your server, but by the time the bytes hit the hard disc, the operating system encrypts it.

So it's important to know are encryptions required but that also means that we can serve public use cases like public data use cases.

So that's one differentiating factor required client-side encryption.

And the other differentiating factor is replication.

We recognize that the current storage paradigm is very much multi-cloud, multi-device.

And there is a need for spec level, primitive operation level support for replication and synchronization, and interplays with encryption.

So the other frequent question that we get is Storage isn't new, what are we doing?

Why aren't we using any number of existing standards?

And our answer to that has been, There is no current standard that requires client-side encryption.

And requiring client-side encryption has implications, architectural, spec level implications on the data model, the protocol, the authorization.

So starting from those two assumptions, encryption and replication, affects the rest of the layer and requires this new stack.

But one of the things we hope to convey in this presentation is we are a standards body that very much respects prior arts and other current projects in this space.

And as much as possible, we want to reuse existing standards and not reinvent the wheel of course.

So that's about it for an overview.

Kaliya or Orie would you like to add anything?

I think you did a great job of explaining kind of what we're working on.

I think another sort of high-level, sort of statement to make around the work is there's a lot of interest in, you know digital signatures and decentralized identity, right?

And so there's this work and the did come working group, which are both sort of a little bit more oriented towards key agreement and encryption versus digital signing.

So just from a generic, you know, where...

How do you go and use these decentralized identifiers?

Those are sort of two areas where we see you know, some work that isn't related to digital signing, which I think is important.

Thanks.

Yeah, just to add to that, that's a really good point Orie.

Everybody involved in this group, is also deeply involved in other standards, bodies and open source communities.

We definitely did not want any more additional work.

The reason we all dived into here is if we now have this advanced infrastructure for decentralized identifiers, for signatures, and encryption, we now have a need to store encrypted and confidential data somewhere.

And this is our answer to that.

This is the storage layer of wallets and key management, and credentials, and all that stuff.

All right.

So let's open the queue for questions.

Should I stop the recording now?

Sure I'm absolutely happy to record through the questions, whatever you like, but yes.

Are the others in the group here okay being recorded?

I'm fine but I'm gonna have to drop in a few minutes anyway.

(Ivan chuckling) Okay.

Okay, so let's go on recording and then if somebody objects, because she or he doesn't want to be recorded, then just tell me please.

You can use either the hands up on the Zoom if you want.

Dave, I see you are on the queue.

Right, this isn't so much a question but a use case that I wanted to highlight to everyone in the group here.

One of the use very important use cases that has a lot to do with W3C, is using this technology to enable web applications to get access to storage.

So one use case is visiting various websites from which you would download web applications and you can essentially bring your own storage to those web applications.

So web application could ask you for access to your Identity Hub or an Encrypted Data Vault, and you could provide it access to it, and then take that storage with you regardless of whatever browser you happen to be using.

So I wanted to highlight that as a sort of bring your own storage use case is very important from a W3C perspective.

Thanks Dave.

That's a really good point.

Can you elaborate a little bit on this Dave?

How would that work for me as a lame user?

Sure, so currently when people use web applications they, you know, they visit a website, their browser downloads the web app, and the web app communicates with the server that served the web app and then uses whatever kind of storage is involved there.

That might also include using a client-side storage in the browser such as indexedDB or localStorage.

All these are different storage mechanisms that are in use today.

But all of those storage mechanisms have different properties and characteristics from what's being proposed here.

So if you're using storage that is on a server, that storage typically is gonna be...

Is not something that you get to bring to that application.

It's something that the application is...

Now let me put it another way.

If there's an application that you're using on the web, and it provides a certain feature to you, whatever storage mechanism that application is using, you are effectively locked into that application.

So there are a number of different applications that might offer you different features on the web.

And some of those features could be offered by application A or application B.

If you're using the web as it exists today, there's no way for you to move information from application A to application B unless point to point integrations are created.

So by using Encrypted Data Vaults and using this technology, it is possible for you to transition from using application A to application B.

And that's one of the use cases we're targeting.

So it's very close in a sense to what Solid tries to do except that it is also combined with very strong encryption.

Yes, that's right.

And they've...

The Solid Community has been a participant in collaborating with this work.

Okay.

Thank you Dmitri you are on the queue, I am happy to manage the queue if you want.

Thanks.

I appreciate it Ivan.

Yeah so I wanted to add to both of the things said by Dave, one is there's definitely representation from the Solid Community of the W3C Solid Community Group.

I am one of the editors of the Solid specification and long time open-source contributor in that world.

We see Confidential Data Storage and Encrypted Data Vaults as very much complimentary to Solid, in the sense of here in this diagram, SOLID is operating on the top layer pictured in blue here.

We want to provide a low level primitive to store solid data under.

And inter-operate between the two layers of the higher level semantic operations, that's all it is concerned with and low-level storage.

And to add to what Dave was saying, so there's a lot of emphasis and a lot of momentum behind offline-first applications for progressive web applications.

And we feel that we address an important niche in that ecosystem because the offline nature of it is currently addressed.

And I know there's a lot of work and a lot of breakout sessions on this during this TPAC with IndexedDB and localStorage mechanism so offline browser provides a storage.

So we're trying to address what happens when you reconnect to the net.

Currently each progressive web application rolls their own replication and synchronization protocol, rolls their own read/write protocol to transfer the data from local storage to some sort of cloud storage so that it can be usable with other browsers, so that you can do server-side processing on that data, any number of things.

So we're hoping to standardize, and secure, and encrypt that server-side component of progressive and offline-first web applications.

Thanks.

Hadley.

Hello.

Hey.

Two quick questions. First, how far are you in the process of building this, or developing specs or any of the proof of concept or implementations?

Where are you in the broad process?

And second, do you have a document of use cases?

I'd love to have a look.

Thank you.

Great questions.

So when was...

We were chartered earlier this year- In March-ish.

February, March this year. So where are about that far along we have currently three independent implementations of the Encrypted Data Vault and about one implementation of the Identity Hub in blue so...

And yes, we do have the working group started, the chartering process started with a use case document and we have it in the specification repo.

I will link to it in IRC shortly.

Brilliant.

Thank you.

Gerard.

Hi guys looks really interesting just...

I think I lost the picture that I was gonna refer to.

Sorry Dmitri.

There it is. The question relates to Layer B. And I think to at least point, it'll be good to view the use cases from our perspective, we're interested in using that encrypted vault to store a public-private key pair.

Ideally something that inherently proves that a user has authenticated and trusted that device, that endpoint.

Then potentially that other endpoints could join it.

So we're sort of looking at it from a...

Almost not the same use case as yours is storing centrally, but being able to re-authenticate an endpoint.

And for that perspective an encrypted document, you know, sounds very appealing.

The ability to put something in there that you cannot get the private key out of, but that you can perform operations on it.

Signing as proof that this endpoint itself is present, signing it to potentially give access or authorize transactions because it's coming from a trusted endpoint, is that use case...

And apologies I haven't looked at your stakes in your details.

Is that use case at all something that could be covered by this?

'Cause the moment you've got encrypted documents on an endpoint, you know, it appeals to that use case.

Is something like that possible with us or is this more for sort of sharing data security between various different endpoints in the same set of data?

Great question.

So I would say both.

Because this is encrypted data storage.

Yes, you can absolutely store public and private key pairs.

In this diagram, we specifically call out that there is some intersection with key management systems and wallets.

Those are shown in red here on the upper-left corner.

So we do recognize that this is geared specifically for data storage, for private key storage, and confidential computing operations.

Traditional key management systems are better suited to that.

And we do leave them out of spec of this scope.

So Encrypted Data Vault depends on having key storage and key management architecture.

If Orie or Dave, wanna comment on this further.

Go ahead, Dave.

So your use case could be served by Encrypted Data Vaults.

My personal opinion would be that it would be a better idea to use some kind of KMS system for storing private key data.

Certainly the main reason for that is any private key data whether encrypted or not it's always better on a...

Especially if it's for digitally signing information.

It's asymmetric key data is better placed on systems where it will never be exfiltrated.

And Encrypted Data Vaults enables you to replicate to that encrypted storage and move it across different providers.

One of the things that differentiates our work from some other work is that we take the tact that something that's encrypted doesn't mean that it will be safe from exfiltration forever, or safe from clear text view forever and all encryption has a shelf life.

And so part in our design, it is such that if you wanna get access to encrypted data or an Encrypted Data Vault, you need both the keys to decrypt but you also need some authorization to get access to the encrypted data to begin with.

So steps are get your authorization, get the encrypted data, and then you need to have keys to decrypt.

If you're talking about having private keys, you can certainly have an Encrypted Data Vault that you keep off that's effectively accessed only by a single party.

But again this design is to create trust boundaries so your encrypted data can live off on whatever storage provider you wanna use.

And that the trust boundary there is that the encrypted storage provider is going to make a best effort never to try and break your encryption or do anything with it and enforce your Access Control Policy.

That's not necessarily.

There are use cases where that makes sense for doing things with private keys, but I would push you towards using KMS systems that guarantee that your private key material is not gonna be exfiltrated depending on how important it is for your use case to ensure that that doesn't happen.

Certainly an Encrypted Data Vault for example, does not have any hardware security modules that puts private keys into hardware.

So whether or not you use an Encrypted Data Vault for your use case kind of depends on how much security you need.

Yeah I think to some extent we're looking for endpoint identities and the interest was Layer A and et cetera there.

And it'd be great if you could hook a TPM in there and we can create some public-private key pairs in there and therefore store documents and sign documents on the device.

Do not just prove that the document was originated.

Maybe the document is uploaded from a certain endpoint but I guess you've answered my question that this is not the primary function and the goal you're trying to solve.

So thanks for that.

Thanks for the answer.

Yeah, that's a really interesting use case of binding the KMS aspect to the data storage endpoint, but yes, like they said.

Yeah, I think that ultimately the, I mean, if you're storing it centrally that could be the private key of the human, of the owner or of the identity.

I think.

All right.

Potentially combining that with an endpoint identity to prove origin, prove where it was uploaded from.

And other things might be useful as an extra layer but again, I don't think that's the primary focus here.

So I think it might just distract from the work you're already doing.

Thanks, and I wanna add to what Dave was saying.

That's a really good point about the combination of encryption and authorization.

There are a number of similar projects that focus on one or the other.

So for example, Solid has the authorization layer but not the encryption layer.

And there's a number of projects in IPFS, various Ethereum Communities that use encryption as the authorization layer.

Meaning if you can decrypt something, you're allowed to have access to it.

We take the approach.

We emphasize that you need both.

Definitely need encryption because authorization may not be sufficient in some use cases and especially for opaqueness to the storage provider but also encryption itself is not sufficient without the authorization layer because as Dave mentioned, all encryption has a shelf life.

We do think that those two things go better together.

Thank you.

Any other questions?

Concerns?

I don't see anybody else on any queues.

All right.

So what are your plans as for when do you plan to complete the work?

What will happen afterwards, et cetera?

I think we're just starting to articulate like a work plan so that's not totally clear yet.

We do have, as Dmitri said, working implementations, and those folks, really want a solid spec sooner rather than later.

We're also having a conversation about, do we keep it all in one spec or do we write one spec for the Encrypted Data Vault side and another for the Identity Hub?

That's all you know, we're in process with that right now.

So we don't have clear answers, but we will soon.

If you have opinions come and join us now while we make those decisions.

I agree.

Thank you Kaliya.

I'm sorry Dmitri.

I wanna add just a quick comment to that.

So the group formed trying to balance those two tensions.

We have existing implementations standardized but we want to leave room for the traditional W3C standards track.

So the DIF aspect of the group right now is for more incubation and rapid prototyping.

But the fact that this is coming from the Community Group and is being developed under W3C compatible IPR, we do very much want to go the full working group standards route after incubation.

Okay, that would have been my other question.

I mean, you know, I wouldn't be a W3C team member if I did not ask that.

So whether eventually this is something that you consider and there's no commitment here at all.

But if there is an idea or you know, exploration of eventually setting up a working group, that would have been my question but you essentially, you answered that this is a possibility in the books.

That is very much our hope.

And one of the things that we're trying to do here is to rope in and engage several communities that have rapid prototyping and incubation as their strengths, but are hungry for more traditional standards work as well.

So we're trying to provide that on ramp.

But then the question that will come eventually if you go down that route, is whether the API such that it would work as part of the browser as well, or it's only, you know, separate thing working with no door Python or whatever.

Great question.

And I didn't really mention or highlighted in this presentation, but yes we very much that there's a strong part of the community that would like a browser API component to this.

Imagine if you had a secure primitives to say I have my IndexedDB now replicated to this encrypted store, that sort of thing.

So yes, browser API component is very much a wishlist and a use case for this group.

And do you have contact with browser developers?

Or let's put it this way.

I see your face.

So I see that you are not giving a yes answer right away.

I think it would greatly help a transition, if you had some contacts with some browser developer groups well in advance would sort of look over your shoulder, when you define your APIs that they would be, you know, browser compatible.

And don't ask me to look at that because it's not my expertise, but you know, you can find people in the various browser communities.

I was just going to ask in fact you specifically.

So at the moment, we have close contacts with the Microsoft Edge browser team, but certainly would love wider contacts in Chrome and Firefox and so on.

Yes.

That is something we very much want.

Yes.

Hadley, do you think that you have some contacts that we can give them?

I was just trying to think that through.

Edge definitely comes to mind.

I would imagine there's somebody on the Chrome team who's gonna be interested but I don't know who it is.

We can certainly try to find out.

I can absolutely take it back to the tag and see if we can drum up some contacts for you.

That will be so wonderful.

And I am happy to play the go-between them.

Great, thanks.

Appreciate it both of you.

Equally are you at all connected to any relevant work in the ITF?

Yes so.

Okay. Not formally connected, but we've had presentations on our group from the Jose Community, the C Boar Community from ITF and- Instance coming from the GNAP.

Yeah.

And the GNAP working group.

So Justin Richard, one of the co-editors of the spec or co-authors of the GNAP spec is aware of our work is presenting next week.

And so yes, as much as possible we do wanna maintain contact with our ITFs.

Okay great.

That's really helpful.

Okay.

Okay.

Thank you very much.

Anybody else?

No, it's useful to see what you're up to.

Thanks so much for the presentation on the session.

Thanks everyone.

Thank you. I'm enjoying the calls.

See you on GitHub.

Thank you.

Bye-bye.

Thanks, bye. That is all.

Stopping the recording.

Sponsors

Platinum sponsor

Media sponsor

For further details, contact sponsorship@w3.org