LWS WG Face-to-Face Meeting (Day 1)

Meeting minutes

[laurens presents site logistics]

laurens: scribing on IRC. everyone should do about an hour
… I'll put the scribe-related stuff into a shared folder on Google Drive

[social event logistics]

laurens: goal over the next few days is to deveop spec from our UC&R
… WG is 1 year old
… which gives us one year to get to REC

Introductions

acoburn: Aaron Coburn (Inrupt)
… I've implemented three or four LDP servers so I know it well
… I've been involved with Solid around six years so I know the pain points
… I'm most interested in simple/clear ways of authenticating access to storage
… there will be a lot around that, but that's the core for me.

jeswr: Working on Solid for ODI. Ozzie
… I'd like to see maximal deployment of Solid and I want to enable an LLM to find stuff for me.

laurens: I've been involved with Solid ~ seven years. I've encountered a feww frustrations:
… e.g. a standard way to request access to resources

bendm: My colleagues and I have been doing SemWeb integration for 15 years.
… Most important UC is security/access for closed data

bartb: working on a dtasharing platform using solid. built on ESS
… we have collaborations with health data sharing initiatives
… main challenge: sharing same data with different parties

<bendm> beau

<Beau> Beau

Beau: working on a project based on ESS also on health use cases
… hope to extend Solid to authenticate access inside resources
… background in commerce. data integration for two years prior to Vito

wonsuk: coeditor for Onogology for Media Resources and API for same

wonsuk: proposed ontology to integrate most of metadata formats
… from 2015 I've been Automotive WG for intra-car comms
… last year i've been working on personal data and phishing UCs (big prob in South Korea)
… funded by govornment

ryey: post-doc at Oxford U (with Jesse)
… research on data usage control and privacy-preserving computation
… want to work on data authorization and data usage control
… personal experience in decentralized technologies. see Solid as the only choice
… i watch out for security and privacy

starl3n: Stevent De Costa: Ozzie. Working in Cambra
… I wasn't expecting Solid to be so well represented as it is
… I'm interested in verifiable agreements

ericP: Eric Prud'hommeaux, mostly interested in formal methods
… patient control over data, as implemented in an NHS project

eBremer: Erich Bremer (Stonybrook U)
… working on software to manage lifecycle of digital pathology imaging
… want to grease the rails of collabortive research, specifically federated learning

Frederik Byl: making health data exchangable/interoperable

laurens: we have a year. we can ask to extend but we need to have a plan

acoburn: reallistically, we need to get to CR well before the 1-year mark. need:
… .. spec text
… .. test suite
… .. implementations
… want to leave here with basic consensus on what the spec will look like, i think we can construct it over the next couple months

laurens: back-of-napkin sketch of what the protocol looks like

acoburn: are there areas where we have profound disagreements
… and can we bridge them or defer them for later
… we don't want to get bogged down in unfinishable tasks

jeswr: would be good to have a timeline for deliverables

laurens: i've been implementing apps on top of servers
… challenges: can't give fine-grained access
… apps impersonate me rather than being subject to finer access control

[delegation discussion expected tomorrow]

laurens: aspects of our protocol are complicated by e.g. controling access by non-authenticated principles

[slide: Things that bother me...]

laurens: we have some ideas for how to solve these problems. want to gather those over the next few days
… what are the remaining blind spots?

bendm: server processes are focused around your POD
… identity bottleneck is a hurdle.
… would be nice not to have to create new accounts all the time
… discovery would also be nice

acoburn: we call this the bootstrapping prob:
… we have a new app or user and the setup is cumbersone
… making that simpler would improve developer experience

bendm: our research focuses around integrating existing identiy infrastructure into Solid

ryey: we have attempted difference apps on Solid using e.g. MPC
… or calendar or recommendation services
… have comments on specs and toolkits
… i've i'm developing an app and i need to store shared data as well as app configuration. where do i store it?
… what if data changes, how do I respond to that?
… suppose i want to store in one format and other apps want to consume it in another?
… data interop focuses on that
… what should we consider as part of the knowledge graph
… how do we advertise the existence of e.g. identiy services

Frederik Byl: +1 to ryey
… fine-grained access
… imposing a hierarchical structure on a graph leads to incompatiblilies

bendm: +1, but...
… it's not entirely the protocol's problem
… we can have different views of the data. but maybe not on V1

bartb: [sparql access over ACL-controled data]

jeswr: should be possible to perform ACL-controlled SPARQL

acoburn: possible but very expensive

bartb: question of what you want to afford at the protocol level

laurens: trade-off of the expense of the protocol

Beau: Q of what should be required in spec
… at the moment, hard to search and uniform access
… spec aims to enable decentralized identity and storage

[slide: ...and how to solve them.]

laurens: agenda
… : afternoon: core operations (storage protocol)
… auxiliary resources, metadata handling, discovery
… (of data and capabilities)
… outcome I would expect: initial scope for the core operations

(all auxiliary files are at https://drive.google.com/drive/folders/1-zoqp1qwEHo0YgXN8c8Ld7NqvbPBdjc2)
… other outcome: refined and prioritized list of core operations

laurens: 2nd topic (at 16:00): scope of query in LWS
… what do actually want query to do in LWS? what are the blind spots? what do want for v1?

bartb: could the issue be data discovery iso query?

laurens: could indeed be
… so we might want to rename this slot to discovery
… it's not like we have a fully agreed opinion on this already

acoburn: this session is only 1h
… scope is really important here
… what kinds of questions do we want to ask to a Solid storage?
… what is realistic for us as a group?
… let's set us up for in-depth conversations on the other side of this meeting

laurens: core and discovery feels like a natural fit
… at 19pm: dinner at otomat
… for tomorrow: most of our morning will be about authorization
… afternoon is about authentication: should we flip that around?
… concerning authorization: pain points: granularity of authz, serverside client, delegated access for clientside clients

acoburn: authz is more complex than authn, so good to give most time to authz
… tackling them both in one day is a good idea
… from our charter: we are not going to invent an authn mechanism
… we may talk about how LWS integrates with existing authn mechanisms
… eg here's how you authenticate on the Web, not via a browser, etc... it's more about integration

bendm: also: seeing which authz mechanisms work for which use cases, and maybe we need a more flexible system

laurens: yes, you might want a more abstract interface to the authz server, but that gives complexities wrt interop
… feel free to add more topics at https://docs.google.com/document/d/1ZgQ3BjODHYhQ_PJEaTzRxIhXXDCl_ivDGScYI9nUwWw/edit?usp=drive_link
… we can continue authz discussion in the afternoon
… for authn: let's talk about use cases, and which authn mechanisms are in use today

acoburn: I'd like to start today with clarifying the conceptual entities
… you'll need those for authn/authz
… eg external server, user, user using application on the browser, etc

ryey: we probably need to discuss how entities will interact

laurens: we'll also bump into identity of users and applications
… i.e. what we need in terms of identity
… (I'm updating the agenda on the go)
… when discussing authn, we also need to discuss the complexities we introduce, we need to lower complexity for uptake

jeswr: I'd like to discuss between conceptual model for authz, and the protocol/flows used for that model
… eg model is WAC, flow is UMA to get the tokens

laurens: in that model, you have different levels, eg RBAC, ABAC, etc
… how you materialize that, could be via WAC or ACP
… how you map that to an authz protocol
… so 3 levels (conceptually, how do you materialize that, what interactions you need)

acoburn: and 4: what are the governing rules
… "the rules governing enforcement"
… e.g. you have ABAC, with attributes (user identity, app identity, time/space)
… you need a way: how do those attribute come together in a policy language

laurens: e.g. in Solid, we don't have, in ACL, a DELETE mode

ryey: are we only talking about access control, or also usage control?

laurens: also important: how do you request and grant access
… i.e., what comes before
… also ties into data discovery

ryey: should we then move interaction patters sooner?
… (interaction patterns discussion I mean)

laurens: I guess it'll get discussed

jeswr: also, the trust model: what can you consider authoritative data
… right now, typically 'what are your trusted IDPs'
… more generally, what is your root of trust?

laurens: i.e., what are your trusted entities in your network

acoburn: so, what is the model for understand how trust is delegated?
… with webid, you say, here are the IdPs I trust
… currently, it's not a chain
… the way delegation generally gets handled, is via chains to a root of trust
… idp lists got mentioned, but they don't scale
… we need to think beyond that

laurens: (also social dinner at day 2 at 7pm Patrick Foleys)
… day 3: start with notifications
… some level of support is requested from the community
… to be discussed: how much we want to support: resource changes vs inbox

acoburn: also, to be seen whether access requests are related to noticiations
… historically, there has been websockets emphasis in Solid, there are a lot of problems with websockets

laurens: also, works well for clientside app, but not for serverside
… good stuff in the notifications spec to look into

acoburn: and everything interrelates with everything
… eg solid notifications: protocols have insufficiently defined authn/authz model
… let's keep in mind how authn/authz works for notifications

laurens: also on Friday: test suite and multiple implementations

acoburn: we need two independent implementations (i.e. not from same organizations, not using same codebase), for each feature, everything else is marked 'at risk' after CR and gets droppen
… so we need to be conscious what would be put 'at risk'

ericP: a good test suite will balance between 'working for generic implementation' and 'cater to specific parts'
… eg 'here is a known user in the system'
… so make clear that it's easy for implementers to implement a specific class to do a specific functionality
… good idea is having facets in tests
… so make sure that a test links to a specific facet, so that you know which test tests which feature
… so, some shared state, some structured manifest, and facets in the manifest

laurens: we have some buffer on Friday
… let's discuss roadmap and responsibilities at the end of each day

bartb: is the discussed priorities of last week taken into account?

laurens: indeed, we took that into account for blocking time for topics
… and we need to circle back to each during each focus discussion

acoburn: the requirements document today is in a much better shape, but still needs a lot of work
… let's give that appropriate attention as well

starl3n: there's a CKAN extension compatible with Solid
… CKAN file storage

DataShades/ckanext-files

ericP: to ask how the CKAN extension changed the implementation
… that's going to influence our discussion, when we're figuring out what kind of API or model we need to support that extension

starl3n: there's no direct implementation with Solid for CKAN
… CKAN is a rather basic implementation, but over time, companies got more interested in granular access control
… people that have a public data catalogue, might have the actual data internally, and via AWS, so the model is decentralized
… it has the kind of hooks to create that kind of affordances
… I see a world where CKAN data integrates with personal data

Beau: notifications seem more complementary than data validation and data shapes
… shapes also relate to querying

<Zakim> ericP, you wanted to ask how the CKAN extension changed the implementation

Defining core operations of the LWS protocol

<starl3n> the note I was adding re reference implementation ideas is here: https://docs.google.com/document/d/1YapaCMiwSdqY1uHc5XfB6d_YvIkNwhP982sXbIY0BnM/edit?usp=sharing

acoburn: We want to see how all of the conceptual entites that we are working with relate
… if 2 different types of entities we have posesss similar properties then they may be the same conceptual entity
… first list catalog the entities we are interested in
… The conceptual entites we have are: Data resources (RDF and non-RDF), Containers, Metadata Resources and Access Control Rules
… Data resources: These could be XML, a movie, RDF, JSON. Data in some format with some media type.

bendm: Do you link this to the endpoint where you get this from.

acoburn: We will get to that, in-short yes, we will want to define the HTTP operations to obtain these
… Containers: These are collections of resources and other containers
… Metadata: Information that is not part of a resource or container itself, but instead describes it
… In Solid today the RDF resources are largely self describing. For other binary files like an image we don't have a good way of describing properties of those files
… I am thinking of a metadata resource to describe properties of this resource

bendm: Could this be a data resource which has another type of link within another type of data resource. This would enable metadata on metadata resources.

acoburn: Potentially. I want to distinguish between data resources and metadata because you can then bind their lifecycles together
… Then you also automatically have a location where you manage metadata about given resources
… The semantics of data resources and metadata resources will be different. You want to ensure, e.g., that metadata resources cannot be deleted. Just actual resources.
… We could imagine a metadata resource specific to a data resource, container or storage. Whether each of these metadata resources have different semantics depending on which of these 3 things they are describing is to be determined
… Access Control Rules (ACL) are kind of like metadata resources, but I expect them to have different semantics.
… Credentials: Unlike the first 4, these are not necessarily addressable. These are the kinds of things that you use to obtain access to the first 4 types of resources.
… Validation / Key Material: These may be part of certain kinds of metadata.
… Services: This is a very broad category, but a placeholder to say "maybe you have a way of doing query" or "maybe you have a way of doing validation"

bendm: The storage server itself should be its own conceptual entity. Then depending on how much we want to go into personal data, a Pod within that server.

jeswr: Please elaborate more on Validation / Key Material

acoburn: One example. You have a token issued by an openID provider. They give you public keys that are used to get access to a server. Another example is a validation that the data you put into a system is the same data that you get out of it. Another example is using HTTP Headers to prevent tamper protection.

bendm: Is this key material for both identity and signing the data itself

acoburn: Yes, both

laurens: What about profiles

acoburn: This somewhat falls under validation / key material

beau: When you ask about metadata, could it also be data that exists within the resource?

acoburn: I am thinking of something similar to an rdf:Type that could go into the link header of the payload response on the data resource. This gives a hint of what it is.

bendm: But the distinction is that the metadata is in the data resource. The assumption is that if you delete the original resource then the metadata resource also gets deleted.

acoburn: Yes, also the possibility that you have put the metadata resource in there, and now you want to edit those metadata labels. There may be constraints around what you want to do and how you want to do it. You should be able to do this without needing to re-upload your resource

bendm: There may be a case that you have so much metadata that it becomes a resource in itself. Or that the metadata becomes more important, and that you want to delete an image and keep its resource

wonsuk: What about IDP servers

acoburn: We are not putting IDPs in this list right now, because we can cover this when we get to the AuthN section

bendm: Note that ACL rules aren't part of the resource server at this stage

acoburn: Ultimately the resource server needs to accept access tokens that need to be validated against something.
… there may be some rules that need to be handled by an authorisation server or many layers of authorisation servers

bendm: Validation is both on data level and on token level, does it make sense to keep both Validation and Key Materials as the same thing. These feel like 2 different concepts.

acoburn: They could be split apart. My thinking on this a little hazy. In order to validate tokens and other types of credentials we will need some kind of key material

laurens: We are in a decentralised setting, so your trust model is most likely based on public keys. To me the key material is fairly related to PKI.

bendm: This feels like it gets into Verifiable Credential territory as well

laurens: They use similar infrastructure. Public keys and trust list, but this is more something that is happening on the client side for verification rather than the server side for verification.

bendm: We could have VC or wallet services that do this

wonsuk: When we use WebIDs or DIDs we need to use the WebID or DID document right? Could the WebID document be considered one of the resources.

acoburn: I would describe this as part of the key material part - this is why laurens similarly mentioned profile documents.

laurens: Yes, the profile resolves to the key material

acoburn: I am also thinking specifically about Controlled Identifer Documents which have a JWT keyID

ryey: Does this also include validating that the data.

acoburn: I am thinking about the content of a data resource as a black box.
… I am not thinking of constraining in any way what you say inside the resource.

ryey: If we are thinking about authentication or identity this may be an issue if we want extensibility between documents. E.g. saying profileA sameAs profileB

acoburn: If we want to say that AgentA is the same as AgentB, then we just need the Authorisation system to recognise that they are the same, then you just need to produce a token that says these agents are the same thing.

laurens: An interesting question you could ask here is if ACL rules are managed and stored by the storage server or a separate service
… You could in thery have a simple interface which splits out the ACL to a different server. Though this can introduce performance issues.

acoburn: These ACL rules are part of an AuthZ server which is conceptually different from the resource server, even though many implementations put them together in practise.
… I wanted to highlight these as a conceptual entity because we have historically used the acl link relation in Solid. So we would need some concept of an ACL resource which we are linking to

laurens: Then how many types of metadata resource do we want to have

acoburn: In theory we could have arbitrarily many, in practise - I want to say there is 1 kind of metadata source, it has this semantic, and it acts like this. implementors can choose to add more.

bendm: Are you talking about really having 2. One ACL and one non ACL metadata.

bartb: Why do we need this? Data discoverability?

acoburn: The reason will become clear when we get to data discovery and query. If we have a common set of metadata, there are certain types of query structures and data discovery structures that we can enforce or require to make this easier.
… if you have a metadata resource and a particular way of descibing its type. then you can create a type index and use that across the pod and every resource acts the same way.
… in contrast if the type is going to apply differently between RDF and non-RDF resource, then you may need to have different ways of doing it for things that are not RDF

laurens: Data discovery is very important to the Authorization problem as well.
… One thing I still wonder about is why we have both resources and containers if we already have resources. Could containment of resources not just be metadata?

bartb: This goes back to the question of are we basing this on LDP or not. LDP is very heavily container based.

laurens: Containers are somewhat a stop-gap to supporting authorisation over many resources

acoburn: Here is a potential answer - I want to share data with an entity for everything that is in a particular container heirarchy. Now I want to create a new resource within that container heriarchy. An easy way to do that is if the container is itself a resource and we can post to the resource. If we can't do that because the container resource

is entierly virtual, then what we will end up from an HTTP perspective is something that is very flat.
… the question of which container goes into is ... I worry about the developer experience.
… do we say here is an existing container and you get the developer to post there

ericP: I want to talk about stuff we did with shapetrees and SAI

<Zakim> ericP, you wanted to describe ShapeTrees approach to establishing where to place new stuff

ericP: in that space we replaced the notion of type indedxes with ShapeTrees that say regardless of the type
… we want to find ???. In order to provide a somewhat managable service in which applications that didin't know about each other could coordinate
… we got shapetrees and registries whcih way "for this container, everything in here conforms to this shape"
… what Aaron said is if you have a particular endpoint and evertyhing you put in here is ??? these resources
… This doesn't manage multiple containership, so you can't say that this thing is both a recipe and a desert topping. But it does provide an app surface where people can store pictures, X-rays etc.

acoburn: One thing to keep in mind on the container discussion is that we want to be careful about not diverging too far from Solid. If we have 2 ways of doing something that are equivalent, then we should try and go with the Solid way.

bendm: In GDrive you can post to a container, and then also put it in 17 other containers - do we want similar properties.

acoburn: Oh absolutely. If you look at what they do, whenever you get a resource it has a top-level URL. When you move that resource between containers; the URL of the resource does not change.
… With slash semantics you hit issues e.g. if you give someone access to a resource then if you move between containers then the URL of that resource changes and people with access to individual resources can lose it.
… If you drop slash semantics we don't have this issue

laurens: Part of the issue is also developers applying implicit semantics to slash semantics that they shouldn't be
… if you look it from an more abstract level then you could also think of all containers and resources as UUIDs with metadata

bendm: yes this is what gdrive does right. ShapeTrees metadata could be put on the containers here

laurens: You could also have schema metadata attributes on resources which are used to e.g. enforce validation on write of resources.
… the question here is what semantic restrictions can the metadata impose

bendm: If in v1 we focus on anything like shape validation, VC validation etc. is an optional service - but something that we could add after v1 - then this would be a sound approach
… another question, should containers have metadata.

laurens: Yes, but this raises an issue. Containers are a special kind of server resources. They contain certain information that must be managed by the server (e.g. containment), users may also desire to add their own user information on the server.

<ericP> image in ShpaeTrees Primer which gives some intuition

acoburn: Today in Solid a container resource has both a server managed, and user mangaed information.
… this make it a composite resource. This makes use of e-Tags, and concurrent operations with PUT and POST complicated.
… Consider a container which has 5 child objects and a dc:title field. If you want to change the dc:title field, and someone else adds a new child object at the same time.

laurens: ESS and CSS both have complex logic to support this

bendm: Can't you have 2 files (server managed and client managed) for write, and a composite one for read

laurens: Yes. But we don't want to semantically mix the two (have a composite) for write operations.

bendm: It seems difficult to have different authorisation rules *within* a resource. So lets not try and have a composite write.

laurens: The finest level of granularity that we should manage access controls on are the resource level.

acoburn: Suppose you only want to give access to certain parts of a resource, it would make sense to split this resource.

ryey: For containers we're not thinking about transitive things?
… for example container A contains container B which contains resource C.
… does A then also contain C.

acoburn: Might be simpler to see that as a projection/view on the data.

bendm: In UMA you have this notion of scopes. You could have authorization rules to apply to certain scopes (groups of resources).

acoburn: Flexibility might make things harder.
… I would probably have every resource point to some ACL resource.
… Implementers could choose to have multiple resources point to the same ACL.
… I wouldn't prohibit that. In general that might hinder extensibility.
… The simple approach might be the most logical, and other approaches may be allowed.

bendm: If a resource can be part of multiple containers, that might also resolve this.

acoburn: If the container information is part of the metadata of the resource
… imagine in the simplest server a resource is only part of a single container and you get an error when you change that metadata.
… In the more complex implementation you could move the data between containers.
… Even more complicated would be to have multiple contains relations in the metadata.
… This would open the door without requiring implementers to support this.

acoburn: Let's move to HTTP operations on these conceptual entities.
… For some we might not care because they are not addressable (e.g. credentials).
… So how do we approach that for each of these conceptual entities.
… Let's start with the data resources, these are the simplest.

laurens: Some discussion I still want to have is whether we want to have different protocol bindings (e.g. GraphQL, ...), but let's keep to REST for now.

acoburn: There is this rough structure in the protocol right now, where we might start with these conceptual entities and operations. And then provide a binding to a REST API, other bindings could then potentially follow.

acoburn: For data resources GET (read), DELETE (delete) and PUT (update) might makes sense.

laurens: What about PATCH for updates?

acoburn: I wouldn't prohibit it, but it brings complexity.

laurens: It does again entail additional restrictions on the syntax and semantics of the data contained in the resource.

eBremer: Why don't we at least support a binary patch?

acoburn: We could rely on standard HTTP for this, e.g. the Accept-PATCH header (https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/Headers/Accept-Patch)

laurens: Atomic updates may be a good use case for the patch operation. But this could be resolved by ETags and the If-None-Match HTTP header on a PUT (https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/Headers/If-None-Match)

acoburn: There's three ways to create a resource in Solid, PUT, PATCH or POST.
… I think we should define one method to do that.

laurens: (chair hat off) We probably should not be imposing anything normative on the contents of data resources.

ryey: But then what about the Linked part in "Linked Web Storage"?

acoburn: This will probably shine in the metadata resources.
… What about POST for creating resources?

ericP: What about creating a resources in a certain place (PUT) vs letting the server define the location (POST)?

acoburn: What if the server does not allow the user to choose the location?

ericP: PUT is usually lower level than POST. POST delegates that authority to the server.

bendm: If we limit the writer's decision power to put a resource in a certain location, that might be very restrictive. But then we should allow alias creation for flexibility.

acoburn: PUT typically tells the server you must create a resource at a location. POST is more suggestive, it asks the server if it would create a resource at a specific location (which could not happen, and a different location may be chosen by the server).

acoburn: If containment is a metadata property. Imagine you had resources that looked more like google drive, where everything is an opaque string. And the resource containment is (potentially mutable) metadata. You could create a resource in a container (e.g. POST into that container) or at a global location of the storage (and then add metadata

as a request header of where that resource should go).
… Theoretically they are the same, one of them looks more like what Solid currently is.
… If we want to do something different from Solid, we should have a good reason.

gibsonf: Suppose writing a resource to a storage is like writing to a named graph.
… In this case we would have no containers.

acoburn: There's no reason you couldn't do that with what is being discussed.
… There's a strong set of use cases to group resources in a certain way.
… The solution to this has historically been containment, and this should likely continue to be possible.

laurens: I think you should indeed try to retain compatibility with Solid, but you should put the authority on resource naming with the server allowing the user to make suggestions (e.g. through a Slug header). This way you can retain compatibility but also have clearer authority of resource identifications with the server.

acoburn: The target resource of a POST would be a container. The target resource of a PUT would be a data resource.

ryey: I cannot thus choose the location of data but just suggest it.

acoburn: This way you could also still support resources that have been created under slash semantics.

eBremer: Does the resource then have to exist?

acoburn: I wouldn't mandate this.

acoburn: Let's move on to containers.

bendm: Have we concluded whether a PUT can create a resource or not?

acoburn: Do we agree that a POST on a container can be used to create a resource.

laurens: I think so.
… I don't think we should grandfather in slash semantics on new resources.

gibsonf: TrinPod does not have a filesystem. We have a triple which has the path information.

bendm: Then I do think you need an affordance for moving resources.

acoburn: On large file uploads, eBremer has looked into aspects of this.
… that could be layered on top of this approach in LWS.

acoburn: So, PUT to create resources... do we strike this or keep it?

acoburn: I think we strike this for now, and we can always put it back in later.

acoburn: Let's continue with containers now.
… Let's start with a READ on a container.
… We should consider pagination, content types, ...
… Do we use HTTP GET for this.

laurens: Agreed.

ryey: Is a slash required then?

acoburn: I haven't seen anything that requires a trailing slash yet.

bartb: How do you then know what a resource is by its URI?

laurens: You can't, a GET or HEAD response should tell you.

acoburn: What with DELETE operations?
… important here is recursive deletion.
… Currently we don't have a recursive delete in Solid.
… This is a developer nightmare, because of manual cleanup.

laurens: Why not have explicit semantics for recursive deletes, and do a non-recursive delete by default.

eBremer: WebDAV has a level header for this.

acoburn: We'll have to come back to recursion.
… If containers are entirely server managed, there's no update operation.

laurens: And then for the creation I would propose a POST with a type link relation header defining it is a container.

beau: Could you POST to a container that does not exist?

acoburn: No, you would get a 404.
… With the Solid specification you cannot do that.

acoburn: Now let's proceed to metadata.
… I think that create and delete are no-ops. Because the lifecycle of the data and metadata resource are tied.

bendm: What about server managed vs client managed metadata. Have we considered that yet.

acoburn: Not yet.

ryey: Are we considering metadata over metadata?

acoburn: I wouldn't.

ryey: What about multiple metadata resources.

acoburn: I would specify any resource must have at least one metadata resource.
… eBremer has already looked into work on HTTP link sets (https://datatracker.ietf.org/doc/rfc9264/)

ryey: Do metadata have ACLs over it?

acoburn: I would go so far as to say that they are governed by the ACLs of the data resource.

ryey: We have some use cases where this might be differently governed.

laurens: How is this today in Solid?

gibsonf: Today you can have separate ACLs on metadata.

laurens: You could have a metadata resource be a specific type of auxiliary resource governed by the same ACLs and lifecycle as the data or container resource.

acoburn: For the READ of metadata, a GET operation?
… For the UPDATE operation it would make sense to use a PATCH operation.
… Another contender might be PUT but this gets complicated with server managed metadata.

ryey: Containment is part of the metadata then?

ebremer: The RFC does not specify anything required about PATCH for link sets.

acoburn: For updates on the ACL, what do we do there? Both PUT and PATCH could work

bendm: Let's park that for now.
… And what about ACLs on ACLS,

laurens: We have to watch out that we aren't getting ahead of ourselves in terms of authorization.

Scope of Query in LWS

laurens: Now we are at the last part of today. My opinion is that LWS shouldn't know the contents format of the data. That means, we shouldn't query on the content of the data. Or maybe we want? Worth discussing that, including indexing, shape trees, etc.

bendm: I feel that if we agree as a group, if we excluded patch based on content type, we should do the same for query.

laurens: yes indeed

bendm: so that's mainly a question of consensus
… I want that, but it doesn't necessarily be in the protocol.

acoburn: yes, supporting mechanism to allow that, but not mandate.

gibsonf1: we are now having issue with searching in thousands of thousands of pods. we can do permission scoped query on server.

bendm: so ACL is separate from search / query service, right?

bartb: so it's actually based on the types of data, right? E.g., for medication, it may related to metadata as well

laurens: we may want to specify what is contained in the metadata tomorrow

acoburn: so for bendm's point, having "query over resource content" is out of the scope of the protocol itself, right?

bendm: the query doesn't need to be in the base protocol. from gibsonf1's case, is querying by types supported, even if not the data content query?

gibsonf1: no. In our view, metadata and data content are the same thing. If you prevent one, you prevent both.
… e.g. for genetic search, finding all patients of a certain criteria require accessing the data content

acoburn: the question is that should *all* services support querying over the data. The query can be efficient, but that's not the question now.

eBremer: I can have metadata about shape, content, type, etc of a data. Is that metadata resource?

gibsonf1: you have enormous information about a patient, e.g. their age, etc. They are not metadata, at least I don't think so. You need to search across them.

eBremer: we need to delineate. ... DIECON (?) is a format, but that's different

gibsonf1: I don't care about the format. I'm concerned about the actual information contained. So, I care about content search, because those information will be needed in our case

acoburn: with my chair hat on, we are specifying things that apply to all servers. Some things can be addons, and we don't have to support that, due to resoruce and time constraint

gibsonf1: On that, I support. But I hope we can specify the capability of the search, so services can specify the capabilities, to improve interoperability. Something like a scheme.

acoburn: we didn't get to the "storage-level metadata" earlier. Let's assume we can get to that tomorrow.
… In Solid spec, it has some storage metadata, e.g. controlled identifier document
… if we want to use that, it's basically an array of services. E.g. one particular service is of a certain type, e.g. query service. Then, you go to the URI of that service, to access that type of service.

laurens: yes, we could have a separate working group specifying that, if needed

acoburn: yes. and here, we only need to specify what services and where they are.

gibsonf1: how about we define a "must" list of services as well

laurens: yes. i don't like in some communities to define some extension points, but they are not really used in their specification in the end
… I hope we can eat our own dog food. if we specify something, we use that. And, again, yes, fully agree

acoburn: how about defining as "can a service live without a part of these services"?

laurens: I would counter that. we are talking about query and data. i wouldn't have a generic "data discovery" service. But we should have a authorization service, etc.

acoburn: I'm not saying only these three. But these three are main ones. Todays, there is a lot of uses of Type Indexes (for RDF), especially client-managed ones, likely to be compromised. It's great to move it to server to make it authoritative.
… I'm saying Type Indexes becasue it's typical.
… where does it belong?

laurens: it may be on the boundary between data and authorization. Let's reserve for tomorrow.
… the word "authorization server" is overloaded, e.g. OIDC, etc. Better to give it another name.

bendm: let's do it tomorrow.

acoburn: one more on this. assuming a legitimiate type is server managed. you can legitimately say "give me the list of containers / resources"
… because you have a token and server-managed. You can easily say that.

bendm: now TI in Solid is on RDF type, and pretty public.

acoburn: If type is included... if the metadata resource contains some server-managed type (e.g. it's a container), and some user-managed type (e.g. a uri representing a photo albumn). you can look up that on a given service.
… the amount of data that have such types is rather limited, so not significant burden for serve.

bendm: if user managed types, will there be a jumble of types, e.g. some say "photo", some say "image"

laurens: but that at least solves the issue of what people use directory layout for

bendm: yes, but just to worry about the potential of having a lot of such cases of "photo" or "image"

acoburn: that saves different apps to find the resource. It supports interop in fact.

gibsonf1: we have server using a general ontology type... it's extremely powerful to search based on that

bendm: I can imagine that, definitely
… do we need to support server types that you must support? And MIME types? Or is that too much detail?

acoburn: I think we can start with that.

laurens: we may get to that tomorrow

bendm: this would allow you to create services to do what gibsonf1 was discussing, right?
… you have an index, and you can pull that within your own query service, right?
… Is it on service level, or storage level?

acoburn: well... implementation is in service level, definitely. the pointer to those services should be on storage / pod level.
… you may want to support multiple services of the same type.

bendm: I'm worried the service provider will force you to use a particular service for a particular type

bendm: the CID is in service level, right?

acoburn: I would imagine that. But we may allow flexibility
… e.g. one being storage type index, one being federated type index

laurens: e.g. metadata is RDF LD. TI could be everything with predicate `rdf:type`. That's a simple solution.
… simple solutions would try all things. Complicated solutions may consider more. But not that complicated in the end, because they are governed by the same ACLs

bendm: does that have any effect on writing side?
… you can imagine services for data validation. Where should I post data to, if TI is considered?

laurens: you can have additional metadata triples, as a simpliest solution.

bendm: or you want the service to do server-managed metadata

acoburn: there are some existing specs we might want to build on

jeswr: e.g. Inrupt has QPF endpoint, with named graph. For metadata, is there something we can easily access the metadata? E.g. as named graphs?
… for query over resource metadata

acoburn: we may want to clarify that metadata query here is specifically about the metadata of the same data.

<bendm> some recent blog post of my colleague Pieter Colpaert on graphs and semantics etc https://pietercolpaert.be/linkeddata/2025/09/30/named-graphs

jeswr: still for QPF, if I do a xxxxxx query, what would be the result?

acoburn: ESS use named graph is because the subject doesn't have to be in the same location as it is, which can lead to overrides. Here, metadata is about a particular resource, which likely doesn't have the same issue -- you can simply rely on the subject.

jeswr: so all metadata is in default graph response from servers?

acoburn: we are discussing something never existed before. we are discussing how it should implement

jewsr: QPF has an endpoint of named graphs. When you think of all RDF graphs in a Pod, it's just named graphs on the Web. I'm just trying to think what the meaning of this is.

gibsonf1: I see each pod is a graph, with resources potentially somewhere else. So putting named graphs into the graph doesn't make sense, if the pod is already a named graph

bendm: if you put it in that way, it may be quite problematic...? not something we can figure out in the next 15 min. Maybe worth discussing that later with use cases.

acoburn: it's easier to think from servers if you have VC, then you have named graph inside named graph, which is a violation. Or you have to figure out how that works.
… If you want to index that, you need to figure out how to do it.

bendm: before concluding, gibsonf1, does it allow querying over content of resources already, or do you need more?

gibsonf1: on search, we already support billions of triples. currently you specify which service you search, and then the permission check. we want to have a service that first checks who you are. But maybe for SPARQL, it's not a good idea. At lesat for TI, we don't do it, we just want to know what types are there, so we can quickly find them.

acoburn: there are types of queries that cannot be satisfied with TI. I'm fine with that
… we want to figure out today which of these should be required

bendm: I think you can do these queries with TI
… gibsonf1, do we need more?

gibsonf1: Apart from TI (or, type search), we need at least paging

laurens: I would call "metadata property index" or something else, rather than type index.

acoburn: sure. using TI is just for easy understanding.
… So TI is a must, querying over resource content is optional, and query over resource metadata is in limbo?

jeswr: I don't think TI should be must; we shall develop more docs for that

bendm: cccc ... we need something else, but something to start building TI is a must

gibsonf1: something to build predicate type?

ericP: TI is what the world can see?

acoburn: no, only what a particular agent can see
… you have access control for only a container, with different types, you'll get result 0

ericP: so everyone enters will get a different result?

laurens: yes. It's easier than to allow arbitrary queries with ACL
… at least
… you can apply cache, etc

acoburn: so do we agree TI is a must for ACL rules?
… any objections?
… no
… query over resource content is out of scope, but not prohibited

yes
… yes

ericP: clarification: is that for a specific API for metadata, or general?

ericP: about what's expressed in the query, for querying over metadata

acoburn: I want to be high-level, rather than details. we want to know what should be included, and later the mechanism

acoburn: for querying over resource metadata, skip?

laurens: put it as "may"

ericP: do you mean for the off-resource-server authorizatin?

gibsonf1: e.g. someone has 50m things, here is the place you can search, how to do that?

ericP: you are exploiting the server capabilities

gibsonf1: you want to do that, is on the implementation side. You can't have authorization and search being completely separate, otherwise too inefficient.

acoburn: we don't want to discuss ACL today for TI

https://tinyurl.com/2mx262z8

rssagent, draft minutes

Google drive with slides & pictures: https://tinyurl.com/2mx262z8/

– DRAFT –
LWS WG Face-to-Face Meeting (Day 1)

08 October 2025

Attendees

Meeting minutes

Introductions

Defining core operations of the LWS protocol

Scope of Query in LWS

Diagnostics