W3C

– DRAFT –
Linked Web Storage WG

27 October 2025

Attendees

Present
acoburn, bartb, dmitriz, eBremer, ericP, gibsonf1, jeswr, TallTed
Regrets
-
Chair
ericP
Scribe
jeswr

Meeting minutes

Introductions & announcements

acoburn: This week is daylight savings time in Europe. Next week we will be back to normal. Please use W3C calendar to get the canonical time of meetings.

acoburn: We plan to spend the next 2 weeks discussing resource metadata, such that we can then draft spec text.
… We will the do 1 week of storage metadata, and then containment.
… By January we hope to be able to draft language for metadata and containment
… Before we can specify format and content of metadata, we need to agree on the categories of data we want to describe

gibsonf1: How do we define metadata as compared to data

acoburn: We have data resources, these could be in any format (RDF, JSON, XML). When we have binary resources e.g. JPEG, we cannot have self-description in that.
… a metadata resource is a resource attached to a data resource which describes it

gibsonf1: So then the definition of a resource is a file

acoburn: Yes

gibsonf1: I am confused because we can resources that aren't files, e.g. request/response

TallTed: My analogy is - if you have a book the stuff on the pages is the data. The title, copyright, version etc. are metadata.
… this is messy. My advice is not to worry about it too much until things are concrete.
… Thinking about databases. The query is not part of the results set - but the query can be put beside it, and that would be metadata.

<Zakim> gibsonf, you wanted to ask about definition of metadata

TallTed: as acoburn was saying - with a JPEG, the data is what lets you display the image. The rest is the side card, e.g. where the picture was taken, time, what device etc.

<Zakim> ericP, you wanted to propose thinking of metadata as a combo of managed data and stuff that the user has sequestered as "metadata"

ericP: Do we also consider some metadata to be server managed, and other metadata to be user managed.

acoburn: Yes

acoburn: I have categories of metadata here ???. There are also types of metadata that we may also want to allow for, but not require; e.g. memento versioning.
… I will go through them quickly, then we can have a discussion about whether these categories need to be modified.
… First should server managed data be in a container or resource? We have both Solid and Fedora as input documents and ???

<TallTed> "Categories" are "types"!

<TallTed> This list is confusing, because there are multiple layers mushed together, and same/VERY-similar line items repeat.

acoburn: Second is user-managed types in addition to server managed types. E.g. for types-indexes
… Third is storage description resources. In Solid it doesn't tell you what should be in this resource - just that there must be only one. Fedora does not specifically refer to a storage resource, it does have a describedBy relation which comes from LDP.

<Zakim> gibsonf, you wanted to ask about metadata location

gibsonf1: Why are we adding locations to find information about a request, rather than allowing the server to manage that - and the client going and saying "give me RDF about this resource" to the server.

acoburn: This is fine unless you want metadata in the same format as the base resource

TallTed: This is where having a good understanding of HTTP is important. In HTTP the client asks for a resource, which is the URL they identify. They can also include the media types they want to get back; with different levels of preference. The server then returns some representation in a form that that the client also requested - but not
… necessarily. The server is free to do anything that it wants. Servers can also have an internal quality rating which mixes the clients quality ratings with the servers own quality ratings of resources.
… It may be a text, a PNG rather than a JPEG. Again we are in a place of "don't worry about this until it comes up". It is not really easy to answer or understand until we have some real examples to discuss. There are so many possible examples that it doesn't work out to have discussions of them on this call, because everyone has their own pet

examples.

gibsonf1: So if the request somehow says "I want to request this resource BUT I want metadata, then it would resolve the issue"

TallTed: Yes, but everyone has their own definition of "what is metadata for this resource". So it makes more sense to ask for a description of the resource.
… e.g. if you ask for a text type on an image, then you might get a text description of the image. These are all quote/unquote metadata but these are also descriptions of the resource
… We just can't cover this all in hypotheticals - it is untenable.

acoburn: This metadata discussion comes from LDP. Here there are 2 resources a and b, where a is describedBy b. The implication is that they are different resources where one is described by the other.
… Another category coming from both Solid and Fedora is having reference to an ACL which may be on the same or different server.

<TallTed> (I'm guessing that this "Fedora" is not the same as "Fedora Linux"?)

acoburn: Another category is reference to a container in which a resource is currently existing.
… A few from HTTP mediatype etc. which would probably be included in standard HTTP headers
… Fedora is NOT Fedora Linux. Fedora is https://fedora.info/spec/ which is one of the input documents in our charter.

<ericP> Fedora

acoburn: are there any categories which are missing or shouldn't be here

<ericP> could s/Categories/Sources/

TallTed: The current list is confusing. The two first categories are Types, other line items appear to intersect with each other. I would break this into sub-lists starting with server-managed and user-managed. These will be special cases in and of themselves. For example in the case of size, server-managed might be down to the type; user managed

might be big and small.
… In the case of creation time - is it when a file was actually created, or is it when it was copied onto the linked web server
… in the case of author is it client, legal entity that created it etc.

acoburn: Yes I will put into server-managed vs. user-managed categories. Then there is ambiguity, e.g. size is often a function of the payload that the user has created. So they don't control it; but they quasi do.

dmitriz: The other categories I would love to see are (1) replication of auxilliray document (2) for encrytion (3) linking to an access log

acoburn: Do you want this on all implementations - or something a particular implementation would add in a specific way

dmitriz: The latter - so in this documents parlance they would be extensions

<Zakim> ericP, you wanted to ask for replication clarification

ericP: Is chain of custody, something that would be replicated place to place

dmitriz: The provenance of authorship would be one thing. Here I am more interested in where copies of this file exist, e.g. on my home server
… we haven't dealt with replication in Solid much before, but it has always been a missing piece.

acoburn: Another point I wanted to discuss is the relationship between metadata resoures and headers.
… e.g. for size, should that come back as content-size in the HTTP headers

<dmitriz> here's an example of Replication related settings for a replicating db like PouchDB: https://pouchdb.com/guides/replication.html

acoburn: for these, often we have used link-headers in Solid and the Fedora project.

<dmitriz> think of it, at base level, as a git-like list of "remotes"

<ericP> +1 to simplicity of single source of truth

acoburn: a minimal approach is that anything in metadata resources is not included in HTTP headers, you just link to the external resource; and all metadata is included in there. Both Solid and the Fedora project do that. Neither Solid nor Fedora have much to say about the format of it; how do you modify it etc.

<TallTed> possibly surprising weaknesses in HTTP -- content negotiation lets the client request any media type, and the server respond with any media type, but content size is always handled in bytes, notably including for paging, which is very unhelpful when working with RDF or CSV or TSV or various other "data" media types

acoburn: a unified approach is that anything you put in a metadata resource is added into the HTTP request. The major con is that you are then constrained on the values that you can put in metadata resources
… the split approach is that one type of metadata resource is used for content that will go into HTTP headers; and another metadata resource is for content that will _not_ go into HTTP headers

TallTed: Another challenge comes from a weakness in HTTP. The client can request any media type, and servers can respond with any media type. Size in only in bytes; therefore you cannot say "give me record 16, or row 16".

<ericP> yeah, but HTTP's concet-length is a transfer parameter

<Zakim> gibsonf, you wanted to comment on location

gibsonf1: One option is to specify a location. Give me metadata on this thing and then ???

TallTed: The deployment needs to know what to give you back when you ask for metadata on a resource; but you are not going to know what the deployments set-up was. The server will then go and say "sure, here is what the metadata is", you will go that is a bunch of gobbildygook and I just wanted size, media type and something else.

gibsonf1: But here (in the LWS group) are we not defining exactly what metadata is

TallTed: Here we have a shortlist of what metadata is. But someone else will then say "no, that causes limitations - here, here and here."
… We need to not be so boxed in that other people cannot extend them within the bounds of LWS
… There is a lot of challenging stuff here without concrete examples. These concrete examples would take a lot of time to pull together. Some are in the UC&R's, but we have not gotten very granular on many of those.

<Zakim> acoburn, you wanted to talk about the requirement for self-descriptive and discoverable APIs

TallTed: We would probably see resource size. Likely in bytes because that is what HTTP supports, but not much beyond that.

acoburn: I want to bring us back to self-descriptive and discoverable API's
… If you specified a particular location; then someone needs to read the specification to know what the location is. This goes agains the principle of having self-descriptive APIs within the specification.

<gibsonf1> +1

acoburn: I think we should be really careful about specifying locations

<gibsonf1> +1 on having metatdata location in the header

<eBremer> https://datatracker.ietf.org/doc/html/draft-ietf-appsawg-uri-get-off-my-lawn-05

acoburn: There is an IETF specification about being very careful when specfying resource URI's and instead using discovery

<dmitriz> speaking of IETF specifications, there's a decent one on Linksets (which are basically a format to express all these auxiliary resources)

<dmitriz> https://www.rfc-editor.org/rfc/rfc9264.html

<ericP> "URI Design and Ownership"

acoburn: Are there thoughts on how rich this data should be and how we should link to it
… one option is that we have a link-set resource which contains metadata that is going to influence what comes back on a get request
… there could be a link-set which links to the location where that link-set is managed
… in addition you could have an RDF or JSON resource which is pointed to by describedBy
… the Solid specification currently prevents the situation where there are two describedBy resources
… what does this group think about this kind of a structure.

<dmitriz> (and fwiw, the Linkset RFC is json-ld enabled)

<eBremer> dmitriz - yes see: https://www.rfc-editor.org/rfc/rfc9264.html#name-the-linkset-relation-type-f

<eBremer> https://www.w3.org/TR/json-ld/#interpreting-json-as-json-ld

<Zakim> gibsonf, you wanted to say for link header example, its super complex

acoburn: To exaplain Link-Sets [RFC9264]. Which enables you to go between HTTP headers and a JSON document.

gibsonf1: It seems to me that if we are talking about metadata, and assume the metadata is in RDF and you can do anything you want with it. Then we would be solving this in the simplest and most interoperable way.
… wheras if we pick one of these things like RFC9264 then it gets complex

ericP: It is challenging to have a document that is both server and user managed when you run into conflicts. Keeping those both separate is useful. If you want to point to the user-managed document; then we are back in the space of how we point to it.

gibsonf1: Here you assume that is a file, I am not sure why you make that assumption. It could be coming from a database etc.

ericP: If it is user-managed metadata, then they have rights on that as well.

gibsonf1: I don't see that as an issue either. Wouldn't the user just do a write request with whatever predicates we define, and the server knows how to handle it.

ericP: If a subset is server managed, does this then change the notion of the data that the server has got?

gibsonf1: From my implementor persepctive; someone requests the metadata on a resource - and the server responds with both server and user managed metadata
… there is no issue from the implementation side. You just send all of the metadata.

acoburn: Let's wrap it here for the week.

<ericP> ADJOURNED

Minutes manually created (not a transcript), formatted by scribe.perl version 246 (Wed Oct 1 15:02:24 2025 UTC).

Diagnostics

Succeeded: s/necessarily. The/... necessarily. The/

Succeeded: s/dmitri/dmitriz/

Succeeded: s/present+ dmitri/present+ dmitriz

All speakers: acoburn, dmitriz, ericP, gibsonf1, TallTed

Active on IRC: acoburn, bartb, dmitriz, eBremer, ericP, gibsonf1, jeswr, TallTed