Jump to content

RDF Messages Task Force/Meeting 2026-02-13

From RDF Stream Processing Community Group

Attendees

  • Piotr Sowiński
  • Anastasiya Danilenka
  • Anh Le Tuan
  • Tobias Schwarzinger
  • Jean-Paul Calbimonte
  • Robin Keskisärkkä
  • Rafael Martin Lesmes

Agenda

  • Fill in attendance list
  • Agenda overview
  • Pick a scribe
  • https://github.com/w3c-cg/rsp/issues/9
    • Review of the changes introduced
    • How to rephrase the definition of an RDF Message?
    • -> after rephrasing, close issue
  • https://github.com/w3c-cg/rsp/issues/10
    • Review of the changes introduced
    • -> if OK, close issue
  • https://github.com/w3c-cg/rsp/issues/6
    • Review of the changes introduced
    • -> if OK, close issue
  • https://github.com/w3c-cg/rsp/issues/4
    • Backstory – current status on the side of JSON-LD WG
    • Review of the current draft in RDF Messages
    • -> decide what’s next
  • https://github.com/w3c-cg/rsp/issues/1
    • Brief summary of the idea for RDF Message Stream profiles
    • Current candidates: LDES, SOSA/SSN, SAREF, ActivityStreams
    • What should be within the scope of profiles?
      • Message shapes (using SHACL) – required, optional?
      • Specifying the timestamp property, if present
      • Procedure for enveloping an RDF dataset into a compliant message, if given also a timestamp – required, optional? Note that enveloping is required to “lift” streams of raw RDF data into something importable into a triple store without losing information.
      • Whether or not the first, second, etc. messages have any special meaning.
      • Scope to which messages should be asserted (e.g., for message N, we always assert it together with message 1, which provides the context).
    • Technical choices
      • How should the profiles be advertised, i.e., how do we know which profile is used in a stream? Define something for HTTP and leave this open for other implementations?
      • How should the timestamp property be specified? Leave it open? SHACL shape? SPARQL fragment?
    • -> gather feedback to write the first draft of the section on profiles
  • Any other matters

Action points

  • Piotr: Update list of authors in the spec (+ Jean Paul, \+ Robin, \+ Tobias)
  • Piotr: File PRs to fix the things that were discussed
  • Piotr: organize the next TF meeting in a few weeks. To discuss:
    • Revisit the new RDF Message Stream definitions after the last iteration
    • New syntax for Turtle, N-Triples, N-Quads
    • YAML-LD / JSON-LD – are we ready to talk to the JSON-LD WG?
    • RDF Message Stream profiles
  • Text:
    • Message scope – point to profiles as defining the possible interpretation scope
    • Invert the assumption (we do not assume they are in the same world)
    • More – see the minutes

Minutes

Scribe: Anastasiya Danilenka

  • Piotr Sowiński: go through issues on GH, see if we can close any during this call

RDF Message definition [1]

  • Piotr Sowiński: issue \#9: definition of RDF message is unclear. Jean-Paul pointed out that the message scope is unclear. Piotr restructured how the definition works. We now have a separate section for scope. It can be found in the spec. Is it good enough?
  • Jean-Paul Calbimonte: it is ok. We assume that what is asserted in one message is not asserted in other. It is really up to consumer how to interpret messages, and it is the default behaviour.
  • Piotr Sowiński: maybe we can say interpretation depends on the profile?
  • Jean-Paul Calbimonte: it is just a detail, not something major.
  • Robin Keskisärkkä: we do not assume they (messages) do not exist in the same world. It is okay to assume either. (see message scope action point).
  • Jean-Paul Calbimonte: it is not wrong, and an interesting thing to consider.
  • Robin Keskisärkkä: always keep in mind blank nodes. If messages are in the same world – blank nodes are affected.
  • Jean-Paul Calbimonte: by definition of RDF Dataset, you cannot assume anything about BNs, even if they describe the same. We shouldn’t take any assumptions, it is up to the consumer.
  • Jean-Paul Calbimonte: we do not have anything new, it is like an alias to the RDF dataset.
  • Piotr Sowiński: integrated Jean-Paul’s note on size.
  • Robin Keskisärkkä: I like the relative size definition.
  • Jean-Paul Calbimonte: messages may be large in terms of bytes, but small in triples. The new phrasing is OK with that.
  • Tobias Schwarzinger: why do we need that? To tell people that messages can be small?
  • Piotr Sowiński: context, motivation. It is not a necessary note, but nice to have.
  • Piotr Sowiński: why can’t we refer to a message? Spec now says we do not provide any way to refer to an RDF message, you can instead refer to resources.
  • Jean-Paul Calbimonte: the previous wording could have been interpreted as we forbid referencing RDF messages.
  • Piotr Sowiński: Spec states that RDF message is RDF Dataset. I do not know how to rewrite that to adapt to the comment
  • Jean-Paul Calbimonte: just say that the RDF message is an RDF dataset. We can leave it as it is now.
  • Tobias Schwarzinger: just repeat the intention from the issue. Currently, it might be too short to grasp the idea for first-time readers.
  • Piotr Sowiński: rephrase the second sentence. But make sure they do not contradict each other (the RDF message does not contain the RDF dataset)
  • Piotr Sowiński: Will rewrite the mentioned parts and will clarify the RDF Dataset part.

RDF Message Stream definition [2]

  • Piotr Sowiński: clarifying RDF Message stream.
  • Piotr Sowiński: Removed mentions of instances. Rewrote a few things. Streams are close to academic definitions. Cleaned up sections.
  • Piotr Sowiński: Section 1.3. \+ Note on any streaming protocol and implications for the RDF message stream. Flow control can also be anything you like.
  • Robin Keskisärkkä: removing instances. One consumer \- one RDF message stream. The “specific consumer” is concerning. Sounds like only one consumer can consume the stream.
  • Piotr Sowiński: we should clarify it. Stream is in motion, it is produced and consumed in motion. While the consumer is reading that – we have a stream. Maybe we can note Kafka Streams and many consumers.
  • Jean-Paul Calbimonte: that’s where the instance idea probably came from. There is a stream is living there and can be consumed by different entities. It is certainly a different way of defining streams than we had initially.
  • Jean-Paul Calbimonte: we say stream carries messages from one specific producer to specific consumer. We need to check that.
  • Piotr Sowiński: Kafka is a good example. You can read a stream from half an hour ago. Kafka Broker is the producer then.
  • Jean-Paul Calbimonte: maybe that was the intention behind the instances.
  • Piotr Sowiński: can we just have instances and not the overarching idea about the stream?
  • Jean-Paul Calbimonte: in issue wanted to get rid of the “instance”
  • Anh Le Tuan: what will you require for the stream? You want the mechanism to pull back the data? E.g., you go to Kafka and get data back, and they send data to you. There are two ways to get data – endpoint can provide you a stream of data, then you subscribe.
  • Piotr Sowiński: producer exposes the endpoint. When a consumer comes – we create the stream ad-hoc. If we limit discussion to one stream that happens right now – it is simple.
  • Anh Le Tuan: We should not specify the consumer. There might be one/many/no consumers.
  • Robin Keskisärkkä: can an RDF stream exist before it started producing/delivering messages?
  • Piotr Sowiński: it doesn’t, unless there is a producer and a consumer, so it is actually being transmitted.
  • Robin Keskisärkkä: stream CAN become available. If you signed it, authed, and setup correct communication protocol. Then stream definition comes into force. Or do an abstraction on top. Because protocols also have specifications for that.
  • Piotr Sowiński: in MQTT you can isolate that. Say sensors report to a message broker. But you have 3 consumers attached to the broker. So consumers may see 3 different streams, with differing orders or possibly missing messages.
  • Anh Le Tuan: it is worth knowing who consumed the stream? You might need to know that you delivered the message to the right person.
  • Anh Le Tuan: why specify one specific consumer?
  • Piotr Sowiński: (in MQTT example). 3 consumers, maybe some messages will be dropped. If streams are independent, there will be no problem. It is up to the broker and the consumer to decide what is acceptable.
  • Jean-Paul Calbimonte: if we focus on existing producer-consumer. How does it work with a stream that produces messages, how does it work with queries?
  • Piotr Sowiński: A streaming query processor works on a single instance of the stream.
  • Jean-Paul Calbimonte: then we have 3 different message streams for which queries are executed.
  • Piotr Sowiński: and 3 different results.
  • Jean-Paul Calbimonte: it explains dropouts and other things that impact query results.
  • Piotr Sowiński: if there are strong guarantees (gRPC), then there is not issue like that
  • Jean-Paul Calbimonte: it can create confusion.
  • Jean-Paul Calbimonte: say, we use SQL, SPARQL, they will refer to 1 stream, there are not multiple streams. They refer to metadata. We refer to actual messages that are streamed.
  • Piotr Sowiński: stick to what is streamed. We can explain this stream identifier as the producer/endpoint.
  • Jean-Paul Calbimonte: maybe it is for another place, where we discuss producers/consumers. There are already concepts like that. When we get there, we can do that.
  • Robin Keskisärkkä: notion of the endpoint is a good one. Depending on the characteristics, it will differ.
  • Piotr Sowiński: there will be another iteration of this discussion (see the action point)
  • Robin Keskisärkkä: first do the definition and an issue, where we can discuss that.
  • Piotr Sowiński: there will be PR for discussion
  • Tobias Schwarzinger: Maybe we can first define message streams in a mathematical way. And in other sections talk about producers/consumers. Like “unbounded sequence of RDF messages”. Think mathematically, w/o producers/consumers.
  • Piotr Sowiński: supports the abstract definition.
  • Tobias Schwarzinger: here instance may come in. Two systems talk to each other.
  • Piotr Sowiński: will try to put into words in the PR. We may defer this to the next iteration of work on the definition, though.

Message vs quad-level streaming [3]

  • Piotr Sowiński: added an introduction. Message-level streaming. Is it enough?
  • Robin Keskisärkkä: there can be one quad, right?
  • Piotr Sowiński: yes
  • Tobias Schwarzinger: insert “may” into “contains” sentence
  • Piotr Sowiński: we will close the issue after the small correction
  • Robin Keskisärkkä: should we say “each message may contain multiple triples or quads”?
  • Piotr Sowiński: agreed

YAML-LD and other RDF Message Log serializations [4]

  • Piotr Sowiński: likes YAML, little experience with YAML-LD. JSON-LD WG wants to standardize YAML-LD (in the charter).
  • Piotr Sowiński: you have additional syntax for YAML streams, with three dashes separating YAML documents in a stream. Q: how does it translate into JSON-LD. Previous discussion in JSON-LD CG leaned towards RDF Messages streams, but ended up with a design where a YAML stream becomes a JSON array.
  • Piotr Sowiński: you can specify two JSON-LD contexts in one JSON-LD document and this is how YAML-LD is interpreted right now according to the draft spec.
  • Piotr Sowiński: we instead propose to transform each YAML-LD document into one JSON-LD document. We separate the JSON-LD documents with newlines (NDJSON-LD).
  • Piotr Sowiński: we want to keep YAML-LD aligned with RDF messages, so the newline approach is better for us.
  • Robin Keskisärkkä: our approach is much more convenient.
  • Tobias Schwarzinger: we introduce asymmetry between Turtle and JSON-LD. Current Turtle parsers can parse, but JSON-LD cannot parse a serialized log. Somewhere in the spec it says that Turtle parser can read but can interpret it differently. Is it important?
  • Piotr Sowiński: it is. RDF-XML example is up for discussion. One XML per line. Basic parser won’t work on this example. So there is indeed asymmetry.
  • Piotr Sowiński: maybe turn the message delimiters in Turtle (currently comments) into directives?
  • Robin Keskisärkkä: only important if we want to consume the whole log. So if we consume a message – it already works.
  • Piotr Sowiński: you can split the document by “@message”, split by line.
  • Robin Keskisärkkä: for consumer, you use a standard parser, for the entirety of the log, you need some changes to parser. E.g., for replaying.
  • Tobias Schwarzinger: it is a question of a use case. It is good to prevent regular parsers for reading entire logs, they can interpret “wrong”.
  • Piotr Sowiński: true. Parsers must rename blank nodes for each document they read.
  • Piotr Sowiński: we need to introduce a directive. What RDF 1.2 introduces in terms of directives. Maybe something in this style. Also, explain why we want to do it this way. And come back to the discussion in the next meeting.
  • Piotr Sowiński: YAML-LD/JSON-LD are only warming up. So we work on our side and then ask them for a meeting and have the discussion. Make our spec good first and then try to convince them.
  • Robin Keskisärkkä: also relates to the discussion on whether we should allow repeated PREFIX and BASE directive in Turtle logs.
  • Piotr Sowiński: it is about compatibility with Turtle parsers. RDF 1.1 Turtle spec allows that (they rewrite the prev values).
  • Piotr Sowiński: but many parsers are not very spec-compliant.
  • Jean-Paul Calbimonte: current proposition (with comment-based message separators) sounds like a hack
  • Piotr Sowiński: propose to allow defining prefixes just once. Otherwise half of the stream would be just prefixes. But allow to override the prefixes in later messages.
  • Robin Keskisärkkä: if we have the message log. You need to read the head every time? And then go to the place you are interested in? I need a smart parser then. I cannot extract a portion of the log in an easy way.
  • Piotr Sowiński: you can never do that (skipping to an arbitrary place in a file) with Turtle. NT and NQ can do that.
  • Piotr Sowiński: speaking of which, do we need additional directive syntax for n-triples and n-quads?
  • Piotr Sowiński: is there anything new in the NT/NQ spec 1.2?
  • [5](https://www.w3.org/TR/rdf12-n-quads/#sec-version)
  • Piotr Sowiński: indeed, there is a VERSION directive, so we can add a new directive to n-quads as well\!
  • Robin Keskisärkkä: interesting, can split the n-quads into message parts.
  • Piotr Sowiński: with Jelly we had a similar issue. How to get to the place you want in the stream. You must read all prefixes and skip everything else. We are considering adding checkpoints, where you clear the prefixes. But do we need it in the RDF Messages? It would be a very complex feature to implement.

Closing

  • Piotr Sowiński: we will have another workshop in a few weeks