W3C

– DRAFT –
FHIR RDF

10 March 2022

Attendees

Present
Dagmar, David Booth, EricP, Gaurav Vaidya, Jim Balhoff
Regrets
-
Chair
David Booth
Scribe
dbooth

Meeting minutes

RDF lists

jim: Looking into OWL API, trying to understand how the parser works. No prognosis yet.

jim: Creating a test now.

Concept IRIs

dbooth: Heard back from Martin Durst. No standard algorithm for converting percent-encoding unicode strings to append to IRIs.

Anyway, the characters you need to escape are essentially in two categories:
- ASCII characters that are syntactically relevant in URIs and IRIs.
  These are called reserved characters. Overall, they are all ASCII
  characters except letters, digits, and "-" / "." / "_" / "~".
  (https://datatracker.ietf.org/doc/html/rfc3987#section-2.2)
  (If you do a careful analysis on where your Unicode strings can
   end up in the IRI, you may be able to leave more characters as-is.
   As an example, you may be able to leave in a ":" if you know
   that you will always have a full IRI with http: or some such
   at the start.)
- Any Unicode codepoints (where there might or might not be characters)
  that you want to leave out for one reason or another. In RFC 3987,
  we on purpose didn't restrict this too much. But you definitely don't
  want surrogates (0xD800-0xDFFF), because these are only used in pairs
  in UTF-16. We also excluded private-use characters (except for query
  parts) and co-called non-characters (0xYZFFFE, 0xYZFFFF). You can
  exclude more, for example unassigned code points,... But it's probably
  better to exclude these altogether rather than to allow them when
  they are percent-encoded.
As for text for a standard, please have a look through
https://datatracker.ietf.org/doc/html/rfc3987#section-3
where you probably can find quite a few pieces (but you will have to select and put them together yourself).

Gaurav: I started drafting an algorithm.

eric: We're trying to see the future. If people only use flat name spaces, they can. But it seems likely that someone may want to introduce hierarchy we may want to help support that.

eric: I think we only want to escape chars that either produce an invalid IRI or ____ .
… Unicode chars should never include surrogates anyway.
… Do we allow dots? Slashes? Colons?
… Hashes are out because they have a special meaning.

gaurav: In first draft, suggested evertying outside of iquery should be percent-encoded.

dbooth: Do we want to do a direct concatenation of stemIRI with percent-encoding of the code?

eric: I think so. Considered relative URL resolution, but don't need it.

gaurav: Yes.

Jim, dagmar: yes

dbooth: Do we want to place any restrictions on stemIRI other than being a valid IRI?

eric: fragID is not sent to server when an HTTP request is sent.
… I think we should allow users to engineer their URI space how they want.

<dbooth> s/valid absolute IRI/

dbooth: Should the absolute IRI be required to have a slash (after the iauthority) to prevent the code from changing the apparent domain name?
… Feels like a security risk if we don't.

dbooth: If the stemIRI is something like https://hl7.org (with no slash) and the code is .evil-hacker.com/
… then that could be a security risk.

eric: Could say that if the scheme is a URL scheme, then the stemIRI must contain a slash after the iauthority.

dbooth: Should we have this restriction?

eric: yes, but it would have to be English text.

dbooth: We should also list currently known URL schemes.

gaurav: Wikidata uses a template, with $1 placeholder.

AGREED: Gaurav w draft re-write

dbooth: directly concatenate stemIRI + percentEncode(code) ?

eric: Agreed.

gaurav: Might want the percentEncode function to depend on the stemIRI -- whether it is in the query string, the fragID, etc.

gaurav: Should we percent-encode everything that is not in ifragment?

ACTION: gaurav to draft algorithm to percent-encode everything but ifragment chars, and show us corner cases or cases we might want to reconsider.

gaurav: I might be out next week.

ADJOURNED

Summary of action items

  1. gaurav to draft algorithm to percent-encode everything but ifragment chars, and show us corner cases or cases we might want to reconsider.
Minutes manually created (not a transcript), formatted by scribe.perl version 185 (Thu Dec 2 18:51:55 2021 UTC).

Diagnostics

Failed: s/valid absolute IRI/

Succeeded: i/ADJOURNED/gaurav: I might be out next week.

No scribenick or scribe found. Guessed: dbooth

Maybe present: AGREED, dbooth, eric, Gaurav, jim