Re: A real problem with CURIEs and a proposal

Niklas,

I think your analysis on the Open Graph protocol issue is correct. 

My issue, however, is: if we go along the lines you propose, we are getting even further away from a compatibility with Turtle/SPARQL, an issue that has already been raised by the RDF WG. I am not sure what the best forum is for that.

Manu: will you be at the Coordination Group tomorrow? Maybe worth raising the issue there?

ivan

On Jan 24, 2012, at 04:01 , Niklas Lindström wrote:

> Hello,
> 
> I've been investigating some of the minute details and issues
> surrounding CURIEs, based on the discussion that recently cropped up
> with ISSUE-125 [1].
> 
> It seems to me that the definition we currently have is flawed in one
> more way, and quite crucially so.
> 
> 
> ## The Problem ##
> 
> As we already know, a bunch of Facebook OpenGraph properties are
> expressed with CURIEs where the parts after the prefix themselves
> contain colons. For instance, "video:actor:role", and
> "my-og-app:podcast:url" as seen in the examples at [2]. (There are
> also 13 such properties defined in <http://ogp.me/ns#>, e.g.
> "og:image:width" and "og:video:height".)
> 
> We currently define CURIEs as:
> 
>    curie       ::=   [ [ prefix ] ':' ] reference
>    reference   ::=   irelative-ref ; (as defined in [RFC3987])
> 
> Now, I may be too tired to see clearly, but if I read the definition
> of irelative-ref in section 2.2 of RFC 3987 [3] correctly, it actually
> prohibits such CURIEs!
> 
> Let me explain. I find these to be the relevant definitions in RFC 3987:
> 
>    irelative-ref  = irelative-part [ "?" iquery ] [ "#" ifragment ]
> 
>    irelative-part = "//" iauthority ipath-abempty
>                   / ipath-absolute
>                   / ipath-noscheme
>                   / ipath-empty
> 
>    ipath-absolute = "/" [ isegment-nz *( "/" isegment ) ]
>    ipath-noscheme = isegment-nz-nc *( "/" isegment )
>    ipath-empty    = 0<ipchar>
> 
>    isegment-nz-nc = 1*( iunreserved / pct-encoded / sub-delims
>                        / "@" )
>                  ; non-zero-length segment without any colon ":"
> 
> If I interpret the ABNF [4] properly, given "og:image:width", I get
> the following:
> 
> * "og:" matches the prefix and ":", so we match "image:width" against
> irelative-ref;
> * there is no "?" or "#" in that, so only irelative-part is considered;
> * it does not start with "//", so we skip the following (iauthority
> ipath-abempty) of the first alternative;
> * it does not start with "/", so it is not an ipath-absolute;
> * it contains a colon ":", so it is not an ipath-noscheme (does not
> match isegment-nz-nc *( "/" isegment ));
> * it is not empty, so it is not an ipath-empty.
> 
> With no more alternatives in irelative-part, I conclude that
> "og:image:width" is not a valid CURIE!
> 
> Please correct me if I'm wrong here! If not, it is quite evident that
> we have to fix this (lest we accept to break a widely deployed
> de-facto usage).
> 
> Ironically, we *do* allow for CURIEs to begin with "//". This makes it
> possible to use CURIEs *indistinguishable* from "normal" IRIs (using
> authority and paths), as explained in ISSUE-125 (and in my old (dead
> horse) ISSUE-90 [5]).
> 
> 
> ## The Proposal ##
> 
> We have the opportunity here to fix a lot of things. I propose to
> define CURIEs along the lines of:
> 
>    curie           =   [ prefix ] ':' local
>    prefix          =   PN_PREFIX; as defined in SPARQL 1.1 [6]
>    local           =   (ipath-rootless / ipath-empty)
>                            [ "?" iquery ] [ "#" ifragment ]
> 
>    ipath-rootless  = isegment-nz *( "/" isegment )
>    isegment        = *ipchar
>    isegment-nz     = 1*ipchar
>    ipchar          = iunreserved / pct-encoded / sub-delims / ":"
>                        / "@
> 
> .. For comparison, this is the definition of the full IRI:
> 
>    IRI         = scheme ":" ihier-part [ "?" iquery ]
>                         [ "#" ifragment ]
> 
>    ihier-part  = "//" iauthority ipath-abempty
>                / ipath-absolute
>                / ipath-rootless
>                / ipath-empty
> 
> 
> ## The Consequences ##
> 
> This (if I'm awake enough) stills allow for *all* the use cases that
> have hitherto been put forward as needed. E.g.:
> 
>    schema:Person/Doctor
>    og:video:height
>    db:resource/Albert_Einstein
>    ex:some?very=special#thing
> 
> (While it is true that it would prevent the "hack" once presented as a
> means of using full IRIs where RDFa 1.0 only allows CURIEs (by using
> @xmlns:http="http:"), isn't that moot? Any processor affected by this
> change in RDFa 1.1 should reasonably use RDFa 1.1 rules, where we now
> allow such IRIs anywhere CURIEs are allowed. (And for that matter, I
> don't recall any reports of actual usage of that.))
> 
> Most importantly, this completely eliminates the risk of confusing
> CURIEs with normal IRIs. That is, IRIs with a scheme followed by "//",
> an authority, and a path of segments (separated with "/"), followed by
> optional "?" query and "#" fragment parts. These are the kinds of IRIs
> that can be expressed in various relative forms and resolved against a
> base IRI.
> 
> Looking at the list of official and common URI schemes at [7], I find
> that of the 137 schemes, 71 (52%) are in the authority+path form. As
> we know, the prevalent two on the web, http and https, are of this
> kind (arguably the only relevant ones). I'd wager that we can expect
> this form to stay prevalent on the web *even* if "http" we're to be
> eventually superseded. (I say so because relative paths are immensely
> usable, and there is an abundance of code dealing with hierarchical
> URL/URI resolution. Combined with the DNS-based authority model it's
> reasonably here to stay.)
> 
> Note also the fact that "http" used as prefix has already turned up in
> the wild, due to the HTTP Vocabulary Working Draft [8]. This has even
> been used in the RDFa 1.1 Core spec itself (as I recently reported in
> my review). To my knowledge, we have asked the ERT WG to change this,
> but this has not yet happened. With this change, such as prefix would
> no longer be a (technical) problem.
> 
> The other form is of the "opaque" IRIs (without an authority part and
> possibly no "/" separated segments (i.e. "non-relativizable")).
> Seemingly we've hitherto *unintentionally* prevented some of them
> (e.g. urn: and tag: URIs); but at the price of the OpenGraph CURIEs.
> There are some fairly well-known schemes in this group (official or
> not), e.g.: mailto, tag, urn, doi, geo, tel, callto, news, xmpp, sip,
> sms, bitcoin, gtalk, skype, spotify. Of these, "tag" and "geo" can be
> found in prefix.cc. (I've previously mentioned that "geo" may be of
> some concern for certain RDFa users [9].) But as we've already
> concluded when resolving ISSUE-90, we argue that these will probably
> not be used as prefixes, and will be quite uncommon as schemes of
> subject or object IRIs in RDFa. Also, given that many IRIs using these
> schemes already are reminiscent of CURIEs, and are of a rather
> specialized nature, I'd imagine that it's easier for anyone coming
> across such oddities to recognize the collision risk, should it ever
> happen. We should still be very clear in the section about CURIEs
> though, that prefixes overshadow schemes in IRIs of these forms, and
> that we advice users to monitor the in-scope prefixes for any such
> collision (along with the workaround accomplishable by using e.g.
> @prefix="geo: geo:").
> 
> 
> ## Summary ##
> 
> I sincerely hope that I have interpreted the ABNF correctly and
> haven't raised the issue of OpenGraph CURIEs in error. And that I have
> made a clear and satisfactory draft proposal for fixing both this and
> the problems raised in ISSUE-125 (primarily the risk of confusing
> CURIEs with normal IRIs).
> 
> Best regards,
> Niklas
> 
> [1]: http://www.w3.org/2010/02/rdfa/track/issues/125
> [2]: http://developers.facebook.com/docs/opengraph/objects/builtin/
> [3]: http://tools.ietf.org/html/rfc3987#section-2.2
> [4]: http://en.wikipedia.org/wiki/Augmented_Backus%E2%80%93Naur_Form
> [5]: http://www.w3.org/2010/02/rdfa/track/issues/90
> [6]: http://www.w3.org/TR/2012/WD-sparql11-query-20120105/#rPNAME_LN
> [7]: http://en.wikipedia.org/wiki/URI_scheme
> [8]: http://www.w3.org/TR/HTTP-in-RDF10/
> [9]: http://lists.w3.org/Archives/Public/public-rdfa-wg/2011Aug/0039.html
> 


----
Ivan Herman, W3C Semantic Web Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
FOAF: http://www.ivan-herman.net/foaf.rdf

Received on Tuesday, 24 January 2012 11:45:36 UTC