Re: A real problem with CURIEs and a proposal

Hi Ivan!

(I'm CC:ing Gavin Carothers who raised ISSUE-125, since we're now
discussing whether this addresses those concerns at all.)

2012/1/24 Ivan Herman <ivan@w3.org>:
> Niklas,
>
> I think your analysis on the Open Graph protocol issue is correct.

Good. Do you think that we should fix this? (I've been believing that
we do want that, even that most(?) of us thought that it was already
supported.) Of course, it is already the case today that since RDFa
1.0 defines CURIEs like this as well, the OG usage is in fact invalid.
I don't know how many RDFa processors actually break on that though.
Since one could not mix (unsafe) CURIEs and IRIs in RDFa 1.0 I'd
expect most of them to just split on ":" and expand the prefix part.

> My issue, however, is: if we go along the lines you propose, we are getting even further away from a compatibility with Turtle/SPARQL, an issue that has already been raised by the RDF WG. I am not sure what the best forum is for that.

That was not my intention. :( The change *does* allow colon ":" in the
first segment of the local part now of course; explicitly in order to
support the OG form of CURIEs. This admittedly gets us further away.

But by disallowing CURIEs to start with "prefix://", I hoped that it
would mitigate (if not fully address) one of the concerns that the
RDF-WG expressed, of confusing them with normal IRIs. As Gavin said:
"These are very easy to confuse with normal IRIs. In general it seems
that the intent of CURIEs was to limit the right hand side to relative
references but that is not accomplished by using the "irelative-ref"
production from the IRI RFC."

So I set out to fulfill the goals of supporting CURIEs like:

    og:video:width
    schema:Person/Engineer
    ex:some?very=special#thing

while not allowing CURIEs of the forms like Gavin's example:

    prefix://user:password[2001:0db8:85a3:0000:0000:8a2e:0370:7334]:8080/

Nor any other IRI using the "//" authority path form (like http and https IRIs).

I find five things to consider:

1. CURIEs do not currently allow e.g. "og:video:height". PNames don't
either. We however have RDFa in the wild using that form (both with
the original Open Graph Protocol using RDFa 1.0 and the new Open Graph
using RDFa 1.1 with @prefix).

2. CURIEs support lots of special characters in the local part; PNames
don't. The same reasoning as in 1 seems to apply, with our explicit
requirements being to support e.g. "schema:Person/Engineer" and
"db:resource/Albert_Einstein" (and "ex:some?very=special#thing", I
suppose).

3. CURIEs are allowed to be identical to full IRIs today (PNames most
definitely aren't). Gavin expressed concerns about this ("Host parts,
IPv4 and IPv6 segments") because they can be confused with normal
IRIs. I interpret that as meaning those with "//" and authority after
the scheme. I propose to not allow the CURIE local part to start with
"/" (and thus not "//").

4. CURIE prefixes, being defined as NCNames, allow some forms which
are not allowed in PName prefixes (e.g. prefixes starting with "_").
We may be able to use PN_PREFIX instead without breaking any real use
case.

5. I kept using the ABNF from the IRI RFC because CURIEs are based on
IRIs. The RDF WG asked us to use W3C EBNF. Provided that we should
address any of the above I'd gather that it is a sound request to do
so using EBNF.

My hope is that if we were to address these, the RDF WG would find the
results satisfactory, even if the CURIE definition end up as a
superset of PName.

(Note that point 1 and 2 may also be of interest for the RDF WG
regarding PNames.)

Best regards,
Niklas

PS. You know that point 3 has vexed me, but please believe that I
don't want to reopen ISSUE-90. That suggested more invasive changes
which don't work with use cases as per above. I've absolutely accepted
that. I approached this based on ISSUE-125 along with the observation
of the Open Graph issue. Part of that suggested that point 3 is of
concern, and that it may be addressed without affecting our needs. I
want to keep the changes to a minimum while supporting as many
concerns as possible (usability and safety being the primary
objectives).


> Manu: will you be at the Coordination Group tomorrow? Maybe worth raising the issue there?
>
> ivan
>
> On Jan 24, 2012, at 04:01 , Niklas Lindström wrote:
>
>> Hello,
>>
>> I've been investigating some of the minute details and issues
>> surrounding CURIEs, based on the discussion that recently cropped up
>> with ISSUE-125 [1].
>>
>> It seems to me that the definition we currently have is flawed in one
>> more way, and quite crucially so.
>>
>>
>> ## The Problem ##
>>
>> As we already know, a bunch of Facebook OpenGraph properties are
>> expressed with CURIEs where the parts after the prefix themselves
>> contain colons. For instance, "video:actor:role", and
>> "my-og-app:podcast:url" as seen in the examples at [2]. (There are
>> also 13 such properties defined in <http://ogp.me/ns#>, e.g.
>> "og:image:width" and "og:video:height".)
>>
>> We currently define CURIEs as:
>>
>>    curie       ::=   [ [ prefix ] ':' ] reference
>>    reference   ::=   irelative-ref ; (as defined in [RFC3987])
>>
>> Now, I may be too tired to see clearly, but if I read the definition
>> of irelative-ref in section 2.2 of RFC 3987 [3] correctly, it actually
>> prohibits such CURIEs!
>>
>> Let me explain. I find these to be the relevant definitions in RFC 3987:
>>
>>    irelative-ref  = irelative-part [ "?" iquery ] [ "#" ifragment ]
>>
>>    irelative-part = "//" iauthority ipath-abempty
>>                   / ipath-absolute
>>                   / ipath-noscheme
>>                   / ipath-empty
>>
>>    ipath-absolute = "/" [ isegment-nz *( "/" isegment ) ]
>>    ipath-noscheme = isegment-nz-nc *( "/" isegment )
>>    ipath-empty    = 0<ipchar>
>>
>>    isegment-nz-nc = 1*( iunreserved / pct-encoded / sub-delims
>>                        / "@" )
>>                  ; non-zero-length segment without any colon ":"
>>
>> If I interpret the ABNF [4] properly, given "og:image:width", I get
>> the following:
>>
>> * "og:" matches the prefix and ":", so we match "image:width" against
>> irelative-ref;
>> * there is no "?" or "#" in that, so only irelative-part is considered;
>> * it does not start with "//", so we skip the following (iauthority
>> ipath-abempty) of the first alternative;
>> * it does not start with "/", so it is not an ipath-absolute;
>> * it contains a colon ":", so it is not an ipath-noscheme (does not
>> match isegment-nz-nc *( "/" isegment ));
>> * it is not empty, so it is not an ipath-empty.
>>
>> With no more alternatives in irelative-part, I conclude that
>> "og:image:width" is not a valid CURIE!
>>
>> Please correct me if I'm wrong here! If not, it is quite evident that
>> we have to fix this (lest we accept to break a widely deployed
>> de-facto usage).
>>
>> Ironically, we *do* allow for CURIEs to begin with "//". This makes it
>> possible to use CURIEs *indistinguishable* from "normal" IRIs (using
>> authority and paths), as explained in ISSUE-125 (and in my old (dead
>> horse) ISSUE-90 [5]).
>>
>>
>> ## The Proposal ##
>>
>> We have the opportunity here to fix a lot of things. I propose to
>> define CURIEs along the lines of:
>>
>>    curie           =   [ prefix ] ':' local
>>    prefix          =   PN_PREFIX; as defined in SPARQL 1.1 [6]
>>    local           =   (ipath-rootless / ipath-empty)
>>                            [ "?" iquery ] [ "#" ifragment ]
>>
>>    ipath-rootless  = isegment-nz *( "/" isegment )
>>    isegment        = *ipchar
>>    isegment-nz     = 1*ipchar
>>    ipchar          = iunreserved / pct-encoded / sub-delims / ":"
>>                        / "@
>>
>> .. For comparison, this is the definition of the full IRI:
>>
>>    IRI         = scheme ":" ihier-part [ "?" iquery ]
>>                         [ "#" ifragment ]
>>
>>    ihier-part  = "//" iauthority ipath-abempty
>>                / ipath-absolute
>>                / ipath-rootless
>>                / ipath-empty
>>
>>
>> ## The Consequences ##
>>
>> This (if I'm awake enough) stills allow for *all* the use cases that
>> have hitherto been put forward as needed. E.g.:
>>
>>    schema:Person/Doctor
>>    og:video:height
>>    db:resource/Albert_Einstein
>>    ex:some?very=special#thing
>>
>> (While it is true that it would prevent the "hack" once presented as a
>> means of using full IRIs where RDFa 1.0 only allows CURIEs (by using
>> @xmlns:http="http:"), isn't that moot? Any processor affected by this
>> change in RDFa 1.1 should reasonably use RDFa 1.1 rules, where we now
>> allow such IRIs anywhere CURIEs are allowed. (And for that matter, I
>> don't recall any reports of actual usage of that.))
>>
>> Most importantly, this completely eliminates the risk of confusing
>> CURIEs with normal IRIs. That is, IRIs with a scheme followed by "//",
>> an authority, and a path of segments (separated with "/"), followed by
>> optional "?" query and "#" fragment parts. These are the kinds of IRIs
>> that can be expressed in various relative forms and resolved against a
>> base IRI.
>>
>> Looking at the list of official and common URI schemes at [7], I find
>> that of the 137 schemes, 71 (52%) are in the authority+path form. As
>> we know, the prevalent two on the web, http and https, are of this
>> kind (arguably the only relevant ones). I'd wager that we can expect
>> this form to stay prevalent on the web *even* if "http" we're to be
>> eventually superseded. (I say so because relative paths are immensely
>> usable, and there is an abundance of code dealing with hierarchical
>> URL/URI resolution. Combined with the DNS-based authority model it's
>> reasonably here to stay.)
>>
>> Note also the fact that "http" used as prefix has already turned up in
>> the wild, due to the HTTP Vocabulary Working Draft [8]. This has even
>> been used in the RDFa 1.1 Core spec itself (as I recently reported in
>> my review). To my knowledge, we have asked the ERT WG to change this,
>> but this has not yet happened. With this change, such as prefix would
>> no longer be a (technical) problem.
>>
>> The other form is of the "opaque" IRIs (without an authority part and
>> possibly no "/" separated segments (i.e. "non-relativizable")).
>> Seemingly we've hitherto *unintentionally* prevented some of them
>> (e.g. urn: and tag: URIs); but at the price of the OpenGraph CURIEs.
>> There are some fairly well-known schemes in this group (official or
>> not), e.g.: mailto, tag, urn, doi, geo, tel, callto, news, xmpp, sip,
>> sms, bitcoin, gtalk, skype, spotify. Of these, "tag" and "geo" can be
>> found in prefix.cc. (I've previously mentioned that "geo" may be of
>> some concern for certain RDFa users [9].) But as we've already
>> concluded when resolving ISSUE-90, we argue that these will probably
>> not be used as prefixes, and will be quite uncommon as schemes of
>> subject or object IRIs in RDFa. Also, given that many IRIs using these
>> schemes already are reminiscent of CURIEs, and are of a rather
>> specialized nature, I'd imagine that it's easier for anyone coming
>> across such oddities to recognize the collision risk, should it ever
>> happen. We should still be very clear in the section about CURIEs
>> though, that prefixes overshadow schemes in IRIs of these forms, and
>> that we advice users to monitor the in-scope prefixes for any such
>> collision (along with the workaround accomplishable by using e.g.
>> @prefix="geo: geo:").
>>
>>
>> ## Summary ##
>>
>> I sincerely hope that I have interpreted the ABNF correctly and
>> haven't raised the issue of OpenGraph CURIEs in error. And that I have
>> made a clear and satisfactory draft proposal for fixing both this and
>> the problems raised in ISSUE-125 (primarily the risk of confusing
>> CURIEs with normal IRIs).
>>
>> Best regards,
>> Niklas
>>
>> [1]: http://www.w3.org/2010/02/rdfa/track/issues/125
>> [2]: http://developers.facebook.com/docs/opengraph/objects/builtin/
>> [3]: http://tools.ietf.org/html/rfc3987#section-2.2
>> [4]: http://en.wikipedia.org/wiki/Augmented_Backus%E2%80%93Naur_Form
>> [5]: http://www.w3.org/2010/02/rdfa/track/issues/90
>> [6]: http://www.w3.org/TR/2012/WD-sparql11-query-20120105/#rPNAME_LN
>> [7]: http://en.wikipedia.org/wiki/URI_scheme
>> [8]: http://www.w3.org/TR/HTTP-in-RDF10/
>> [9]: http://lists.w3.org/Archives/Public/public-rdfa-wg/2011Aug/0039.html
>>
>
>
> ----
> Ivan Herman, W3C Semantic Web Activity Lead
> Home: http://www.w3.org/People/Ivan/
> mobile: +31-641044153
> FOAF: http://www.ivan-herman.net/foaf.rdf
>
>
>
>
>

Received on Wednesday, 25 January 2012 00:51:49 UTC