Re: RDF-ISSUE-8 (IRI vs URI): Incorporate IRI-s into the RDF documents [Cleanup tasks]

On Mon, Mar 7, 2011 at 11:12 AM, Mischa Tuffield <mischa.tuffield@garlik.com
> wrote:

> Hello,
>
> <snip/>
>
> On 5 Mar 2011, at 15:26, Pat Hayes wrote:
>
>
> On Mar 5, 2011, at 5:19 AM, RDF Working Group Issue Tracker wrote:
>
>
> RDF-ISSUE-8 (IRI vs URI): Incorporate IRI-s into the RDF documents [Cleanup
> tasks]
>
>
> http://www.w3.org/2011/rdf-wg/track/issues/8
>
>
> Raised by: Ivan Herman
>
> On product: Cleanup tasks
>
>
> The IRI Spec[1] is from 2005, and it may be necessary to retrofit it to
> RDF. Eg, what is the relationship between "http://résumé.example.org" and
> "http://xn--rsum-bpad.example.org"? Are they the same resource or not?
> Note that SPARQL has something on that[2]...
>
> Context matters here.  "http://xn--rsum-bpad.example.org" is the URI
mapped from the IRI
"http://résumé.example.org<http://xn--rsum-bpad.example.org>"
but it is also a valid IRI in its own right (I think -- correct me if I'm
wrong).  If you're dereferencing the resource to fetch its representation
then I think you can safely conclude that those represent the same resource,
but that decision is up to your application.

However, from the perspective of RDF semantics I think it would be wrong to
put the burden on the implementer to consider normalization when computing
term equality, graph equivalence, etc.  This is already an issue to some
extent; see the note in RDF Concepts [1] that says: "Because of the risk of
confusion between RDF URI references that would be equivalent if derefenced,
the use of %-escaped characters in RDF URI references is strongly
discouraged. See also the URI equivalence
issue<http://www.w3.org/2001/tag/issues.html#URIEquivalence-15>of the
Technical Architecture Group."

Nowhere in either the RDF or SPARQL specs do I see anything that implies
applications should normalize URIRefs when comparing them; they all seem to
specify a simple string comparison of the URIRefs.  Likewise, I think that "
http://xn--rsum-bpad.example.org" and
"http://résumé.example.org<http://xn--rsum-bpad.example.org>"
when taken as IRIs should be considered different
terms/nodes/resources/whatever you want to call them.

>
> SPARQL says "IRI (corresponds to the Concepts and Abstract Syntax term "RDF
> URI reference")"
>
>
> As far as I am aware, URI Ref definition came out before the RFC defining
> IRI. They are "pretty similar" insofar as the URIRef work was second
> guessing what IRIs would be, but they didn't managed to get it 100%
> correct.
>
>
> Is this strictly correct? That is, are IRIs in fact just URI references by
> another name? If not (as I suspect) can anyone briefly outline the points of
> difference?
>
>
> No, they are not the same thing, the differences lie in terms of what
> characters get encoded and which don't. One example is the backtick
> character `, which doesn't need to be % encoded when creating an IRI but it
> does need to be when generating a URI Ref. I sent an email to the SWIG
> mailing list about this a while back [1], whereby people pointed out the
> history, and some of the subtle differences between the two.
>

In addition to the encoding differences, note that the RFC defining IRIs
(RFC3987) is based on a more recent URI definition (RFC3986).  However, RDF
Concepts calls out an old definition of URI (RFC2396) when defining
URIRefs.  Among other differences, this old definition does not allow
percent-encoded characters in the host component, while IRIs and new-style
URIs do allow internationalized domain names.  So there seems to be a whole
class of IRIs that, strictly speaking, are not representable as RDF URIRefs
under the current definition.  (My apologies if this has been re-hashed
elsewhere, I'm somewhat new to this discussion.)

-Alex

[1] http://www.w3.org/TR/rdf-concepts/#section-Graph-URIref



>
> Mischa
>
> [1] http://lists.w3.org/Archives/Public/semantic-web/2010Jul/0426.html
>
>
> Pat
>
>
> [1] http://www.ietf.org/rfc/rfc3987.txt
>
> [2] http://www.w3.org/TR/rdf-sparql-query/#docTerminology
>
>
>
>
>
>
> ------------------------------------------------------------
> IHMC                                     (850)434 8903 or (650)494 3973
> 40 South Alcaniz St.           (850)202 4416   office
> Pensacola                            (850)202 4440   fax
> FL 32502                              (850)291 0667   mobile
> phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
>
>
>
>
>
>
>
> ___________________________________
> Mischa Tuffield PhD
> Email: mischa.tuffield@garlik.com
> Homepage - http://mmt.me.uk/
> Garlik Limited, 1-3 Halford Road, Richmond, TW10 6AW
> +44(0)845 652 2824  http://www.garlik.com/
> Registered in England and Wales 535 7233 VAT # 849 0517 11
> Registered office: Thames House, Portsmouth Road, Esher, Surrey, KT10 9AD
>
>

Received on Monday, 7 March 2011 20:36:12 UTC