Re: Addressing ISSUE-47 (invalid and relative IRIs) from Richard Cyganiak on 2011-07-07 (public-rdb2rdf-wg@w3.org from July 2011)

From: Richard Cyganiak <richard@cyganiak.de>
Date: Thu, 7 Jul 2011 18:57:11 +0100
To: David McNeil <dmcneil@revelytix.com>
Cc: RDB2RDF WG <public-rdb2rdf-wg@w3.org>
Message-Id: <8EF72312-B3E4-4EB6-9F8E-FBAA33AF1A12@cyganiak.de>

Hi David,

Thanks for the comments.

On 7 Jul 2011, at 16:26, David McNeil wrote:
> On Mon, Jul 4, 2011 at 6:14 PM, Richard Cyganiak <richard@cyganiak.de> wrote:
>> 3. Invalid IRIs (e.g., anything containing spaces and so on) are skipped, and if any triple would include such an IRI then that triple is skipped
>> 
> 
> This worries me. I am uncomfortable with rows of data silently disappearing based on their contents.

The question is, what's the alternative. The only workable other option I can think of is to make this an error. I don't like that option much, because this error cannot be determined based on the schema, and cannot be detected at startup time but only at query time. So it would be a “runtime error”, a new concept.

I'd prefer if the validity of a mapping depended only on the schema of a DB, but not on its contents, so that you could do validation at startup time.

(Thinking more about it, another reasonable option might be to make it a blank node.)

We'll run into similar questions with typed literals. What should happen when the mapping produces "aaa"^xsd:integer? I'm tempted to say that these should also simply not be generated.

>> 4. rr:template is changed so that it %-encodes most characters. This means that rr:column "person/{NAME}" will work even if the name contains spaces, the result will be "http://base.uri/person/Alice%20Smith"
>> 
> 
> A couple of thoughts on this:
> 
> * I don't think we yet had group consensus that R2RML should perform automatic %-encoding.

That's right. I can't remember the issue being discussed?

> * I think a consequence of what you are proposing is that the following two R2RML snippets would behave differently with respect to encoding:
>   rr:subject [ rr:column "Name" ]
>   rr:subject [ rr:template "{Name}" ]
> 
> I think it would be less surprising to users if these two constructs had the same behavior.

You are right, it's a bit surprising, but I think it's easy enough to learn and remember that "{name}" performs %-encoding when generating IRIs while "name" doesn't.

I don't see how we could reasonably make both behave the same. Both can't be %-encoding, because then rr:column would %-encode already valid IRIs, making them invalid. If both are non-%-encoding, then we need some other mechanism for %-encoding, so we'd need a proposal for that.

And to be honest, I believe that rr:column will be rarely used for generating IRIs.

If you'd prefer to see some other behaviour here, could you please open an issue (or make a change proposal)?

Best,
Richard

Received on Thursday, 7 July 2011 17:57:39 UTC