Re: Signal for semantic extensions from David Booth on 2013-05-20 (public-rdf-comments@w3.org from May 2013)

From: David Booth <david@dbooth.org>
Date: Mon, 20 May 2013 09:48:43 -0400
To: Richard Cyganiak <richard@cyganiak.de>
CC: public-rdf-comments <public-rdf-comments@w3.org>
Message-ID: <519A29BB.8080807@dbooth.org>
Hi Richard,

On 05/18/2013 12:56 PM, Richard Cyganiak wrote:
> David,
>
> On 18 May 2013, at 15:56, David Booth <david@dbooth.org> wrote:
>
>> Hi Richard,
>>
>> Yes, if RDF did not define a standard rdf:requires (or similar)
>> signal to explicitly indicate the semantic extension, then to be
>> safe a client would have to assume that *any* unrecognized class or
>> property *might* signal the need for additional entailments.
>
> But almost any such term *does* introduce additional entailments. If
> the class is defined as having a superclass, or the property has a
> defined domain and range, then you need their definitions to get all
> entailments.

Yes, if those superclass, domain or range declarations are not directly 
in the RDF graph that the client receives, and RDF author's intent 
relies on determining those entailments.

>
>> But that could result in an awful lot of false positives, which the
>> client would have no automated way of distinguishing from true
>> positives.
>
> Can you give an example where this would result in a false positive,
> and explain why you think such cases would be frequent?

I don't know how prevalent this would be, but I know I often define 
predicates that are simply used to attach data, with no additional 
entailments required, much like this:

   :house  v:color "red" .

>
>> I should clarify that when I talk about entailments, I'm primarily
>> thinking of entailments that can be expressed in RDF as added
>> triples.
>
> Well, but anything can be expressed in RDF as triples, given
> appropriate vocabularies.

Agreed.  I just wanted to be clear that I wasn't trying to get into the 
"magic happens here" step of selecting interpretations beyond what is 
stated explicitly in the semantics.

>
>> If we assume that a Linked Data approach is used, then there is
>> another way that this problem can be addressed.  Suppose every
>> unrecognized class or property URI is followed to obtain its
>> definition, expressed also in RDF.  If the definition itself is
>> required to define all of the entailment rules that the client
>> would need for that semantic extension -- i.e., the semantic
>> extension implied by that class or predicate -- then the client
>> could be assured that if it had obtained definitions for all
>> unrecognized classes and predicates then it would be able to
>> determine all entailments.
>
> Hm.
>
> You assume that it is desirable that all parties in the communication
> have a shared sense of “all” or “complete entailments”.

Not quite.  I assume that it is desirable for the client to be *able* to 
have a shared sense of "all entailments" that the RDF author intended. 
An RDF author generally has a particular meaning in mind when writing an 
RDF document, i.e., the RDF author intends to make certain information 
available.  Some of that information is explicit in the RDF triples that 
are included directly in the RDF graph, and some is implicit in the 
author's intended entailments.

> I don't think
> that's the case, in general. Enabling communication despite partial
> understanding is a key feature of RDF. It's common for server to
> publish data in a single document using a number of different
> vocabularies, because different clients understand different parts,
> and will generally ignore what they don't understand. As a publisher,
> I also assume that some clients will only understand parts of my data
> (e.g., generic SKOS) and I include bits in XKOS anyway for those
> clients that understand it.

Yes, that is an important feature of RDF.  But that's not the use case I 
am addressing, because in those cases, the client does not *need* all of 
the RDF author's intended entailments.  I am trying to address the use 
case in which the client needs to know that it has obtained *all* of the 
RDF author's intended entailments, and the RDF author wishes to enable 
the client to do so -- automatically and without additional 
communication between the client and the RDF author.

Imagine that the RDF author has sent an RDF message to the client, and 
the client needs to know: have I received the *entire* message or only a 
portion of it?  Was there something more that the RDF author was trying 
to tell me?  Given that some of the message is explicit in the triples 
that were sent, and some is implicit in the intended entailments, it is 
critical that the client have a clear way to know what are those 
intended entailments.

> I also assume that some clients may draw
> conclusions from my data that I am completely unaware of, based on
> domain knowledge or based on combining it with additional data.

Yes.  The goal here is not to prevent the client from determining 
additional entailments, beyond what the RDF author intended.  The 
client's goal is to determine **at least** all of the entailments that 
the RDF author intended to convey.  If more are determined, that's fine.

>
> There are several aspects in the design of RDF that enable this. Most
> importantly, dropping some triples doesn't make any other triples
> false (although it may of course make the message unintelligible).
> Neither does adding additional triples. And semantic extensions are
> supposed to be designed in a way that they might produce additional
> triples, but not invalidate other triples.
>
> In short, I don't think that we should be designing for the notion
> that there is a unique and complete full interpretation of a given
> graph. No matter what we say in any spec, and no matter what you say
> is required to fully understand your data, different consumers with
> different capabilities will draw different conclusions from your
> data.

I agree, but that's not what I am trying to address here.  There are 
multiple goals one might consider:

  - (1a) Determining a cooperating RDF author's intent; versus
    (1b) defining a "universal interpretation"; and

  - (2a) Enabling the client to determine the RDF author's
intended entailments; versus
    (2b) requiring the client to do so.

I'm only talking about 1a and 2a.

David


>
> Best, Richard
>
>
>> But if the client was unable to obtain a definition for some class
>> or predicate, then this would indicate that it may be missing some
>> entailment rules -- with no need for rdf:requires -- and the client
>> could notify the user.   However, for this approach to work, we
>> would have to adopt a convention that says "if a class or predicate
>> definition is supplied (in RDF), then it must either: (a) supply
>> (in RDF) all of its associated entailment rules (directly or
>> indirectly); or (b) use a class or predicate that has no RDF
>> definition".  In other words, the lack of an available RDF
>> definition would signal potentially missing entailments.  This
>> means that if a URI owner wanted to provide only a partial
>> definition in RDF of a class or predicate, then the URI owner would
>> have to be sure that the definition also references a class or
>> predicate that has no RDF definition, as a way to signal the
>> existence of additional entailment rules.  This would work and
>> might be the cleanest architectural design, but it is slightly more
>> implicit than using a rdf:requires predicate.
>>
>> David
>>
>>
>> On 05/18/2013 05:01 AM, Richard Cyganiak wrote:
>>> David,
>>>
>>> (Unofficial response to ask for clarification)
>>>
>>> Given that any RDF vocabulary is a semantic extension, isn't the
>>> answer here simply that if a client sees a class or property IRI
>>> that it doesn't know, then it must assume that additional
>>> inferences are possible?
>>>
>>> Richard
>>>
>>>
>>> On 18 May 2013, at 04:05, David Booth <david@dbooth.org> wrote:
>>>
>>>> This comment raises an issue that is somewhat theoretical at
>>>> present.  I mentioned it over a year ago (message below) but
>>>> have not seen any discussion about it.  I have not seen it be a
>>>> problem in practice yet, so I do not think it is urgent for the
>>>> working group to address.  But if RDF gains popularity over the
>>>> coming years, and more semantic extensions are introduced, it
>>>> could become a practical consideration, given the long time
>>>> span between RDF versions.
>>>>
>>>> At present there is no standard way in RDF to unambiguously
>>>> signal the expectation of a particular semantic extension.
>>>> I'll explain further what I mean, and make a specific proposal.
>>>> Perhaps others will think of a better way to solve the problem,
>>>> but hopefully this will at least explain what it is.
>>>>
>>>> Suppose an RDF consumer receives a graph written by an RDF
>>>> author and (roughly speaking) the RDF consumer wants to be able
>>>> to fully "understand the author's intended meaning" of that
>>>> graph.  More precisely, the RDF author has used certain
>>>> semantic extensions that imply certain entailments, and wishes
>>>> to allow consumers of that graph to be able to automatically
>>>> (by machine) determine these entailments. In turn, the RDF
>>>> consumer wishes to be able to compute all of those entailments.
>>>> Note that this is *not* suggesting that the RDF consumer be
>>>> *required* to compute the RDF author's intended entailments.
>>>> It is only about *enabling* the RDF consumer to do so if
>>>> desired.
>>>>
>>>> For semantic extensions that are well known, such as OWL, the
>>>> RDF consumer can detect the presence of well known URIs (such
>>>> as OWL predicates) to know that those well known semantic
>>>> extensions are intended.  But for semantic extensions that are
>>>> *not* well know -- non-standard semantic extensions -- the RDF
>>>> consumer has no standard automatable way to know that certain
>>>> URIs are intended to signal the use of particular semantic
>>>> extensions.  Thus, the RDF consumer has no standard way of
>>>> determining whether or not he/she/it has computed all of the
>>>> entailments that the RDF author intended to convey.
>>>>
>>>> When the RDF consumer processes an RDF graph, the processor
>>>> should be able to clearly indicate to the user either: "I have
>>>> computed all of the author's intended entailments" or "I cannot
>>>> compute all of the author's intended entailments because I do
>>>> not have the module for semantic extension
>>>> 'http://example/BobsFavoriteExtension'.  Please load it and
>>>> try again."  But this is only possible if the RDF author has
>>>> an unambiguous standard way to signal the intended semantic
>>>> extensions.
>>>>
>>>> The motivation for this use case is to enable the vision of
>>>> the semantic web to work, even in the presence of new semantic
>>>> extensions.  This means that: (a) the RDF consumer cannot be
>>>> expected to have any other communication with the RDF author
>>>> (other than obtaining the graph that the author had provided);
>>>> and (b) the RDF consumer must be able to perform these steps
>>>> automatically (by machine).
>>>>
>>>> I suggest the RDF working group define a standard predicate
>>>> rdf:requires (or whatever name the group chooses) that an RDF
>>>> author can use to indicate that a particular semantic extension
>>>> is intended.  It could be used like this:
>>>>
>>>> <> rdf:requires <http://example/BobsFavoriteExtension> .
>>>>
>>>> which would indicate that the current document uses semantic
>>>> extension <http://example/BobsFavoriteExtension> .  Hence, to
>>>> be assured of determining all of the document author's
>>>> intended entailments, the RDF processor must understand that
>>>> semantic extension.
>>>>
>>>> Furthermore, for backward compatibility with OWL, it would be
>>>> good to define:
>>>>
>>>> owl:imports rdfs:subPropertyOf rdf:semanticExtension .
>>>>
>>>> and recommend that RDF processors also recognize owl:imports
>>>> as signaling a semantic extension.
>>>>
>>>> Again, since I have not yet seen this issue arise in practice,
>>>> I would consider it a low priority to fix, and would not mind
>>>> if the working group decides to defer it to a future RDF
>>>> version.  On the other hand, it is a very easy gap to fix.
>>>>
>>>> Thanks, David
>>>>
>>>>
>>>> On 03/30/2012 06:18 PM, David Booth wrote:
>>>>> -------- Forwarded Message -------- From: David Booth
>>>>> <david@dbooth.org> To: Pat Hayes <phayes@ihmc.us> Cc:
>>>>> Jonathan A Rees <rees@mumble.net>, Jeni Tennison
>>>>> <jeni@jenitennison.com>, www-tag@w3.org List <www-tag@w3.org>
>>>>> Subject: Re: The TAG Member's Guide to ISSUE-57 Discussion -
>>>>> F2F reading Date: Fri, 30 Mar 2012 18:17:06 -0400
>>>>>
>>>>> Hi Pat,
>>>>>
>>>>> On Wed, 2012-03-28 at 14:24 -0500, Pat Hayes wrote:
>>>>>> FWIW, I am willing to work actively (on- or off-list) with
>>>>>> anyone who wants to try reconciling any proposal with the
>>>>>> RDF semantics, or just to explore any semantic issues. This
>>>>>> is particularly timely as the RDF2 WG is right now debating
>>>>>> issues which impinge on the RDF semantics framework, so it
>>>>>> would be good to get any pending issues or problems out
>>>>>> into the open.
>>>>>
>>>>> I would suggest that the RDF WG look at Part 3 "Determining
>>>>> Resource Identity" of "Resource Identity and Semantic
>>>>> Extensions: Making Sense of Ambiguity":
>>>>> http://dbooth.org/2010/ambiguity/paper.html#part3 That
>>>>> section proposes a standard process for determining resource
>>>>> identity. As far as I know, I did not invent this process.  I
>>>>> simply documented what seemed to be the general ideas
>>>>> floating around.
>>>>>
>>>>> However, I did identify one specific gap in the RDF specs: [[
>>>>> At present there is a minor gap in the RDF standards, in that
>>>>> there is no standard way for an RDF processor to recognize
>>>>> that a particular URI is intended to signal an opaque
>>>>> semantic extension: the knowledge of which URIs are intended
>>>>> to signal opaque semantic extensions must be externally
>>>>> supplied to the RDF processor.  The RDF processor must
>>>>> magically know about them in advance.  It cannot alert the
>>>>> user to the need for a new opaque semantic extension that was
>>>>> previously unknown. This gap could be addressed by defining a
>>>>> standard predicate, such as rdf2:requires, to explicitly
>>>>> indicate when a particular semantic extension is required.
>>>>> However, since it currently seems unlikely that many semantic
>>>>> extensions will be needed that cannot be defined using
>>>>> standard inference rules, this does not seem like a major
>>>>> gap. ]]
>>>>>
>>>>> I will forward this message separately to the RDF comments
>>>>> list, since I cannot post to the regular RDF list.
>>
>
>
>
>
Received on Monday, 20 May 2013 13:49:14 UTC