Re: Issue 89 - why? from Graham Klyne on 2011-09-19 (public-prov-wg@w3.org from September 2011)

From: Graham Klyne <GK@ninebynine.org>
Date: Mon, 19 Sep 2011 17:20:54 +0100
To: Satya Sahoo <satya.sahoo@case.edu>
CC: "Myers, Jim" <MYERSJ4@rpi.edu>, W3C provenance WG <public-prov-wg@w3.org>
Message-ID: <4E776BE6.8080306@ninebynine.org>
On 19/09/2011 03:02, Satya Sahoo wrote:
> Hi Graham,
> My responses are interleaved.
>
>> d1, d1v2, d1v2 are surely different things, but I don't see that modelling
> them is fundamentally different.>What sorts of things can one say about
> d1v1/d1v2 that one cannot say about d1?  And vice versa?
>
> Thanks for correcting the typo - d1v2 and not d2v2!
>
> In an OWL ontology, if d1 is asserted to be an instance of class "Document"
> then its version information is optional. On the other hand, if d1v1 and
> d1v2 are to be asserted to be instances of a class "VersionedDocument" then
> a user has to associate a version number to its attribute "hasVersion".

Surely, that depends on how the OWL ontology is constructed?  E.g. the kind of 
cardinality constraint given for the hasVersion property?

> The advantages of having a class "VersionedDocument" versus only "Document"
> is now we can make all kinds of assertions that Jim wanted - "this is the
> latest version of the Appointment Letter", "this version of the Appointment
> Letter has wrong starting date", "this version of the Appointment Letter has
> the 2011 university rules on academic integrity", there is a clear link
> between various versions etc.

Indeed (assuming a corresponding OWL ontology), but does this need to  be part 
of the core *provenance* ontology?

> A "Document" class on the other hand *does not need* its instances to have
> version information, hence (1) there is no consistency check possible by an
> OWL reasoner to flag incorrect assertions which do not have version
> information - this will allow corruption of knowledge base, (2) if there are
> multiple instances of Appointment Letter with missing version information,
> there is very little information to distinguish between various versions - a
> user may have to manually review timestamps, name of file etc. (defeating
> the purpose of using SW technologies).
>
> A similar analogy is a class "Person" versus specialized classes "Employee",
> "US Senator", "Surgeon" etc.
>
> Overall, my point is that if we decide to create an OWL ontology for PROV we
> have to follow the OWL requirements.
>
>> my point was that it doesn't prevent one from describing (say) d1v1, d1v2,
> etc. and also separately saying>that they arr "versions" of d1.  And it's a
> *lot* simpler than the current proposal.
> Since OPMV (and even the Provenir ontology[1]) use OWL, it does exactly what
> we are doing in the PROV ontology - define what are "necessary" conditions
> for an instance to be member of a class.
>
> There is no difference in how any OWL ontology will describe d1v1 and d1v2
> as well as that they are versions of d1 in terms of "necessary" conditions
> for class definition. Just to clarify, we have not discussed anything that
> is not standard way of defining an OWL ontology class (no PROV ontology
> specific complication).

I don't remember reading about anything like this in OWL - can you please 
provide a link to the relevant part(s) of the OWL spec?

> Maybe a specific example will help us understand our points better or you
> (and others) are invited to attend our regular ontology telcon at 12noon US
> ET - please send me your Skype ID separately if you would like to attend.

Well, my comments are somewhat aimed at the conceptual model, as expressed using 
RDF. At this stage, I'm not specifically concerned with the OWL ontology, except 
to the extent that it is being used to clarify the underlying RDF structure used 
to expressed provenance.

#g
--

> [1] http://wiki.knoesis.org/index.php/Provenir_Ontology
>
> On Sun, Sep 18, 2011 at 5:16 PM, Graham Klyne<GK@ninebynine.org>  wrote:
>
>> On 18/09/2011 19:52, Satya Sahoo wrote:
>>
>>> Hi Jim and Graham,
>>>
>>>   If we don't distinguish at all, we have a mess - a document and a version
>>>> can't be distinguished if we can't>talk about fixed content and we'd then
>>>> be unable to answer questions about when the document was>created (with
>>>> the
>>>> first version or only when the text was finalized).
>>>>
>>>
>>>
>>> I believe modeling a document d1 versus modeling versions of document
>>> d1v1,
>>> d2v2 are two distinct notions.
>>>
>>
>> d1, d1v2, d1v2 are surely different things, but I don't see that modelling
>> them is fundamentally different.  What sorts of things can one say about
>> d1v1/d1v2 that one cannot say about d1?  And vice versa?
>>
>>
>>   The d1v1 and d2v2 are specialized (maybe
>>> subclass) notions of d1. Also, modeling concepts such as d1v1, d2v2 are
>>> not
>>> required by all provenance applications.
>>>
>>>
>>>   For example, OPMV avoids this whole issue by saying that the things to
>>>>
>>> which provenance are applied are>static [1].
>>> The OPMV has used the original OPM Artifact definition and hence the OPM
>>> notion of "static" Artifact.
>>>
>>
>> Certainly - my point was that it doesn't prevent one from describing (say)
>> d1v1, d1v2, etc. and also separately saying that they arr "versions" of d1.
>>   And it's a *lot* simpler than the current proposal.
>>
>> #g
>> --
>>
>>   On Sun, Sep 18, 2011 at 6:20 AM, Graham Klyne<GK@ninebynine.org>   wrote:
>>>
>>>   Jim,
>>>>
>>>>
>>>> On 17/09/2011 16:15, Myers, Jim wrote:
>>>>
>>>>   Are you asking whether we need to distinguish between something and
>>>>> 'something that can't change in some ways' to unambiguously record
>>>>> provenance, or just whether frozen attributes is the best way to do
>>>>> that?
>>>>>
>>>>> If we don't distinguish at all, we have a mess - a document and a
>>>>> version
>>>>> can't be distinguished if we can't talk about fixed content and we'd
>>>>> then be
>>>>> unable to answer questions about when the document was created (with the
>>>>> first version or only when the text was finalized).
>>>>>
>>>>>
>>>> Agreed, we need to be able to distinguish between the document and its
>>>> "versions" for which some values about which we make provenance
>>>> assertions
>>>> are invariant.
>>>>
>>>>
>>>>   (This is the problem with things - we don't always agree on what aspects
>>>>
>>>>> of a thing can change and still be recognizable as the same thing, so we
>>>>> define entities for which the aspects that important relative to the
>>>>> provenance we're recording are clearly changeable or not changeable, not
>>>>> open to interpretation).
>>>>>
>>>>> If we consider the alternatives to fixing attributes, the most obvious
>>>>> would be to stick the constraint in the type/class - as we do with
>>>>> document
>>>>> and document-version. Either works, but you end up with a lot of type
>>>>> proliferation. 'document-version<#>-at-****location<>-inEncoding<>-****
>>>>> withEncryption<>'
>>>>> is well defined relative to moving, encoding and encryption changes,
>>>>> etc.
>>>>> The alternative encoding is to fix the attributes. To me, the
>>>>> interpretation
>>>>> should be the same in both cases - a version is really a different kind
>>>>> of
>>>>> thing than a document even if we record it as document with a  fixed
>>>>> content
>>>>> attribute. (The statue and other examples make this clearer).
>>>>>
>>>>>
>>>> I take a view that something may be a "version" of something else if it
>>>> is
>>>> asserted to be (*).  The important consequence of being such a "version"
>>>> is
>>>> that valid provenance assertions made with respect to these versions are
>>>> permanent truths, and can they can be said to be about some aspect of the
>>>> original resource.  Beyond that, why do we need to know what are the
>>>> particular constraints for a particular "version"?
>>>>
>>>> I guess I'm trying to dodge the philosophical minefields about what
>>>> constitutes identity.  I'm more concerned with what we need as a minimum
>>>> to
>>>> be able to record, exchange and do useful things with provenance
>>>> information.
>>>>
>>>> It could be that I'm missing something important here, hence my original
>>>> question being phrased as "what breaks?"
>>>>
>>>> ...
>>>>
>>>> You also raise what I see as a separate issue:  "a version is really a
>>>> different kind of thing than a document".  In some senses, this is almost
>>>> tautologically true, but from a perspective of ontologizing, I'm not sure
>>>> it's useful.  Can versions have versions (I think so).  Then we are faced
>>>> with a potentially infinite regress of types, or a type that can be
>>>> reflexive (if that's an allowable use) with respect to the version
>>>> relationship; i.e. a type that can be both range and domain of a "has
>>>> version".  To me, the latter seems to be the simpler course, unless and
>>>> until we find some essential functionality that is broken in such an
>>>> approach.
>>>>
>>>> ...
>>>>
>>>> (*) of course, it may be of interest to others to understand what makes
>>>> something a "version" of something else, and to understand the variant
>>>> and
>>>> invariant elements in detail.  I'm just asking if this needs to be part
>>>> of
>>>> the _provenance_ discussion, or if it can be treated separately.
>>>>
>>>> For example, OPMV avoids this whole issue by saying that the things to
>>>> which provenance are applied are static [1].  This is enough for OPMV to
>>>> be
>>>> useful in a significant range of applications for provenance (I
>>>> understand
>>>> it is used in the current UK open gov data work).  I personally think
>>>> that
>>>> might be too strong a constraint, but if the price of relaxing that
>>>> constraint is to wade into difficult philosophical territory, them I'm
>>>> not
>>>> so sure it's worth it.
>>>>
>>>> The fact that the things OPMV describes may be different versions of some
>>>> underlying thing is simply not part of this particular ontology, and it
>>>> seems to work OK so far.
>>>>
>>>> [1] http://open-biomed.**sourcefor**ge.net/opmv/ns.html#**sec-**
>>>> specification<http://sourceforge.net/opmv/ns.html#**sec-specification><
>>>> http://open-**biomed.sourceforge.net/opmv/**ns.html#sec-specification<http://open-biomed.sourceforge.net/opmv/ns.html#sec-specification>>-
>>>> see sub-section on "Artifact"
>>>>
>>>> #g
>>>> --
>>>>
>>>>
>>>>
>>>>   -----Original Message-----
>>>>
>>>>> From: public-prov-wg-request@w3.org [mailto:public-prov-wg-
>>>>>> request@w3.org] On Behalf Of Graham Klyne
>>>>>> Sent: Saturday, September 17, 2011 3:07 AM
>>>>>> To: W3C provenance WG
>>>>>> Subject: Issue 89 - why?
>>>>>>
>>>>>> I've been reading some of the discussion of Issue 89:
>>>>>>
>>>>>>     http://www.w3.org/2011/prov/****track/issues/89<http://www.w3.org/2011/prov/**track/issues/89>
>>>>>> <http://www.w3.**org/2011/prov/track/issues/89<http://www.w3.org/2011/prov/track/issues/89>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> which seems to my mind be getting rather like a counting of angels-on-
>>>>>> pinheads, and I wonder if we're not in danger of over-ontologizing
>>>>>> here.
>>>>>>
>>>>>> Going back to the original issue, I see:
>>>>>>
>>>>>> [[
>>>>>> The conceptual model defines an entity in terms of an identifier and a
>>>>>> list of
>>>>>> attribute-value pairs. It is indeed crucial for the asserter to
>>>>>> identify
>>>>>> the
>>>>>> attributes that have been frozen in a given entity.
>>>>>> ]]
>>>>>>
>>>>>> Why is it so crucial to identify what attributes have been frozen?
>>>>>>
>>>>>> What practical application of provenance is prevented is we don't
>>>>>> require
>>>>>> this?
>>>>>>
>>>>>> #g
>>>>>> --
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>
Received on Monday, 19 September 2011 23:20:18 UTC