Re: PROV-WG Reminder: Review PAQ before FPWD vote

On 15/11/2011 10:45, Luc Moreau wrote:
> Hi Graham and Paul,
>
> I unfortunately don't have time to give it a deep read-through before Thursday.
> However, after a first scan, I still have a number of issues. I am sending them
> to you now as a single message. When I have completed a reading of the document,
> I can raise them as specific issues, if appropriate (note that many were already
> raised).
>
>
>
>
> 1. ISSUE-52/iSSUe-55
> I still believe that 4.2.2 should be on an equal footing to section 2.

They're not either/or options.  Section 2 stands alone, as far as it goes. 
Section 4.2.2 describes a precursor mechanism for finding a URI to be used per 
section 2.

> Section 2 can only work in some limited circumstances when a provenance uri exists.

The term "limited" here is loaded.  I don't think it is (or should) be an 
uncommon circumstance on the web.

Web architecture says "Good practice: Identify with URIs" 
(http://www.w3.org/TR/webarch/#pr-use-uris).  As a specification under W3C 
Aegis, I think we should be expecting and promoting good web practice.

URIs are not expensive or hard to generate; a service that has provenance 
information for some resource can easily enough auto-generate a URI to identify it.

> Section 4.2.2 is really the only mechanism that is workable when provenance is
> produced by a party different to the resource provider.

Section 2 still applies for 3rd party provenance providers: the section 4.2.2 
mechanism is a precursor to using the section 2 mechanism.

> 2. ISSUE-79
> I still don't have a good understanding of the expected properties for
> provenance-uri and provenance information.
> - can we expect the same provenance information for every dereference of a
> provenance-uri? If not, we must say it.
> - if resource changes, does provenance uri change?

I expect a common case will be constant provenance at a given URI.

But, depending on the provenance service, the provenance information returned 
may change, but only in restricted ways:  provenance claims once asserted cannot 
be un-asserted.  It's no different to any other (trustworthy) knowledge source 
on the web: if we obtain information that we expect to be true, we don't expect 
to get conflicting information when performing multiple retrievals.

We might add some words to explain this, but what is it that we can say about 
provenance that isn't also true of many other web resources?  My concern is that 
if we add more detail we may cause more confusion then clarity when what we try 
say would in some respects be restating the obvious.

We've already said, up front: "Provenance information, to be useful, must be 
persistent and not itself dependent on context" (sect 1.2).

Here's a possible paragraph for section 2:
[[
Provenance assertions are about pre-determined activities involving entities; as 
such, they are not dynamic.  Thus, provenance information returned at a given 
provenance-URI may commonly be static.  But the availability of provenance 
information about a resource may vary (e.g. if there is insufficient storage to 
keep it indefinitely, or new information becomes available at a later date), so 
the provenance information returned at a given URI may change, provided that 
such change does not contradict any previously retrieved information.

How much or how little provenance information is returned in response to to a 
retrieval request is a matter for the provenance provider application.  At a 
minimum, for as long as provenance information about an entity remains 
available, sufficient should be returned to enable a client application to walk 
the provenance graph per <a class="sectionRef" href=#Incremental Provenance 
Retrieval></a>.
]]

I've added this in my working copy.

(I think there's a danger here of straying into the details of application 
design.  It's up to a provenance information provider how much or how little 
provenance information they provide in response to a request.  We can't 
prescribe how much provenance information must be available.  What we should be 
aiming for is to specify tools that developers can use (interoperably) to 
achieve their application goals.)

> I question implementability of provenance-uri.
> Isn't it the case that two compliant implementations of provenance-uri would be:
> - constant uri (returning all provenance)

Returning *some* provenance.  See above.

> - canned query (as describe for the rest service)

That, too, is possible.  Seems useful to me.

> But, then, provenance-uri does not seem useful.

This is a logical fallacy.  A may be B or C.  Neither B nor C satisfy D 
therefore A does not satisfy D.  The possibility that A may be something other 
than B or C has not been excluded.

Retrieving from a provenance-uri could invoke a process which evaluates 
available information on-the-fly.  There's no clear limit to what that could be 
used for.

> I can see a good usage when provider decides to cache/pre-compute
> provenance information. But this seems to be a very specific usage of
> provenance-uri.

The issue of precomputing/cacheing is a red herring.  URI retrieval doesn't 
constrain how the resource representation returned is obtained.

(In the web, application cacheing of precomputed values should not generally be 
necessary - the web is very good at handling that kind of thing, as long as 
applications work with the web and not against it.  Using normal web retrieval 
of URIs and a consistent URI allocation policy, the web cacheing infrastructure 
can avoid unnecessary recomputation.)

> How can an implementer with provenance stored in a database/triple
> store usefully use the provenance-uri?

Many ways.  Linked Data API would be one 
(http://code.google.com/p/linked-data-api/wiki/Specification).  SPARQL endpoint 
is another (a URI can encode a SPARQL query).  Again, it's not for us to tell 
developers how to code.

> 3. The specification should not make any assumption about the nature of
> entity-uri, which may be or not be http Uris. Prov-dm document gives an example
> with uuid Uris. The http get/head approach may not always work.

Where does the specification make any such assumption?  As far as PAQ is 
concerned, entity URIs may use any scheme.  If it anywhere says otherwise, 
that's an error.

> 4. While it is reasonable to mention entity-uris in headers as resource
> representations are passed in get responses, a provenance service should not be
> limited to entity uris. We need to be able to retrieve the provenance of
> activities too.

(The first part of tjhis isn't making any sense to me)

For such a purpose, activities would surely be entities?

> It would also be reasonable to access the content of an account.

How/why is this not possible?

> 5. Section 6. There doesn't seem to be a mechanism to control the amount of
> provenance returned. A viewer for prov may want to obtain provenance at a
> maximum distance of 3 hops. Sparql is not a satisfactory solution for clients
> which request XML or json.

That's an application choice.  Section 6 covers a way to handle large provenance 
responses.

> 6. Section 6. Paul's recent blog on simple provenance is showing the limitation
> of the http head approach. When a get/head of resource ex:post returns a
> provenance-uri it may not be his provenance since Paul may not have control over
> this service. Why refer to this method in seCtion 6?

If you ask a service provider for information, and trust that service provider, 
then presumably you trust the information returned.  There's nothing we can 
specify to enforce this.  Provenance information is no different in this respect 
from any other information that a user most decide to trust, or not.

This is discussed in the security considerations section.

> 7. note is required for section 5. This conditional on development of prov-o.
> But is it really the case we have such hasProvenance property?

Not at present, but I think we'll want one.  If prov-o doesn't choose to define 
something like this, we can define it in a separate namespace.  It has already 
been introduced (section 3.3) for associating provenance information with a 
resource in RDF data.

> 8. As a client, how do I decide between 5.2 and 5.3 for a resource and provider
> discovered dynamically?

These are intended to be examples of how a query service might be used, not 
recipes to be slavishly followed.  A competent developer should figure out what 
will work for their environment.

> 9. Shouldn't Section 3.3 also include a property hasProvenanceService.

Hmmm... possibly.  At this stage, I think I'd prefer to focus on what we have, 
but if anyone says that they have an application that needs this, I'd happily 
consider adding it.

> 10. ISSUE-75 doesn't seem to be tackled.

I don't think we need to say anything about this:  a developer can choose to use 
either without breaking anything.  It is explicitly noted (sections 3.1.1, 
3.2.1) that a resource may have both provenance and service URIs, but that this 
is not considered to be normal.  I think adding details about handling such a 
situation is not good use of either our effort or the reader's.

#g
--

Received on Wednesday, 16 November 2011 09:13:59 UTC