Re: ISSUE-7 (whyAccessUrl): Drop dcat:accessUrl, use the URI of the dcat:Download resource instead [DCAT]

On 12-02-10 09:56 AM, Richard Cyganiak wrote:
> On 10 Feb 2012, at 04:53, Sarven Capadisli wrote:
>>>> I agree with Ed Summers' proposal to drop dcat:accessURL and simply use dcat:distribution.
>>>
>>> The problem is that accessURL is optional – some datasets are not distributed online but through other means. In that case, what would be used as the node representing the dcat:Distribution? A blank node? A made-up URI? So there needs to be a special case in both production and consumption code to deal with this case, and that just increases the probability that implementers with less RDF experience get it wrong.
>>
>> I find Ed's example [3] fairly straight forward:
>>
>> ex:dataset1
>>     a dcat:Dataset ;
>>     dcat:distribution<http://example.gov/downloads/1>  .
>>
>> <http://example.gov/downloads/1>
>>     dcat:format "text/csv" .
>>
>> No bnode or made-up URI. As far as I understand, this approach can still be used to access any distribution method.
>
> I said the problem is datasets that are distributed *not* online but through some other means, that is, datasets where you know the available formats, but don't know the accessURL for the specific distribution for some reason. A concrete example that occurs on data.gov.uk would be a dataset where we know that it's available in CSV and XLS, but don't have the specific download link for each of the formats but we only have a single URL of a generic download HTML page, which in turn contains links to the two versions. How do you model that in this simplified scheme?

If I'm not misinterpreting what you are saying, I wouldn't call that 
"not online", since this is still published on the Web, and available 
digitally. I'd say "not online" for something that's physically tangible 
e.g., printed documents. I don't think that's what you've meant.

In any case, I think it is important to distinguish the following cases 
for the record:

a) Data accessible only by interacting with the web application e.g., it 
may require a login, sessions, or XHR.
b) Direct accessible links i.e., data can be retrieved independently 
with an HTTP request.

For (a), I think we are slightly out of luck. Since the 
dcat:Distribution class caters to any distribution type (including the 
HTML page for the downloads), the following would satisfy the very last 
stop to access the data:

ex:dataset1
     a dcat:Dataset ;
     dcat:distribution <http://example.gov/downloads/> .

For (b), I believe the example from earlier still holds because you 
state that it contains links to the two versions. I apologize if I'm 
still missing your point, but I don't understand how a direct link to 
all of the formats can be /eventually/ get to, yet they can't be described:

<http://data.gov.uk/dataset/performance-data-government-ict-projects-31-july-2010>
     a dcat:Dataset ;
     dcat:distribution 
<http://interim.cabinetoffice.gov.uk/media/428873/performance-data-government-ict-projects-31july2010.csv> 
;
     dcat:distribution 
<http://interim.cabinetoffice.gov.uk/media/428879/performance-data-government-ict-projectscontracts-31July2010.xls> 
.

<http://interim.cabinetoffice.gov.uk/media/428873/performance-data-government-ict-projects-31july2010.csv>
     dcterms:format "text/csv" .

<http://interim.cabinetoffice.gov.uk/media/428879/performance-data-government-ict-projectscontracts-31July2010.xls>
     dcterms:format "application/vnd.ms-excel" .

Why link to the generic HTML page for the distributions in the first place?

Perhaps an example from data.gov.uk that you are thinking of would help 
me understand your point better.

-Sarven

Received on Friday, 10 February 2012 14:08:24 UTC