Re: ISSUE-80: We need a definition of "dataset"

Note that the RDF Data Cube vocabulary has a different definition of
"dataset" than DCAT:

"Represents a collection of observations, possibly organized into various
slices, conforming to some common dimensional structure."

Assuming the DCAT definition is used, I think it useful to make clear that a
"common dimensional structure" is not implied.  FWIW, my prior experience
led me to assume the "common dimensional structure" meaning for DCAT until I
dug into the DCAT spec.


On the "too-broad" side, there probably are collections of data published or
curated by a single agent that are larger than is intended by this
definition.  In particular, I agree with Bernadette Lóscio in thinking that
the collection's content should be related - not "a random assortment of
data".  As an extreme example, imagine the entire content of datahub.io
described as a single dataset!


So... I'd suggest adding the word "related":

"A related collection of data, published or curated by a single agent, 
   ^^^^^^^
and available for access or download in one or more formats."

The addition of "related" deals with both concerns at once; it would be
strange and tautological to require all the data in a single cube to be
"related".


-Ed Staub

Received on Friday, 14 November 2014 08:40:06 UTC