Re: action-152: "subsetting" [SEC=UNCLASSIFIED]

Hi Simon,

Perhaps I've missed the context of this thread, however I don't agree that we can be definitive in the relationship between discrete and continuous coverages as per the statement below.

In many cases, we will have two distinctly different data sets.

Often there will be a sampling regime, e.g. a set of meteorology sensors that observed a particular phenomena at a point in time that can be thought of as a discrete coverage.

We would often then generate a continuous coverage using a range of spatial analysis algorithms to provide an estimate of what is happening in areas that are not covered by met sensors.

I typically think of the two types of coverages as being distinctly different entities in many of my use cases.

Bruce
________________________________
From: Simon.Cox@csiro.au<mailto:Simon.Cox@csiro.au> <Simon.Cox@csiro.au<mailto:Simon.Cox@csiro.au>>
Sent: Monday, 4 April 2016 8:06:43 PM
To: j.d.blower@reading.ac.uk<mailto:j.d.blower@reading.ac.uk>
Cc: l.vandenbrink@geonovum.nl<mailto:l.vandenbrink@geonovum.nl>; kerry.taylor@anu.edu.au<mailto:kerry.taylor@anu.edu.au>; m.riechert@reading.ac.uk<mailto:m.riechert@reading.ac.uk>; bill@swirrl.com<mailto:bill@swirrl.com>; public-sdw-wg@w3.org<mailto:public-sdw-wg@w3.org>
Subject: RE: action-152: "subsetting"

The continuous vs discrete coverage story could be seen as another version of the http range-14 issue. Any discrete coverage is likely to be strictly just a representation (serialization) of an underlying continuous coverage. Is it clear that a specific URI refers to a discrete coverage with a specific resolution? I suspect that the answer is ‘sometimes’.

From: Jon Blower [mailto:j.d.blower@reading.ac.uk]
Sent: Monday, 4 April 2016 5:29 PM
To: Cox, Simon (L&W, Clayton) <Simon.Cox@csiro.au<mailto:Simon.Cox@csiro.au>>
Cc: l.vandenbrink@geonovum.nl<mailto:l.vandenbrink@geonovum.nl>; kerry.taylor@anu.edu.au<mailto:kerry.taylor@anu.edu.au>; m.riechert@reading.ac.uk<mailto:m.riechert@reading.ac.uk>; bill@swirrl.com<mailto:bill@swirrl.com>; public-sdw-wg@w3.org<mailto:public-sdw-wg@w3.org>
Subject: Re: action-152: "subsetting"

Thanks Simon - if we think about the "ultimate parent" being a continuous coverage (which is rarely serialised on a server) then perhaps we can think of some derived products as subsets. But I don't think this works if we're talking about aggregation operations like averaging - I think I it's a stretch to regard "average temperature" as a mathematical subset of a temperature field.

Continuous coverages are mostly notional anyway - in almost all cases a server will store a discrete coverage, and it's the sampling of this discrete coverage that I was talking about. If there is interpolation involved in this sampling/extraction then the set of values in the derived product are not a mathematical subset of the set of values in the parent discrete coverage.

Cheers, Jon

On 4 Apr 2016, at 07:37, "Simon.Cox@csiro.au<mailto:Simon.Cox@csiro.au>" <Simon.Cox@csiro.au<mailto:Simon.Cox@csiro.au>> wrote:
I too like ‘extract’, though I don’t think I grok Kerry’s objection to “subset”.
How does the common usage conflict with the set-theoretic view?

I’ve just seen Jon Blower’s contribution, where he points out that some of the geospatial services do re-sampling including interpolation, which is therefore not subsetting, and though I understand the point I’m not sure that I even agree with this without some discussion. It comes down to the continuous vs. discrete coverage argument. If what we are interested in is the continuous phenomenon, the discrete representation could be seen as merely a sampling of it. Then a case could be made that any representation of a coverage inside a spatio-temporally limited region, perhaps only containing a subset (!) of the range components, is still a ‘subset’ of the continuous coverage …

Simon

From: Linda van den Brink [mailto:l.vandenbrink@geonovum.nl]
Sent: Friday, 1 April 2016 5:50 PM
To: Kerry Taylor <kerry.taylor@anu.edu.au<mailto:kerry.taylor@anu.edu.au>>; Maik Riechert <m.riechert@reading.ac.uk<mailto:m.riechert@reading.ac.uk>>; Bill Roberts <bill@swirrl.com<mailto:bill@swirrl.com>>
Cc: SDW WG Public List <public-sdw-wg@w3.org<mailto:public-sdw-wg@w3.org>>
Subject: RE: action-152: "subsetting"

I don’t care that much, but I like ‘extract’ better than subsetting.

Van: Kerry Taylor [mailto:kerry.taylor@anu.edu.au]
Verzonden: vrijdag 1 april 2016 04:59
Aan: Maik Riechert; Bill Roberts
CC: SDW WG Public List
Onderwerp: RE: action-152: "subsetting"

Thanks for your comments  Maik and Bill!

I could live with “filter” too –although it carries a notion of dynamic behaviour – but  also a big improvement over “subsetting”!

As neither of you seem particularly concerned, and there is no other comment so far, maybe  this is a non-issue.

Does anyone else care? If not, I can close the action and drop my objection to “subsetting”.

Kerry

From: Maik Riechert [mailto:m.riechert@reading.ac.uk]
Sent: Thursday, 31 March 2016 8:57 PM
To: Bill Roberts <bill@swirrl.com<mailto:bill@swirrl.com>>; Kerry Taylor <kerry.taylor@anu.edu.au<mailto:kerry.taylor@anu.edu.au>>
Cc: SDW WG Public List <public-sdw-wg@w3.org<mailto:public-sdw-wg@w3.org>>
Subject: Re: action-152: "subsetting"

I think "extract" is sometimes more natural when speaking about it (says the German guy...), for example:
You can extract a vertical slice from a 4D grid coverage.
vs
You can subset a 4D grid coverage to a vertical slice.

Personally, I always use subset, just because you got to use something and subset is not overloaded that much sometimes.

Having said that, there are also collections of coverages. And in that case I usually speak of a filtered collection when I select coverages according to some criteria. But this is really just because collection filtering is an established term elsewhere. And of course you could also filter a satellite image so that it only includes the parts within a bounding box (a min max filter on latitude/longitude for example).

So, extract, subset, filter, it's all the same to me really. It's just that in some sentences/contexts one or the other sounds better because it is either more common or more natural. I agree though that "subset" is not common in the webby world, and I would say that "extract" is more associated with file unzipping.

Maik
Am 29.03.2016 um 13:22 schrieb Bill Roberts:
Hi Kerry

I find the notion of subsets of datasets a reasonable one. I acknowledge that 'subsetting' is a relatively ugly neologism (though there are a lot worse made-up words at use in the world of technology!) But I'd be happy to use your suggested alternative of 'extract' and 'extracting'.

Cheers

Bill



On 29 March 2016 at 13:10, Kerry Taylor <kerry.taylor@anu.edu.au<mailto:kerry.taylor@anu.edu.au>> wrote:
I have an objection to the use of the word ”subsetting”, prominent in the spatial community and leaking also into other “big data” technology discussions. It seems to have some heritage in the statistical community, too.
I partly dislike it because it is not a word, but also because the notion of a ‘subset’ feels wrong, as it treats a ‘dataset’ as an unstructured ‘set’ of things, whereas this is very rarely the case when “subsetting” is required.
The formal (and widely understood) mathematical notion of sets seems inappropriate.
Normally, the known structure is a very important part of the “subsetting” operation.

I do not think that “subsetting”  carries the intended meaning to the audience for whom our writing is directed – at least not to the “webby but not spatial expert” audience. I note ( probably due to our influence) DWBP is now also speaking of ‘subsetting’.

I have some suggested alternatives I raise for consideration by the SDW, ordered best-first in my opinion.

Noun     Verb
extract  extracting
snippet  snipping
selection selecting
snip  snipping


--Kerry

Received on Monday, 4 April 2016 22:18:43 UTC