Warning:
This wiki has been archived and is now read-only.

BP Consolidated Narratives

From Spatial Data on the Web Working Group
Jump to: navigation, search

Consolidated narratives / common themes

A first pass from Jeremy arising from assessment of the use cases.

I think that these are the base set of common themes. I've captured a number of sub-themes that I think we will need to address in our best practices too. WDYT?

NEXT STEPS:

  1. cross-reference these common themes with those in the Data on the Web Best Practice (editors draft) to determine overlap (if any) - we want to avoid duplication
  2. make sure we are consistent with the Architecture of the web (Volume One)
  3. map our requirements to these common themes to see if (a) we are missing any requirements, or (b) we have requirements that aren't covered in the common themes.

linking data

  • … resources and datasets
  • … the essential “webbyness” (connectedness) of "data on the web” e.g.
    • [observation -] paper publication - digital image - digitised dataset;
    • microscopy imagery - derived feature(s) - [ lab results | patient groups | morphological features ];
    • specimens - derived samples;
    • statistical data (& other ‘aspatial’ data) - geographic features (e.g. administrative regions, roads etc.)
  • … allow linking by consumer / downstream service - not only data publisher
  • … x-ref with other spatial features [“spatiotemporal correlation”, correlating (named) events, epochs and calendars in time”];
  • ... spatial and temporal reasoning using Regional Connection Calculus [RCC8], Allen Calculus and ‘fuzzy' (inexact) relationships [real-time query or pre-determined at ‘load’ time? these may vary with time]
  • … containment hierarchies of one ‘place’ within another (and other topological/merological relationships); specific need for “Mutually Exclusive Collectively Exhaustive” (MECE) set
  • … relative (spatial) relationships based on context e.g. my location [expressing location and places in human terms]

need to cross reference this topic with issues arising the the hypermedia discussion on the WG-Comments email list

1. Crossref with data on the Web BP (Kerry)
  • 9.9 Data Vocabularies refers to vocabularies to characterise possible relationships
  • 9.11 Data Access says that data should be available for bulk download via bulk file formats or via an API, possibly implemented via one URL or multiple data URLs available through discoverable metadata
  • 9.6 Data Versioning suggests that the Memento protocol provides a possible implementation to relate a versioned resource to other versions.
  • That's about it -- surprisingly little!
2. Crossref with Web Architecture (Kerry)
  • Interaction A retrieval action for a resource ... the server sends back a message containing what it determines to be a representation of the resource..
  • Benefits of URIs A resource should have an associated URI if another party might reasonably want to create a hypertext link to it
  • Details of retrieving a representation describes how to traverse a link, using Xlink 1.0 as embedded in SVG for an example.
  • Linking and Access control in general the link target URI should not be hidden or controlled, only the resource it identifies.
  • Hypertext When a representation of one resource contains a reference to another resource, expressed with a URI identifying that other resource, this constitutes a link between the two resources.... Note: In this document, the term "link" generally means "relationship", not "physical connection"....A specification SHOULD provide ways to identify links to other resources, including to secondary resources (via fragment identifiers)...A specification SHOULD allow Web-wide linking, not just internal document linking...Users of hypertext links expect to be able to navigate among representations by following links..A data format SHOULD incorporate hypertext links if hypertext is the expected user interface paradigm..Data formats that do not allow content authors to create hypertext links lead to the creation of "terminal nodes" on the Web.
  • Summary: links are really important, but not a lot in the Web Arch to help us with our narrative.
3. Mapping Requirements to these themes (Kerry)

There are good few metadata requirements I have left out here -- in general metadata should be linkable both to and from the data it describes!

CoverageTemporalExtent

CRSDefinition

ExSituSampling

Linkability Key requirement for this theme

MachineToMachine

MobileSensors and also MovingFeatures are not BP requirements, but may be good example of linking needs

ObservationAggregations could relate to the to the containment hierarchies

Provenance is a requirement for linking to provenance, Referenceexternalvocabularies is a requirement to link to vocabularies (although perhaps "use" rather than link in this case), and ReferenceDataChunks is a requirement to be able to link to coarser-grained "subset" data. These are all linking requirements.

SamplingTopology is about representation, but could relate to spatio-temporal reasoning [aside -- not sure why spatio-temporal reasoning is in this linking theme] SpatialRelationships and also SpatialOperators is also about spatio-temporal relationships and reasoning

TemporalVagueness and UncertaintyInObservations and SpatialVagueness talk about fuzzy relationships

Are we missing any requirements? The catch-all linkability requirement could be taken to cover most of this theme. Relative/locally contextual spatial relationships seems to have been missed in requirements (surprising to me -- I thought it was there, somewhere).

publishing data with clear semantics

  • … enabling reconciliation with other vocabularies
  • … and which vocabulary should I use to describe my data anyway (where do I start?)
  • … how should I publish my vocabulary (& it’s relationships to some other vocabulary)
  • … different views on the same resource (e.g. from SKOS to HY_Features); how to relate these views (so that they can be reconciled)
  • … “because people involved in integration of data from multiple domains should not be burdened with having to grasp the full complexity of each domain”
  • … simple encodings (e.g. JSON) that can be mapped to the complex domain model as necessary; how are (the semantics of) these simple encodings described?
  • … being able to map multiple data formats to a single semantic model to build a homogeneous dataset from disparate sources

this topic appears to overlap (at least partially) with the Data on the Web Best Practice doc

Cross-ref with data on the web BP (Kerry)

Best Practice 14 to 19 covers this area. BP 14 is categorised as "metadata" and the rest as "Data vocabularies". Note outstanding Issue-8 questions whether vocabularies are in scope for DWBP.

Note also the Best Practices for Publishing Linked Data Jan 2014. It includes some general advice about URI design, how to publish a vocabulary, and suggests encoding the data in RDFa,JSON-LD,Turtle and N-Triples or RDF/XML, preserving the semantics of the vocabulary.

Cross-ref with Web architecture (Kerry)

Cannot find anything at all of relevance to this issue

exposing datasets through APIs

  • example APIs: OGC WxS [WPS; ‘micro-languages'], RESTful APIs, javascript APIs etc.
    • … example API: subsetting data in `x`, `y`, `z` dimensions e.g. for 3d geological data … equally applicable to `t` [time] dimension
    • … example API: (near) real-time data streaming- e.g. of observation data to enable generation of alerts; integration with IoT protocols (CoAP, MQTT … Google’s Weave?)
    • … example API: query in space and/or time for relevant (information?) resources
  • … APIs should have targeted usage (designed to achieve a specific goal)
  • … self describing (both content and API- see HAL for example of latter)
  • … mapping API parameters to other datasets e.g. gazetteer as nominal dimension on air quality dataset
  • … enabling acquisition of the subset of properties (that describe the target resource) needed for a particular outcome
  • … how do I package APIs for simple re-use e.g. coordinate transformation API
  • … extracting a spatio-temporal subset from a very large EO dataset (such as side-scan sonar) [but how to encode the response payload?]
  • … how to relate a service end-point to the dataset it exposes (or vice-versa)

Cross-referenced by Ed

Overlap between the enabling discovery theme with the Data on the Web Best Practice (editors draft):

Consistent with the Architecture of the web (Volume One)?

Somewhat, however I would suggest that some of the classic OGC interfaces do not always follow follow architectural guidelines for example with respect to Representation Management.

enabling discovery

  • … discovery of datasets _and_ the features/attributes they contain
  • … summary records (metadata)[is it necessary to create separate ‘metadata’ records for harvest by discovery portals?]
  • … where do I (a consumer) discover what is available for use based on (current) context; querying in space and time? (where/what is the ‘search engine’? [common practice is to start with Google/Bing/Yahoo/Yandex])
  • … how do I discover the information resources that are available for a given real-world thing?
  • … what _else_ is of interest given a focus on a particular resource [in-bound links][how to manage the potential volume of in-bound links shared; there could be thousands of links to reference resources]
  • … how do I publish (spatio-temporal) information so that it can be readily indexed by “the big 4” search engines and/or domain-specific catalogues
  • … what is the set of information I need to provide interoperability across data portals/search engines?

THIS MIGHT BE TWO NARRATIVES; publishing for discovery and discovery itself

Crossreferencing (Linda)

Overlap between the enabling discovery theme with the Data on the Web Best Practice (editors draft):

  • http://w3c.github.io/dwbp/bp.html#metadata. BP 1 states: Metadata MUST be provided. It mentions that computer applications, notably user agents should be able to process the metadata, but it does not mention portals. The section gives a list of Discovery metadata, incl. spatial and temporal coverage. This section does not seem to address the discovery of features/attributes that datasets contain. BP4, Provide structural metadata, might address this but I don’t think that’s what we mean by discovery of features/attributes.
  • http://w3c.github.io/dwbp/bp.html#DataIdentification mentions that data discovery depends fundamentally on the use of HTTP URIs.


Consistent with the Architecture of the web (Volume One)?

  • Yes. The Architecture of the web (Volume One) does not contain information/rules on discovery and/or dataset metadata or things like that, so there is no real possibility for conflict.

Mapping requirements to this theme (Linda)

http://www.w3.org/TR/sdw-ucr/#Crawlability - how to publish spatial data so that it can be indexed.

http://www.w3.org/TR/sdw-ucr/#CRSDefinition - you might want to find only spatial data with a specific CRS.

http://www.w3.org/TR/sdw-ucr/#Discoverability - the main requirement for this theme! A lot of use cases are related to this.

http://www.w3.org/TR/sdw-ucr/#Linkability - traversable links help improve discoverability (both by human browsing and better crawlability)

http://www.w3.org/TR/sdw-ucr/#Machinetomachine - perhaps relevant to crawlability aspect in this theme - not sure

http://www.w3.org/TR/sdw-ucr/#Provenance - I don't see this as very directly improving discoverability, but I associate provenance with metadata and metadata with discoverability. Also, not sure in which other theme this would fit.

http://www.w3.org/TR/sdw-ucr/#Qualitymetadata - Mapped to this theme for same reason as previous one.

http://www.w3.org/TR/sdw-ucr/#Sensormetadata - to enable discovering data for specific sensor characteristics

http://www.w3.org/TR/sdw-ucr/#Spatialmetadata - metadata helps discoverability

http://www.w3.org/TR/sdw-ucr/#Spatialoperators - operators help you discover related spatial data

http://www.w3.org/TR/sdw-ucr/#Spatialvagueness - helps you find spatial data even with vague location indicators

http://www.w3.org/TR/sdw-ucr/#Temporalvagueness - same with vague time indicators

Note that in the requirements there is nothing about portals or about discovering inbound links. These parts of the theme seem not to be covered by requirements.

assignment of identifiers to ‘real-world things’ & information resources

(in-line cross-referencing by Armin, and possibly also Phil)

Mapping requirements to this theme (Armin)

expressing (geo)spatial information & temporal information

  • … _events_ cited specifically as these have both spatial and temporal attributes (observations are events too!)
  • … relative positioning [linear referencing]
  • … positional (in)accuracy [uncertainty, precision, probability, confidence][e.g. second quarter of the 9th century is _approx._ 825-850 but could be 823-852](note: the ends of a ’sampling traverse’ could be known in a national/global CRS to within ±5m, whilst the position of the samples themselves may be accurate to ±0.01 along the traverse)
  • … imprecise spatial and temporal referencing (near, south of, after, around)
  • … which vocabulary/data model to use for spatial / temporal attributes
  • … how to express CRS (& temporal reference systems [calendars are different!]); should there be a default CRS?; implications of “wrong” CRS on positional accuracy
  • … graphs of literals vs structured objects expressed as single literal (WKT, GML) etc.
  • … what do downstream tooling support; what query ability is desired?
  • … for use in spatially aware applications (such as maps)
  • … describing how things (such as geometry) vary with time [is time treated as first class coordinate or an attribute of the resource? varies depending on case]
  • … geotagging e.g. using postcode
  • … 1d, 2d and 3d geometries e.g. TIN coverage in varying resolution & heterogeneous resolution voxel sets (e.g. to represent complex geology)
  • … times (instants, periods) in addition to relative time-frames (durations)
  • … "geometry can be 95% of the total data size” [Dutch Base Registry]: is there a need for performance optimisation (e.g. compression); can this be standardised? - think lidar point cloud for _big_ data
  • … how do I combine data expressed in different Coordinate Reference Systems?
  • … provision of multiple geometries for a single feature; e.g. a single reference point (for “pin on map”), a bounding box (for search), a simple geometry (for coarse spatial analysis), a detailed geometry (for resolving cadastral boundary disputes) etc.


not everything is geo; see UC 4.40 for cartesian 2d and 3d data

Questions to consider

  • … is it feasible to select a single geospatial vocabulary? can we define (axiomatic) mappings between the most common?
  • … is it feasible to select a common (default) CRS? can we define standard CRS transformation functions?
  • ... should we provide guidance on how a particular community can agree common geographic representations and spatial reference systems?
  • ... should we provide / can we refer to validators for common formats; WKT, GeoJSON etc.

Crossreferencing (Kerry)

  • Best Practice 3: Provide locale parameters metadata Information about locale parameters (date, time, and number formats, language) should be described by metadata...... The machine readable version of the discovery metadata may be provided according to the vocabulary recommended by W3C to describe datasets, i.e. the Data Catalog Vocabulary [VOCAB-DCAT]....Check that the metadata for the dataset itself includes the language in which it is published and that all numeric, date, and time fields have locale metadata provided either with each field or as a general rule. So we could be specialising this BP to recommend owl-time for linked data or asking DWBP to do so. We could consider asking for GEO-DCAT to get a mention.
  • Best Practice 22: Provide bulk download -- Data should be available for bulk download. Implementation ... could include Hosting a database, web page, or SPARQL endpoint that contains discoverable metadata [VOCAB-DCAT] describing the container and data URLs associated with the container.
  • That's all I can find from Data on the web. For Web Architecture, one principle should be kept in mind: Orthogonality: Orthogonal abstractions benefit from orthogonal specifications. A specification should clearly indicate which features overlap with those governed by another specification.

sensor data

  • … expressing the ‘observation context’ (location, time, observed property, quality etc.)
  • … & representing (data) processing chains (provenance) e.g. georeferencing, pixel classification
  • … need to include concepts of “sampling feature” (that is representative of the _subject of interest_; statistical, spatial or proxy) and the more specific concept of “specimen” (which is tested ex-situ)
  • … “sampling features” and “specimens” may be grouped; relationships between samples provide a ’topology’ of sampling
    • … batches of specimens may include control samples
    • … these concepts are already well described in O&M2
  • … crowd-sourcing e.g. of observations - e.g.
    • #uksnow for crowd-sourced snowfall observations;
    • earthquake damage reports using address locations;
    • (geocoded) bushfire reports;
    • note: some of these social media channels don’t allow use of structured data; how do we parse textual reports to extract, say, location etc.
  • ... humans can be sensors too
  • ... correlation/geolocation with ‘authoritative’ sensor data streams
  • … virtual observations from numerical simulations
  • … publishing / subscribing to observation data stream
crossref to Data on the web BP (Kerry)

In general terms, pretty much all of the Data on the Web best practices apply, although they are not so obvious from the analysis here, more obviously connected to ssn as it is today. Wrt ssn as it is currently defined, these things could be considered missing

crossref to Web Arch (Kerry)
  • 4.2.3. Extensibility says that providing for open extensibility is good but extensions must not interfere with the original spec and the spec should specify agent behaviour in the face of unrecognised extensions.
Mapping Requirements to this theme (Kerry)

Compatibility

Compressible because sensor data is often big, although this does not really occur in the theme description

CoverageTemporalExtent again does not appear in the description, but surely applies -- may just be covered in a different BP narrative

I've left out a few more generic ones that could go here too crawlability, CRS definition, date,time,duration, default crs, - but are not specific to sensor data so I am assuming covered elsewhere.

Discoverability again not specifically sensor, but there are many sensor use cases against this requirement

DynamicSensorData This is a big one in the theme description (but BP is not in the deliverable part of the requirement -- prob should be)

Ex-situ sampling surely a SSN issue, not a BP one?

GeoreferencedSpatialData another obvious one, but not really mentioned in the narrative

Humans as sensors big feature in narrative -- although surely a SSN issue, not a BP one?

Independence on reference systems another obvious one, but not really mentioned in the narrative

Lightweight API this is missing from narrative--but should be a BP issue I think, it is in scope at all.

Linkability this is a different theme-but is evident in the sensor data theme description too.

Machine to machine

Mobile sensors Moving features these 2 feature big in the narrative --they are ssn issues but do they need a a BP mention?-- I do not think so.

http://w3c.github.io/sdw/UseCases/SDWUseCasesAndRequirements.html#Provenance Provenance]

externalvocabularies Reference external vocabularies only has SSN against it -- but this is a BP issue, surely? A good few of the use cases are sensor-related.

Sampling topology this is big in the narrative, but this is surely a SSN issue alone? (as in the UCR now)

Space-time multi-scale Now this has SSN and Time in the UCR, but surely this is at least partly a consumption issue -- and should be in BP?

Spatial vagueness and also Temporal vagueness I think this is implied by some of the narrative issues, although neither is targeted at BP

Streamable data

TimeSeries this raises a common issue I see -- it has Time, SSN and Coverage against it , and I agree that those deliverables will have to be aligned to this requirement - but would it not really be delivered through Best Practice instead? Ie those other deliverables will just throw it back this way? Or am I just thinking this way because the solution I have in mind would be to use the rdf datacube -- and somewhere we would need to say how to do this.

Uncertainty in observations The same could be said for this, currently against SSN

Virtual Observations Surely an SSN issue instead of BP, but narrative raised it.

Are we missing any requirements? maybe -- I did not spot "correlation/geolocation with ‘authoritative’ sensor data streams" or "pub/sub" (although streaming is there -- isn't pub/sub assuming a partial solution?)



---

Other stuff

Not quite figured out where this fits in ... feels like it's in scope but haven't found a home yet.

  • working with large data that doesn't play nicely with the typical RDF-centric Linked Data approach
    • … referencing “chunks” (subsets: slices, ranges etc.) of very large datasets e.g. assembly of large remote sensing dataset from individual “tiles” / “scenes” or extracting a particular band in a multi-spectral image
    • … raster data; pixels (2d) and voxels (3d)- describing ‘resolution’ etc.
    • … lidar data & acoustic survey- point clouds … not structured like raster data! … slices could be _derived_ from the raw data?
  • data encoding choices
    • ... what are our recommendations?
    • … how does a consuming application know which _vocabulary_ is used? (as distinct from format which is described using media types)
    • … (how) can a consuming application dictate which vocabulary is provided?
  • visual styling of features (for display on maps)
  • “sending the code to the data” for analysis of very big data [SCOPE?]
  • dealing with large datasets; “paging” (query) result sets to make them manageable to consumer applications

Cross-ref with Data on the Web (Bill)

There are a number of relevant best practices from the Data on the Web working group http://w3c.github.io/dwbp/bp.html

  • BP4 provide structural metadata - can help a consuming tool know which vocabularies are used. Goes together with: BP15 document vocabularies - BP16 share vocabularies
  • BP12 must be machine readable standardized format - this best practice does not require specific formats, so it's possible to select a format appropriate for the type of data (eg large remote sensing datasets), but recommends to use common existing formats where possible and mentions NetCDF (used in conjunction with CF Conventions) in particular - an efficient format for array-oriented scientific data, which might be useful for some of our coverage data examples?
  • BP13 should be in multiple formats where possible - if we are able to offer data in more than one machine readable format, that's an advantage
  • BP22 - data should be available for bulk download
  • BP23 - follow REST principles

Cross-ref with Web architecture (Bill)

http://www.w3.org/TR/webarch/

Not so much of relevance here other than that the web architecture does not constrain formats of resources on the web and there is the media-type system for identifying formats

http://www.w3.org/TR/webarch/#formats http://www.w3.org/TR/webarch/#internet-media-type


Other notes