Warning:
This wiki has been archived and is now read-only.

BP consolidation proposal

From Spatial Data on the Web Working Group
Jump to: navigation, search

Best Practices: alignment with DWBP and consolidation

Proposals prepared by Jeremy Tandy for discussion on BP sub-group call (13-July-2016); this does not reflect group position of BP editors.

Summary

(some) Details are provided is later sections

Best practices listed according to their number as stated in the summary

  1. align with DWBP 11; add extra strategies relevant for spatial data (#2, #3 and #4)
  2. remove: merge with #1
  3. remove: merge with #1
  4. remove: merge with #1
  5. align with DWBP 18; provide examples for predetermined subsets and use of an API
  6. remove: I think this is covered by DWBP 16 (choose the right formalisation level)
  7. extend DWBP 15 with guidance on which specific standard vocabularies need to be used for geometry & how they should be used; this should include _how_ to reference a CRS and how to version [the descriptions of] Spatial Things … also note relationship to DWBP 13, 14 and 19 for serving data in multiple formats … extend DWBP 13 to define specific formats? DWBP 14 and 19 are well covered in DWBP and don’t need to be added to. … Also note that DWBP 4 talks about structural metadata; the specified data formats should do this … #7 also discusses performance considerations; the approaches one might take to simplify highly complex geometries … extend DWBP 18 to cover these concerns
  8. ¿ extend DWBP 7; could we consider CRS as providing “data quality” information … here our concern is supporting ‘high-precision applications’; DWBP 7 says “Data quality might seriously affect the suitability of data for specific applications”? … I don’t think that SRS or CRS can be treated as locale information (as per DWBP 3)
  9. extend DWBP 15; relative positions [what are our options here?]
  10. extend DWBP 7; positional accuracy information
  11. extend DWBP 15; things that change over time … UPDATE: x-ref vs. LvdB’s suggestion
  12. align with DWBP 15; spatial & topological relationships
  13. remove: merge with #23
  14. remove: out of scope - but illustrate in an example
  15. remove: out of scope
  16. remove: out of scope - but illustrate in an example
  17. > needs further consideration; the issue here is not “crowd-sourced observations”, it is “crowd-sourced data that includes spatial information” … ref. discussion about how we support the “platform providers” who work with crowd-sourced data (of whom social networks are an example)
  18. remove: out of scope - but specifically we should look at moving Spatial Things (such as cars, boats, marathon runners) who’s position is being constantly updated in #11 … also see DWBP 20 (real-time access)
  19. remove: merge with #24 … #19 provides an ‘approach’
  20. remove: merge with #24
  21. remove: merge with #24 … ref. durable links, link between Spatial Things
  22. remove: merge with #24 … this describes an approach that describes what to link to
  23. remove: merge with #24 as it now overlaps
  24. amend to “Publish meaningful links to related resources“
  25. make your entity-level data indexable by search engines … ¿ could this be given as an extension of DWBP 15; providing details of which standard vocabularies can support search engine indexing? … also note the need to provide HTML for web-crawlers (potentially in addition to other data formats); ref. DWBP 14
  26. extend DWBP 1; spatial metadata … UPDATE: extend DWBP 2
  27. remove: I think this is covered by a combination DWBP 17 and DWBP 23
  28. align with DWBP 23
  29. extend DWBP 25 to include that the API should describe what data is _actually_ available
  30. extend DWBP 23 to include provision of search in the API


note: Regarding SDWBP 4 (provide stable identifiers for things that change over time) I think that we don’t need to link this into DWBP 8 (provide a version indicator) or DWBP 12 (assign URIs to dataset versions and series)- both of which assume that versioning occurs at the dataset level (e.g. the dataset is the unit of governance) … however, we often see the attributes “[dcterms:]modified” and “version” included in data models to help consumers identify which entities have been updated. I think we might need to _extend_ DWBP 11 to include the concept of versioning (or perhaps extend DWBP 12 to cover the [entities] within a dataset?) … and talk about the distinction between the Thing and the information resource that describes the Thing (as per URLs in Data Primer; Thing and Landing Page)


Referenced DWBP best practices:

  • DWBP 1: Provide metadata
  • DWBP 7: Provide data quality information
  • DWBP 11: Use persistent URIs as identifiers within datasets
  • DWBP 13: Use machine-readable standardised data formats
  • DWBP 14: Provide data in multiple formats
  • DWBP 15: Reuse vocabularies, preferably standardised ones
  • DWBP 16: Choose the right formalisation level
  • DWBP 17: Provide bulk download
  • DWBP 18: Provide subsets for large datasets
  • DWBP 19: Use content negotiation for serving data available in multiple formats
  • DWBP 20: Provide real-time access
  • DWBP 23: Make data available though an API
  • DWBP 25: Provide complete documentation for your API

New best practices:

  • Publish meaningful links to related resources (evolved from #24)
  • … and [possibly] working with multiple representations of a Spatial Thing


Grouping / consolidation of SDW Best Practices

1. Remove Best Practices specifically relating to Observations and Sensors: our scope is “spatial data on the web” … details of how to describe observations and sensors should be covered by the SSN work; the best practice need only consider the spatial (and temporal?) aspects of these data types

  • SDWBP 14: Provide context required to interpret observation data values
    • location is part of this context; we can use observation / sensor data and the associated [geo] spatial context as an example … for remote sensing, the spatial characteristics of the observed data are not directly related to the position of the sensor
  • SDWBP 15: Describe sensor data processing workflows
    • this is out of scope for a ‘spatial data on the web’ best practice
  • SDWBP 16: Relate observation data to the real world
    • linking the sensor & observation data to the Spatial Thing which is being monitored / observed; this is one of the key “relationships” that is used to find observation data (environmental monitoring data) associated with a specific Spatial Thing (such as a waterbody segment) … use as an example
  • SDWBP 17: How to work with crowd-sourced observations
    • the issue here is not “crowd-sourced observations” - it is “crowd-sourced data that includes spatial information” … it is not our place to say how “the crowd” should use, say, Twitter to convey this information … we are more concerned with helping the platform provider expose the aggregate set of data with appropriate spatial context … and the App developer to make it easy to comment about a Spatial Thing and/or include spatial information
    • [what are the key issues?]
  • SDWBP 18: How to publish (and consume) sensor data streams
    • what is special about a data stream? … it’s just data … the main concern is relating the rapidly changing data (this is provided in the data stream) with the infrequently changing contextual data that is required to interpret the data stream; e.g. associated metadata … sensor data provides a good example
    • specifically we should look at moving Spatial Things (such as cars, boats, marathon runners) who’s position is being constantly updated (ref. SDWBP 11: How to describe properties that change over time) … [distinguish between things where data is being actively re-sampled and the (ad-hoc) update of asserted properties?]
    • this also relates to the API through which the data is accessed (ref. DWBP 20: Provide real-time access) … SensorThings API OGC #15-078

2. Remove best practices that are only methods (“possible approaches”) that could be applied to other best practices

  • the following best practices are “possible approaches” to meet SDWBP 1: Use globally unique HTTP identifiers for entity-level resources
    • SDWBP 2: Reuse existing identifiers when available
    • SDWBP 3: Convert or map dataset-scoped identifiers to URIs
    • SDWBP 4: Provide stable identifiers for Spatial Things that change over time

3. “Linking spatial data” and “Enabling discovery” sections are very closely related;

  • you can’t “use links to find related data” (SDWBP 24) without “[making] links visible” (SDWBP 19) … suggest conflating these best practices such that the best practice is “Publish [meaningful] links to related resources” and the outcome is that consumers (including search engines) can use those links to find related resources that are relevant to their needs
  • this provides the opportunity to talk about the mechanisms by which links can be published (embedded in data; or as a complementary resource) and look at providing summary information (such as Linksets) that enable the relationships between datasets to be evaluated without needing to download everything
  • include the mechanisms outlined in SDWBP 19: making links visible
  • this is particularly relevant to spatial data as people _may_ consider that ‘spatial correlation’ is sufficient to determine related resources … (i) requires the ‘geometry’ for all potentially related objects to be downloaded for assessment, (ii) spatial analysis is costly in terms of computation and (iii) does not provide any information on the semantic relationship
  • the new broader best practice now overlaps with SDWBP 13: Assert known relationships … and SDWBP 23: Link to related resources … suggest combining
  • [meaningful] conflate with SDWBP 20: Provide meaningful links … this is concerned with providing users with enough information to determine whether it is worthwhile resolving the target resource
  • [durable] conflate with SDWBP 21: Link to Spatial Things … this is concerned with providing durable relationships; the aim to link between things that have stable identifiers (such as Spatial Things) rather than the information resources (such as geometries) which may change from time-to-time
  • [network effect] conflate with SDWBP 22: Link to resources with well-known or authoritative identifiers … choosing ‘authoritative’ & ‘well-known’ resources is a strategy for improving discoverability - developers can build applications that [harvest links to] allow users to traverse ‘back-links’ from easily discoverable ‘hubs’ in the data network (e.g. these authoritative, well-known resources) to related information- even where the original data publisher did not make that connection
  • [Spatial Identifier Reference Framework (SIRF): crosswalks between different identifiers used for the same Spatial Thing and to information resources that use them]
  • retain SDWBP 25: Make entity-level data indexable by search engines … not just the “Big 4” (Google, Bing, Yahoo, Yandex), but other services such as ‘sameAs.org’ (which harvests links) and HyperCat


Alignment with “generic” Data on the Web best practices

  • SDWBP 1: Use globally unique HTTP identifiers for entity-level resources
    • … extend DWBP 11: use persistent URIs as identifiers within datasets
    • talk about the strategies relevant for spatial data; …
  • SDWBP 5: Provide identifiers for parts of larger information resources
    • … this describes a “possible approach” that could be used to meet DWBP 18: Provide subsets for large datasets
    • ensure that our examples include:
      • providing access to a predefined subset; e.g. using HTTP GET to access a particular time-slice
      • offering an API to enable a user to request only what they need; e.g. “cookie cut” an area from a larger dataset
  • SDWBP 12: Use spatial semantics for Spatial Things
    • … extend DWBP 15: Reuse vocabularies, preferably standardised ones
    • … assess how to determine which is the appropriate spatial vocabulary

Assessment of SDW Requirements against SDW Best Practices

Number of requirements related to each SDWBP: 1 (i), 6 (iii), 7 (xi), 8 (i), 9 (i), 10 (ii), 11 (iii), 12 (ii), 13 (iii), 14 (i), 15 (i), 18 (ii), 19 (i), 20 (i), 24 (i), 25 (iii), 26(iii), 27(ii), 28 (i)

  • SDWBPs reference more than twice:
    • 6 Provide minimum set of information for your intended application
    • 7 How to describe geometry
    • 11 How to describe properties that change over time
    • 13 Assert known relationships
    • 25 Make entity level data indexable by search engines
    • 26 Include spatial information in dataset metadata
  • SDWBPs not referenced:
    • 2 Reuse existing identifiers when available
    • 3 Convert or map dataset scoped identifiers to URIs
    • 4 Provide stable identifiers for things that change over time
    • 5 Provide identifiers for parts of larger information resources
    • 16 Relate observation data to the real world
    • 17 How to work with crowd-sourced observations
    • 21 Link to Spatial Things
    • 22 Link to resources with well-known or authoritative identifiers
    • 23 Link to related resources
    • 29 APIs should be self describing
    • 30 Include search capability in your data access APIs

--

(those in italics are not specifically ‘Best Practice’ deliverables)

  • Bounding box and centroid
    • … describe how to assert bounding box and centroid of a Spatial Thing
    • SDWBP 7: How to describe geometry
    • DWBP 15: Re-use vocabularies, preferably standardised ones
      • Extend to describe specific vocabularies for describing Geometries - including bounding box & centroid.
      • ¿ but there are different approaches for different data formats … we’re not assuming RDF ?
  • Compatibility with existing practices
    • … compatible with existing methods of making spatial data available (OGC WxS)
    • SDWBP 28: Expose entity-level data through convenience APIs
  • Compressible
    • … should be compressible (for data transfer)
    • SDWBP 7: How to describe geometry
  • Crawlability
    • … crawlable; allowing data to be found and indexed by external agents
    • SDWBP 25: Make your entity-level data indexable by search engines
  • CRS definition
    • … recommended way of referencing a CRS with a HTTP URI, and to get useful information about that CRS when the URI is dereferenced
    • SDWBP 7: How to describe geometry
  • Default CRS
    • … default CRS that can be assumed where CRS is not specified
    • [assume that CRS is specified in the vocabulary / format definition?]
    • SDWBP 8: Specify CRS for high-precision applications
  • Discoverability
    • … easy for humans and machines to find spatial data on the Web; e.g. by means of discovery metadata
    • SDWBP 24: Use links to find related data
    • SDWBP 25: Make your entity-level links indexable by search engines
    • SDWBP 26: Include spatial information in dataset metadata
  • Encoding for vector geometry
    • … recommended way of encoding vector geometry
    • SDWBP 7: How to describe geometry
  • Independence on reference systems
    • … applicable to more than geospatial data; e.g. for microscopy imaging
    • SDWBP 7: How to describe geometry
  • Linkability
    • … link between spatial things and other resources
    • SDWBP 1: Use globally unique identifiers for entity level resources
    • SDWBP 19: Make your entity-level links visible on the web
    • SDWBP 20: provide meaningful links
  • Machine to machine
    • … support for machine to machine data exchange / usage
    • SDWBP 6: Provide a minimum set of information for your intended application
    • SDWBP 7: How to describe geometry
    • SDWBP 9: How to describe relative positions
    • SDWBP 10: How to describe positional (in)accuracy
    • SDWBP 11: How to describe properties that change over time
    • SDWBP 12: Use spatial semantics for Spatial Things
    • SDWBP 13: Assert known relationships
    • SDWBP 18: How to publish (and consume) sensor data streams
    • SDWBP 25: Make your entity level data indexable by search engines
  • Moving features
    • … it should be possible to refer to features that change their location
    • SDWBP 11: How to describe properties that change over time
    • SDWBP 12: Use spatial semantics for Spatial Things
  • Multilingual support
    • … all vocabularies that will be developed or revised should have annotation in multiple languages
    • [this doesn’t appear to be relevant to BP - except that when choosing your vocabulary / format, your community might need multi-lingual support … e.g. multi-lingual lables]
    • SDWBP 6: Provide a minimum set of information for your intended application (multi-lingual labels)
  • Provenance
    • … alignment (???) with models or vocabularies for describing provenance e.g. PROV-O and ISO 19115
    • SDWBP 15: How to describe sensor data processing workflows
    • SDWBP 26: Include spatial information in dataset metadata
  • Quality metadata
    • … possible to describe the properties of data quality; e.g. uncertainty
    • SDWBP 10: How to describe positional (in)accuracy
    • SDWBP 14: How to provide context required to interpret observation values
  • Reference data chunks
    • … possible to identify and reference chunks of data
    • SDWBP 27: Publish spatial data at the level of granularity you can support
  • Reference external vocabularies
    • … possible to refer to externally managed controlled vocabularies
    • DWBP 15: Re-use vocabularies
  • Spatial metadata
    • … recommended ways to describe spatial characteristics of [a dataset]: dimensionality, datatype, CRS, extent, resolution, etc.
    • SDWBP 7: How to describe geometry
    • SDWBP 26: Include spatial information in dataset metadata
  • Spatial relationships
    • … recommended way for expressing spatial relationships between Spatial Things
    • SDWBP 13: Assert known relationships
  • Spatial operators
    • … recommended way for the definition (???) and use of spatial operators
    • SDWBP 13: Assert known relationships
  • Spatial vagueness
    • … describe locations in a vague, imprecise manner
    • SDWBP 6: Provide a minimum set of information for your intended application
  • Streamable data
    • … [spatial] data should be streamable
    • SDWBP 11: How to describe properties that change over time
    • SDWBP 18: How to publish (and consume) sensor data streams
    • SDWBP 27: Publish spatial data at the level of granularity you can support
  • Support for 3D
    • … recommendations should be applicable to 3D spatial things
    • SDWBP 7: How to describe geometry
  • Time dependencies in CRS definitions
    • … CRS definitions to have time-dependent components such as point of origin
    • SDWBP 7: How to describe geometry
  • Support for tiling
    • … support tiling (for raster and vector data) […] to improve retrieval speed
    • SDWBP 7: How to describe geometry (performance considerations)
  • Validation
    • … possible to validate spatial data - to automatically detect conflicts with standards and definitions
    • [query if this is in scope]