Warning:
This wiki has been archived and is now read-only.
BP Consolidated Narratives
Contents
- 1 Consolidated narratives / common themes
Consolidated narratives / common themes
A first pass from Jeremy arising from assessment of the use cases.
I think that these are the base set of common themes. I've captured a number of sub-themes that I think we will need to address in our best practices too. WDYT?
NEXT STEPS:
- cross-reference these common themes with those in the Data on the Web Best Practice (editors draft) to determine overlap (if any) - we want to avoid duplication
- make sure we are consistent with the Architecture of the web (Volume One)
- map our requirements to these common themes to see if (a) we are missing any requirements, or (b) we have requirements that aren't covered in the common themes.
linking data
- … resources and datasets
- … the essential “webbyness” (connectedness) of "data on the web” e.g.
- [observation -] paper publication - digital image - digitised dataset;
- microscopy imagery - derived feature(s) - [ lab results | patient groups | morphological features ];
- specimens - derived samples;
- statistical data (& other ‘aspatial’ data) - geographic features (e.g. administrative regions, roads etc.)
- … allow linking by consumer / downstream service - not only data publisher
- … x-ref with other spatial features [“spatiotemporal correlation”, correlating (named) events, epochs and calendars in time”];
- ... spatial and temporal reasoning using Regional Connection Calculus [RCC8], Allen Calculus and ‘fuzzy' (inexact) relationships [real-time query or pre-determined at ‘load’ time? these may vary with time]
- … containment hierarchies of one ‘place’ within another (and other topological/merological relationships); specific need for “Mutually Exclusive Collectively Exhaustive” (MECE) set
- … relative (spatial) relationships based on context e.g. my location [expressing location and places in human terms]
need to cross reference this topic with issues arising the the hypermedia discussion on the WG-Comments email list
1. Crossref with data on the Web BP (Kerry)
- 9.9 Data Vocabularies refers to vocabularies to characterise possible relationships
- 9.11 Data Access says that data should be available for bulk download via bulk file formats or via an API, possibly implemented via one URL or multiple data URLs available through discoverable metadata
- 9.6 Data Versioning suggests that the Memento protocol provides a possible implementation to relate a versioned resource to other versions.
- That's about it -- surprisingly little!
2. Crossref with Web Architecture (Kerry)
- Interaction A retrieval action for a resource ... the server sends back a message containing what it determines to be a representation of the resource..
- Benefits of URIs A resource should have an associated URI if another party might reasonably want to create a hypertext link to it
- Details of retrieving a representation describes how to traverse a link, using Xlink 1.0 as embedded in SVG for an example.
- Linking and Access control in general the link target URI should not be hidden or controlled, only the resource it identifies.
- Hypertext When a representation of one resource contains a reference to another resource, expressed with a URI identifying that other resource, this constitutes a link between the two resources.... Note: In this document, the term "link" generally means "relationship", not "physical connection"....A specification SHOULD provide ways to identify links to other resources, including to secondary resources (via fragment identifiers)...A specification SHOULD allow Web-wide linking, not just internal document linking...Users of hypertext links expect to be able to navigate among representations by following links..A data format SHOULD incorporate hypertext links if hypertext is the expected user interface paradigm..Data formats that do not allow content authors to create hypertext links lead to the creation of "terminal nodes" on the Web.
- Summary: links are really important, but not a lot in the Web Arch to help us with our narrative.
3. Mapping Requirements to these themes (Kerry)
There are good few metadata requirements I have left out here -- in general metadata should be linkable both to and from the data it describes!
Linkability Key requirement for this theme
MobileSensors and also MovingFeatures are not BP requirements, but may be good example of linking needs
ObservationAggregations could relate to the to the containment hierarchies
Provenance is a requirement for linking to provenance, Referenceexternalvocabularies is a requirement to link to vocabularies (although perhaps "use" rather than link in this case), and ReferenceDataChunks is a requirement to be able to link to coarser-grained "subset" data. These are all linking requirements.
SamplingTopology is about representation, but could relate to spatio-temporal reasoning [aside -- not sure why spatio-temporal reasoning is in this linking theme] SpatialRelationships and also SpatialOperators is also about spatio-temporal relationships and reasoning
TemporalVagueness and UncertaintyInObservations and SpatialVagueness talk about fuzzy relationships
Are we missing any requirements? The catch-all linkability requirement could be taken to cover most of this theme. Relative/locally contextual spatial relationships seems to have been missed in requirements (surprising to me -- I thought it was there, somewhere).
publishing data with clear semantics
- … enabling reconciliation with other vocabularies
- … and which vocabulary should I use to describe my data anyway (where do I start?)
- … how should I publish my vocabulary (& it’s relationships to some other vocabulary)
- … different views on the same resource (e.g. from SKOS to HY_Features); how to relate these views (so that they can be reconciled)
- … “because people involved in integration of data from multiple domains should not be burdened with having to grasp the full complexity of each domain”
- … simple encodings (e.g. JSON) that can be mapped to the complex domain model as necessary; how are (the semantics of) these simple encodings described?
- … being able to map multiple data formats to a single semantic model to build a homogeneous dataset from disparate sources
this topic appears to overlap (at least partially) with the Data on the Web Best Practice doc
Cross-ref with data on the web BP (Kerry)
Best Practice 14 to 19 covers this area. BP 14 is categorised as "metadata" and the rest as "Data vocabularies". Note outstanding Issue-8 questions whether vocabularies are in scope for DWBP.
- Standarized terms should be used to provide metadata ...The Open Geospatial Consortium (OGC) could define the notion of granularity for geospatial datasets, while [DCAT] vocabulary provides a vocabulary reusing the same notion applied to catalogs on the Web.
- Vocabularies should be clearly documented. ....A vocabulary may be published together with human-readable Web pages, as detailed in the recipes for serving vocabularies with HTML documents in the Best Practice Recipes for Publishing RDF Vocabularies [SWBP-VOCAB-PUB]. Elements from the vocabulary are defined with attributes containing human-understandable labels ...
- Vocabularies should be shared in an open way ....Provide the vocabulary under an open license such as Creative Commons Attribution License CC-BY [CC-ABOUT]. Create entries for the vocabulary in repositories such as LOV, Prefix.cc, Bioportal and the European Commission's Joinup....
- Vocabularies should include versioning information
- Existing reference vocabularies should be re-used where possible ...The Standard Vocabularies section of the W3C Best Practices for Publishing Linked Data provides guidance on the discovery, evaluation and selection of existing vocabularies..
- When creating or re-using a vocabulary for an application, a data publisher should opt for a level of formal semantics that fit data and applications. ...The data supports all application cases but should not be more complex to produce and re-use than necessary;..Data producers should therefore seek to identify the right level of formalization for particular domains, audiences and tasks, and maybe offer different formalization levels when one size does not fit all....
Note also the Best Practices for Publishing Linked Data Jan 2014. It includes some general advice about URI design, how to publish a vocabulary, and suggests encoding the data in RDFa,JSON-LD,Turtle and N-Triples or RDF/XML, preserving the semantics of the vocabulary.
Cross-ref with Web architecture (Kerry)
Cannot find anything at all of relevance to this issue
exposing datasets through APIs
- example APIs: OGC WxS [WPS; ‘micro-languages'], RESTful APIs, javascript APIs etc.
- … example API: subsetting data in `x`, `y`, `z` dimensions e.g. for 3d geological data … equally applicable to `t` [time] dimension
- … example API: (near) real-time data streaming- e.g. of observation data to enable generation of alerts; integration with IoT protocols (CoAP, MQTT … Google’s Weave?)
- … example API: query in space and/or time for relevant (information?) resources
- … APIs should have targeted usage (designed to achieve a specific goal)
- … self describing (both content and API- see HAL for example of latter)
- … mapping API parameters to other datasets e.g. gazetteer as nominal dimension on air quality dataset
- … enabling acquisition of the subset of properties (that describe the target resource) needed for a particular outcome
- … how do I package APIs for simple re-use e.g. coordinate transformation API
- … extracting a spatio-temporal subset from a very large EO dataset (such as side-scan sonar) [but how to encode the response payload?]
- … how to relate a service end-point to the dataset it exposes (or vice-versa)
Cross-referenced by Ed
Overlap between the enabling discovery theme with the Data on the Web Best Practice (editors draft):
- BP8, BP9 Require API to support versions of data - Different to BP29 ?
- BP23 Follow REST principles when designing APIs suggests a RESTFUL approach to developing API's an issue of still some debate with the OGC community
- BP26 Maintain separate versions for a data API If an API is used as a interface to data, the API itself should be clearly versioned with implicit backwards compatibility
- BP22 Best Practice 22: Provide bulk download Suggest's the use of API's to make bulk downloads of data more practical.
- BP29 Update the status of identifiers Use HTTP status codes to point to previous versions the resource, a simple spatio-temporal mechanism ?
Consistent with the Architecture of the web (Volume One)?
Somewhat, however I would suggest that some of the classic OGC interfaces do not always follow follow architectural guidelines for example with respect to Representation Management.
enabling discovery
- … discovery of datasets _and_ the features/attributes they contain
- … summary records (metadata)[is it necessary to create separate ‘metadata’ records for harvest by discovery portals?]
- … where do I (a consumer) discover what is available for use based on (current) context; querying in space and time? (where/what is the ‘search engine’? [common practice is to start with Google/Bing/Yahoo/Yandex])
- … how do I discover the information resources that are available for a given real-world thing?
- … what _else_ is of interest given a focus on a particular resource [in-bound links][how to manage the potential volume of in-bound links shared; there could be thousands of links to reference resources]
- … how do I publish (spatio-temporal) information so that it can be readily indexed by “the big 4” search engines and/or domain-specific catalogues
- … what is the set of information I need to provide interoperability across data portals/search engines?
THIS MIGHT BE TWO NARRATIVES; publishing for discovery and discovery itself
Crossreferencing (Linda)
Overlap between the enabling discovery theme with the Data on the Web Best Practice (editors draft):
- http://w3c.github.io/dwbp/bp.html#metadata. BP 1 states: Metadata MUST be provided. It mentions that computer applications, notably user agents should be able to process the metadata, but it does not mention portals. The section gives a list of Discovery metadata, incl. spatial and temporal coverage. This section does not seem to address the discovery of features/attributes that datasets contain. BP4, Provide structural metadata, might address this but I don’t think that’s what we mean by discovery of features/attributes.
- http://w3c.github.io/dwbp/bp.html#DataIdentification mentions that data discovery depends fundamentally on the use of HTTP URIs.
Consistent with the Architecture of the web (Volume One)?
- Yes. The Architecture of the web (Volume One) does not contain information/rules on discovery and/or dataset metadata or things like that, so there is no real possibility for conflict.
Mapping requirements to this theme (Linda)
http://www.w3.org/TR/sdw-ucr/#Crawlability - how to publish spatial data so that it can be indexed.
http://www.w3.org/TR/sdw-ucr/#CRSDefinition - you might want to find only spatial data with a specific CRS.
http://www.w3.org/TR/sdw-ucr/#Discoverability - the main requirement for this theme! A lot of use cases are related to this.
http://www.w3.org/TR/sdw-ucr/#Linkability - traversable links help improve discoverability (both by human browsing and better crawlability)
http://www.w3.org/TR/sdw-ucr/#Machinetomachine - perhaps relevant to crawlability aspect in this theme - not sure
http://www.w3.org/TR/sdw-ucr/#Provenance - I don't see this as very directly improving discoverability, but I associate provenance with metadata and metadata with discoverability. Also, not sure in which other theme this would fit.
http://www.w3.org/TR/sdw-ucr/#Qualitymetadata - Mapped to this theme for same reason as previous one.
http://www.w3.org/TR/sdw-ucr/#Sensormetadata - to enable discovering data for specific sensor characteristics
http://www.w3.org/TR/sdw-ucr/#Spatialmetadata - metadata helps discoverability
http://www.w3.org/TR/sdw-ucr/#Spatialoperators - operators help you discover related spatial data
http://www.w3.org/TR/sdw-ucr/#Spatialvagueness - helps you find spatial data even with vague location indicators
http://www.w3.org/TR/sdw-ucr/#Temporalvagueness - same with vague time indicators
Note that in the requirements there is nothing about portals or about discovering inbound links. These parts of the theme seem not to be covered by requirements.
assignment of identifiers to ‘real-world things’ & information resources
(in-line cross-referencing by Armin, and possibly also Phil)
- … URIs as canonical identifiers rather than ‘names’ [Best Practice 10: Use persistent URIs as identifiers]
- … non-unique naming [reconciling information associated with each name; ’sameAs’ is not enough?] [Best Practice 14: Use standardized terms]. However, there is no best practice proposed on what other weaker relations can be used beyond ’sameAs’. SKOS mapping relations, for example, offer weak relations to reconcile relations between terms: [SKOS mapping relations]
- … relating toponyms and aliases to ‘real-world things’ e.g. (historical) place names, sub-surface geological features [relationships may change overtime] [HTTPRange14 Issue, Use of Hash URIs and Slash URIs with a 303 redirect, Best Practice 10: Use persistent URIs as identifiers]
- … relating information resources to the things they describe … terms such as Renaissance Italy, a geographic entity, apply to a specific time range - albeit an inexact one [No best practice defined yet]
- … an _event_ is a real-world thing too e.g. birth of Albert Einstein or World War 1 [No best practice defined yet, but the notion of events is well established in ontologies, e.g. PROV-O is built upon instantaneous events]
- … relationship between _versions_ of an information resource that describe a ‘real-world thing’ e.g. the geometry of a named feature may be updated- either because of a policy change (for administrative geography) or new measurement (for a natural phenomenon) [Best Practice 17: Vocabulary versioning, Best Practice 29: Update the status of identifiers]
Mapping requirements to this theme (Armin)
- … http://w3c.github.io/sdw/UseCases/SDWUseCasesAndRequirements.html#BoundingBoxCentroid - Bounding box itself requires an identifier
- … http://w3c.github.io/sdw/UseCases/SDWUseCasesAndRequirements.html#CoverageTemporalExtent - Spatial Coverage data itself requires an identifier to be referenced by temporal references
- … http://w3c.github.io/sdw/UseCases/SDWUseCasesAndRequirements.html#CRSDefinition - Coordinate Reference System is identified through a URI
- … http://w3c.github.io/sdw/UseCases/SDWUseCasesAndRequirements.html#Discoverability - Human readable URIs improve discoverability
- … http://w3c.github.io/sdw/UseCases/SDWUseCasesAndRequirements.html#DynamicSensorData - Sensor needs to be able to assign URIs in real-time (Range of permissible URIs?)
- … http://w3c.github.io/sdw/UseCases/SDWUseCasesAndRequirements.html#HumansAsSensors - Humans need unique identifiers
- … http://w3c.github.io/sdw/UseCases/SDWUseCasesAndRequirements.html#Linkability - Core Requirement: URIs allow linkability
- … http://w3c.github.io/sdw/UseCases/SDWUseCasesAndRequirements.html#ReferenceDataChunks - There needs to be a way to combine chunks of data under one URI (e.g. DataCube vocabulary)
- … http://w3c.github.io/sdw/UseCases/SDWUseCasesAndRequirements.html#TilingSupport - Tiles may require URIs
- … http://w3c.github.io/sdw/UseCases/SDWUseCasesAndRequirements.html#TimeSeries - TimeSeries may require URIs
expressing (geo)spatial information & temporal information
- … _events_ cited specifically as these have both spatial and temporal attributes (observations are events too!)
- … relative positioning [linear referencing]
- … positional (in)accuracy [uncertainty, precision, probability, confidence][e.g. second quarter of the 9th century is _approx._ 825-850 but could be 823-852](note: the ends of a ’sampling traverse’ could be known in a national/global CRS to within ±5m, whilst the position of the samples themselves may be accurate to ±0.01 along the traverse)
- … imprecise spatial and temporal referencing (near, south of, after, around)
- … which vocabulary/data model to use for spatial / temporal attributes
- … how to express CRS (& temporal reference systems [calendars are different!]); should there be a default CRS?; implications of “wrong” CRS on positional accuracy
- … graphs of literals vs structured objects expressed as single literal (WKT, GML) etc.
- … what do downstream tooling support; what query ability is desired?
- … for use in spatially aware applications (such as maps)
- … describing how things (such as geometry) vary with time [is time treated as first class coordinate or an attribute of the resource? varies depending on case]
- … geotagging e.g. using postcode
- … 1d, 2d and 3d geometries e.g. TIN coverage in varying resolution & heterogeneous resolution voxel sets (e.g. to represent complex geology)
- … times (instants, periods) in addition to relative time-frames (durations)
- … "geometry can be 95% of the total data size” [Dutch Base Registry]: is there a need for performance optimisation (e.g. compression); can this be standardised? - think lidar point cloud for _big_ data
- … how do I combine data expressed in different Coordinate Reference Systems?
- … provision of multiple geometries for a single feature; e.g. a single reference point (for “pin on map”), a bounding box (for search), a simple geometry (for coarse spatial analysis), a detailed geometry (for resolving cadastral boundary disputes) etc.
not everything is geo; see UC 4.40 for cartesian 2d and 3d data
Questions to consider
- … is it feasible to select a single geospatial vocabulary? can we define (axiomatic) mappings between the most common?
- … is it feasible to select a common (default) CRS? can we define standard CRS transformation functions?
- ... should we provide guidance on how a particular community can agree common geographic representations and spatial reference systems?
- ... should we provide / can we refer to validators for common formats; WKT, GeoJSON etc.
Crossreferencing (Kerry)
- Best Practice 3: Provide locale parameters metadata Information about locale parameters (date, time, and number formats, language) should be described by metadata...... The machine readable version of the discovery metadata may be provided according to the vocabulary recommended by W3C to describe datasets, i.e. the Data Catalog Vocabulary [VOCAB-DCAT]....Check that the metadata for the dataset itself includes the language in which it is published and that all numeric, date, and time fields have locale metadata provided either with each field or as a general rule. So we could be specialising this BP to recommend owl-time for linked data or asking DWBP to do so. We could consider asking for GEO-DCAT to get a mention.
- Best Practice 22: Provide bulk download -- Data should be available for bulk download. Implementation ... could include Hosting a database, web page, or SPARQL endpoint that contains discoverable metadata [VOCAB-DCAT] describing the container and data URLs associated with the container.
- That's all I can find from Data on the web. For Web Architecture, one principle should be kept in mind: Orthogonality: Orthogonal abstractions benefit from orthogonal specifications. A specification should clearly indicate which features overlap with those governed by another specification.
sensor data
- … expressing the ‘observation context’ (location, time, observed property, quality etc.)
- … & representing (data) processing chains (provenance) e.g. georeferencing, pixel classification
- … need to include concepts of “sampling feature” (that is representative of the _subject of interest_; statistical, spatial or proxy) and the more specific concept of “specimen” (which is tested ex-situ)
- … “sampling features” and “specimens” may be grouped; relationships between samples provide a ’topology’ of sampling
- … batches of specimens may include control samples
- … these concepts are already well described in O&M2
- … crowd-sourcing e.g. of observations - e.g.
- #uksnow for crowd-sourced snowfall observations;
- earthquake damage reports using address locations;
- (geocoded) bushfire reports;
- note: some of these social media channels don’t allow use of structured data; how do we parse textual reports to extract, say, location etc.
- ... humans can be sensors too
- ... correlation/geolocation with ‘authoritative’ sensor data streams
- … virtual observations from numerical simulations
- … publishing / subscribing to observation data stream
crossref to Data on the web BP (Kerry)
In general terms, pretty much all of the Data on the Web best practices apply, although they are not so obvious from the analysis here, more obviously connected to ssn as it is today. Wrt ssn as it is currently defined, these things could be considered missing
- Best Practice 2: Provide descriptive metadata. On one hand most of ssn could be seen as exactly descriptive metadata, but it is quite different to the DoWBP idea of descriptive metadata.
- Best Practice 3: Provide locale parameters metadata ie The language(s) of the dataset and The formats used for numeric values, dates and time -- although most of this can be done via xsd datatypes.
- Best Practice 5: Provide data license information definitely missing
- Best Practice 6: Provide data provenance information. In a way, ssn is all about provenance. However it predates prov-o and does not quite happily line up. See paper by Compton, Corsar and Taylor at ssn 2014 for a suggested alignment (btw, also makes ssn play better with O&M).
- Best Practice 7: Provide data quality information SSN does this, in its own way. May be worthwhile to look at "The machine readable version of the dataset quality metadata may be provided according to the vocabulary that is being developed by the DWBP working group , i.e., the Data Quality and Granularity vocabulary"
- Best Practice 8: Provide versioning information discusses real time data and suggests possible implementations that mostly seem (to me) inappropriate for real time data. Memento is mentioned. An unchanging "latest version" URI is mentioned. OWL annotation properties and PROV-O are mentioned. There are several outstanding issues. There is also a best practice for version history.
- 9.9 Data Vocabularies has a lot to say about using standard vocabularies, documenting them, sharing them, versioning them, re-using them, and choosing the right formalisation level. There are no surprises here--ssn meets these best practices apart from standardising, versioning, and licencing which this group will do. There is an issue about whether vocabularies are in scope for DWBP.
- Best Practice 22: Provide bulk download, Best Practice 24: Provide real-time access and Best Practice 25: Provide data up to date are relevant for services that may use the ssn ontology.
- Best Practice 30: Gather feedback from data consumers and Best Practice 31: Provide information about feedback suggest that publication supports feedback from data consumers to improve data quality and enhance the consumer experience. DWBP is developing a Dataset Usage Vocabulary for this purpose -- clearly we should ensure that SSN can work well with this.
crossref to Web Arch (Kerry)
- 4.2.3. Extensibility says that providing for open extensibility is good but extensions must not interfere with the original spec and the spec should specify agent behaviour in the face of unrecognised extensions.
Mapping Requirements to this theme (Kerry)
Compressible because sensor data is often big, although this does not really occur in the theme description
CoverageTemporalExtent again does not appear in the description, but surely applies -- may just be covered in a different BP narrative
I've left out a few more generic ones that could go here too crawlability, CRS definition, date,time,duration, default crs, - but are not specific to sensor data so I am assuming covered elsewhere.
Discoverability again not specifically sensor, but there are many sensor use cases against this requirement
DynamicSensorData This is a big one in the theme description (but BP is not in the deliverable part of the requirement -- prob should be)
Ex-situ sampling surely a SSN issue, not a BP one?
GeoreferencedSpatialData another obvious one, but not really mentioned in the narrative
Humans as sensors big feature in narrative -- although surely a SSN issue, not a BP one?
Independence on reference systems another obvious one, but not really mentioned in the narrative
Lightweight API this is missing from narrative--but should be a BP issue I think, it is in scope at all.
Linkability this is a different theme-but is evident in the sensor data theme description too.
Mobile sensors Moving features these 2 feature big in the narrative --they are ssn issues but do they need a a BP mention?-- I do not think so.
http://w3c.github.io/sdw/UseCases/SDWUseCasesAndRequirements.html#Provenance Provenance]
externalvocabularies Reference external vocabularies only has SSN against it -- but this is a BP issue, surely? A good few of the use cases are sensor-related.
Sampling topology this is big in the narrative, but this is surely a SSN issue alone? (as in the UCR now)
Space-time multi-scale Now this has SSN and Time in the UCR, but surely this is at least partly a consumption issue -- and should be in BP?
Spatial vagueness and also Temporal vagueness I think this is implied by some of the narrative issues, although neither is targeted at BP
TimeSeries this raises a common issue I see -- it has Time, SSN and Coverage against it , and I agree that those deliverables will have to be aligned to this requirement - but would it not really be delivered through Best Practice instead? Ie those other deliverables will just throw it back this way? Or am I just thinking this way because the solution I have in mind would be to use the rdf datacube -- and somewhere we would need to say how to do this.
Uncertainty in observations The same could be said for this, currently against SSN
Virtual Observations Surely an SSN issue instead of BP, but narrative raised it.
Are we missing any requirements? maybe -- I did not spot "correlation/geolocation with ‘authoritative’ sensor data streams" or "pub/sub" (although streaming is there -- isn't pub/sub assuming a partial solution?)
---
Other stuff
Not quite figured out where this fits in ... feels like it's in scope but haven't found a home yet.
- working with large data that doesn't play nicely with the typical RDF-centric Linked Data approach
- … referencing “chunks” (subsets: slices, ranges etc.) of very large datasets e.g. assembly of large remote sensing dataset from individual “tiles” / “scenes” or extracting a particular band in a multi-spectral image
- … raster data; pixels (2d) and voxels (3d)- describing ‘resolution’ etc.
- … lidar data & acoustic survey- point clouds … not structured like raster data! … slices could be _derived_ from the raw data?
- data encoding choices
- ... what are our recommendations?
- … how does a consuming application know which _vocabulary_ is used? (as distinct from format which is described using media types)
- … (how) can a consuming application dictate which vocabulary is provided?
- visual styling of features (for display on maps)
- “sending the code to the data” for analysis of very big data [SCOPE?]
- dealing with large datasets; “paging” (query) result sets to make them manageable to consumer applications
Cross-ref with Data on the Web (Bill)
There are a number of relevant best practices from the Data on the Web working group http://w3c.github.io/dwbp/bp.html
- BP4 provide structural metadata - can help a consuming tool know which vocabularies are used. Goes together with: BP15 document vocabularies - BP16 share vocabularies
- BP12 must be machine readable standardized format - this best practice does not require specific formats, so it's possible to select a format appropriate for the type of data (eg large remote sensing datasets), but recommends to use common existing formats where possible and mentions NetCDF (used in conjunction with CF Conventions) in particular - an efficient format for array-oriented scientific data, which might be useful for some of our coverage data examples?
- BP13 should be in multiple formats where possible - if we are able to offer data in more than one machine readable format, that's an advantage
- BP22 - data should be available for bulk download
- BP23 - follow REST principles
Cross-ref with Web architecture (Bill)
Not so much of relevance here other than that the web architecture does not constrain formats of resources on the web and there is the media-type system for identifying formats
http://www.w3.org/TR/webarch/#formats http://www.w3.org/TR/webarch/#internet-media-type
Other notes
- visual styling of map features: is this in scope? SVG working group: http://www.w3.org/TR/SVG2/ http://www.w3.org/TR/SVG2/styling.html
- dealing with large datasets: if RDF, can use LIMIT/OFFSET in SPARQL. Raster data: tiling and download just the tiles you need.
- sending code to data - is this in scope?