CEO-LD Kick Off Meeting -- 29 Sep 2015

<phila> rrsagenet, make logs public

Tour de Table

<phila> Simon Agass – Data Satellite Catapult. Activating different sorts of data, making it available. Data innovation. Denise McKenzie, OGC. Introduces self/OGC. Work being done in the SDWWG – choosing standard Geoffrey – geology, talked about work on glaciers etc. Mapping and coverage data is important (LIDAR etc) Strategic objectives for CODATA: • data science – integration. Classic techniques don't work • we need to be able to make data a[CUT]

<phila> Jitao: Works at Institute of Remote Sensing/CAS

<phila> ... Has largest amount of sat data in China, trying to open it to the world

<phila> Maik: From University of Reading. Work with Jon Blower on MELODIES project

<phila> MELODIES project

<phila> Maik: Talked about the project. Experimenting with different APIs etc.

<phila> ... quite new to the area. Joined Uni reading reently, was training at ESA

<phila> ... Come from a Web Developer point of view. I just want to use data easily

<phila> Jianhui: From CNIC/CAS

<phila> ... Operate infrastructure in CAS, providing service to other scientists

<phila> ... inc high speed network

<phila> ...developing infrastructure to integrate research data within CAS which has > 100 institutes

<phila> ... want to provide data services to all the scientists.

<phila> ... For sat data, we deliver service systems, connected to 350TB of satellite data that can be downloaded freely

<phila> ... Open sat data - the big change is how to make it easier to find and access the data and get infro from this sat data

<phila> scribe:ph

<phila> Jianhui: So we want to work out how to put sat data on the open Web

<phila> ... Also a member of CODATA China

<phila> scribe: phila

jianhui: Work with Geoffrey and Simon H on this
... This is a good opportunity for us to help make sast data more open more easily
... and exchange data between UK and China

YangGao: Surrey Space
... want to automate real time processing of satellite data
... not familiar with W3C and OGC - but encouraged by what I've heard
... ambition of this group coincides with our work
... want to understand what we can contribute to standardising
... want to identify gaps, methodologies etc. Encouraging community to follow methods
... SSTL has long standing relationship with many Chinese organisations
... First remote sensing satellite acquired in China came from Surrey Space (SSTL)
... helpful in disaster monitoring, urban planning etc.
... very motivated to work with Chinese colleagues. Want to share lessons learnt, etc.

Payam: From University of Surrey
... work on IoT, data interop
... member of the SDW WG

phila: Can you, Geoffrey, say something about the UK-China angle, what are the funders asking for

Geoffrey: The FO is looking for collaboration between UK and China that will help lead to innovation
... I said that all you can do is stimulate the first part of that. It's up to others to begin the companies that earn the billions

<jtandy> (FO = UK Foreign Office)

Geoffrey: The widespread use of data that is freely open to all can be the basis of a lot of enconomic activity
... What we need to do in the short term is to identify the issue we're trying to address, look for the way forward. In 6 months time we should be able to set out in precise terms the work that needs to be done then to lead to future development

<MaikRiechert> First correction: I'm not from the University of Berlin, but University of Reading :)

<chunming> for chinese: http://www.w3.org/2015/ceo-ld/Overview.html.zh-hans

<chunming> Wiki page: https://www.w3.org/2015/ceo-ld/wiki/Edinburgh

<chunming> W3C official website: http://www.w3.org/account/request

https://www.w3.org/accounts/request

<chunming> Wiki page for London Kick-off meeting: https://www.w3.org/2015/ceo-ld/wiki/London_Kick_Off_meeting

<jtandy> D_McKenzie: can participants in this group be from countries other that UK and China?

<jtandy> GeoffreyBolton: that is for our group to determine- we need to outline how _we_ want to work [to reach our goals]

<jtandy> phila: [talks about the logistics of the group]

<jtandy> phila: there are two face-to-face meetings

<jtandy> ... first Sapporo, for W3C TPAC

<chunming> W3C TPAC 2015: http://www.w3.org/2015/10/TPAC/Overview.html

<jtandy> ... this is not formerly part of the project, it is where the W3C/OGC Spatial Data on the Web WG will meet

<jtandy> phila: the next F2F meeting will be in Beihang ... to be organised by chunming

<chunming> TPAC 2015 (in Chinese): http://www.chinaw3c.org/tpac2015-overview.html

<jtandy> phila: also there are the weekly teleconf calls for Spatial Data on the Web WG (SDW)

<jtandy> ... we'll be holding the next one tomorrow afternoon in this room (@ Royal Society no less), 2PM UK-time (UTC+1)

<jtandy> https://docs.google.com/spreadsheets/d/1PSnpJYQDgsdgZgPJEfUU0EhVfgFFYGc1WL4xUX9Dunk/edit#gid=2122201582

jtandy: The Use Cases eds worked out which use cases apply to Coverages etc.

http://w3c.github.io/sdw/UseCases/SDWUseCasesAndRequirements.html#arCoverageInLinkedData

GB: Use cases seem very specific. Why? Just written by people in the group?

jtandy: The UCs are written by the group. We wanted people to write UCs that are specific so that you can validate against it - you can find test data

<MaikRiechert> (link doesn't work in Firefox, says "undefined" in all headers)

jtandy: In the best practices doc, for example, we're looking at common themes for exposing GIS on the Web. We've not looked much at sat data as we knew this group was coming

<MaikRiechert> using Chrome now, works ;)

<jtandy> Geoffrey asks about the common themes; see here: https://www.w3.org/2015/spatial/wiki/BP_Consolidated_Narratives

<Payam> Jeremy describes the Spatial Data on the Web Working Group's use case and best practices

<Payam> there is a lack of satellite data related use-cases

<Payam> one of the key goals of the the Spatial Data on the Web Working Group is to encourage people to link their data to other assets on the web

<Payam> jtandy: one of the key goals of the the Spatial Data on the Web Working Group is to encourage people to link their data to other assets on the web

SimonAgass: I think linking data is even more important for sat data
... was disrupted by G Maps etc.
... prices of sat data dropping. Sentinel, SkyBox etc.
... bigger players are seeing a threat in some way to their curated data model
... EO data has not been connected through the Web. It's been in isolation. best you might get is in the metadata
... putting the data into context is where the value comes. Mashing it wwith other data on the Web is where the value comes. For the EO industry to move on, integrating it into the Web of data will greatly increase the value

jtandy: The MELODIES project is looking at that. Pipelines etc.
... (Processing pipelines)

SimonAgass: We did a project earlier this year with Chile on disaster management. They had lots of sat data,m but not the resources to undertand it nad use it
... we built some infrastructure that used Linked Data, NLP etc. to bring in more info

GB: It strikes me that linking sat data is crucial - but what sort of other data. You want spectral range to be aware of others
... not just this image but have you looked at A, b C
... geological maps of coutries might be a general cover - but that's not global. it's regional
... Thirdly point data - might be very ad hoc, incomplete
... I think we need to think about what sort of data we want to linjk to

jtandy: +1 we can link from anythign to anything but it rapidly dissolves into meaninglessness

GB: Typically on the Web, you ask 'what's at this location? - but we might want to ask what are the locations that have these same properties?
... You can't do that with ad hoc observations

<Simon_Hodson> Important to be able to ask the question: at what locations is this property the case?

jtandy: I think we should come back to that conversation. It's a key topic

<Simon_Hodson> What needs to be done to the satellite EO data in order to be able to ask such questions?

jtandy: Maybe SimonAgass could give us an insight into what you had to do with the sat data for Chile to integrate it. I guess it was mostly manual?

SimonAgass: Mostly

jtandy: You have to be an expert - you can't offload the set up time to a software agent
... So theme 1 was linking data

<Simon_Hodson> Themes for Use Cases

<Simon_Hodson> Theme One: linked data.

<Simon_Hodson> Theme Two: publishing data with clear semantics

jtandy: Talks through more of https://www.w3.org/2015/spatial/wiki/BP_Consolidated_Narratives

<Payam> jtandy: a part of the work at the Spatial Data on the Web Working Group is to formalise the vocabulary that can describe geospatial data

GB: It seems to me that you need to compose your query well to discovery
... Would it be helpful to give advice on how to compose the right query?
... Geo probs at the moment - you tend to go to Google Maps
... what is the starting point, how do you ask the right question?

jtandy: Implementers tend to provide portals that encode a lot of the hidden knowledge
... imagine an air quality dataset. Data is being collected in a number of places that might be listed in a gazeteer
... but if I use that gazeteer then there are more links/routes to discoverability

GB: I think I'm getting back - do we need to have experts to use this data
... So the question is, how much expertise do we expect people to have in order to be able to access the information.

jtandy: Colleagues talk of the danger of giving access to data to people who don't understand it.

GB: A little knowledge is a dangerous thing

jtandy: You shouldn't need to be an exoert to ask the question altough you may need some expertise to understand the answer

D_McKenzie: And there's a danger with locations since you can gather lots of info about a location that can lead to privacy breaches
... Maybe there are some datasets that you shouldn't be able to link

GB: This has resonance in code breaking

Payam: If you have multiple providers of data you're looking for and then finding the data within that
... There's an indexing issue there
... And then the interesting part, once you have the data, you don't have everything in one document, you have links to find more

Jianhui: We have a platform that provies services to the scientists so we know about the requirements that they have
... Sometimes you don't jyst want a link from one image to another, you want a link to otehr info.
... They want to link at differnet scales or the whole picture
... I think the use case is more complicated
... So I think we need to invite some users to tell us what they're after

jtandy: Drawing input from the user community is something that the sat applications catapult has been doing for the last 2 years - so maybe SimonAgass can talk about that

SimonAgass: Users, yes, and the value adders. Look at some of the challenges we want to overcome.

jtandy: Before we create any new tech, we should look at what problem it should solve

DM: EObs has many facets

jtandy: The broader SDW is about coverage data. This group here has a series of expertise in space based RS
... there are use cases in the SDW that mention LIDAR, Sonar etc
... there are also in situ sensing questions. It's still EO

GB: I suggets we concentrate on satellite data but then ask which principles apply to otehr areas. Are there exceptions when talking about sat data
... There was a gov report on what sensor data was available, ice sheets etc.

jtandy: There are also coverage datasets that are not an observations. A coverage varies with space or time.
... The Met Office has coverages that are time series of measurements at the same points for >150 years

GB: There are many dataets that have both spatial and temporal slices

jtandy: When we create these 4 D datasets, it's quite hard to work with them as they've been sliced and stored in one way.

GB: Sampling is a fuindamental issue. Most of the maps we generate are interpolated based on point data
... do we want to think about sampling distances and correlatability

jtandy: That's an example of the context we need to interpret the EO data

Payam: The Semantic Sensor Network - is an uppler level ontology, but you can put in specifics.

jtandy: There has been a proposal for simplification
... Talks about the CHARME project
... people find that they want to annotate sections of data they've found

-> http://charme.org.uk/ CHARME project

Payam: SSN looked at the sensor devices - but if you have streaming data, you don't need to add the semantics to each itme, just the stream

jtandy: The last theme is obvious... which is how to express the geographic and temporal elements
... how do we write that down
... there are multiple mechanisms for doing that
... we want to encourage convergence on one
... Recaps on https://www.w3.org/2015/spatial/wiki/BP_Consolidated_Narratives
... Nothing specific there about coverage data - we can create it if we need to but I think what we've said so far can fit into those 7

Jianhui: You mentioned publishing data with clear semantics. RS data can be understood by machine?

jtandy: In the US there are a number of orgs creating hydrology data (Aus has 500 such orgs)
... Untl recently, there was no governance on how that info was produced. Everyone had their own data model that was very hard to reconcile

DM: There was a trading model for water in Aus

jtandy: We need to be able to do the mapping automatically, do the crosswalks etc.

DM: Google Translate for data

MaikRiechert: In Melodies - land cover categories arfe different in diff countries and it's hard to map between them

jtandy: In Germany you might have coniferous forest, elsewhere it's just forest
... That's a relatively simple mapping
... but it can be more complex
... we want to be able to offload a lot of ther work to machines

MaikRiechert: Yes, but you need to experiment with differnet mappings., You can't have a universal mapping
... Maybe there can be default mappings from some organisations

jtandy: Once you've experiement with mappings, you might want to publish 'these are the mappings that worked for me' so we should make those durable and publishable
... Which brings us to questions we've not touched on - how 3rd parties can add or subtract value
... Annotations is one part of it.

<chunming> MELODIES project: http://www.melodiesproject.eu

jtandy: trust and provenance

Simon_Hodson: ... Can we think about linking those attributes?
... So we need to identify those attributes and see how they'd be used to describe some data

jtandy: I think that's a workable approach
... I haven't come with any pre-baked ideas
... You're saying (SH) that these are things that scientists typically ask in a given field

Simon_Hodson: it seems that what we're interested in the kind of attributes we need to think about

jtandy: Physical quantities?

Simon_Hodson: Yes, or an attribute that there might be pollution from a sensor or whatever

jtandy: The contextual info is what you use to assess whether the data is any use for you
... So starying with the attributes is common

DM: Sensor Enablement is a framework for how you handle the geospatial aspects of sensors. So it's a management toolkit providing a level of consistency
... A way to see what you're going to get from that sensor network. But you can put in info that allows you to make further decisions

Payam: I think SWE has service descriptions too
... so you can see the format so you know how to query

DM: Standards help here, you state how you've set up your network so people know what to expect.#

<jtandy> (SWE = Sensor Web Enablement)

<chunming> minutes: http://www.w3.org/2015/09/29-ceo-ld-minutes.html

<jtandy> phila: resumes the meeting after coffee

<jtandy> ... we need to set out what we would like to achieve and what we can achieve

<jtandy> ... and what is not in scope

<jtandy> ... then ...

<jtandy> ... all around this room know about various standards- we can determine the gaps

<jtandy> ... we also need to respect the goals of this working group- based on the charter of the SDW WG

<jtandy> yang: (talking to phil over the break)

<jtandy> ... the key challenges that we need to highlight are

<jtandy> ... i) the diversity of satellite data payload

<jtandy> ... optical cameras, SAR, LIDAR, multi-spectral cameras

<jtandy> ... there is a diversity of instrument

<jtandy> ... which means that there is a diversity in different formats

<jtandy> ... the satellite operators [tend to] encrypt and compress the data for downlink

<jtandy> yang: then once the "data product" is received [at the ground station] this compression and encryption is reversed

<jtandy> ... the data is then turned into products that can be discovered and used

<jtandy> ... [...]

<jtandy> .... there is a diversity of approaches for creating these products (?) based on the heritage and background of different System suppliers

<jtandy> yang: so ... we want standards- but standards close to the data product end of this process

<jtandy> ... nevertheless, we need to consider two phases

<jtandy> ... i) generic processing that can be standardised for _any_ application

<jtandy> ... ii) [doing stuff] for specific domains, such as agriculture etc.

<jtandy> ... the second phase is difficult to do

<jtandy> ... but there is likely to be good support from the scientific cimmunity to do this

<jtandy> ... but, like said this morning, we need to engage with the end users

<jtandy> ... we also need to set up examples that we can refer to

<jtandy> phila: here's my use case ...

<jtandy> ... simon has a bunch of developers creating applications

<jtandy> ... they want to use satellite data- from Jianhui's data centre and SENTINEL (and other sources)

<jtandy> ... the point is that the application developer should be able to treat each of those sources in the same way

<jtandy> ... each 'product' will have the same structure

<jtandy> ... [phila points out some specific parts to consider]

<jtandy> phila: we need to standardise on the 'product' - the coverage

<jtandy> ... then, in subsequent phases of this working group's life, there is implementation to consider

<jtandy> ... we will seek money / funding for a phase 2 where

<jtandy> ... we can engage with the satellite manufacturers / operators to implement these standards

<jtandy> ... so that it easy for application developers to work with multiple sources

<jtandy> yang: there is a starting point for where we can standardise- [after the encryption and compression is removed]

<Payam> +1 chunming

<jtandy> ... it would be difficult to engage with satellite manufactures earlier than this

<jtandy> yang: after this we can look at data formats

<jtandy> ... at this point we can work with users to see how these formats can be georeferenced (?) etc.

<jtandy> ... if this can be clarified, this would be better

<jtandy> phila: I think that this is what we're doing

<jtandy> ... we want to focus in the user end

<jtandy> ... I'm focused on the application developer who wants to access data through a set of APIs

<jtandy> ... they will be using http, apis, etc.

<jtandy> ... this is different to the [expert] users from Juanhui's organisation

<jtandy> ... but this is clearly different from the calibration and [detailed instrumentation] that is executed at the satellite Platform and Ground Station

<jtandy> phila: [in reponse to GB's question] I see two types of users:

<jtandy> ... i) the regular web developer ... they understand web-stuff but have no need understand

<jtandy> ... the details of the satellite, the instrument, the processing chain etc.

<jtandy> ... this is hard- I'm asking for the moon on a stick

<jtandy> ... example: Geoscience Australia hackathon ... everyone created applications based on timeseries of pictures where they lived

<jtandy> ... not really adding value

<jtandy> ... given the investment to acquire and curate that data

<jtandy> ... in the linked data world there are many data sources that could be linked to satellite data

<jtandy> ... statistics etc. ...

<jtandy> ... it's the job of a data scientist to do this

<jtandy> ... but actually, we want web developers to be able to do this

<jtandy> ... that's our first user

<jtandy> ... this is the focus, the main priority

<Payam> we can consider 3 categories of users:

<Payam> i) users who are interested in observation and measurment data

<jtandy> jianhui: we have data scientist users

<Payam> ii) the second group are users who are interested in the O&M data linked to other assets on the web and/or to be able to link the data to other assets

<jtandy> jitao: users in my organisation are experts in the data but don't understand the web technologies - I do, but they generally don't

<jtandy> GB: so what kind of people are these?

<Payam> iii) users who are interested in O&M data + linked data+ provenance data and the processes that have been applied to the data in the pipeline

<jtandy> phila: I worry that [if we tightly define] the users, we will close of potential solution

<jtandy> GB: why are we doing this? so that people can utilise this information creatively for their purpose

<jtandy> ... so we need to put the maximum usability in place whilst minimising the constraints on usage (such as the need for expertise)

<jtandy> phila: looking at Payam's catelgories, the first and second (?) map to 4-star and 5-star linked data

<chunming> 5-start data: http://www.w3.org/2014/Talks/0123_phila_lata/#(14)

<jtandy> [phila gives an overview of 5-star linked data]

<chunming> 6-start is data with provenance

<jtandy> (also see http://5stardata.info/en/)

<jtandy> Payam: we can look at the details for all these groups-

<jtandy> ... but we can start by looking at the core elements for the first category

<jtandy> ... and then add modules for categories 2 and 3

<jtandy> GB: let's look at the deficit here ... it seems to be expertise

<jtandy> ... a lack of expertise prevents people from using the data

<jtandy> ... if you're an expert, you can probably navigate this anyway

<jtandy> MaikRiechert: [missed]

<jtandy> GB: there's one category of users that are clear ... that's students, the educational value of this data is huge

<MaikRiechert> different data representation could map to different user types, e.g. a user without expert knowledge could use a WMS endpoint, whereas an expert user could use a more direct access to the data, like WCS, GeoSPARQL, or other APIs/formats

<jtandy> jianhui: there are many cases where

jianhui: There are cases where people just want to share data with their colleagues. Location data etc and they'll make a map of where they went - social media style

jtandy: So they're creating a derived product using sat data as input
... one example is burn mapping after wildfires in Greece so that funds can be allocated to rejuvenation

SimonAgass: That applies in an industrial org as well. Imagine a distributed org - they want to share data without losing control

phila: We're not doing Access Control - LD doesn't mean LOD

jtandy: Summarises - assume data has been downlinked, decrypted and put in a format ready for exchange.
... We don't want to try nad tell manufacturers to re-engineer their satellites

MaikRiechert: There are different levels of processing. If they want to expose an earlier level they can

<jtandy> phila: there a standard in this space already- from W3C ...

<jtandy> ... RDF Data Cube, DCAT, PROV-O etc.

<jtandy> Payam: agrees- we can find existing technologies

jtandy: You can use satellites to look at the surface roughness of the oceans to see how windy it is
... typically those things are used to provide wind speed and direction variation over an area
... I'm sure it's possible to create a time series over a specific point but it's not done in meteorology as a rule

<jtandy> phila: can I ask an embarrassing question- I've looked at lots of satellite data

GB: matching data points is always difficult

<jtandy> phila: is GeoTIFF a coverage

<jtandy> ... or an image

<jtandy> [discussion]

jtandy: Creating an OWL ontology from the Application schema that came out of the coverages WG sounds easy. However... what Peter baumann has done is to build something that is bound to the XMl sturcture, rather than the domain model
... There is a coverages standard ISO 19107 - that could probably be exported as an OWL ontology

MaikRiechert: Auto generated OWL is typically horrible

jtandy: Kerry calls it non-OWLy-OWL
... You need an expert to interpret that
... But there's a great description of what a coverage includes.
... A set of domain values in any number of dimensional spaces, and you have a set of values that you map on to that. The rest of the complexity comes from expressing the space and time aspects
... Can we give coordinates without giving the CRS

DM: SOme of that work is happening in other places

jtandy: SOme of the reaklly valuable thing to come out of the coverage implementation work that peter Baumann has done... rather than having to iterate over each point, you can define a start point and a step and a grid description
... those things don't appear in the ISO model
... you could provide those things in the metadata - the rectified grids
... would that be helpful?

GB: I think that's a serious problem... recoverability
... Say you have a bunch of school children in South Africa and they come up with something brilliant. And people thing that's so important and so profound and we'd need to go and track back to understand where their data came from

jtandy: The US National Climate Assessment - they have the 'line of sight' concept between data and application
... You may not be too concerned aout the provenance info but a user might so that they can determine whetehr to trust the product.

GB: Not all users need to know everything but there is need to be able to trace

jtandy: So we've agreed where we start our standardisation path
... we want to support developers who just want to get the job done
... There are people who will need to track back the provenance
... if you just have the data, that sounds limiting. Here's a GeoTIFF
... without any more background
... that's still a coverage but you don't know where it came from
... A lot of developers might be in that space
... A farmer just wants to know whetehr he should plant what where

DM: But he does need to be able to trust the data

Yang: Is there a legality issue? The data source usually owns the data

<jtandy> [phila cites his next working group: Open License ...]

Yang: Even if you put a time stamp on the data - there might be other satellites that acquire similar data at similar times - can the user choose differnet sources and differnet times, so the prov is important

-> http://w3c.github.io/ole/charter.html Open License Expression WG (draft)

Yang: Explores provenance issues

phila: Talks about application specificity

SimonAgass: So is there a group that doesn't want provenance?

jtandy: There's a group that are happy to assume that someone else has looked at the prov

DM: using government data comes with an inherent trust level (justified or otherwise)

<chunming> s/applkication/application

<jtandy> (seems like we have two categories: application developers and data scientists?)

chunming: So there's a requirement that we want to be able to query this data through a kind of timestamp - I want to query some portion of data from adefined time...
... one of my colleagues is trying to develop new algorithms with time
... this is a new research area in the database domain

jitao: This kind of data has already been included in sat data

jtandy: So if we have 2 typesof user - app developers who make assumptions about data
... and we have people we rudelu call data scientists who really want to be able to check the prov

DM: And one feeds into the other

jtandy: Payam you said that we could start with teh core set of things that the devs want and then add the modules that the data scientists want
... that seems like a useful approach to me
... People are going to want to get a lumo of data out of the big whole
... by starting from the web developer, we start simple

<jtandy> phila: I'm still struggling with my inexperience here- what does coverage data look like

<jtandy> deniseMcK: also, what does a coverage look like in the context of this group?

<jtandy> phila: so for example, if I have a "coverage" can I access a single point?

<chunming> GEOGLAM initiative

-> http://www.geoglam-crop-monitor.org/ GeoGLAM

<Payam> http://unstats.un.org/unsd/bigdata/

<chunming> UN Big Data for Official Statistics

<chunming> scribe: chunming

Payam: next meeting will be in Beijing

jtandy: we will find the community that will support the coverage data

Simon Agass: not only download data, but also providing some infor to describe it.

jtandy: talk about relevant data, what is relevant

Simon Agass: example of population analysis; tag images; [missing]

scribe: adition, search for , damige assessment, need remote locations;

jtandy: so in the example, the active volcano should be a relevant archived datasets.
... is there introduction of MELODIES project?

Maik Riechart: introduce of MELODIES projects

scribe: link data set and provide meanings.
... don't know how to link with datasets, apis,
... if there's kind of document of this, could be useful

<MaikRiechert> land cover example: http://melodiesproject.eu/content/challenges-mapping-land-cover

Denise McKenzie: [missing]

Phila: with emphasis this meeting, try help working group deliverible on coverage linked data
... one of doing is best practice document

<jtandy> (Denise talked about using the NASA Space Apps Challenge hackathons as a source of questions that developers might have when working with earth observation data)

Maik Riechart: where the place to discuss?

phila: working group
... talk about provanance; talked about sharing on the web, encourage people to do that;
... second stage is about implementation, testing, encourage people to do impl.
... talked about what is coverage data.
... category, infoset that may be relevant to the image

DM: large meta data vocabulary. on geospatial data
... for coverage, its a subset (?)

phila: coverage data is - the rdf data cube (3 dim), or two dim (table), in RDF, or json format. this could be useful.

jtandy: for multi-dim datasets
... it could be cut to horizental pieces, and tiles, ...
... use RDF data cube, that related to different access patterns

phila: talked about image accessible (accessibility), that might be relevant.

jtandy: image can be data
... take the value of a pixel

[morning session closed]

<Payam> phila: there are three areas that we would like to focus:

<Payam> i) metadata ii) access requirement and iii) representation options

<Payam> phila: discusses ways of describing geospatial datasets and existing standards

data linking to/from coverage data is a kind of access req.

<Payam> phila: what is a dataset? how do you define a dataset?

phila: how do you describe dataset?

<Payam> discussion on using Dublin core to describe a dataset

<Payam> difference between the dataset and distributions of thethe same data

<Payam> s/the same time/the same data

<Payam> distinction between the actual concept of the dataset and distributions of the dataset

<Payam> jtandy: sometimes different systems use different identifiers for the same data and it is not obvious that for example two systems refer to the same data

<Payam> jtandy: it is important to note that the same data can appear in different systems and maybe with different identifiers

<Payam> GB: is this a problem that needs to be solved?

<Payam> MaikRiechert: it is sometimes not clear what people refer to as a dataset

<Payam> jtandy: reads the definition of dataset from DCAT

<Payam> phila: we need to decide how we are going to define a "coverage"

<Payam> phila: Access requirements: to we need to be able to access the individual observations?

<Payam> jtandy: granularity questions refers to the issue that if one can reference individual items (e.g. cells in an image) and have direct access

<Payam> phila: talks about RDF data cube

access requirement: for a third party to reference/access to a point/subset/part/whole of the dataset

<Payam> jtandy: talks about the observation in RDF data cube

W3C RDF Data Cube Vocabulary: http://www.w3.org/TR/vocab-data-cube/

<Payam> Jianhui: RDF data cube provides the actual data not only the metadata

<Payam> Jianhui: if we publish the remote sensing data as RDF we may need to provide new tools to allow users to use these data

<Payam> Jianhui: if we have very large RDF described data, search and query will be slow

<Payam> phila: you don't necessarily store it as RDF... you access and process it in other formats...

<Payam> phila: you need to extract the part you need to transform it to RDF

<Payam> Payam: or take the attributes that you need out of RDF and handle/process them separately (e.g. index the attributes outside RDF)

<Payam> chunming: we should focus on interoperability and sharing data not the implmentation

linked data segments: http://linkeddatafragments.org

<Payam> Yang: is it possible to extend the current satellite data format to help to understand [interpret] the data easily

<Payam> Yang: NASA remote sensing data: http://rsd.gsfc.nasa.gov/rsd/RemoteSensing.html

<Payam> http://rsd.gsfc.nasa.gov/rsd/RemoteSensing.html

<Payam> http://daac.gsfc.nasa.gov

<Payam> http://mirador.gsfc.nasa.gov

<jtandy> http://mirador.gsfc.nasa.gov/cgi-bin/mirador/granlist.pl?page=1&location=(-90,-180),(90,180)&dataSet=AMSRERR_CPR&version=002&allversion=002&startTime=2011-07-06T00:00:01Z&endTime=2011-07-06T23:59:59Z&keyword=AMSRERR_CPR&longname=AMSR-E%20Rainfall%20Subset,%20collocated%20with%20the%20CloudSat%20track&CGISESSID=eaf185b8f1dfaa584283d0ecbbe2d37d&prodpg=http://disc.gsfc.nasa.gov/datacollection/AMSRERR_CPR_V002.html

<Payam> http://mirador.gsfc.nasa.gov/cgi-bin/mirador/granlist.pl?page=1&location=(-90,-180),(90,180)&dataSet=GLDAS_CLM10SUBP_3H&version=001&allversion=001&startTime=2015-08-02&endTime=2015-08-02T23:59:59Z&keyword=GLDAS_CLM10SUBP_3H&prodpg=http://disc.gsfc.nasa.gov/datacollection/GLDAS_CLM10SUBP_3H_V001.html&longname=GLDAS%20CLM%20Land%20Surface%20Model%20L4%203%20Hourly%201.0%20x%201.0%20degree%20Subsetted&CGISESSID=64ecf6a8bef8d413aeb10d3223c7af28

<Payam> link to metadata: http://hydro1.sci.gsfc.nasa.gov/data/s4pa//GLDAS_V1/GLDAS_CLM10SUBP_3H/2015/214/GLDAS_CLM10SUBP_3H.A2015214.2100.001.2015253203046.grb.xml

<SimonAgass> http://data.satapps.org/

<Payam> Yang: this machine readable/interpretable data should not replace human readable (i.e. HTML) represetnation

<Payam> GB: we should be assuring that data and inferences made from it are accessible to wider groups in the society and not only the scientists who work with that data

<Payam> Simon: we should focus on the application, added value and use-cases that we couldn't do a certain task without the metadata

jtandy: HDF is commonly used by satellite data, but it is not accepted by browser :-)

<Payam> GB: summarises what we have discussed:

phila: idealy, we want to create a user agent, it read a URL in special pattern, and return a dataset in a text-format, structural, semantic/metadata rich way.

<SimonAgass> Some sample data from Sentinel 1 http://sedas.satapps.org/download-sample-data/

<Payam> GB:what we discussed: purpose, function and ways to deliver-

<phila> scribe: phila

<scribe> chair: Jeremy

jtandy: Checks who will be here tomorrow
... invites those who won't to express any thing they wanted to raise

GB: Planning to circulate a notes of today's meeting

SimonAgass: Nothing at the minute to raise.

Payam: Are we going to link what we're discussing here with the SDW WG

jtandy: Yes
... Something I wanted to do today - was to remind ourselves of the 7 themes being discussed by the broader group wrt BPs
... So it is necessary toi partition the work between this group and the SDW Wg
... and I'd like to achive that before we finish tomorrow.

Payam: This is a new type of accessing data from satellites
... so we'll need funding to develop tools. So if we convert some NASA data, how do we know we've done it right?

jtandy: We need to engage the data publishing and using community and get them to build

Payam: We're talking specifically about satellite data

jtandy: And if we have BPs, how do we evaluate whether they have met the BPs
... and advice on hwo to do it

SimonAgass: We need to manage expectations on this too.
... From my experience of saying we're going to make EO data available raises expectations
... This is a stage for describing data and making it available, but there are stages between the satellite and this stage and after it has been published

jtandy: +1
... we're not deadling with the data processing. We're only looking at the data publishing part of the story
... So a task for tomorrow, we need to be clear on what our scope is

phila: I think we're close to having that definition based on what Simon just said

DM: The Interface...

jtandy: Maybe think of it as a contract

SimonAgass: It does imply a level of handshake agreement

jtandy: So a searching question ...
... When we're doing action planning for tomorrow, we need to know how much time each of us can commit
... In order for us to be successful, some actual work has to be done. We need to cut our cloth accordingly

SimonAgass: there are activities that fall into this. One in particular on integrating LD
... Broker Technology 4 EO

jtandy: That gives us a pathway towards future funding?

SimonAgass: There's a potential alignment that allows me to spend some time on this as it's about LD and EO
... but not a massive amount of time

GB: I think the number of testbeds can be highly informative at this stage even if they're simple

Payam: I think Yang can help. Everuone in my dept is funded by a project so I can't pull people off those.
... I can perhaps do a little

jtandy: And you're editor of the SDW WG's BP doc
... Any overlap with any of those projects?

Payam: We do semantic models for smart cities etc. There are validation issues etc. I can ask someone there to join the meetings/check something.

Yang: I think if we could know a little bit better what sort of commitment in terms of time you're looking for
... In general we're very supportive of this work.
... We were chatting earlier - this forum is so useful - I managed to come up with a project - I'll invite people to Surrey to look at proposals to funding bodies.
... When we're working on those proposals maybe we can do some linking.

jtandy: In terms of that future proposal... what's the time scale from now on until somefunding arrives

yang: After the time of this project. Any bidding process will create new ideas in its own right.
... Lots of potential from this WG

Payam: We can contribute to use cases of course

jtandy: So content review, editorial, discussion, but not investigative work
... It's important that we don't over commit.

SimonH: It would be good to pull out examples. It will be most useful for me to comment on the documents.

jtandy: Are you content that we're working in a way that's compatible with CODATA

SH: I think I'd mention the connections I've raised. GODAN, GeoGlam
... CODATA does a lot of work on policy around EO. We are very active in that. Just submitted a white paper on the benefits of data sharing.
... We'd like to contribute more to the tech standards level
... We're also very active in training
... So this initiative provides a useful opportunity to affect the development process.

DM: This is a topic to bring up with Barbara Ryan (?) at Eye on Earth next week.

SH: I'll be there as well

DM: OGC has 3 staff going to Geo in November
... The plenary could be a useful place to validate any requirements we have by then.
... I won't be at GEO but Mark Reichart will (OGC CEO)

jtandy: Before the break, Phil was scribing a list. I have a similar list.

Issues

jtandy: First on the list I think we need to talk about is identifiers for the dataset and the distribution
... is that a useful thing that we can talk about for coverage data?

Yes

jtandy: Also when talking about IDs, we need to be able to refer to slices, or subsets or individual cells
... Can we provide patterns for slices and subsets?
... What vocabulary would be use to link a subset to its parent
... VoID does that already.
... The need to be able to describe the relationship between a dataset and its parts

-> http://www.w3.org/TR/void/ VoID

jtandy: We'll need to see how things like VoID map onto our concerns
... next on my list - how do we discover the dataset?
... Before you work with the data you have to find it so it needs to be discoverable, i.e. have discovery metadata
... There's the ISO model
... and GeoDCAT-AP which matches DCAT and ISO 19115
... Earlier we talked about discovering datasets as a whole. We talked about granularity
... Irrespective of the dataset, every time you produce a coverage, there's metadata about the structure, the observed properties, the physical properties that the coverage provides. ISO19123 provides a way of doing that as does RDF data CUbe
... So we can describe how it works inside.
... When we want to share data - if it needs to be downloaded before you can access the metadata - that's not going to work. So the metadata needs to be usable by a user agent, preferably a browser that doesn't rely on a plugin.
... Logically that probably means multiple formats including RDF
... Given that we need this metadata that is usable, how do we make it parsable by standard search engines?
... When you publish your data, you want people to find it - it would be easier if they can find it in their usual search engine
... We need to be able to say 'this is how you publish your discovery metadata'
... The site Yang pointed us to included a lot of human readable content that you can browser through - we need to keep that and provide a machine readable path
... So our metadata should support human and machine browsing

Yang: We want search engines to be able to find our data, yes. That sort of result can be highlighted in future documetnation.
... I don't know to what extent that can be defined in this project.

<chunming> phila: data on the web best practice (almost done by w3c) - general staff on dataset share on the web

<chunming> ... we are focusing on coverage data

-> http://w3c.github.io/dwbp/bp.html Data on the Web Best Practices

phila: That covers general stuff about publishing data on the Web.

jianhui: I think it's very ambitious to publish scientific/research data to the Web - but maybe we should talk to the scientific users. Do they think it's a good way or not? I'm not sure.
... Maybe should make some demonstration or testbed to show the scientists - they're the end users.
... I think that's important
... SCientists might have differnet views on this.
... This idea of publishing scientific data on the Web is correct or not. Raw data - if teh resolution is high - might be >1 GB. If we publish to the Web 0 how can we get this data?
... Can we pubslish all that data? Do the scientists want it or not? Secodnly - what about the infrastructure - can it be supported or not.

phila: refers to http://philarcher.org/diary/2015/50shadesofno/

GB: Years ago, it wasn't difficult to publish your data in the paper your wrote. People could see the data and decide whether you were right or wrong
... The challenge now is that we base our science on enormous amounts of data. It's a real challenge to make it open enough for someone else to go through and see if you're right.

jtandy: But I do think it's important that we recognise that our infratsructure puts some restrictions on us.
... In our work, we see that our data will be bigger by an order of magnitude in 10 years.
... So we're talking about uploading sofware to the data
... In some cases, you might be able to download the whole dataset. But it might be that it's so large that, today, you have send an e-mail to a colleague ask them to courier it to you.
... Or you might be possible to open an API
... So there are many paths - but the first step is to publish the metadata

GB: I agree that it's not easy, and maybe impossible
... But if I publish a paper revealing the secret of life, it's worth accessing the PetaBytes of data

chunming: What we want to do is not just copy how scientists share their data now, we want to do it better
... amking better links
... refer to a small portion of their data etc.
... Of course, if we just implement using today's tech, we'd face problems with scalability

SimonAgass: We have to assume that tech will improve - so we don't need to wait until the scalable tech is available.

jtandy: And if we can build demonstrators that prove what we're trying to achieve, even accepting the constraints of bandwidth etc.

GB: CERN is a good example. They don't make all their data available - there's nowhere like it. But they do have a large internal community that check each other.
... That's a sort of compromise as the amount of data they have is so large

jtandy: But they may still use the same technology within their computers.

Yang: If you need more justification.. one fact - any research council funded project now will require that the data has to be publicly accessible.
... So IT departments are adding DOIs for datasets as well as papers.
... I don't think there is any leeway for not publishing your data.
... Anything published after 2016 will be under this rule

jtandy: I think that was one of teh requiremetns from the Royal Society's work on science an an open enterprise.
... As people are now applying for funding that will deliver after 2016, they're already including the costs of publishing data persistently.

SH: It's a requirement in Horizon 2020

jtandy: So when we're talking about publishing the data itself... we need to ask ourselves is how do we encode the data itself.
... is it feasible or sensible to encode the whole lot in RDF

Payam: We came up with a set of smart city datasets for testing against
... we annotated the data and put it on the Web.
... You give people the data and the tools. And then you see what the problems are

jtandy: I think that's a good idea. There are mid scale datasets - ones big enough to be problematic but not obviously too huge

<Payam> smart city dataset example:http://iot.ee.surrey.ac.uk:8080 (data and metadata)

<Payam> and tools

Yang: This is not just going to benefit the current audience. Democratisation is not just wamted in the space business. People are working on reducing the amount of data needed to be downlinked etc.
... Lots of effort looking into that.
... This effort can merge with others in the space domain#
... We may be being slightly optimistic about solving everything
... but as space people we want people to be able to access space assets from mobile.

jtandy: If we might merge with others, do we know who that might be now?

<Payam> CCSDS: http://public.ccsds.org/default.aspx (standards from space domain)

Yang: CCSDS might be relevant http://public.ccsds.org/default.aspx

SH: There's a meeting coming up with them and CODATA
... CCSDS is more on the data management, long term management

jtandy: Might be worth letting them know about the work we're doing but at this point we may not expect them to have something for us

<Payam> https://www.google.co.uk/search?client=safari&rls=en&q=CCDS+standards&ie=UTF-8&oe=UTF-8&gfe_rd=cr&ei=p7UKVpiSCuHH8gf5kKfoAg#q=satellite+data+site:ccsds.org

<scribe> ACTION: We just identified a stakeholder - we ought to identify other stakeholders that we should contact [recorded in http://www.w3.org/2015/09/29-ceo-ld-minutes.html#action01]

Payam: I see that CCSDS has a satellite data model

Yang: The WG I sit in within CCSDS is specifically about intelligent systems for spacecraft.
... You could propose to start something with them

jtandy: We'll create a workplan on what we're going to do
... I suggest Simon H takes point on that potential relationship
... We're talking about how to encode the data. Obviously there are choices - NetCDF, HD5, etc. Many formats we can use and I think we can provide guidance on when it is best to use each type.
... In particular, it's an interesting question to ask what types of data encoding (not metadata) might work in a browser
... can it consume it and display it with Canvas or Web GL
... We should understand when there times when data can work nicely wth the browser etc.
... We have talked a lot about how to query the data and interract with it.
... One of the things we should be looking at - what functions should an API offer. Query by geoposition, time (vital for smart cities), Simon talked about the observed quantity
... These are all examples of starting points for people to interract with EO data.
... We talked about Strabon, Linked Data API
... How difficult SPARQL is to work with etc.
... Lots of work we could do to turn the conversation on APIs actionable.
... We talked about annotations on datasets
... The CHARME project should help there
... being able to refer to bits of datasets or the whole dataset
... How do I identify subsets, slices etc.
... To support users/non-experts - we need to support provenance to describe the processing chain or how my subset has been extracted.
... We'll certainly want to describe the provenance in terms of the platform, source etc. - which maps to the Semantic Sensor Network.
... Wheredid this come from (processing chain, platform).
... And for the non-expert, how can a data publisher make an assessment of quality
... Phil pointed me to http://w3c.github.io/dwbp/vocab-dqg.html
... Are there significant gaps?

Payam: Part of this data is live so that fact changes the way you handle it.

GB: What I suggest we do is to start writing our report in note form tomorrow morning.

phila: There are things that are relevant to us that are being covered elsewhere (BPs, SSN etc)

GB: We can identify the overlaps, the other one is where we can bets use our efforts in some sort of test bed to be developed, rather than in an ad hoc way
... So who else do we want to inform about this.

<chunming> phila: as jianhui said, coverage data could be huge, need way to access a portion of data

phila: Makes general points about scalability

CEO-LD Kick Off Meeting

29 Sep 2015

Attendees

Contents

Tour de Table

Issues

Summary of Action Items