See also: IRC log
<phila> rrsagenet, make logs public
<phila> Simon Agass – Data Satellite Catapult. Activating different sorts of data, making it available. Data innovation. Denise McKenzie, OGC. Introduces self/OGC. Work being done in the SDWWG – choosing standard Geoffrey – geology, talked about work on glaciers etc. Mapping and coverage data is important (LIDAR etc) Strategic objectives for CODATA: • data science – integration. Classic techniques don't work • we need to be able to make data a[CUT]
<phila> Jitao: Works at Institute of Remote Sensing/CAS
<phila> ... Has largest amount of sat data in China, trying to open it to the world
<phila> Maik: From University of Reading. Work with Jon Blower on MELODIES project
<phila> MELODIES project
<phila> Maik: Talked about the project. Experimenting with different APIs etc.
<phila> ... quite new to the area. Joined Uni reading reently, was training at ESA
<phila> ... Come from a Web Developer point of view. I just want to use data easily
<phila> Jianhui: From CNIC/CAS
<phila> ... Operate infrastructure in CAS, providing service to other scientists
<phila> ... inc high speed network
<phila> ...developing infrastructure to integrate research data within CAS which has > 100 institutes
<phila> ... want to provide data services to all the scientists.
<phila> ... For sat data, we deliver service systems, connected to 350TB of satellite data that can be downloaded freely
<phila> ... Open sat data - the big change is how to make it easier to find and access the data and get infro from this sat data
<phila> scribe:ph
<phila> Jianhui: So we want to work out how to put sat data on the open Web
<phila> ... Also a member of CODATA China
<phila> scribe: phila
jianhui: Work with Geoffrey and
Simon H on this
... This is a good opportunity for us to help make sast data
more open more easily
... and exchange data between UK and China
YangGao: Surrey Space
... want to automate real time processing of satellite
data
... not familiar with W3C and OGC - but encouraged by what I've
heard
... ambition of this group coincides with our work
... want to understand what we can contribute to
standardising
... want to identify gaps, methodologies etc. Encouraging
community to follow methods
... SSTL has long standing relationship with many Chinese
organisations
... First remote sensing satellite acquired in China came from
Surrey Space (SSTL)
... helpful in disaster monitoring, urban planning etc.
... very motivated to work with Chinese colleagues. Want to
share lessons learnt, etc.
Payam: From University of
Surrey
... work on IoT, data interop
... member of the SDW WG
phila: Can you, Geoffrey, say something about the UK-China angle, what are the funders asking for
Geoffrey: The FO is looking for
collaboration between UK and China that will help lead to
innovation
... I said that all you can do is stimulate the first part of
that. It's up to others to begin the companies that earn the
billions
<jtandy> (FO = UK Foreign Office)
Geoffrey: The widespread use of
data that is freely open to all can be the basis of a lot of
enconomic activity
... What we need to do in the short term is to identify the
issue we're trying to address, look for the way forward. In 6
months time we should be able to set out in precise terms the
work that needs to be done then to lead to future
development
<MaikRiechert> First correction: I'm not from the University of Berlin, but University of Reading :)
<chunming> for chinese: http://www.w3.org/2015/ceo-ld/Overview.html.zh-hans
<chunming> Wiki page: https://www.w3.org/2015/ceo-ld/wiki/Edinburgh
<chunming> W3C official website: http://www.w3.org/account/request
https://www.w3.org/accounts/request
<chunming> Wiki page for London Kick-off meeting: https://www.w3.org/2015/ceo-ld/wiki/London_Kick_Off_meeting
<jtandy> D_McKenzie: can participants in this group be from countries other that UK and China?
<jtandy> GeoffreyBolton: that is for our group to determine- we need to outline how _we_ want to work [to reach our goals]
<jtandy> phila: [talks about the logistics of the group]
<jtandy> phila: there are two face-to-face meetings
<jtandy> ... first Sapporo, for W3C TPAC
<chunming> W3C TPAC 2015: http://www.w3.org/2015/10/TPAC/Overview.html
<jtandy> ... this is not formerly part of the project, it is where the W3C/OGC Spatial Data on the Web WG will meet
<jtandy> phila: the next F2F meeting will be in Beihang ... to be organised by chunming
<chunming> TPAC 2015 (in Chinese): http://www.chinaw3c.org/tpac2015-overview.html
<jtandy> phila: also there are the weekly teleconf calls for Spatial Data on the Web WG (SDW)
<jtandy> ... we'll be holding the next one tomorrow afternoon in this room (@ Royal Society no less), 2PM UK-time (UTC+1)
jtandy: The Use Cases eds worked out which use cases apply to Coverages etc.
http://w3c.github.io/sdw/UseCases/SDWUseCasesAndRequirements.html#arCoverageInLinkedData
GB: Use cases seem very specific. Why? Just written by people in the group?
jtandy: The UCs are written by the group. We wanted people to write UCs that are specific so that you can validate against it - you can find test data
<MaikRiechert> (link doesn't work in Firefox, says "undefined" in all headers)
jtandy: In the best practices doc, for example, we're looking at common themes for exposing GIS on the Web. We've not looked much at sat data as we knew this group was coming
<MaikRiechert> using Chrome now, works ;)
<jtandy> Geoffrey asks about the common themes; see here: https://www.w3.org/2015/spatial/wiki/BP_Consolidated_Narratives
<Payam> Jeremy describes the Spatial Data on the Web Working Group's use case and best practices
<Payam> there is a lack of satellite data related use-cases
<Payam> one of the key goals of the the Spatial Data on the Web Working Group is to encourage people to link their data to other assets on the web
<Payam> jtandy: one of the key goals of the the Spatial Data on the Web Working Group is to encourage people to link their data to other assets on the web
SimonAgass: I think linking data
is even more important for sat data
... was disrupted by G Maps etc.
... prices of sat data dropping. Sentinel, SkyBox etc.
... bigger players are seeing a threat in some way to their
curated data model
... EO data has not been connected through the Web. It's been
in isolation. best you might get is in the metadata
... putting the data into context is where the value comes.
Mashing it wwith other data on the Web is where the value
comes. For the EO industry to move on, integrating it into the
Web of data will greatly increase the value
jtandy: The MELODIES project is
looking at that. Pipelines etc.
... (Processing pipelines)
SimonAgass: We did a project
earlier this year with Chile on disaster management. They had
lots of sat data,m but not the resources to undertand it nad
use it
... we built some infrastructure that used Linked Data, NLP
etc. to bring in more info
GB: It strikes me that linking
sat data is crucial - but what sort of other data. You want
spectral range to be aware of others
... not just this image but have you looked at A, b C
... geological maps of coutries might be a general cover - but
that's not global. it's regional
... Thirdly point data - might be very ad hoc, incomplete
... I think we need to think about what sort of data we want to
linjk to
jtandy: +1 we can link from anythign to anything but it rapidly dissolves into meaninglessness
GB: Typically on the Web, you ask
'what's at this location? - but we might want to ask what are
the locations that have these same properties?
... You can't do that with ad hoc observations
<Simon_Hodson> Important to be able to ask the question: at what locations is this property the case?
jtandy: I think we should come back to that conversation. It's a key topic
<Simon_Hodson> What needs to be done to the satellite EO data in order to be able to ask such questions?
jtandy: Maybe SimonAgass could give us an insight into what you had to do with the sat data for Chile to integrate it. I guess it was mostly manual?
SimonAgass: Mostly
jtandy: You have to be an expert
- you can't offload the set up time to a software agent
... So theme 1 was linking data
<Simon_Hodson> Themes for Use Cases
<Simon_Hodson> Theme One: linked data.
<Simon_Hodson> Theme Two: publishing data with clear semantics
jtandy: Talks through more of https://www.w3.org/2015/spatial/wiki/BP_Consolidated_Narratives
<Payam> jtandy: a part of the work at the Spatial Data on the Web Working Group is to formalise the vocabulary that can describe geospatial data
GB: It seems to me that you need
to compose your query well to discovery
... Would it be helpful to give advice on how to compose the
right query?
... Geo probs at the moment - you tend to go to Google
Maps
... what is the starting point, how do you ask the right
question?
jtandy: Implementers tend to
provide portals that encode a lot of the hidden knowledge
... imagine an air quality dataset. Data is being collected in
a number of places that might be listed in a gazeteer
... but if I use that gazeteer then there are more links/routes
to discoverability
GB: I think I'm getting back - do
we need to have experts to use this data
... So the question is, how much expertise do we expect people
to have in order to be able to access the information.
jtandy: Colleagues talk of the danger of giving access to data to people who don't understand it.
GB: A little knowledge is a dangerous thing
jtandy: You shouldn't need to be an exoert to ask the question altough you may need some expertise to understand the answer
D_McKenzie: And there's a danger
with locations since you can gather lots of info about a
location that can lead to privacy breaches
... Maybe there are some datasets that you shouldn't be able to
link
GB: This has resonance in code breaking
Payam: If you have multiple
providers of data you're looking for and then finding the data
within that
... There's an indexing issue there
... And then the interesting part, once you have the data, you
don't have everything in one document, you have links to find
more
Jianhui: We have a platform that
provies services to the scientists so we know about the
requirements that they have
... Sometimes you don't jyst want a link from one image to
another, you want a link to otehr info.
... They want to link at differnet scales or the whole
picture
... I think the use case is more complicated
... So I think we need to invite some users to tell us what
they're after
jtandy: Drawing input from the user community is something that the sat applications catapult has been doing for the last 2 years - so maybe SimonAgass can talk about that
SimonAgass: Users, yes, and the value adders. Look at some of the challenges we want to overcome.
jtandy: Before we create any new tech, we should look at what problem it should solve
DM: EObs has many facets
jtandy: The broader SDW is about
coverage data. This group here has a series of expertise in
space based RS
... there are use cases in the SDW that mention LIDAR, Sonar
etc
... there are also in situ sensing questions. It's still EO
GB: I suggets we concentrate on
satellite data but then ask which principles apply to otehr
areas. Are there exceptions when talking about sat data
... There was a gov report on what sensor data was available,
ice sheets etc.
jtandy: There are also coverage
datasets that are not an observations. A coverage varies with
space or time.
... The Met Office has coverages that are time series of
measurements at the same points for >150 years
GB: There are many dataets that have both spatial and temporal slices
jtandy: When we create these 4 D datasets, it's quite hard to work with them as they've been sliced and stored in one way.
GB: Sampling is a fuindamental
issue. Most of the maps we generate are interpolated based on
point data
... do we want to think about sampling distances and
correlatability
jtandy: That's an example of the context we need to interpret the EO data
Payam: The Semantic Sensor Network - is an uppler level ontology, but you can put in specifics.
jtandy: There has been a proposal
for simplification
... Talks about the CHARME project
... people find that they want to annotate sections of data
they've found
-> http://charme.org.uk/ CHARME project
Payam: SSN looked at the sensor devices - but if you have streaming data, you don't need to add the semantics to each itme, just the stream
jtandy: The last theme is
obvious... which is how to express the geographic and temporal
elements
... how do we write that down
... there are multiple mechanisms for doing that
... we want to encourage convergence on one
... Recaps on
https://www.w3.org/2015/spatial/wiki/BP_Consolidated_Narratives
... Nothing specific there about coverage data - we can create
it if we need to but I think what we've said so far can fit
into those 7
Jianhui: You mentioned publishing data with clear semantics. RS data can be understood by machine?
jtandy: In the US there are a
number of orgs creating hydrology data (Aus has 500 such
orgs)
... Untl recently, there was no governance on how that info was
produced. Everyone had their own data model that was very hard
to reconcile
DM: There was a trading model for water in Aus
jtandy: We need to be able to do the mapping automatically, do the crosswalks etc.
DM: Google Translate for data
MaikRiechert: In Melodies - land cover categories arfe different in diff countries and it's hard to map between them
jtandy: In Germany you might have
coniferous forest, elsewhere it's just forest
... That's a relatively simple mapping
... but it can be more complex
... we want to be able to offload a lot of ther work to
machines
MaikRiechert: Yes, but you need
to experiment with differnet mappings., You can't have a
universal mapping
... Maybe there can be default mappings from some
organisations
jtandy: Once you've experiement
with mappings, you might want to publish 'these are the
mappings that worked for me' so we should make those durable
and publishable
... Which brings us to questions we've not touched on - how 3rd
parties can add or subtract value
... Annotations is one part of it.
<chunming> MELODIES project: http://www.melodiesproject.eu
jtandy: trust and provenance
Simon_Hodson: ... Can we think
about linking those attributes?
... So we need to identify those attributes and see how they'd
be used to describe some data
jtandy: I think that's a workable
approach
... I haven't come with any pre-baked ideas
... You're saying (SH) that these are things that scientists
typically ask in a given field
Simon_Hodson: it seems that what we're interested in the kind of attributes we need to think about
jtandy: Physical quantities?
Simon_Hodson: Yes, or an attribute that there might be pollution from a sensor or whatever
jtandy: The contextual info is
what you use to assess whether the data is any use for
you
... So starying with the attributes is common
DM: Sensor Enablement is a
framework for how you handle the geospatial aspects of sensors.
So it's a management toolkit providing a level of
consistency
... A way to see what you're going to get from that sensor
network. But you can put in info that allows you to make
further decisions
Payam: I think SWE has service
descriptions too
... so you can see the format so you know how to query
DM: Standards help here, you state how you've set up your network so people know what to expect.#
<jtandy> (SWE = Sensor Web Enablement)
<chunming> minutes: http://www.w3.org/2015/09/29-ceo-ld-minutes.html
<jtandy> phila: resumes the meeting after coffee
<jtandy> ... we need to set out what we would like to achieve and what we can achieve
<jtandy> ... and what is not in scope
<jtandy> ... then ...
<jtandy> ... all around this room know about various standards- we can determine the gaps
<jtandy> ... we also need to respect the goals of this working group- based on the charter of the SDW WG
<jtandy> yang: (talking to phil over the break)
<jtandy> ... the key challenges that we need to highlight are
<jtandy> ... i) the diversity of satellite data payload
<jtandy> ... optical cameras, SAR, LIDAR, multi-spectral cameras
<jtandy> ... there is a diversity of instrument
<jtandy> ... which means that there is a diversity in different formats
<jtandy> ... the satellite operators [tend to] encrypt and compress the data for downlink
<jtandy> yang: then once the "data product" is received [at the ground station] this compression and encryption is reversed
<jtandy> ... the data is then turned into products that can be discovered and used
<jtandy> ... [...]
<jtandy> .... there is a diversity of approaches for creating these products (?) based on the heritage and background of different System suppliers
<jtandy> yang: so ... we want standards- but standards close to the data product end of this process
<jtandy> ... nevertheless, we need to consider two phases
<jtandy> ... i) generic processing that can be standardised for _any_ application
<jtandy> ... ii) [doing stuff] for specific domains, such as agriculture etc.
<jtandy> ... the second phase is difficult to do
<jtandy> ... but there is likely to be good support from the scientific cimmunity to do this
<jtandy> ... but, like said this morning, we need to engage with the end users
<jtandy> ... we also need to set up examples that we can refer to
<jtandy> phila: here's my use case ...
<jtandy> ... simon has a bunch of developers creating applications
<jtandy> ... they want to use satellite data- from Jianhui's data centre and SENTINEL (and other sources)
<jtandy> ... the point is that the application developer should be able to treat each of those sources in the same way
<jtandy> ... each 'product' will have the same structure
<jtandy> ... [phila points out some specific parts to consider]
<jtandy> phila: we need to standardise on the 'product' - the coverage
<jtandy> ... then, in subsequent phases of this working group's life, there is implementation to consider
<jtandy> ... we will seek money / funding for a phase 2 where
<jtandy> ... we can engage with the satellite manufacturers / operators to implement these standards
<jtandy> ... so that it easy for application developers to work with multiple sources
<jtandy> yang: there is a starting point for where we can standardise- [after the encryption and compression is removed]
<Payam> +1 chunming
<jtandy> ... it would be difficult to engage with satellite manufactures earlier than this
<jtandy> yang: after this we can look at data formats
<jtandy> ... at this point we can work with users to see how these formats can be georeferenced (?) etc.
<jtandy> ... if this can be clarified, this would be better
<jtandy> phila: I think that this is what we're doing
<jtandy> ... we want to focus in the user end
<jtandy> ... I'm focused on the application developer who wants to access data through a set of APIs
<jtandy> ... they will be using http, apis, etc.
<jtandy> ... this is different to the [expert] users from Juanhui's organisation
<jtandy> ... but this is clearly different from the calibration and [detailed instrumentation] that is executed at the satellite Platform and Ground Station
<jtandy> phila: [in reponse to GB's question] I see two types of users:
<jtandy> ... i) the regular web developer ... they understand web-stuff but have no need understand
<jtandy> ... the details of the satellite, the instrument, the processing chain etc.
<jtandy> ... this is hard- I'm asking for the moon on a stick
<jtandy> ... example: Geoscience Australia hackathon ... everyone created applications based on timeseries of pictures where they lived
<jtandy> ... not really adding value
<jtandy> ... given the investment to acquire and curate that data
<jtandy> ... in the linked data world there are many data sources that could be linked to satellite data
<jtandy> ... statistics etc. ...
<jtandy> ... it's the job of a data scientist to do this
<jtandy> ... but actually, we want web developers to be able to do this
<jtandy> ... that's our first user
<jtandy> ... this is the focus, the main priority
<Payam> we can consider 3 categories of users:
<Payam> i) users who are interested in observation and measurment data
<jtandy> jianhui: we have data scientist users
<Payam> ii) the second group are users who are interested in the O&M data linked to other assets on the web and/or to be able to link the data to other assets
<jtandy> jitao: users in my organisation are experts in the data but don't understand the web technologies - I do, but they generally don't
<jtandy> GB: so what kind of people are these?
<Payam> iii) users who are interested in O&M data + linked data+ provenance data and the processes that have been applied to the data in the pipeline
<jtandy> phila: I worry that [if we tightly define] the users, we will close of potential solution
<jtandy> GB: why are we doing this? so that people can utilise this information creatively for their purpose
<jtandy> ... so we need to put the maximum usability in place whilst minimising the constraints on usage (such as the need for expertise)
<jtandy> phila: looking at Payam's catelgories, the first and second (?) map to 4-star and 5-star linked data
<chunming> 5-start data: http://www.w3.org/2014/Talks/0123_phila_lata/#(14)
<jtandy> [phila gives an overview of 5-star linked data]
<chunming> 6-start is data with provenance
<jtandy> (also see http://5stardata.info/en/)
<jtandy> Payam: we can look at the details for all these groups-
<jtandy> ... but we can start by looking at the core elements for the first category
<jtandy> ... and then add modules for categories 2 and 3
<jtandy> GB: let's look at the deficit here ... it seems to be expertise
<jtandy> ... a lack of expertise prevents people from using the data
<jtandy> ... if you're an expert, you can probably navigate this anyway
<jtandy> MaikRiechert: [missed]
<jtandy> GB: there's one category of users that are clear ... that's students, the educational value of this data is huge
<MaikRiechert> different data representation could map to different user types, e.g. a user without expert knowledge could use a WMS endpoint, whereas an expert user could use a more direct access to the data, like WCS, GeoSPARQL, or other APIs/formats
<jtandy> jianhui: there are many cases where
jianhui: There are cases where people just want to share data with their colleagues. Location data etc and they'll make a map of where they went - social media style
jtandy: So they're creating a
derived product using sat data as input
... one example is burn mapping after wildfires in Greece so
that funds can be allocated to rejuvenation
SimonAgass: That applies in an industrial org as well. Imagine a distributed org - they want to share data without losing control
phila: We're not doing Access Control - LD doesn't mean LOD
jtandy: Summarises - assume data
has been downlinked, decrypted and put in a format ready for
exchange.
... We don't want to try nad tell manufacturers to re-engineer
their satellites
MaikRiechert: There are different levels of processing. If they want to expose an earlier level they can
<jtandy> phila: there a standard in this space already- from W3C ...
<jtandy> ... RDF Data Cube, DCAT, PROV-O etc.
<jtandy> Payam: agrees- we can find existing technologies
jtandy: You can use satellites to
look at the surface roughness of the oceans to see how windy it
is
... typically those things are used to provide wind speed and
direction variation over an area
... I'm sure it's possible to create a time series over a
specific point but it's not done in meteorology as a rule
<jtandy> phila: can I ask an embarrassing question- I've looked at lots of satellite data
GB: matching data points is always difficult
<jtandy> phila: is GeoTIFF a coverage
<jtandy> ... or an image
<jtandy> [discussion]
jtandy: Creating an OWL ontology
from the Application schema that came out of the coverages WG
sounds easy. However... what Peter baumann has done is to build
something that is bound to the XMl sturcture, rather than the
domain model
... There is a coverages standard ISO 19107 - that could
probably be exported as an OWL ontology
MaikRiechert: Auto generated OWL is typically horrible
jtandy: Kerry calls it
non-OWLy-OWL
... You need an expert to interpret that
... But there's a great description of what a coverage
includes.
... A set of domain values in any number of dimensional spaces,
and you have a set of values that you map on to that. The rest
of the complexity comes from expressing the space and time
aspects
... Can we give coordinates without giving the CRS
DM: SOme of that work is happening in other places
jtandy: SOme of the reaklly
valuable thing to come out of the coverage implementation work
that peter Baumann has done... rather than having to iterate
over each point, you can define a start point and a step and a
grid description
... those things don't appear in the ISO model
... you could provide those things in the metadata - the
rectified grids
... would that be helpful?
GB: I think that's a serious
problem... recoverability
... Say you have a bunch of school children in South Africa and
they come up with something brilliant. And people thing that's
so important and so profound and we'd need to go and track back
to understand where their data came from
jtandy: The US National Climate
Assessment - they have the 'line of sight' concept between data
and application
... You may not be too concerned aout the provenance info but a
user might so that they can determine whetehr to trust the
product.
GB: Not all users need to know everything but there is need to be able to trace
jtandy: So we've agreed where we
start our standardisation path
... we want to support developers who just want to get the job
done
... There are people who will need to track back the
provenance
... if you just have the data, that sounds limiting. Here's a
GeoTIFF
... without any more background
... that's still a coverage but you don't know where it came
from
... A lot of developers might be in that space
... A farmer just wants to know whetehr he should plant what
where
DM: But he does need to be able to trust the data
Yang: Is there a legality issue? The data source usually owns the data
<jtandy> [phila cites his next working group: Open License ...]
Yang: Even if you put a time stamp on the data - there might be other satellites that acquire similar data at similar times - can the user choose differnet sources and differnet times, so the prov is important
-> http://w3c.github.io/ole/charter.html Open License Expression WG (draft)
Yang: Explores provenance issues
phila: Talks about application specificity
SimonAgass: So is there a group that doesn't want provenance?
jtandy: There's a group that are happy to assume that someone else has looked at the prov
DM: using government data comes with an inherent trust level (justified or otherwise)
<chunming> s/applkication/application
<jtandy> (seems like we have two categories: application developers and data scientists?)
chunming: So there's a
requirement that we want to be able to query this data through
a kind of timestamp - I want to query some portion of data from
adefined time...
... one of my colleagues is trying to develop new algorithms
with time
... this is a new research area in the database domain
jitao: This kind of data has already been included in sat data
jtandy: So if we have 2 typesof
user - app developers who make assumptions about data
... and we have people we rudelu call data scientists who
really want to be able to check the prov
DM: And one feeds into the other
jtandy: Payam you said that we
could start with teh core set of things that the devs want and
then add the modules that the data scientists want
... that seems like a useful approach to me
... People are going to want to get a lumo of data out of the
big whole
... by starting from the web developer, we start simple
<jtandy> phila: I'm still struggling with my inexperience here- what does coverage data look like
<jtandy> deniseMcK: also, what does a coverage look like in the context of this group?
<jtandy> phila: so for example, if I have a "coverage" can I access a single point?
<chunming> GEOGLAM initiative
-> http://www.geoglam-crop-monitor.org/ GeoGLAM
<Payam> http://unstats.un.org/unsd/bigdata/
<chunming> UN Big Data for Official Statistics
<chunming> scribe: chunming
Payam: next meeting will be in Beijing
jtandy: we will find the community that will support the coverage data
Simon Agass: not only download data, but also providing some infor to describe it.
jtandy: talk about relevant data, what is relevant
Simon Agass: example of population analysis; tag images; [missing]
scribe: adition, search for , damige assessment, need remote locations;
jtandy: so in the example, the
active volcano should be a relevant archived datasets.
... is there introduction of MELODIES project?
Maik Riechart: introduce of MELODIES projects
scribe: link data set and provide
meanings.
... don't know how to link with datasets, apis,
... if there's kind of document of this, could be useful
<MaikRiechert> land cover example: http://melodiesproject.eu/content/challenges-mapping-land-cover
Denise McKenzie: [missing]
Phila: with emphasis this
meeting, try help working group deliverible on coverage linked
data
... one of doing is best practice document
<jtandy> (Denise talked about using the NASA Space Apps Challenge hackathons as a source of questions that developers might have when working with earth observation data)
Maik Riechart: where the place to discuss?
phila: working group
... talk about provanance; talked about sharing on the web,
encourage people to do that;
... second stage is about implementation, testing, encourage
people to do impl.
... talked about what is coverage data.
... category, infoset that may be relevant to the image
DM: large meta data vocabulary.
on geospatial data
... for coverage, its a subset (?)
phila: coverage data is - the rdf data cube (3 dim), or two dim (table), in RDF, or json format. this could be useful.
jtandy: for multi-dim
datasets
... it could be cut to horizental pieces, and tiles, ...
... use RDF data cube, that related to different access
patterns
phila: talked about image accessible (accessibility), that might be relevant.
jtandy: image can be data
... take the value of a pixel
[morning session closed]
<Payam> phila: there are three areas that we would like to focus:
<Payam> i) metadata ii) access requirement and iii) representation options
<Payam> phila: discusses ways of describing geospatial datasets and existing standards
data linking to/from coverage data is a kind of access req.
<Payam> phila: what is a dataset? how do you define a dataset?
phila: how do you describe dataset?
<Payam> discussion on using Dublin core to describe a dataset
<Payam> difference between the dataset and distributions of thethe same data
<Payam> s/the same time/the same data
<Payam> distinction between the actual concept of the dataset and distributions of the dataset
<Payam> jtandy: sometimes different systems use different identifiers for the same data and it is not obvious that for example two systems refer to the same data
<Payam> jtandy: it is important to note that the same data can appear in different systems and maybe with different identifiers
<Payam> GB: is this a problem that needs to be solved?
<Payam> MaikRiechert: it is sometimes not clear what people refer to as a dataset
<Payam> jtandy: reads the definition of dataset from DCAT
<Payam> phila: we need to decide how we are going to define a "coverage"
<Payam> phila: Access requirements: to we need to be able to access the individual observations?
<Payam> jtandy: granularity questions refers to the issue that if one can reference individual items (e.g. cells in an image) and have direct access
<Payam> phila: talks about RDF data cube
access requirement: for a third party to reference/access to a point/subset/part/whole of the dataset
<Payam> jtandy: talks about the observation in RDF data cube
W3C RDF Data Cube Vocabulary: http://www.w3.org/TR/vocab-data-cube/
<Payam> Jianhui: RDF data cube provides the actual data not only the metadata
<Payam> Jianhui: if we publish the remote sensing data as RDF we may need to provide new tools to allow users to use these data
<Payam> Jianhui: if we have very large RDF described data, search and query will be slow
<Payam> phila: you don't necessarily store it as RDF... you access and process it in other formats...
<Payam> phila: you need to extract the part you need to transform it to RDF
<Payam> Payam: or take the attributes that you need out of RDF and handle/process them separately (e.g. index the attributes outside RDF)
<Payam> chunming: we should focus on interoperability and sharing data not the implmentation
linked data segments: http://linkeddatafragments.org
<Payam> Yang: is it possible to extend the current satellite data format to help to understand [interpret] the data easily
<Payam> Yang: NASA remote sensing data: http://rsd.gsfc.nasa.gov/rsd/RemoteSensing.html
<Payam> http://rsd.gsfc.nasa.gov/rsd/RemoteSensing.html
<Payam> http://daac.gsfc.nasa.gov
<Payam> http://mirador.gsfc.nasa.gov
<Payam> link to metadata: http://hydro1.sci.gsfc.nasa.gov/data/s4pa//GLDAS_V1/GLDAS_CLM10SUBP_3H/2015/214/GLDAS_CLM10SUBP_3H.A2015214.2100.001.2015253203046.grb.xml
<SimonAgass> http://data.satapps.org/
<Payam> Yang: this machine readable/interpretable data should not replace human readable (i.e. HTML) represetnation
<Payam> GB: we should be assuring that data and inferences made from it are accessible to wider groups in the society and not only the scientists who work with that data
<Payam> Simon: we should focus on the application, added value and use-cases that we couldn't do a certain task without the metadata
jtandy: HDF is commonly used by satellite data, but it is not accepted by browser :-)
<Payam> GB: summarises what we have discussed:
phila: idealy, we want to create a user agent, it read a URL in special pattern, and return a dataset in a text-format, structural, semantic/metadata rich way.
<SimonAgass> Some sample data from Sentinel 1 http://sedas.satapps.org/download-sample-data/
<Payam> GB:what we discussed: purpose, function and ways to deliver-
<phila> scribe: phila
<scribe> chair: Jeremy
jtandy: Checks who will be here
tomorrow
... invites those who won't to express any thing they wanted to
raise
GB: Planning to circulate a notes of today's meeting
SimonAgass: Nothing at the minute to raise.
Payam: Are we going to link what we're discussing here with the SDW WG
jtandy: Yes
... Something I wanted to do today - was to remind ourselves of
the 7 themes being discussed by the broader group wrt BPs
... So it is necessary toi partition the work between this
group and the SDW Wg
... and I'd like to achive that before we finish tomorrow.
Payam: This is a new type of
accessing data from satellites
... so we'll need funding to develop tools. So if we convert
some NASA data, how do we know we've done it right?
jtandy: We need to engage the data publishing and using community and get them to build
Payam: We're talking specifically about satellite data
jtandy: And if we have BPs, how
do we evaluate whether they have met the BPs
... and advice on hwo to do it
SimonAgass: We need to manage
expectations on this too.
... From my experience of saying we're going to make EO data
available raises expectations
... This is a stage for describing data and making it
available, but there are stages between the satellite and this
stage and after it has been published
jtandy: +1
... we're not deadling with the data processing. We're only
looking at the data publishing part of the story
... So a task for tomorrow, we need to be clear on what our
scope is
phila: I think we're close to having that definition based on what Simon just said
DM: The Interface...
jtandy: Maybe think of it as a contract
SimonAgass: It does imply a level of handshake agreement
jtandy: So a searching question
...
... When we're doing action planning for tomorrow, we need to
know how much time each of us can commit
... In order for us to be successful, some actual work has to
be done. We need to cut our cloth accordingly
SimonAgass: there are activities
that fall into this. One in particular on integrating LD
... Broker Technology 4 EO
jtandy: That gives us a pathway towards future funding?
SimonAgass: There's a potential
alignment that allows me to spend some time on this as it's
about LD and EO
... but not a massive amount of time
GB: I think the number of testbeds can be highly informative at this stage even if they're simple
Payam: I think Yang can help.
Everuone in my dept is funded by a project so I can't pull
people off those.
... I can perhaps do a little
jtandy: And you're editor of the
SDW WG's BP doc
... Any overlap with any of those projects?
Payam: We do semantic models for smart cities etc. There are validation issues etc. I can ask someone there to join the meetings/check something.
Yang: I think if we could know a
little bit better what sort of commitment in terms of time
you're looking for
... In general we're very supportive of this work.
... We were chatting earlier - this forum is so useful - I
managed to come up with a project - I'll invite people to
Surrey to look at proposals to funding bodies.
... When we're working on those proposals maybe we can do some
linking.
jtandy: In terms of that future proposal... what's the time scale from now on until somefunding arrives
yang: After the time of this
project. Any bidding process will create new ideas in its own
right.
... Lots of potential from this WG
Payam: We can contribute to use cases of course
jtandy: So content review,
editorial, discussion, but not investigative work
... It's important that we don't over commit.
SimonH: It would be good to pull out examples. It will be most useful for me to comment on the documents.
jtandy: Are you content that we're working in a way that's compatible with CODATA
SH: I think I'd mention the
connections I've raised. GODAN, GeoGlam
... CODATA does a lot of work on policy around EO. We are very
active in that. Just submitted a white paper on the benefits of
data sharing.
... We'd like to contribute more to the tech standards
level
... We're also very active in training
... So this initiative provides a useful opportunity to affect
the development process.
DM: This is a topic to bring up with Barbara Ryan (?) at Eye on Earth next week.
SH: I'll be there as well
DM: OGC has 3 staff going to Geo
in November
... The plenary could be a useful place to validate any
requirements we have by then.
... I won't be at GEO but Mark Reichart will (OGC CEO)
jtandy: Before the break, Phil was scribing a list. I have a similar list.
jtandy: First on the list I think
we need to talk about is identifiers for the dataset and the
distribution
... is that a useful thing that we can talk about for coverage
data?
Yes
jtandy: Also when talking about
IDs, we need to be able to refer to slices, or subsets or
individual cells
... Can we provide patterns for slices and subsets?
... What vocabulary would be use to link a subset to its
parent
... VoID does that already.
... The need to be able to describe the relationship between a
dataset and its parts
-> http://www.w3.org/TR/void/ VoID
jtandy: We'll need to see how
things like VoID map onto our concerns
... next on my list - how do we discover the dataset?
... Before you work with the data you have to find it so it
needs to be discoverable, i.e. have discovery metadata
... There's the ISO model
... and GeoDCAT-AP which matches DCAT and ISO 19115
... Earlier we talked about discovering datasets as a whole. We
talked about granularity
... Irrespective of the dataset, every time you produce a
coverage, there's metadata about the structure, the observed
properties, the physical properties that the coverage provides.
ISO19123 provides a way of doing that as does RDF data
CUbe
... So we can describe how it works inside.
... When we want to share data - if it needs to be downloaded
before you can access the metadata - that's not going to work.
So the metadata needs to be usable by a user agent, preferably
a browser that doesn't rely on a plugin.
... Logically that probably means multiple formats including
RDF
... Given that we need this metadata that is usable, how do we
make it parsable by standard search engines?
... When you publish your data, you want people to find it - it
would be easier if they can find it in their usual search
engine
... We need to be able to say 'this is how you publish your
discovery metadata'
... The site Yang pointed us to included a lot of human
readable content that you can browser through - we need to keep
that and provide a machine readable path
... So our metadata should support human and machine
browsing
Yang: We want search engines to
be able to find our data, yes. That sort of result can be
highlighted in future documetnation.
... I don't know to what extent that can be defined in this
project.
<chunming> phila: data on the web best practice (almost done by w3c) - general staff on dataset share on the web
<chunming> ... we are focusing on coverage data
-> http://w3c.github.io/dwbp/bp.html Data on the Web Best Practices
phila: That covers general stuff about publishing data on the Web.
jianhui: I think it's very
ambitious to publish scientific/research data to the Web - but
maybe we should talk to the scientific users. Do they think
it's a good way or not? I'm not sure.
... Maybe should make some demonstration or testbed to show the
scientists - they're the end users.
... I think that's important
... SCientists might have differnet views on this.
... This idea of publishing scientific data on the Web is
correct or not. Raw data - if teh resolution is high - might be
>1 GB. If we publish to the Web 0 how can we get this
data?
... Can we pubslish all that data? Do the scientists want it or
not? Secodnly - what about the infrastructure - can it be
supported or not.
phila: refers to http://philarcher.org/diary/2015/50shadesofno/
GB: Years ago, it wasn't
difficult to publish your data in the paper your wrote. People
could see the data and decide whether you were right or
wrong
... The challenge now is that we base our science on enormous
amounts of data. It's a real challenge to make it open enough
for someone else to go through and see if you're right.
jtandy: But I do think it's
important that we recognise that our infratsructure puts some
restrictions on us.
... In our work, we see that our data will be bigger by an
order of magnitude in 10 years.
... So we're talking about uploading sofware to the data
... In some cases, you might be able to download the whole
dataset. But it might be that it's so large that, today, you
have send an e-mail to a colleague ask them to courier it to
you.
... Or you might be possible to open an API
... So there are many paths - but the first step is to publish
the metadata
GB: I agree that it's not easy,
and maybe impossible
... But if I publish a paper revealing the secret of life, it's
worth accessing the PetaBytes of data
chunming: What we want to do is
not just copy how scientists share their data now, we want to
do it better
... amking better links
... refer to a small portion of their data etc.
... Of course, if we just implement using today's tech, we'd
face problems with scalability
SimonAgass: We have to assume that tech will improve - so we don't need to wait until the scalable tech is available.
jtandy: And if we can build demonstrators that prove what we're trying to achieve, even accepting the constraints of bandwidth etc.
GB: CERN is a good example. They
don't make all their data available - there's nowhere like it.
But they do have a large internal community that check each
other.
... That's a sort of compromise as the amount of data they have
is so large
jtandy: But they may still use the same technology within their computers.
Yang: If you need more
justification.. one fact - any research council funded project
now will require that the data has to be publicly
accessible.
... So IT departments are adding DOIs for datasets as well as
papers.
... I don't think there is any leeway for not publishing your
data.
... Anything published after 2016 will be under this rule
jtandy: I think that was one of
teh requiremetns from the Royal Society's work on science an an
open enterprise.
... As people are now applying for funding that will deliver
after 2016, they're already including the costs of publishing
data persistently.
SH: It's a requirement in Horizon 2020
jtandy: So when we're talking
about publishing the data itself... we need to ask ourselves is
how do we encode the data itself.
... is it feasible or sensible to encode the whole lot in
RDF
Payam: We came up with a set of
smart city datasets for testing against
... we annotated the data and put it on the Web.
... You give people the data and the tools. And then you see
what the problems are
jtandy: I think that's a good idea. There are mid scale datasets - ones big enough to be problematic but not obviously too huge
<Payam> smart city dataset example:http://iot.ee.surrey.ac.uk:8080 (data and metadata)
<Payam> and tools
Yang: This is not just going to
benefit the current audience. Democratisation is not just
wamted in the space business. People are working on reducing
the amount of data needed to be downlinked etc.
... Lots of effort looking into that.
... This effort can merge with others in the space
domain#
... We may be being slightly optimistic about solving
everything
... but as space people we want people to be able to access
space assets from mobile.
jtandy: If we might merge with others, do we know who that might be now?
<Payam> CCSDS: http://public.ccsds.org/default.aspx (standards from space domain)
Yang: CCSDS might be relevant http://public.ccsds.org/default.aspx
SH: There's a meeting coming up
with them and CODATA
... CCSDS is more on the data management, long term
management
jtandy: Might be worth letting them know about the work we're doing but at this point we may not expect them to have something for us
<scribe> ACTION: We just identified a stakeholder - we ought to identify other stakeholders that we should contact [recorded in http://www.w3.org/2015/09/29-ceo-ld-minutes.html#action01]
Payam: I see that CCSDS has a satellite data model
Yang: The WG I sit in within
CCSDS is specifically about intelligent systems for
spacecraft.
... You could propose to start something with them
jtandy: We'll create a workplan
on what we're going to do
... I suggest Simon H takes point on that potential
relationship
... We're talking about how to encode the data. Obviously there
are choices - NetCDF, HD5, etc. Many formats we can use and I
think we can provide guidance on when it is best to use each
type.
... In particular, it's an interesting question to ask what
types of data encoding (not metadata) might work in a
browser
... can it consume it and display it with Canvas or Web
GL
... We should understand when there times when data can work
nicely wth the browser etc.
... We have talked a lot about how to query the data and
interract with it.
... One of the things we should be looking at - what functions
should an API offer. Query by geoposition, time (vital for
smart cities), Simon talked about the observed quantity
... These are all examples of starting points for people to
interract with EO data.
... We talked about Strabon, Linked Data API
... How difficult SPARQL is to work with etc.
... Lots of work we could do to turn the conversation on APIs
actionable.
... We talked about annotations on datasets
... The CHARME project should help there
... being able to refer to bits of datasets or the whole
dataset
... How do I identify subsets, slices etc.
... To support users/non-experts - we need to support
provenance to describe the processing chain or how my subset
has been extracted.
... We'll certainly want to describe the provenance in terms of
the platform, source etc. - which maps to the Semantic Sensor
Network.
... Wheredid this come from (processing chain, platform).
... And for the non-expert, how can a data publisher make an
assessment of quality
... Phil pointed me to http://w3c.github.io/dwbp/vocab-dqg.html
... Are there significant gaps?
Payam: Part of this data is live so that fact changes the way you handle it.
GB: What I suggest we do is to start writing our report in note form tomorrow morning.
phila: There are things that are relevant to us that are being covered elsewhere (BPs, SSN etc)
GB: We can identify the overlaps,
the other one is where we can bets use our efforts in some sort
of test bed to be developed, rather than in an ad hoc way
... So who else do we want to inform about this.
<chunming> phila: as jianhui said, coverage data could be huge, need way to access a portion of data
phila: Makes general points about scalability