Report on the second meeting of the CEO-LD Project

A starry night sky with a prominent streak showing the movement of a satellite

28 February 2016

Present:

UK
Geoffrey Boulton (CODATA)
Phil Archer (W3C)
Denise Mckenzie (OGC)
Jeremy Tandy (UK Met Office)
Maik Riechert (University of Reading)
Bill Roberts, SWIRRL
China
Li Jianhui, CNIC/CAS
Guoqing Li, Institute of Remote Sensing and Digital Earth, CAS
Chunming Hu, Beihang University/W3C
Jitao Yang, Beijing Language and Culture University/RADI
Wang Lizhe, Institute of Remote Sensing and Digital Earth, CAS
Jibo Xie, Institute of Remote Sensing and Digital Earth, CAS
Xuezhi Wang, CNIC, CAS
Tianyu Wo, Beihang University/W3C
Angel Li, Beihang University/W3C
Xueyuan Jia, Beihang University/W3C
Jinghua Zhao, CESI
Qun Zhang, CESI
Xianqian Chai, China Mobile
Xiaohai Li, China Mobile
Jinwei Wang, China Mobile

Raw minutes


Introduction

The second meeting of the CEO-LD project took place at the Vision Hotel, adjacent to Beihang University in Beijing, on 28 February 2016. In the first half of the meeting, the participants heard a set of short presentations on current related efforts. With that in mind, the focus switched in the afternoon to discussing what could most usefully be done within those existing activities to advance the potential use of Linked Data in EO, concluding with a small set of concrete tasks.

During the morning session, the project partners were pleased to be joined by representatives from China Mobile and the China Electronics Standardization Institute.

CRSNet Project

Guoqing Li from the Institute of Remote Sensing and Digital Earth within the Chinese academy of Science presented his groups' work on the China Remote Sensing Network (CRSNet) project. This was begun in 2011 and is now nearing its end. Coverages per se have not been a topic but the kind of challenges talked about in this context are very relevant.

The project operates in two stages: break down what there is and then integrate it, using different sensors together to resolve problems at global scale. More than 40 'Common Products' visualise things like radiation budgets, vegetation structures, global ice change, that an be generated at different spatial and temporal scales. Guoqung Li's assertion is that when dealing with multi-dimensional data like this, online processing is not (currently) possible – it needs to be done offline and the result of that processing made available through a product that allows exploration across one dimension only.

The geo and temporal aspects are not sufficient for such products: it is necessary to process things like the waveband(s) of the readings, and the underlying semantics. This is where the Linked Data approach holds most promise as it may overcome the need to have a specialist on the team who understands each of the datasets in detail.

One facet of creating pre-determined Common Products is that there are only 3 levels of granularity (10m, 100m and 1km). So if the data has a granularity of, say, 30m, then it has to be normalised to 10m. In temporal terms, the data may be available twice a day but the tool only wants the data every 8 days and so, again, the broker has to take care of that mismatch. It should also be noted that although the focus of this project is satellite data, CRSNet also makes use of data from in situ and airborne sensors, the latter of which is particularly complex to handle.

These point to the need for the complex workflows used in CRSNet and highlights the value of the CEO-LD project in helping to apply Web technologies to make it much easier for non-specialists to make use of satellite data.

The discussion following Guoqing Li's presentation concluded by considering how available the Chinese data is or might be. In broad terms, the data is available in China and can be made available for research elsewhere, but as things stand, access to the data is not sufficiently liberal for operational uses.

Geospatial Data Cloud

Xuezhi Wang presented the work being done by the Chinese Network Information Centre of the Chinese Academy of Sciences (CNIC) to provide a one stop shop for spatial data services: GIS Cloud. The service is based on the USGS data which was opened for public use in 2008 and includes more than 350TB of data entities with more than 7 million records covering China from LANDSAT, MODIS etc. The traditional way of working would be to download data and process it locally using ArcGIS. In contrast, CNIC's service allows users to submit tasks that the platform performs (on virtual machines) after which the processed data can be downloaded and/or visualised.

GIS Cloud is a powerful tool that allows users to search by address, geolocation or administrative unit, and render multiple wavebands as raster layers with false colours, combine data across layers and edit vectors on the Web page. The visualisation tools in the platform use an HTML5-based map client with dynamic cache, map style editing etc. But all the processing is done server side. CNIC is interested in ways to move at least some of the processing to the browser to reduce latency.

The meeting agreed that this case provides further evidence for the need for the kind of approach being looked at by CEO-LD. It emphasised the need to be able to identify and use subsets of data and that any data needs to carry its contextual information. In the case of GIS Cloud, the platform provides all the context implicitly, but if the data itself is to be manipulated client side then the context needs to be explicit.

It was noted that USGS's opening of their data led directly to many new international relations and partnerships. The first use of any data is by its producer, but it's the second and third users who add value to that data by their reuse. CNIC's metadata service is based on OGC's CSW standard.

Taxi!

Beihang University's Tianyu Wo talked about his work with 'U-Car' chauffeured car services. At first sight this topic seems to be at a tangent to that of coverages, however, location-based data such as customer waiting times and fuel consumption along a particular route is important, and the amount of data being handled is substantial. The data platform for U-Car currently receives 120 million messages per day (each vehicle sends an update on its position and OBD every 10 seconds) and the company is growing rapidly. The total amount of data is too large to be handled in the browser and so needs some server side pre-processing, but manipulating processed data in the browser has great potential. The need is to be able to handle multiple data sources in real time and it's notable that the cars go in and out of coverage (in tunnels etc.). A potentially useful design feature of HTML5 therefore is its ability to cope with intermittent connectivity. Non-spatial data, such as events that are likely to generate demand for taxis, or poor weather and other factors that one can predict might increase demand in a given location are all potentially useful.

MELODIES

Maik Riechert of the University of Reading gave a demonstration of his work in the MELODIES project which has advanced significantly since the initial CEO-LD meeting. As noted by Tianyu Wo, there will always be some things that need to be done server side, but by moving some of the data to the browser, it's possible to make it more available for use in a diversity of Web applications that may be created by those with no specialist knowledge of geospatial and satellite data. To an outsider, the MELODIES demo looks like images but it's data that's sent to the browser and laid on top of a base image. A 3000 x 2000 pixel coverage is readily handled in the browser but care is needed. JSON can be compressed before being sent to the browser but when uncompressed can easily become too large to handle.

Having data available in the browser provides more flexibility than may be available in server side processing. As an example, the UK's Department of the Environment and Rural Affairs' remit doesn't extend to the sea so they only ever want to look at satellite data related to the land. Maik showed how he was able to strip out the sea areas of a land cover dataset and create and manipulate a smaller dataset, all in the browser using a regular Web developer's toolkit as, to emphasise the point, his application is handling data, not images.

The Chinese colleagues could see the advantages and potential of the MELODIES approach and there followed a discussion about the maximum size of a dataset that is sensible to send to a browser. Not long ago, 40MB would be too much, now it is probably OK. Perhaps content negotiation can be used to return an amount of data within a defined maximum as either a set of pages connected by HTTP Link Headers, or perhaps a more complex option would be to return the data in different resolutions depending on the HTTP accept headers received. A further alternative might be to only send, say, 10MB tiles at a time and allow the browser to 'say when' it's had as much as it can cope with. Maik showed how his application only requests a certain amount of data at a time when a user clicks on a control for selecting a time step and, as the data packages are small, the latency is well within acceptable limits. Importantly, the data held is discarded from memory before the next set is loaded so that the application doesn't become more bloated as it is used. As for other web resources, the browser cache is used implicitly when loading data that had been loaded before but which had been discarded from memory in the meantime.

UK-China Collaboration

The participants spent some time thinking about how to realise a solid collaboration. OGC's Denise McKenzie suggested that we focus on specific themes that are important in both countries, such as agriculture, disaster management, water, or fire management. It was also clear that those in the room were mostly concerned with data management, rather than being end users. Geoffrey Boulton remarked that in his work on ice caps, he has to handle images; he'd much rather have Maik working with him to turn it into data that he can manipulate.

Jianhui Li pointed out that CNIC provides services to scientists, i.e. specialists in specific fields, not generalist Web application developers. Guoqing Li suggested working on a new Common Product, perhaps a national vegetation index, but that the idea of exposing data that can be used in a Web application, rather than providing a complete service based on back end processing, would mean a new workflow. There was much discussion about where the participants could see the approach being useful in a future project, tied to an ongoing programme and how the collaboration would work in practice.

The discussion for the remainder of the day continued to be wide ranging with many 'future possibilities' but everyone was cognisant of the limited time available in the current project and keen to deliver a practical outcome that can be the basis of future work. After a good deal of discussion, it was agreed that:

RADI
will develop two new Common Products, a Normalised Difference Vegetation Index and a Leaf Area Index that would be made available via a RESTful API with a WGS84 coordinate reference system. The Common products would be available as HDF5 and GeoTIFF.
CNIC
will write a library that converts HDF5 and GeoTIFF to the CoverageJSON format under development at the University of Reading and expose that data through an API. The data will be 'just too big' for a browser to handle in full and therefore will need to offer a mechanism for delivery of subsets of the data. The service will be hosted on CNIC's Geospatial Data Cloud infrastructure.
University of Reading
will support CNIC in the development of their service, amending and updating the CoverageJSON spec in the light of the experiences gained. Reading will also develop a client application to support grid comparison.
Beihang University
will investigate an independent application that makes use of the RADI and CNIC APIs.

The outcome of the work will be demonstrated during the OGC's Technical Committee meeting in Dublin on 22 June at 09:00 local time, 15:00 Beijing. OGC will manage the invitation and registration process for that occasion.

W3C will report on the project and continue to ensure good communication between the CEO-LD project and the OGC/W3C Spatial Data on the Web Working Group. It is notable in that regard that since the Beijing meeting, Bill Roberts of SWIRRL has convened a separate meeting series of SDW WG members interested in the coverages work (he, Maik Riechert and Jitao Yang of RADI are the editors of the Coverages in Linked Data specification). In parallel with the CEO-LD project, a team at the Australian National University in Canberra will also experiment with CoverageJSON and look at alternatives, thus providing an independent view.

In this way, the SDW WG will have empirical evidence on which to base the development of the standard that goes substantially beyond what is normally available. This is important since it is clear that the concepts of turning satellite images into data that can be manipulated by Web developers with little or no knowledge of Earth Observation technologies and methods are new. The collaboration with colleague in the UK and China ensures that the methods are cross border and cross cultural with active discussions about future collaborations of mutual interest and benefit.