GeoKnow Public Datasets

Alejandra Garcia Rojas | Posted on: November 20, 2015

In this blogpost we want to present three public datasets that were improved/created within GeoKnow project.

LinkedGeoData
Size: 177GB zipped turtle file
URL: http://linkedgeodata.org/

LinkedGeoData is the RDF version of Open Street Map (OSM), which covers the entire planet geospatial data information. As of September 2014 the zipped xml file from OSM had 36GB of data, while the size of zipped LGD files in turtle format is 177GB. The detailed description of the dataset can be found in the D1.3.2 Continuous Report on Performance Evaluation. Technically, LinkedGeoData is set of SQL files, database-to-rdf (RDB2RDF) mappings, and bash scripts. The actual RDF conversion is carried out by the SPARQL-to-SQL rewriter Sparqlify. You can view the Sparqlify Mappings for LinkedGeoData here. Within The maintenance and improvement of the Mappings required to transform OSM data to RDF has being done during all the project. This dataset has being used in several use cases, but specially for all benchmarking tasks within GeoKnow.

Wikimapia
URL: http://wikimapia.org/api/

Wikimapia is a crowdsourced, open-content, collaborative mapping initiative, where users can contribute mapping information. This dataset existed already before the project started. However it was only accessible through Wikimapia’s API⁴ and provided in XML or JSON formats. Within GeoKnow, we downloaded several sets of geospatial entities from Wikimapia, including both spatial and non-spatial attributes for each entity and transformed them into RDF data. The process we followed is described next. We considered a set of cities throughout the world (Athens, London, Leipzig, Berlin, New York) and downloaded the whole content provided by Wikimapia regarding the geospatial entities included in those geographical areas. These cities where preferred since they are the base cities of several partners in the project, while the rest two cities were randomly selected, with the aim to reach our target of more than 100000 spatial entities from Wikimapia. Apart from geometries, Wikimapia provided a very rich set of metadata (non-spatial properties) for each entity (e.g. tags and categories describing the geospatial entities, topological relations with nearby entities, comments of the users, etc.). The aforementioned dumps were transformed into RDF triples in a straightforward way: (a) defining intermediate resources (functioning as blank nodes) where information was organized in more than one levels, (b) flattening the information of deeper levels where possible in order to simplify the structure of the dataset and (c) transforming tags into OWL classes. Specifically, we developed a parsing tool to communicate with the Wikimapia API and construct appropriate n-triples from the dataset. The tool takes as input a bounding box in the form of wgs84 coordinates (min long, min lat, max long, max lat). We chose five initial bounding boxes: one for each of the cities mentioned above. The bounding box was defined in such way so that it covered the whole area of the selected city. Each bounding box was then further divided by the tool into a grid of smaller bounding boxes in order to overcome the upper limit per area of the returned entities from Wikimapia API. For each place returned, we transformed all properties into RDF triples. Every tag was assigned an OWL class and an appropriate label, corresponding to the textual description in the initial Wikimapia XML file. Each place became an instance of the classes provided by its tags. For the rest of the returned Wikimapia attributes, we created a custom property in a uniform way for each attribute of the returned Wikimapia XML file. The properties resulting from the Wikimapia XML attributes point to their literal values. For example, we construct properties about each place’s language id, Wikipedia link, URL link, title, description, edit info, location info, global administrative areas, available languages and geometry information.

If these attributes follow a deeper tree structure, we assign the properties at intermediate custom nodes by concatenating the property with the place ID; these nodes function as blank nodes and connect the initial entity with a set of properties and the respective values. This process resulted to creating an initial geospatial RDF dataset containing, for each entity, the polygon geometry that represents it, along with a wealth of non-spatial properties of the entity. The dataset contains 102,019 geospatial entities and 4,629,223 triples.
Upon that, in order to create a synthetically interlinked pair of datasets, we split the Wikimapia RDF dataset, duplicating the geometries and dividing them into the two datasets in the following way. For each polygon geometry, we created another point geometry located in the centroid of the polygon and then shifted the point by a random (but bounded) factor⁵. The polygon was left in the first dataset where the point was transferred to the second dataset. The rest of the properties where distributed between the two datasets as
follows: The first dataset consists of metadata containing the main information about the Wikimapia places and edit information about users, timestamps, deletion state and editors. The second dataset consists of metadata concerning basic info, location and language information. This way, the two sub-datasets essentially refer to the same Wikimapia entities, differing only in geometric and metadata information. Each of the two sub-datasets contains 102,019 geospatial entities and the first one contains 1,225,049 triples while the second one 4,633,603 triples.

Seven Greek INSPIRE-compliant data themes of Annex I
URL: http://geodata.gov.gr/sparql/

For the INSPIRE to RDF use case, we selected seven data themes from Annex I,that are describes in the Table below. Although all metadata in geodata.gov.gr is fully compatible with INSPIRE regulations, data is not because it has been integrated from several diverse sources, which have rarely followed the proper standards. Thus, due to data variety, provenance, and excessive volume, its transformation into INSPIRE-compliant datasets is a time-consuming and demanding task. The first step was the alignment of the data to INSPIRE Annex I. To this goal, we utilised the Humboldt Alignment Editor, a powerful open-source tool with a graphical interface and a high-level language for expressing custom alignments. Such transformation can be used to turn a non-harmonised data source to an INSPIRE-compliant dataset. It only requires a source schema (an .xsd for the local GML file) and a target one (an .xsd implementing an INSPIRE data schema). As soon as the schema mapping was defined, the source GML data was loaded, and the INSPIRE-aligned GML file was produced. The second step was the transformation into RDF. This process was quite straightforward, provided the set of suitable XSL stylesheets. We developed all these transformations in XSLT 2.0, implementing one parametrised stylesheet per selected data theme. By default, all geometries were encoded in WKT serialisations according to GeoSPARQL.The produced RDF triples were finally loaded and made available in both Virtuoso and Parliament RDF stores, in http://geodata.gov.gr/sparql, as a proof of concept.

INSPIRE Data Theme	Greek dataset	Number of features	Number of triples
[GN] Geographical names	Settlements, towns, and localities in Greece.	13 259	304 957
[AU] Administrative units	All Greek municipalities after the most recent restructuring (”Kallikratis”).	326	9 454
[AD] Addresses	Street addresses in Kalamaria municipality.	10 776	277 838
[CP] Cadastral parcels	The building blocks in Kalamaria are used. Data from the official Greek Cadastre are not available through geodata. gov.gr.	965	13 510
[TN] Transport networks	Urban road network in Kalamaria.	2 584	59 432
[HY] Hydrography	All rivers and waterstreams in Greece.	4299	120 372
[PS] Protected sites	All areas of natural preservation in Greece according to the EU Natura 2000 network.	419	10 894

FAGI-gis: fusing geospatial RDF data

Giorgos Giannopoulos | Posted on: October 5, 2015

GeoKnow introduces the latest version of FAGI-gis, a framework for fusing Linked Data, that focuses on the geospatial properties of the linked entities. FAGI-gis receives as input two datasets (through SPARQL endpoints) and a set of links that interlink entities between the datasets and produces a new dataset where each pair of linked entities is fused into a single entity. Fusion is performed for each pair of matched properties between two linked entities, according to a selected fusion action, and considers both spatial and non-spatial properties.

The tool supports an interactive interface, offering visualization of the data at every part of the process. Especially for spatial data, it provides map previewing and graphical manipulation of geometries. Further, it provides advanced fusion functionality through batch mode fusion, clustering of links, link discovery/creation, property matching, property creation, etc.

As the first step of the fusion workflow, the tool allows the user to select and filter the interlinked entities (using the Classes they belong to or SPARQL queries) to be loaded for further fusion. Then, at the schema matching step, a semi-automatic process facilitates the mapping of entity properties from one dataset to the other. Finally, the fusion panel allows the map-based manipulation of geometries, and the selection from a set of fusion actions in order to produce a final entity, where each pair of matched properties are fused according to the most suitable action.

The above process can be enriched with several advanced fusion facilities. The user is able to cluster the linked entities according to the way they are interlinked, so as to handle with different fusion actions, different clusters of linked entities. Moreover, the user can load unlinked entities and be recommended candidate entities to interlink. Finally, training on past fusion actions and on OpenStreetMap data, FAGI-gis is able to recommend suitable fusion actions and OSM Categories (Classes) respectively, for pairs of fused entities.

The FAGI-gis is provided as free software and its current version is available from GitHub. An introductory user guide is also available. More detailed information on FAGI-gis is provided int he following documents:

OSMRec – Α tool for automatic annotation of spatial entities in OpenStreetMap

Giorgos Giannopoulos | Posted on: June 10, 2015

GeoKnow has recently introduced OSMRec, a JOSM plugin for automatic annotation of spatial features (entities) into OpenStreetMap. OSMRec trains on existing OSM data and is able to recommend to users OSM categories, in order to annotate newly inserted spatial entities. This is important for two reasons. First, users may not be familiar with the OSM categories; thus searching and browsing the OSM category hierarchy to find appropriate categories for the entity they wish to insert may often be a time consuming and frustrating process, to the point of users neglecting to add this information. Second, if an already existing category that matches the new entity cannot be found quickly and easily (although it exists), the user may resort instead to using his/her own term, resulting in synonyms that later need to be identified and dealt with.

The category recommendation process takes into account the similarity of the new spatial entities to already existing (and annotated with categories) ones in several levels: spatial similarity, e.g. the number of nodes of the feature’s geometry, textual similarity, e.g. common important keywords in the names of the features and semantic similarity (similarities on the categories that characterize already annotated entities). So, for each level (spatial, textual, semantic) we define and implement a series of training features that represent spatial entities into a multidimensional space. This way, by training the aforementioned models, we are able to correlate the values of the training features with the categories of the spatial entities, and consequently, recommend categories for new features. To this end, we apply multiclass SVM classification, using LIBLINEAR.

The following figure represents a screen of OSMRec within JOSM. The user can select an entity or draw a new entity on the map and ask for recommendations by clicking the “Add Recommendation” button. The recommendation panel opens and the plugin automatically loads the appropriate recommendation model that has previously been trained offline.

The recommendation panel provides a list with the top-10 recommended categories and the user can select from this list and click “Add and continue”. As a result the selected category is added to the OSM tags. By the time the user adds a new tag at the selected object, a new vector is computed for that OSM instance in order to recalculate the predictions and display an updated list of recommendations (taking into account the previously selected categories/tags, as extra training information). Further, OSMRec provides functionality for allowing the user to combine several recommendation models, based on (a) a selected geographic area, (b) user’s past editing history on OSM and (c) combination of (a) and (b). This way, personalized category recommendations can be provided that take into account the user’s editing history and/or the specific characteristics of a geographic area of OSM.

OSMRec plugin can be downloaded and installed in JOSM following the standard procedure. Detailed implementation information can be found in the following documents:

Geospatial-semantic Exploration on the Move

Uros Milosevic | Posted on: January 24, 2015

A typical tourist scenario is hard to picture without a map. Yet, such a scenario implies you are not familiar with your surroundings and, therefore, often not sure how to find the things that are of interest to you. Typical geospatial browsers will provide you with common exploration tools that will most often include a slippy map combined with keyword search, categorized points of interest (POIs) and a fixed set of filters. But, all of these imply either that you know what it is you’re looking for, or that the preset collection of POIs and criteria will be enough to satisfy your needs. In real life, however, those needs will often be affected by the given context, which is, in turn, dependent on multiple, dynamic factors, such as the place you’re visiting, your mood, interests, background etc. Imagine using your favorite geospatial browser to answer the following question:

“Where are the nearest buildings designed by Frank Lloyd Wright, typical of the Prairie School movement?”

GEM (Geospatial-semantic Exploration on the Move) is the very first geospatial exploration tool that offers a rich mobile experience and overcomes the abovementioned limitations of conventional solutions by exploiting all strengths of the Linked Open Data paradigm, such as built-in semantics in open, crowd-sourced knowledge found in publicly available sources, such as DBpedia, loaded and filtered on-demand, according to user’s needs, in order to prevent maps from overpopulating.

GitHub: https://github.com/GeoKnow/GEM

Linked Map

Francisco J. Lopez-Pellicer | Posted on: August 24, 2014

The Linked Map team is willing receive feedback on the results of the project. This project is a short term project (less than 1 year) and it is part of the larger FP7 project PlanetData. Linked Map address the development of a standard WMS that, at the same time, is a LD node offering read/write access to geographic knowledge. This vision is applied in a challenging scenario: the use of crowdsourcing techniques to improve the quality of the automatic integration of a National Map with existing VGI data.

Up to date we have developed:

A transparent semantic proxy for WMS 1.3.0 (deliverable D17.1)
Transformation into RDF of a National Map (BCN/BTN25 provided by IGN.es) and a VGI dataset (a portion of OSM) (deliverable D16.3)
Large scale alignment of both datasets using only name, type and location properties (deliverable D16.3)
Annotation of the resulting RDF datasets at feature level with W3C PROV ontology (deliverable D16.1 and implementation D16.3)
A web application that enables a basic crowdsourced approach to the quality assessment of mappings and data (deliverable D18.1). The early beta version is available at http://linkedmap.unizar.es/crowdsourcing-platform/

Deliverables are available for download at http://planet-data-wiki.sti2.at/web/Deliverables and http://linkedmap.unizar.es/deliverables/

We would very appreciate your comments.

W3C – Ontos Event about GeoKnow Generator

Daniel Hladky | Posted on: April 25, 2014

The GeoKnow project has been running over one year and is proud to show the first results of the research project during the W3C Switzerland event on May 22, 2014. Ontos, a W3C member and a partner of the GeoKnow project will demonstrate the GeoKnow Generator during the talk “Linked Open Data”. The event takes place in 1700 Fribourg, Switzerland and is free of charge. More information about the event and registration is available at the following link:

http://www.ontos.com/web-25-celebrating-25-years-of-the-web

GeoKnow Tutorials available

Daniel Hladky | Posted on: March 20, 2014

The GeoKnow team has created the first tutorials showing how to work with the GeoKnow Generator and the tools. The team plans to extend the list of tutorials during 2014 helping everybody to get a better understanding on how to work with geospatial data using the GeoKnow LD stack.

The GeoKnow consortium also welcomes everybody to work with the prototype available at http://generator.geoknow.eu:8080/generator/#/home. Just keep in mind it is the demo server and as with many software projects some minor problems can occur.
Any feedback is welcome via our Twitter channel https://twitter.com/geoknow.

GeoKnow Workshop at EDF 2014

Daniel Hladky | Posted on: March 1, 2014

One of the most important event in Europe about Open Data is the European data forum. During the event that will take place in March in Athens several workshop about tools and applications are organised. The GeoKnow consortium will present the LinkedDataEurope workshop the current version of the research project. During the talk we will show the GeoKnow research objectives related to geospatial linked open data, the GeoKnow Generator and various tools that help to fulfil the Linked Data Lifecycle with geospatial data.

Following is the link to the slides that will be presented at the workshop:
http://www.linkeddataeurope.eu/wp-content/uploads/2014/01/GeoKnow_Short.pdf

Geospatial Data User Survey Results by GeoKnow

Alejandra Garcia Rojas | Posted on: May 6, 2013

In the past month (April 2013), we invited geospatial data consumers and providers, GIS experts and Semantic Web specialists to participate in our Geospatial Data Users Survey. The goal of this survey was to collect general use cases and user requirements from people outside the GeoKnow consortium. We publicised the survey using mailing lists and social networks, and it was available for 25 days. During this period we received 122 responses, of these we had 51 full responses and 71 incomplete ones. Since we were interested in having good quality surveys, so we performed a manual control, which resulted in 39 useful responses – not too bad. In this blog post, we aim to show some interesting results from our survey. If you are interested to learn more about the results of this survey, you can check the public derivable available here.

User Scenarios

One of the goals of this survey was to learn more use cases different from those we already consider in the project. Thus, we asked participants how they use geospatial data in their work. To analyse this question, we grouped answers in different types which is shown in the graph at the right. Most of the scenarios were about visualisation and analysis, followed by geospatial data creation scenarios.

UserScenarionByType

Functionalities

We asked users for the most popular tools they use at their work. Responses to this question were OSM and Google Maps/Earth, as well as other GSI. After we asked about the features they like the most about these tools, participants reflected preference by easy to use and free tools for their work, referring to their popular choices of Google Maps or OSM. Also having an API to interact with the application is important. The fact that applications provided data that can be integrated was also appreciated. GIS applications were considered as difficult. Integration and interoperability were mentioned as goals. Besides the previous question, we were also interested in knowing the missing functionalities that may improve their work. A list of these functionalities grouped by the related work package within GeoKnow is presented in the image below.

Conclusions

This survey allow us to learn from different use cases, main features used, and desired functionalities, that are to be considered in the creation of the GeoKnow Generator. Some important high level findings from the survey were the emphasis in interoperability and reusability through open APIs and approachable visualisation components, support for common geospatial data formats and geodbs, and the necessity of simple tools to support data integration/reuse from geospatial LOD sources. We also found that some of the ideas of the GeoKnow project are further supported by user requirements like the integration of private and public data and the importance of using the web as an integration platform.

A Survey for Geospatial Data Users

Jon Jay Le Grange | Posted on: March 28, 2013

Many different applications we deal with on a daily basis have some kind of geographic dimension. This geospatial information is normally required for decision making at different levels. However, this information is dispersed among a multiplicity of sources. At GeoKnow we aim to make information seeking easier by allowing exploration, editing and interlinking of heterogeneous information sources with a spatial dimension.

Now we are interested in getting to know the people that face these kinds of issues in their everyday work. We have created a survey to help us to understand and to hear more about their experience with geospatial data. This survey targets geospatial data consumers and providers, and GIS users interested in having an integrated web of geospatial data.

If you use geospatial data in your work, your contribution in this survey will be highly appreciated. The outcome of this survey will impact the use cases and requirements for the GeoKnow project, which aims to create a versatile software framework to rapidly generate spatial semantic web applications.

We are offering a 20 euro Amazon voucher to the first 50 completed surveys. Willing to participate? Please go right away to:

The Geoknow Survey

We value your participation!

Community & Business Groups

Geospatial Semantic Web Community Group

No Reports Yet Published

GeoKnow Public Datasets

FAGI-gis: fusing geospatial RDF data

OSMRec – Α tool for automatic annotation of spatial entities in OpenStreetMap

Geospatial-semantic Exploration on the Move

Linked Map

W3C – Ontos Event about GeoKnow Generator

GeoKnow Tutorials available

GeoKnow Workshop at EDF 2014

Geospatial Data User Survey Results by GeoKnow

A Survey for Geospatial Data Users

Tools for this group

Pages

Group closed

Archives

Categories

Footer Navigation

Navigation

Contact W3C

W3C Updates