From W3C Wiki
RDFizing and Interlinking the EuroStat Data Set Effort - riese
A LinkingOpenData project
Note: On 2008-01-31, riese has been launched, cf. http://riese.joanneum.at/
This page is the main resource for the "RDFizing and interlinking the EuroStat data set" - riese project, an effort in the realm of the SWEO LinkingOpenData project.
We aim at RDFizing the whole EuroStat data. Just to give you an idea: We're talking about some -- conservatively estimated -- 4.000.000.000 (or 4 billion) RDF triple. This estimation is based on the EuroStat TOC, assuming some 10 triples per item value.
In a first step, the existing EuroStat data schema is recreated in RDF (using RDF-S), along with a mapping to an RDF graph.
- Using rdfs:label instead of riese:hasLabel and riese:hasItemValue?
- Not sure about the time representation here: does TimePoint identify a single time point (yesterday,7pm) or a set of time points (everyday at 7pm)? In the first case we could just use dc:date for the sake of simplicity. For the second case, we might use a little bit of OWL-time?
- rdf:value instead of riese:hasDicValue and hasItemValue?
- wrt. to the geographic parts, we could re-use the geonames ontology?
Michael's answers to YvesRaimond's comments:
- I actually thought about using rdfs:label everywhere, though I tend to introduce new props if semantically should state something different. So I guess it might be a good idea to make riese:hasLabel an rdfs:subPropertyOf rdfs:label (or replace it), but with riese:hasItemValue I think it is cleaner to have separat prop (it is the item's value at last :)
- Ok! I guess it doesn't matter much in this case. However, I'd rather use nouns instead of verbs, so I would put riese:itemValue instead of riese:hasItemValue, and riese:label instead of riese:hasLabel.
- riese:TimePoint specifies the point in time (which granularity is defined by the according riese:TimeDataFormat) at wich an item's value is valid (I'll make up an example, soon ...); dc:date is not flexible enough wrt formatting; OWL time might be an overkill, but I'll have a look into it what we can utilise.
- Hmm, I guess I am not clear about that... So your time points are actually intervals (if it's yearly, then one time point covers an interval of one year). In this case, I would just use a "startsAt" property and a "duration" property (having respectively xsd:dateTime and xsd:duration as a range), I think this should be expressive enough. I would also rename riese:TimePoint as riese:Interval. Actually, the more I think about it, the more I think the event ontology at  may be enough to cleanly model this "time dependent" part, as what we are trying to model is just a classification of a space/time region... This would look like:
@prefix riese: <http://riese.joanneum.at/core#>. @prefix event: <http://purl.org/NET/c4dm/event.owl#>. @prefix time: <http://purl.org/NET/c4dm/timeline.owl#>. @prefix dic: <http://riese.joanneum.at/core#>. @prefix : <>. # Ontology riese:Item rdfs:subClassOf event:Event. riese:value rdfs:subPropertyOf event:literal_factor. riese:unit rdfs:subPropertyOf event:factor;rdfs:subPropertyOf riese:dic. riese:s_adj rdfs:subPropertyOf event:factor. # and so on for each DIC... # Some instance data # http://europa.eu.int/estatref/info/notes/en/read_me.htm :data a riese:Item; riese:value "11148"; riese:unit "mio-eur"; # are these properties operating over a discrete space? in this case we should consider creating individuals for these, and we could just use riese:dic riese:s_adj "nsa"; riese:partner "ext_eurozone"; riese:flow "net"; riese:indic "bp-100"; event:place <http://dbpedia.org/resource/Europe>; event:time :int2004m05; # or just [time:at "2004-05"^^xsd:YearMonth] . :int2004m05 a time:Interval; time:at "2004-05"^^xsd:YearMonth; .
What do you think? The good think is that it is extensible - if we happen to access more things on how these values are captured, we can still attach them to the event...
- rdf:value instead of riese:hasDicValue, again I'd prefer rdfs:subPropertyOf rdf:value, but open to discuss. Not sure about riese:hasItemValue ... you proposed rdfs:label earlier ... hm I guess we need a #swig session ;)
- reuse the geonames ontology: absolutely; AND interlink it ...
- change log for v0.1:
- removed riese:hasLabel and added rdfs:label, instead
- removed riese:hasItemValue and added rdf:value, instead
- added rdf:value to riese:TimePoint
- changed riese core NS to
http://riese.joanneum.at/core#, which will be the actual server hosting the riese stuff
- Looking at the TOC of the EuroStat data set it seems to be a good idea to use the so called "open datasets" marked with Full download (and trailing
_tin the code) rather than the individual tables. For example
innore_t.tsvmight be preferred over
ir010.tsv - ir140.tsv. However, several data are only available as individual datasets/tables. There appears to be no difference between datasets and tables. In order to get all data, both datasets and tables have to be used.
- As the
geo.dicoffers more structured data, the proposed riese geo schema seems reasonable. Note: The cities and regions are available in the native language, ie. it is "Wien" not "Vienna" - might consider using an
xml:langtag? The files
unload.diccontain exactly the same information as
geo.dic. Other files offer deeper semantics as well.
- A very simple import demonstrator - just some hacking; don't expect too much :)
- EuroStat home
- EuroStat copyright and licence policy
- code repository
Publications and Presentations
- Some slides introducing riese
People Interested in the Area
Please add yourself here in case you want to contribute (in terms of schema, mapping, development, testing, UI, etc.).