ISSUE-29 (Well-formedness): Criteria for well-formedness [Data Cube Vocabulary] + FW: [publishing-statistical-data] Re: qb:DimensionProperty subClassOf qb:CodedProperty ?

Hello,

I suggest to add the issue of creating hierarchies of literal values, which we discussed a while ago (see below or at [1]) in the QB Google group, as an own issue or as an aspect of ISSUE-29.

The problem is the following:

If observations are using literal values as dimension values, how can one create a hierarchy (qb:codeList skos:ConceptScheme) of these values? 

One possible solution is to create instances of skos:Concept representing and linking to those literal values using skos:notation. 

However, this makes it more difficult for applications to query for observations, since they do not know whether the observations will actually use literal values or skos:Concepts. 

Best,

Benedikt

[1] <http://groups.google.com/group/publishing-statistical-data/browse_thread/thread/9903b29e670c5c94/c2386716f7b69cb4?lnk=gst> 



-----Original Message-----
From: publishing-statistical-data@googlegroups.com [mailto:publishing-statistical-data@googlegroups.com] On Behalf Of Dave Reynolds
Sent: Tuesday, December 06, 2011 11:53 PM
To: publishing-statistical-data@googlegroups.com
Cc: Dominik Siegele
Subject: Re: [publishing-statistical-data] Re: qb:DimensionProperty subClassOf qb:CodedProperty ?

Hi Benedikt,

I generally agree with your approach and with Richard's comments.

The notion of defining skos:Concepts but then using the literal values in the data is a little odd but I can see some point to it.

The one thing I would point out is that for dates the Interval URI Set [1] and associated service may be useful to you. We've tended to use that for all the Data Cube sets that we've published. One advantage to using the resources as the dimension values instead of date literals is that it makes to possible to query the data via other properties of those resources. For example with data at a day resolution we can include the Interval Set properties in the published data and so pick out values for a month or year or government year without having to do time calculations in the sparql. If your data is only at calendar year resolution that may be less relevant to you.

Cheers,
Dave

[1]
http://www.epimorphics.com/web/wiki/using-interval-set-uris-statistical-data

On Tue, 2011-12-06 at 22:05 +0000, Richard Cyganiak wrote: 
> Hi Benedikt,
> 
> On 6 Dec 2011, at 20:37, Benedikt Kämpgen wrote:
> > Given the task to represent date as Date Literal, geo as specific instances of NUTSRegion, and sex as instances of skos:Concept for the male/female/total. We have this task e.g. at [1] where we are representing Eurostat [2] data using the RDF Data Cube Vocabulary (QB).
> > 
> > The approach that we now consider to implement: 
> > *Optional: rdfs:range for DimensionProperty in order to have an understanding of what kinds of things are represented by the members, e.g., xsd:date for dc:date and NUTSRegion for geo.
> 
> That makes sense. I would always specify this when no qb:codeList is present.
> 
> > *qb:codeList for DimensionProperty in order to list the possible 
> > skos:Concepts that represent values of the dimension, e.g., 
> > estat:y2003 for one specific year, estat:AT for one specific 
> > country, and estat:F for one specific gender
> 
> I would use qb:codeList only with skos:ConceptSchemes. It looks like your intention is to create concept schemes for all dimensions, including time. I think that's ok.
> 
> > *skos:Concepts have as rdfs:seeAlso instances linked that they 
> > represent, e.g., estat:AT rdfs:seeAlso dbpedia:Austria
> 
> I would use skos:closeMatch (or skos:exactMatch if you're a radical; or skos:relatedMatch if you're a coward) instead of rdfs:seeAlso.
> 
> This has the consequence of typing dbpedia:Austria as a skos:Concept, but that surely is fine, given the definition of skos:Concept:
> 
> [[
> A SKOS concept can be viewed as an idea or notion; a unit of thought. However, what constitutes a unit of thought is subjective, and this definition is meant to be suggestive, rather than restrictive.
> ]]
> 
> Some might say: “A country is not an idea! It exists in the real world!” But I don't find that such arguments hold water. Countries are created and abolished through legislation and treaties; and decades can pass where large parts of mankind disagree on the question whether a particular entity is a country or not. Countries are really just the taxonomist's business objects of political geographers.
> 
> > and as rdfs:label Literal values linked that they represent, e.g., 
> > estat:y2003 rdfs:label "2003"^^xsd:date
> 
> Use skos:notation instead of rdfs:label. Note that "2003"^^xsd:date is ill-typed. It has to be "2003-01-01"^^xsd:date, or "2003"^^xsd:gYear.
> 
> > * The observations can either use the represented instances 
> > directly, e.g., dbpedia:Austria and "2003"^^xsd:date, or they can 
> > use the skos:Concept representations, e.g., estat:F
> 
> I agree that this makes sense in the case of literals (dates in particular). For URIs, it seems overly complicated. Why not just define a concept scheme that directly includes dbpedia:Austria as a concept using skos:inScheme?
> 
> > This approach brings the following advantages: 
> > * We can limit the number of literal values of a specific dimension
> 
> Right, and I like this. The logic would be: If a dimension property has a qb:codeList and is used with literal values, then assume that the literal values are the skos:notations of the concepts in the code list.
> 
> > * We can have relationships between dimension values, e.g., for 
> > hierarchies, and still use the literal values or the non-information 
> > URIs in the observations
> 
> Yup.
> 
> > * Publishers may still represent skos:Concepts as possible dimension values and can link them using owl:sameAs to the actual represented values.
> 
> Do not *EVER* link to a skos:Concept using owl:sameAs! ;-)
> 
> Seriously, skos:xxxMatch is always better for that purpose.
> 
> Best,
> Richard
> 
> 
> 
> > Although this may be wrong, as it would state the term (e.g., skos:Concept Germany) and the actual thing (dbpedia:Germany) as being the same thing, applications that would work with the explained approach would also work here.
> > 
> > I would be glad to hear your opinions on this.
> > 
> > Regards,
> > 
> > Benedikt
> > 
> > [1] <http://estatwrap.ontologycentral.com/page/teilm020>
> > [2] <http://estatwrap.ontologycentral.com/>
> > 
> > --
> > AIFB, Karlsruhe Institute of Technology (KIT)
> > Phone: +49 721 608-47946
> > Email: benedikt.kaempgen@kit.edu
> > Web: http://www.aifb.kit.edu/web/Hauptseite/en
> > 
> > 
> > 
> >> -----Original Message-----
> >> From: publishing-statistical-data@googlegroups.com 
> >> [mailto:publishing- statistical-data@googlegroups.com] On Behalf Of 
> >> Benedikt Kämpgen
> >> Sent: Friday, October 28, 2011 11:39 AM
> >> To: publishing-statistical-data@googlegroups.com
> >> Cc: Dominik Siegele
> >> Subject: RE: [publishing-statistical-data] Re: qb:DimensionProperty 
> >> subClassOf qb:CodedProperty ?
> >> 
> >> Hi Dave,
> >> 
> >> Thanks for your answer.
> >> 
> >>> This is one area where I think the current QB vocabulary could do 
> >>> with some extension. It would be nice to be able to define the 
> >>> property that is used for hierarchical relationships between 
> >>> dimensions values when those are not skos:Concepts (and thus skos:broader/narrower).
> >> Dito.
> >> 
> >> For example, we have now tried to model it for Eurostat correctly, 
> >> not using skos:ConceptScheme, but the actual regions from 
> >> nuts:NUTSRegion, see [1] and definition of geo dimension.
> >> 
> >> However, with this approach we cannot say anymore, that only 
> >> certain region are used in the dataset.
> >> 
> >> Best,
> >> 
> >> Benedikt
> >> 
> >> 
> >> [1] http://estatwrap.ontologycentral.com/dsd/tsieb010
> >> 
> >> 
> >> --
> >> AIFB, Karlsruhe Institute of Technology (KIT)
> >> Phone: +49 721 608-47946
> >> Email: benedikt.kaempgen@kit.edu
> >> Web: http://www.aifb.kit.edu/web/Hauptseite/en
> >> 
> >> 
> >> 
> >>> -----Original Message-----
> >>> From: publishing-statistical-data@googlegroups.com 
> >>> [mailto:publishing- statistical-data@googlegroups.com] On Behalf 
> >>> Of Dave Reynolds
> >>> Sent: Wednesday, October 19, 2011 11:16 AM
> >>> To: publishing-statistical-data@googlegroups.com
> >>> Subject: RE: [publishing-statistical-data] Re: 
> >>> qb:DimensionProperty
> >> subClassOf
> >>> qb:CodedProperty ?
> >>> 
> >>> On Wed, 2011-10-19 at 11:01 +0200, Benedikt Kämpgen wrote:
> >>>> Hi,
> >>>> 
> >>>> I have a follow-up question regarding dimensions and code lists:
> >>>> 
> >>>> In QB, a dimension value used by an observation typically is an 
> >>>> instance of skos:Concept from a skos:ConceptScheme.
> >>> 
> >>> Not required, can also be instances of some defined 
> >>> [rdfs|owl]:Class
> >>> 
> >>>> I have seen some examples of
> >>>> datasets [1,2], that then link from such instances of 
> >>>> skos:Concept with owl:sameAs to entities they represent, e.g.
> >>>> <http://dbpedia.org/resource/Spain>. I guess this is fine from a 
> >>>> practical point of view, but is it not semantically incorrect;
> >>> 
> >>> Indeed, not correct.
> >>> 
> >>>> I am wondering whether
> >>>> this is really intended and will lead to problems, later,
> >>> 
> >>> In the datasets we've published we've tended to use "normal" 
> >>> resources directly for things like geographies and time periods 
> >>> and only use skos:Concepts for things that are definitely 
> >>> classification schemes - e.g. gender or age groups.
> >>> 
> >>>> e.g., if we want
> >>>> to define hierarchies on dimension values.
> >>> 
> >>> This is one area where I think the current QB vocabulary could do 
> >>> with some extension. It would be nice to be able to define the 
> >>> property that is used for hierarchical relationships between 
> >>> dimensions values when those are not skos:Concepts (and thus skos:broader/narrower).
> >>> 
> >>> Dave
> >>> 
> >>> --
> >>> Epimorphics Ltd                        www.epimorphics.com
> >>> Court Lodge, 105 High Street, Portishead, Bristol BS20 6PT
> >>> Tel: 01275 399069                     Mobile: 07906 628814
> >>> 
> >>> Epimorphics Ltd. is a limited company registered in England 
> >>> (number
> >>> 7016688)
> >>> Registered address: Court Lodge, 105 High Street, Portishead, 
> >>> Bristol
> >>> BS20 6PT, UK
> >>> 
> >>> 
> >>> 
> >>>> 
> >>>> Regards,
> >>>> 
> >>>> Benedikt
> >>>> 
> >>>> 
> >>>> [1] <http://estatwrap.ontologycentral.com/data/tsieb010>
> >>>> [2] <http://estatwrap.ontologycentral.com/dic/geo#ES>
> >>>> 
> >>>> 
> >>>> --
> >>>> AIFB, Karlsruhe Institute of Technology (KIT)
> >>>> Phone: +49 721 608-47946
> >>>> Email: benedikt.kaempgen@kit.edu
> >>>> Web: http://www.aifb.kit.edu/web/Hauptseite/en
> >>>> 
> >>>> 
> >>>> 
> >>>>> -----Original Message-----
> >>>>> From: publishing-statistical-data@googlegroups.com 
> >>>>> [mailto:publishing- statistical-data@googlegroups.com] On Behalf 
> >>>>> Of Richard Cyganiak
> >>>>> Sent: Friday, September 23, 2011 8:02 PM
> >>>>> To: publishing-statistical-data@googlegroups.com
> >>>>> Subject: Re: [publishing-statistical-data] Re: 
> >>>>> qb:DimensionProperty
> >>>> subClassOf
> >>>>> qb:CodedProperty ?
> >>>>> 
> >>>>> Hi Bill,
> >>>>> 
> >>>>> On 23 Sep 2011, at 08:05, BillRoberts wrote:
> >>>>>> But I see your point Richard. Maybe I'm thinking too much like 
> >>>>>> a physicist instead of a statistician!
> >>>>>> 
> >>>>>> In practice most of these continuous variables are 'chunked': 
> >>>>>> time into years or months, space into a list of points or 
> >>>>>> regions, age into
> >>>>>> 5 year bands etc etc
> >>>>> 
> >>>>> Exactly. Statistics tend to be aggregate data, where many 
> >>>>> individual
> >>>> “events”
> >>>>> or “facts” (which often have continuous attributes) have been 
> >>>>> lumped
> >>>> together
> >>>>> into a single observation. The values along a number of 
> >>>>> dimensions
> >> have
> >>>> been
> >>>>> “classified” into discrete ranges, and everything that falls 
> >>>>> into the same bucket (cube cell) has been “tabulated” into a 
> >>>>> single total or average
> >>>> number,
> >>>>> and we're interested only in these totals.
> >>>>> 
> >>>>> This aggregation can remove a lot of valuable detail, but also 
> >>>>> makes it
> >>>> easier
> >>>>> to ask higher-level questions (especially for dimensions where 
> >>>>> the classification is hierarchical), and may make the datasets 
> >>>>> smaller and
> >> may
> >>>>> anonymize the data to some extent.
> >>>>> 
> >>>>> 
> >>>>> If you have some values that you *truly* want to model as 
> >>>>> continuous,
> >> then
> >>>> you
> >>>>> should ask yourself if you aren't really looking at a measure 
> >>>>> rather than
> >>>> a
> >>>>> dimension.
> >>>>> 
> >>>>> Best,
> >>>>> Richard
> >>>>> 
> >>>>> 
> >>>>> 
> >>>>>> 
> >>>>>> So it's making a bit more sense to me now.
> >>>>>> 
> >>>>>> On Sep 22, 7:08 pm, Richard Cyganiak <rich...@cyganiak.de> wrote:
> >>>>>>> On 22 Sep 2011, at 16:18, BillRoberts wrote:
> >>>>>>> 
> >>>>>>>> But there are many dimension properties with values that are 
> >>>>>>>> not
> >> coded
> >>>> or
> >>>>> codelist-able.
> >>>>>>> 
> >>>>>>> With the exception of time, I don't think that's true.
> >>>>>>> 
> >>>>>>> Can you give an example of some other dimension whose values
> >> don't come
> >>>>> from a controlled/managed set of terms that ought to be 
> >>>>> represented
> >> as a
> >>>> SKOS
> >>>>> concept scheme or RDFS class?
> >>>>>>> 
> >>>>>>> Best,
> >>>>>>> Richard
> >>>> 
> >>> 
> > 
> 

Received on Thursday, 22 March 2012 10:07:32 UTC