Dagstuhl workshop breakout group on vocabularies

01 Jul 2009

See also: IRC log


Guus Schreiber
cerealtom, ivan

See also the slides of the report of the breakout group, as given by Ivan


topic 1. vocab hosting and persistence

re hosting there is the vocab hosting best practices document from swdbp

re vocabulary alignments there are various developments...

sorry, missed the details of the mentioned alignment work

OMV: ontology metadata vocabulary

re URI schemes...

<danbri> danbrI: I have a specific proposal on federating independent namespaces, dns risk reduction & long term persistence



(and associated purl software)

danbri++ re dns risk reduction

also the google vocab namespace

lets not enumerate all of these right now

question: is physical hosting an issue


consensus: generally yes

guus: are we just going to provide a list of existing services or develop some guidelines?

now covering the state of the art wrt directory services for vocabs

examples, swoogle, watson, falcons, talis schemacache

guus: distinction: concept search (e.g watson/swoogle) vs schema search (schemacache)

schemaweb.info is outdated and not maintained. schemacache needs to be pushed live

schemacache current uri: http://schemacache.test.talis.com/

crawling vs curation

ivan points out there are some well used offerings in specific domains

distinction: general offering versus domain dependent

topic 3. versioning

publishing versioning histories for vocabs

(says danbri)

danbri mentions tom baker from DC

"when did dc:audience appear?" all these aspects are recorded

also cf the w3c document versioning scheme

danbri would like a mechanism for people to declare downstream dependencies on vocabularies

ivan: its a hairy ie research problem, technical but also sociological

<ivan> Open provenance model

also a need to revisit some of the named graphs work and provide some guidance

scribe: to the community at large on usage of NG

this was topic 4 (provenance) btw

preventing man in the middle attacks is a good use case for signing vocabs with pgp

research issue: robustness of vocabs and class hierarchies

<ivan> chris: falcon and others already have heuristics built in, ie, they reject statements that would define a superclass of foaf:Person

and trust mechanisms to address this

<ivan> cerealtom: for rights, the legal works have been done at open data comons and cc0 at science commons

<ivan> ... active usage of this should be pushed

<ivan> danbri: that is mostly at data issues

<danbri> tom: many diff jurisdictions have diff approaches

<danbri> ...whether there's some notion of a db right

<danbri> ...diff legal frameworks

<danbri> ...for making legal claims, of ownerships over sets of data, collections etc

<danbri> ... see work from Jordan Hatchett

<danbri> ... one strategy: put all in public domain

<danbri> ... layer on that social norms

<danbri> ... statements and requests for how it is used

<danbri> ... most ppl want attribution for what they do

<danbri> .... typically dont want to chase that thru the lawcourts, but was some ack

<danbri> ... cc lets you waive formal rights but express social norms

<danbri> ... open data commons

<danbri> stefan: eg i'd like to make my data public, but not have it aggregated

<ivan> -> www.opendatacommons.org Open data commons' site

<danbri> danbri: see TAMI and respect my privacy work from MIT/DIG (note : Oshani is in SocialWeb XG ...)

<danbri> see http://dig.csail.mit.edu/2009/SocialWebPrivacy/

<danbri> tom: talis community license > odc

<danbri> http://www.w3.org/2009/07/01-swdag2009-irc

Federation of Independent Namespaces

mutual back-watching for namespace hosts

proposal from danbri and tom baker

<hhalpin> http://dublincore.org/documents/singapore-framework/

<hhalpin> Application profiles

<aldogan> whole range of practices from bare reuse of ontology entities from other namespaces, to owl:import

<aldogan> one intermediate practice is declaring a module of an ontology to be imported (this works for reasoning purposes)

<hhalpin> so you can think about this in two ways

<hhalpin> abstract space of possible combinations

<hhalpin> which subset of that on some level that can be characterized as graph patterns

<hhalpin> are good practices

<hhalpin> on a practical level:

<hhalpin> let's think about practical examples

<hhalpin> on an instance level

<hhalpin> that can be put in tutorials

<hhalpin> guus: alignments are just vocabularies that link.

<hhalpin> bizer; but you can have transformation rules rather than alignments via a common concept, ala SKOS

<hhalpin> transformation rules = data fusion

<ivan> ivan: there is also the issue of the granularity of terms within vocabularies, if I refer to a specific term, what else do I pull in

<hhalpin> how can alignments that are manually made

<hhalpin> rather than automatically made

<aldogan> is intensional vs. extensional alignments an issue here? or are we just discussing about publishing alignments?

<hhalpin> be found and used, and validated.

<hhalpin> I think intensional vs. extensional is vocabulary-dependent.

<hhalpin> research challenge: lack of semantics and semantics of alignment

<hhalpin> so you can figure out how to reason properly over aligned vocabularies (again, how much of another vocabulary do you pull in?)

<aldogan> owl:sameAs often used to map any kind of similarity

<aldogan> but this is on the social side, since owl:sameAs has a proper (extesional) semantics

<hhalpin> directory services make it hard to find alignments

<hhalpin> we need directory services for alignments and make it easier to find them, standard way to host, as well as who said what alignment and provenance of alignment.

<hhalpin> guus: named graphs may be a potentially a way to get alignments off the ground

<hhalpin> guus: is this a w3c challenge for named graphs?

<hhalpin> ivan: how can I make ANY inferences in a multi-namespace document?

<hhalpin> ivan: does current RDF-based semantics work in multi-namespace documents? Does DL?

<hhalpin> ivan: Probably for RDF(S), not for DL.

<ivan> -------

<ivan> Start of session again after lunch

<ivan> guus Trying to summarize the research challenges and relations to other groups

<ivan> * Semantics of alignment

<ivan> * language for publishing algnments

<ivan> * characterizing graph patterns

<ivan> * internal and external dependencies across vocabularies

<ivan> * ranking and characterizing directory services

<ivan> * inference consequences of cherry-picking

<aldogan> isn't the last a special case of the previous one?

<ivan> * quality criteria for ranking deployed vocabularies and alignments

<ivan> (forget the last one, rephrase:)

<ivan> * metrics for selecting deployed vocabularies and vocabulary elements

<hhalpin> asun: not every language expresses the same concepts

<ivan> * problems of multilingual vocabulary alignments

<hhalpin> asun: like different kinds of fishes exist in say Greek, not English.

<hhalpin> harry: but we need to encourage schemas to have multiple languages.

<hhalpin> TAG's latest versioning:

<hhalpin> http://www.w3.org/2001/tag/doc/versioning-html/

<ivan> * vocabulary versioning in a distributed environments

<ivan> Slogan: do not delay the future

<ivan> * SW specific provenance models

<danbri> 3 or 4 strands that must be brought together: hacker/internet/crypto provenance, database provenance, logical formalism / kr scene (OWL, RIF, ...) .... and then the RDF/dataweb/web2 scene where the best we have is SPARQL graphs

finding the top 5 research issues

<danbri> my personal ranking of topics:

<danbri> (this isn't private voting is it?)

<danbri> n2. graph patterns.

<danbri> n4. provenance models.

<danbri> n5. quality metrics for selecting vocabs.

<danbri> n3. vocab dependencies and cherry picking.

<danbri> n1. semantics of alignments / alignment language.

<danbri> n6. multilingual vocabs.

<ivan> After due deliberations and semi-secret voting the result is as follows:

<natasha> Final list of research challenges

<natasha> 1. Quality metrics for selecting vocabularies and vocabulary elements

<natasha> 2. Logical consequences from cherry picking vocabulary elements from distributed vocabularies

<natasha> 3. SW-specific provenance models

<natasha> 4. Semantics of alignments, including languages and models for publishing alignments

<natasha> 5. Characterizing graph patterns in published web data

<aldogan> 1. Semantics of alignments, including languages and models for publishing alignments

<aldogan> 2. vocab dependencies and cherry picking

<aldogan> 3. Quality metrics for selecting vocabularies and vocabulary elements

<aldogan> 4. provenance models

<aldogan> multilinguality and graph patterns are very important as well, uhm

<aldogan> unsure how much overlap exists among the topics emerged however

short term goals

<ivan> Subtopic: White paper

<ivan> guus outlining a structure

<ivan> - URI Scheme (PURL, etc)

<ivan> - Physical hosting, backup policies

<ivan> - Versioning strategy (current practice like W3C practice, DC practice)

<ivan> - DNS Redirect advices, examples, templates

<ivan> - minimal attribution metadata

<ivan> - link & advice with respect to Open Data Commons

<hhalpin> i am listening in IRC

<hhalpin> "listening" via text

Summary of Action Items

[End of minutes]

Minutes formatted by David Booth's scribe.perl version 1.135 (CVS log)
$Date: 2009/07/02 16:11:22 $