Vocabulary Preservation Session, DC2013, Morning session

03 Sep 2013

See also: IRC log

See also: Session description and agenda, with links to presentations

See also: Session Discussion paper


Felix Ostrowski (GraphThinking)
Daniel Vila Suero (Ontology Engineering Group (UPM))
Daniel Garijo (UPM)
Alexander Haffiner (Deutsche National Bibliothek)
Malar Thomas (National Library Board Singapore)
Eric Childress (OCLC)
Stefanie Ruehle (SUB Goettingen)
Bob Bailey (Thomson Reuters)
Antoine Isaac (Vrije Universiteit Amsterdam)
Dan Brickley (Google)
Eva Mendez (University Carlos III of Madrid)
Ivan Herman (W3C)
Tom Baker (DCMI)
Bernard Vatant (Mondeca)
Pierre-Yves Vandenbussche (Fujitsu)
Andrea Perego (European Commission)
Karen Coyle (Consultant)
Richard Wallis (OCLC)
Michael Düro (Office des publications)
Joseph Busch (Taxonomy Strategies)
Lars Svensson (Deutsche National Bibliothek)
Tom Baker (DCMI), Ivan Herman (W3C)
Pierre-Yves Vandenbussche (Fujitsu)
Karen Coyle (Consultant)

Opening the session about Sustainable governance for long term preservation of RDF vocabularies

tbaker:We consider here RDF Vocabularies as sets of properties and classes to describe Linked Data. Vocabularies are a valuable part of knowledge that has to be maintain over time. As detailed in the session discussion paper, we identified 3 main requirements for preservation:
* Req.1 - Institutional guarantees for persistence of URIs

<bvatant> danbri disagrees about req.1 - will discus later on

tbaker:* Req.2 - Persistence of documentation (links between URI/documentation should be maintained)
* Req.3 - Appropriate versioning of vocabularies (distinction between last stable and historical versions)

tbaker:Some solutions to be considered:
* Cooperation among vocabulary maintainers (e.g. FOAF/DCMI)
* Role of memory institutions? Can insitutions offer standard agreements on a long term preservation? What would be the scope of these agreements (small/large/fragile vocabularies)?
* Initiatives like "safety through redundancy"
* Access to historical versions (LOCKSS, Memento protocol?)

DCMI perspective (Tom Baker, DCMI)

tbaker:DCMI namespace and publication policies: meaning of key vocabulary URIs will not substantially change (emphasise on substantially). Commitment that non key vocabularies will be preserved forever (as long as DCMI exists). Example of DCMI/FOAF cooperation.

<kcoyle> What does "maintainer" mean? preserving? changing?

tbaker:DCMI maintains snapshots of FOAF. DCMI has access to the server.

<kcoyle> does preservation mean freezing vocabularies?

danbri: FOAF arises from the need for small application/users to use simple data

tbaker:Could this cooperation become a model?

W3C perspective (Ivan Herman, W3C)

iherman:W3C role did not really take care of vocabularies which had to be handled by the community. W3c sees world as distributed. Why centralize vocab management? Vocabs have to be on the web, need a community (that takes technology). But the state now is problematic. Maintenance of vocabulary website is not easy, the design of a vocabulary quality may vary. Vocabs are really hard to find. A vocabulary should be stable so the community can use it (stability of URI). It is a general fact that it is difficult for the community to develop vocabularies. Development and maintenance environment very much needed (again, this is technology that not everyone can provide).
W3C wants to provide an environment for any community to develop, host, maintain, preserve vocabularies. Could be articulate around Community Groups within W3C (easy to create and offer already an environment to discuss, wiki space, email lists, etc. No need for W3C membership to be part of a Community Group). Example of Open Annotation Ontology community group. OA CG brought together existing annotation vocabularies and sought convergence.

<kcoyle> it is hard to define "what is a vocabulary?" - when is a vocab a vocab?

iherman:A Cross domain community group could be created to support the development of other communities

iherman:W3C vocabulary hosting: W3C offers w3c URL namespaces

<kcoyle> Is there some advantage to developing best practices about naming? (IRI structure, content)

iherman:Some policies should be defined so a certain level of vocabulary quality is reached (metadata, language, etc.). Policies on stability of URIs. A persistence policy is already in place in W3C. W3C will not remove any URIs.

<kcoyle> definitely need best practices about maintenance (do NOT remove existing terms!)

iherman:W3C web site is cvs archive. In the process to get agreement from MIT (hosting most of W3C Website) to guarantee the sustainability of W3C Website in case W3C does not exist anymore
Will have working group on best practices for publishing data on the web, including vocabs. Vocabulary directory/registry should be created. Software? http://www.w3.org/2013/04/vocabs/. Work in progress, any input welcome.

PURLs perspective (Richard Wallis, OCLC)

rwallis:PURL offers Persistent URLs through re-direction - URLs can change. But this is more than simple redirects, it includes software with accounts for management (community driven), some interfaces and partial redirections. Community purlz.org. Open Source, developed by OCLC in 1995. Other institutions are also running PURL services, not just OCLC. For instance you have 2M hits for "dc" over 2 weeks.
OCLC plan for the future: PURL is a key component of the Web, there is NO plan for stopping/ extending the service. Commitment to transfer the service in case OCLC support ends

Europen Union publications office (OPOCE), metadata registry (MDR) (Michael Duro)

mduro:The metadata registry (MDR) is published on the EU Open Data Portal. https://joinup.ec.europa.eu/catalogue/repository/metadata-registry to see managed vocabs.
Governance over time : Based on a Common Data Model (IMMC Core Metadata schema). Called 'authority tables' - lists of terms. Partial publication in SKOS.

<kcoyle> discussion: authorities vs. SKOS vocabs - is there a difference? is it a difference that matters in practice?

mduro:capture for each concept (authority code, start-end use date, predecessor concept, successor concept + @en label). Use URIs for concepts. In the future, plan to have URI for each label so the system can handle multilingual labels. Very important for the future.

Linked Open Vocabularies LOV (Bernard Vatant, Mondeca)

bvatant:http://lov.okfn.org currently registered 360 vocabs and the management team is composed of 5 curators. Open source, anyone is welcome to participate in the curation or development. Maps vocab use in linked data, shows relations with other vocabularies (computed, using SPARQL). Provides a documentation (metadata) about each vocabulary. Various services, APIs, SPARQL Endpoints, search, suggest features.
Lessons learned: history tracking (both documentation and previous versions file) tough even though it is 13 years old max. Publications best practices need to be uniform and consistent. Example of DOLCE: URI has changed (host moved). Some applications are still using the old URIs with no redirection available. Every bit of a vocabulary can change ... No information about changes, policies, stability... Who should care about vocabulary preservation ?

<kcoyle> mis-quoting Ranganathan: A vocabulary is a growing organism

bvatant:Need for a collaborative and sustainable governance.

aisaac: Who is going to care? LOV relies on Vocabulary owners?

bvatant: Good feedback from authors. But maintaining LOV requires a lot of manual curation.

danbri: what if LOV were down tomorrow? If we had a namespace apocalyse, what would we lose? Cultural loss, but would things break?

bvatant: One could reconstruct FOAF vocabulary from usage.


<kcoyle> preservation vs finding

bvatant: make small interesting vocabularies visible

<tbaker> ...Bootstrapping: who will be the first one to use it?

<kcoyle> usage may be a kind of self-documentation

danbri: Taxpayers' money issue. Three-year projects go away and someone does same thing four years later. We could do better work building on previous.

aperego: Vocabularies vs. Thesaurus, same requirements? Thesaurus, we have IDs/ codes but no URI... SKOS concept schemes can be more dynamic than others. Organzations like ISO have language codes but no URIs. LC has URIs for these codes but does not control evolution of these standards... Big issue: people using older version of a thesaurus - semantics may change for same URIs.

bvatant:Very similar issues Vocab/thesaurus: range modification/concept label modification

tbaker: Thesaurus developed by different groups, cultures, languages, the meaning is always changing

bvatant: does sustainability mean freezing terms meaning? Need for an infrastructure to handle meaning in time while still using the same URI.

mduro: Eurovoc, track version in time is not yet addressed. Publication of snapshot + modification documented. A Service could be provided but nothing right now.

fostrowski: Notion of semantic persistence is complicated. Meaning will change. Dictionaries record usage. If users use it "wrong". Need metadata to determine which version one used.

iherman: If we go that way, at which granularity should we put the timestamp information? Very expensive for organisations.

kcoyle: Preservation or Maintenance? Who can maintain a vocabulary if the author leave his role? How much effort / cost for maintaining a vocabulary?

JensLudwig: In long-term preservation community, need to make case for use in order to make case for preservation.

iherman: not one-size-fit-all solution. Some vocabularies are small (1-2 people). But oil-drilling, medical ontologies - enormous, complex

Minutes formatted by David Booth's scribe.perl version 1.138 (CVS log)
$Date: 2013-09-05 09:26:52 $