See also: IRC log
See also: Session description and agenda, with links to presentations
See also: Session Discussion paper
tbaker:We consider here RDF Vocabularies as
sets of properties and classes to describe Linked Data. Vocabularies are
a valuable part of knowledge that has to be maintain over time. As
detailed in the session
discussion paper, we identified 3 main requirements for
preservation:
* Req.1 - Institutional guarantees for persistence of URIs
<bvatant> danbri disagrees about req.1 - will discus later on
tbaker:* Req.2 - Persistence of
documentation (links between URI/documentation should be maintained)
* Req.3 - Appropriate versioning of vocabularies (distinction between
last stable and historical versions)
tbaker:Some solutions to be considered:
* Cooperation among vocabulary maintainers (e.g. FOAF/DCMI)
* Role of memory institutions? Can insitutions offer standard agreements
on a long term preservation? What would be the scope of these agreements
(small/large/fragile vocabularies)?
* Initiatives like "safety through redundancy"
* Access to historical versions (LOCKSS, Memento protocol?)
tbaker:DCMI namespace and publication policies: meaning of key vocabulary URIs will not substantially change (emphasise on substantially). Commitment that non key vocabularies will be preserved forever (as long as DCMI exists). Example of DCMI/FOAF cooperation.
<kcoyle> What does "maintainer" mean? preserving? changing?
tbaker:DCMI maintains snapshots of FOAF. DCMI has access to the server.
<kcoyle> does preservation mean freezing vocabularies?
danbri: FOAF arises from the need for small application/users to use simple data
tbaker:Could this cooperation become a model?
iherman:W3C role did not really take care of
vocabularies which had to be handled by the community. W3c sees world as
distributed. Why centralize vocab management? Vocabs have to be on the
web, need a community (that takes technology). But the state now is
problematic. Maintenance of vocabulary website is not easy, the design
of a vocabulary quality may vary. Vocabs are really hard to find. A
vocabulary should be stable so the community can use it (stability of
URI). It is a general fact that it is difficult for the community to
develop vocabularies. Development and maintenance environment very much
needed (again, this is technology that not everyone can provide).
W3C wants to provide an environment for any community to develop, host,
maintain, preserve vocabularies. Could be articulate around Community
Groups within W3C (easy to create and offer already an environment to
discuss, wiki space, email lists, etc. No need for W3C membership to be
part of a Community Group). Example of Open
Annotation Ontology community group. OA CG brought together
existing annotation vocabularies and sought convergence.
<kcoyle> it is hard to define "what is a vocabulary?" - when is a vocab a vocab?
iherman:A Cross domain community group could be created to support the development of other communities
iherman:W3C vocabulary hosting: W3C offers w3c URL namespaces
<kcoyle> Is there some advantage to developing best practices about naming? (IRI structure, content)
iherman:Some policies should be defined so a certain level of vocabulary quality is reached (metadata, language, etc.). Policies on stability of URIs. A persistence policy is already in place in W3C. W3C will not remove any URIs.
<kcoyle> definitely need best practices about maintenance (do NOT remove existing terms!)
iherman:W3C web site is cvs archive. In the
process to get agreement from MIT (hosting most of W3C Website) to
guarantee the sustainability of W3C Website in case W3C does not exist
anymore
Will have working group on best practices for publishing data on the
web, including vocabs. Vocabulary directory/registry should be created.
Software? http://www.w3.org/2013/04/vocabs/.
Work in progress, any input welcome.
rwallis:PURL offers Persistent URLs through
re-direction - URLs can change. But this is more than simple redirects,
it includes software with accounts for management (community driven),
some interfaces and partial redirections. Community purlz.org. Open
Source, developed by OCLC in 1995. Other institutions are also running
PURL services, not just OCLC. For instance you have 2M hits for "dc"
over 2 weeks.
OCLC plan for the future: PURL is a key component of the Web, there is
NO plan for stopping/ extending the service. Commitment to transfer the
service in case OCLC support ends
mduro:The
metadata registry (MDR) is published on the EU Open Data Portal.
https://joinup.ec.europa.eu/catalogue/repository/metadata-registry
to see managed vocabs.
Governance over time : Based on a Common Data Model (IMMC Core Metadata
schema). Called 'authority tables' - lists of terms. Partial publication
in SKOS.
<kcoyle> discussion: authorities vs. SKOS vocabs - is there a difference? is it a difference that matters in practice?
mduro:capture for each concept (authority code, start-end use date, predecessor concept, successor concept + @en label). Use URIs for concepts. In the future, plan to have URI for each label so the system can handle multilingual labels. Very important for the future.
bvatant:http://lov.okfn.org
currently registered 360 vocabs and the management team is composed of 5
curators. Open source, anyone is welcome to participate in the curation
or development. Maps vocab use in linked data, shows relations with
other vocabularies (computed, using SPARQL). Provides a documentation
(metadata) about each vocabulary. Various services, APIs, SPARQL
Endpoints, search, suggest features.
Lessons learned: history tracking (both documentation and previous
versions file) tough even though it is 13 years old max. Publications
best practices need to be uniform and consistent. Example of DOLCE: URI
has changed (host moved). Some applications are still using the old URIs
with no redirection available. Every bit of a vocabulary can change ...
No information about changes, policies, stability... Who should care
about vocabulary preservation ?
<kcoyle> mis-quoting Ranganathan: A vocabulary is a growing organism
bvatant:Need for a collaborative and sustainable governance.
aisaac: Who is going to care? LOV relies on Vocabulary owners?
bvatant: Good feedback from authors. But maintaining LOV requires a lot of manual curation.
danbri: what if LOV were down tomorrow? If we had a namespace apocalyse, what would we lose? Cultural loss, but would things break?
bvatant: One could reconstruct FOAF vocabulary from usage.
<kcoyle> preservation vs finding
bvatant: make small interesting vocabularies visible
<tbaker> ...Bootstrapping: who will be the first one to use it?
<kcoyle> usage may be a kind of self-documentation
danbri: Taxpayers' money issue. Three-year projects go away and someone does same thing four years later. We could do better work building on previous.
aperego: Vocabularies vs. Thesaurus, same requirements? Thesaurus, we have IDs/ codes but no URI... SKOS concept schemes can be more dynamic than others. Organzations like ISO have language codes but no URIs. LC has URIs for these codes but does not control evolution of these standards... Big issue: people using older version of a thesaurus - semantics may change for same URIs.
bvatant:Very similar issues Vocab/thesaurus: range modification/concept label modification
tbaker: Thesaurus developed by different groups, cultures, languages, the meaning is always changing
bvatant: does sustainability mean freezing terms meaning? Need for an infrastructure to handle meaning in time while still using the same URI.
mduro: Eurovoc, track version in time is not yet addressed. Publication of snapshot + modification documented. A Service could be provided but nothing right now.
fostrowski: Notion of semantic persistence is complicated. Meaning will change. Dictionaries record usage. If users use it "wrong". Need metadata to determine which version one used.
iherman: If we go that way, at which granularity should we put the timestamp information? Very expensive for organisations.
kcoyle: Preservation or Maintenance? Who can maintain a vocabulary if the author leave his role? How much effort / cost for maintaining a vocabulary?
JensLudwig: In long-term preservation community, need to make case for use in order to make case for preservation.
iherman: not one-size-fit-all solution. Some vocabularies are small (1-2 people). But oil-drilling, medical ontologies - enormous, complex