Ontology Dowsing

From W3C Wiki
Revision as of 19:48, 20 May 2013 by Rliepins (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search
"Dowsing is a type of divination employed in attempts to locate ground water,
buried metals or ores, gemstones, oil, gravesites, and many other objects and
materials, as well as so-called currents of earth radiation, without the use of 
scientific apparatus."

--Wikipedia article on Dowsing, retrieved on 14th January 2010.


At the moment, the methods used in practice to locate an adequate vocabulary for describing one's data in RDF are more akin to dowsing than to an educated, technically-guided choice, supported by scientific tools and methodologies. While the situation is improving with the progress of Semantic Web search engines and better education, oftentimes data publishers still rely on informal criteria such as word-of-mouth, reputation or follow-your-nose strategies.

This page tries to identify methods, tools, applications, websites or communities that can help Linked Data publishers to discover or build the right vocabulary they need. The tools identified below are sorted from the ones that require less time and efforts from the publisher's side to those that require hard work.


Lists of ontologies and services

  Main article: Lists of ontologies

There are several webpages that reference ontologies by simply matching a theme (e.g., People, Product) to a URI or listing tools to find ontologies. Examples:

This category requires minimal effort: if the publisher's data are in the domains referenced in the list, the corresponding ontology can readily used.

These lists pose the question "how to define what's in these lists?" Popularity is one aspects, quality may be another. What is a quality ontology? When does it become popular? Who decides?

Search engines

  Main article: Search engines

Semantic Web search engines are applications for finding ontologies that require reasonable effort: queries are usually written as natural language keywords and results are ranked. Some additional information is often provided. Examples:

  • FalconS has both term search and ontology search features;
  • Sindice generic Semantic Web document search;
  • Swoogle is the grand-father of Semantic Web search engines;
  • SWSE is an RDF entity search engine;
  • vocab.cc RDF term search;
  • Watson is an ontology search engine.

The problem here is that it is still hard to choose between two matching ontologies. What should guide publishers to the right choice? Should these ontologies be reused at all? See also BuildOrBuyTerms.

Repositories

  Main article: Ontology repositories

Ontology repositories are usually more specific that semantic web search engines and their navigation/search interfaces can vary greatly. They offer tools that may be specific to the type of applications the repository was designed for. Examples;

  • SchemaWeb is a quite old ontology directory, but it is still used;
  • Schemapedia is another ontology directory;
  • Cupboard is an ontology repository with some advanced features, powered by Watson Semantic search engine;
  • Knoodl is a repository and collaborative ontology management tool;
  • Ontology Design Patterns repository for design patterns and ontology modules following the patterns
  • Prefix.cc is a namespace lookup service, which can be seen as a kind of vocabulary directory;
  • DERI Vocabularies is a repository and can be used as an online ontology editor;
  • OWL Seek is a repository of ontologies with additional metadata such as funding organisation, submitter, submission dates and possibly to get a list sorted by various criteria.
  • SchemaCache is a tool by Talis.
  • Linked Open Vocabulary offers a visual navigation among dereferenceable vocabularies (made by Mondeca).

Mailing lists/online communities

  Main articles: Ontology-related mailing lists and Ontology-related online communities

If other tools are not sufficient to find an appropriate vocabulary, publishers can (and often do) rely on online communities by asking them directly. Examples:

This is a rather effortless solution which can be really efficient in some case. However, repeated enquiries about vocabularies can easily polute the traffic and publishers should first try to find a solution on their own, e.g., by following the links and indications and this wiki page. See also MailingLists.

Ontology Editors

  Main article: Ontology editors

If a data publishers cannot find a relevant vocabulary, or existing vocabularies are not good enough/suitable for the use case, they can make their own ontology. They can be helped by editors, such as:

  • Protégé ontology editor (popular, pluggable).
  • WebProtégé is the online version of Protégé.
  • NeOn Toolkit is another ontology editor with many pluggins available. It is especially suited for heavy-weight projects (e.g., multi-modular ontologies, multi-lingual, ontology integration, etc);
  • SWOOP is a small and simple ontology editor;
  • Neologism is an online vocabulary editor and publishing platform;
  • TopBraid Composer is a multipurpose Semantic Web editor;
  • Vitro is an Integrated Ontology Editor and Semantic Web Application;
  • Knoodl is a community-oriented ontology and knowledge base editor.
  • Ontofly is a web-based ontology editor.
  • Altova OWL editor is another ontology editor.
  • PoolParty is a thesaurus management system and a SKOS editor.
  • IBM Integrated Ontology Development Toolkit An ontology toolkit for storage, manipulation, query, and inference of ontologies and corresponding instances, based on Eclipse.
  • Anzo for Excel will generate an initial ontology based on spreadsheet data and structure.
  • Euler GUI is an editor for N3, RDF, OWL and other various other things.
  • OWLGrEd is a graphical ontology editor for OWL.


This requires considerable efforts and requires some guidelines.

Learning ontology design, best practices and evaluation

  Main articles: Best practices and Ontology evaluation

Guides to ontology design:

Here are best practices:

In addition to finding or making an ontology that contains the terms that are needed for the dataset, publishers may like to assess the quality of the ontologies, especially when they have the choice between several of them. Some possible factors:

  • Fully documented;
  • Used by independent data pubslihers;
  • There exist tools that support the vocabulary specifically;
  • The ontology is highly ranked by users in a voting system;
  • all terms are dereferencable;
  • The ontology just covers the right domain (not an upper level "ontology of everything");
  • expressive enough: the ontology has axioms that make valuable inferences;
  • not too expressive: the ontology does not define axioms that have limited utility and would make reasoning costly;

Tools:

  • OWL 2 Validator determines whether an ontology is in OWL 2 DL, OWL 2 EL, OWL 2 QL, OWL 2 RL or OWL 2 Full;
  • OWL 1 Validator determines whether an ontology is in OWL 1 DL, OWL 1 Lite or OWL 1 Full;
  • RDF Validator the official W3C validator for RDF/XML syntax validation;
  • rdf:alerts is a tool for finding potential problems in linked data;
  • ...

Interlinking ontologies

  Main articles: Interlinking vocabularies

Interlinking ontologies can mean reusing existing ontologies, in modular way, or aligning ontologies. There are quite a lot of tools for ontology matching, most of which are prototypes from research projects. The following are well maintained tools:

Some useful information are available on http://ontologymatching.org/, including references to existing tools, a very comprehensive list of scientific publications on the topic (500+ references).

Related Events, Projects, etc.

  Main articles: Ontology-related events and Ontology-related projects

There is an important amount of research work going on to solve parts of the problem of guiding publishers to the right vocabulary: