From W3C Wiki
"Dowsing is a type of divination employed in attempts to locate ground water, buried metals or ores, gemstones, oil, gravesites, and many other objects and materials, as well as so-called currents of earth radiation, without the use of scientific apparatus."
--Wikipedia article on Dowsing, retrieved on 14th January 2010.
At the moment, the methods used in practice to locate an adequate vocabulary for describing one's data in RDF are more akin to dowsing than to an educated, technically-guided choice, supported by scientific tools and methodologies. While the situation is improving with the progress of Semantic Web search engines and better education, oftentimes data publishers still rely on informal criteria such as word-of-mouth, reputation or follow-your-nose strategies.
This page tries to identify methods, tools, applications, websites or communities that can help Linked Data publishers to discover or build the right vocabulary they need. The tools identified below are sorted from the ones that require less time and efforts from the publisher's side to those that require hard work.
Lists of ontologies
There are several webpages that reference ontologies by simply matching a theme (e.g., People, Product) to a URI. Examples:
- Semanticweb.org gives a short list on its homepage;
- VocabularyMarket provides links to ontologies, answering simple questions (how about music collections?);
- Ontology on Semanticweb.org has a list of ontologies, ranked according to their usage.
This category requires minimal effort: if the publisher's data are in the domains referenced in the list, the corresponding ontology can readily used.
These lists pose the question "how to define what's in these lists?" Popularity is one aspects, quality may be another. What is a quality ontology? When does it become popular? Who decides?
Semantic Web search engines are applications for finding ontologies that require reasonable effort: queries are usually written as natural language keywords and results are ranked. Some additional information is often provided. Examples:
- Sindice generic Semantic Web document search;
- FalconS has a term search feature;
- Swoogle is the grand-father of Semantic Web search engines;
- SWSE is an RDF entity search engine;
- Watson is an ontology search engine;
- Semantic Web Search is yet another search engine.
The problem here is that it is still hard to choose between two matching ontologies. What should guide publishers to the right choice? Should these ontologies be reused at all? See also BuildOrBuyTerms.
Ontology repositories are usually more specific that semantic web search engines and their navigation/search interfaces can vary greatly. They offer tools that may be specific to the type of applications the repository was designed for. Examples;
- SchemaWeb is a quite old ontology directory, but it is still used;
- Schemapedia is another ontology directory;
- Cupboard is an ontology repository with some advanced features, powered by Watson Semantic search engine;
- Knoodl is a repository and collaborative ontology management tool;
- Ontology Design Patterns repository for design patterns and ontology modules following the patterns
- Prefix.cc is a namespace lookup service, which can be seen as a kind of vocabulary directory;
- DERI Vocabularies is a repository and can be used as an online ontology editor.
Mailing lists/online community
If other tools are not sufficient to find an appropriate vocabulary, publishers can (and often do) rely on online communities by asking them directly. Examples:
- W3C Semantic Web mailing list;
- Linking Open Data ML;
- Semantic Web Yahoo! Group;
- SemanticOverflow is a Q&A service about semantic technologies;
This is a rather effortless solution which can be really efficient in some case. However, repeated enquiries about vocabularies can easily polute the traffic and publishers should first try to find a solution on their own, e.g., by following the links and indications and this wiki page.
If a data publishers cannot find a relevant vocabulary, or existing vocabularies are not good enough/suitable for the use case, they can make their own ontology. They can be helped by editors, such as:
- Protégé ontology editor (popular, pluggable)
- NeOn Toolkit is another ontology editor with many pluggins available. It is especially suited for heavy-weight projects (e.g., multi-modular ontologies, multi-lingual, ontology integration, etc);
- SWOOP is a small and simple ontology editor;
- Neologism is an online vocabulary editor and publishing platform;
- TopBraid Composer is a multipurpose Semantic Web editor;
- Vitro is an Integrated Ontology Editor and Semantic Web Application;
- Knoodl is a community-oriented ontology and knowledge base editor
This requires considerable efforts and requires some guidelines. Here are best practices:
- DontWorryBeCrappy: it's ok to do it wrong, it can improve later;
- Best Practice Recipes for Publishing RDF Vocabularies.
In addition to finding or making an ontology that contains the terms that are needed for the dataset, publishers may like to assess the quality of the ontologies, especially when they have the choice between several of them. Some possible factors:
- Fully documented;
- Used by independent data pubslihers;
- There exist tools that support the vocabulary specifically;
- The ontology is highly ranked by users in a voting system;
- all terms are dereferencable;
- The ontology just covers the right domain (not an upper level "ontology of everything");
- expressive enough: the ontology has axioms that make valuable inferences;
- not too expressive: the ontology does not define axioms that have limited utility and would make reasoning costly;
- OWL 2 Validator determines whether an ontology is in OWL 2 DL, OWL 2 EL, OWL 2 QL, OWL 2 RL or OWL 2 Full;
- OWL 1 Validator determines whether an ontology is in OWL 1 DL, OWL 1 Lite or OWL 1 Full;
- RDF Validator the official W3C validator;
- rdf:alterts is a tool for finding potential problems in linked data;
Related Events, Projects, etc.
There is an important amount of research work going on to solve parts of the problem of guiding publishers to the right vocabulary:
- EKAW 2010 Workshop on Ontology Quality;
- ISWC 2010 Workshop on Semantic Repositories for Web, SERES 2010;
- SEALS is a European project on evaluating semantic applications, including ontologies;
- Semantic Web Journal is an academic journal which encourages the publication of ontology description (=> peer reviewed ontologies => good ontologies, in principle).