"Dowsing is a type of divination employed in attempts to locate ground water, buried metals or ores, gemstones, oil, gravesites, and many other objects and materials, as well as so-called currents of earth radiation, without the use of scientific apparatus."
--Wikipedia article on Dowsing, retrieved on 14th January 2010.
At the moment, the methods used in practice to locate an adequate vocabulary for describing one's data in RDF are more akin to dowsing than to an educated, technically-guided choice, supported by scientific tools and methodologies. While the situation is improving with the progress of Semantic Web search engines and better education, oftentimes data publishers still rely on informal criteria such as word-of-mouth, reputation or follow-your-nose strategies.
This page tries to identify methods, tools, applications, websites or communities that can help Linked Data publishers to discover or build the right vocabulary they need. The tools identified below are sorted from the ones that require less time and efforts from the publisher's side to those that require hard work.
Lists of ontologies and services
Main article: Lists of ontologies
There are several webpages that reference ontologies by simply matching a theme (e.g., People, Product) to a URI or listing tools to find ontologies. Examples:
- Semanticweb.org gives a short list on its homepage;
- VocabularyMarket provides links to ontologies, answering simple questions (how about music collections?) and some other links to services, similar to this page but with many outdated links and a disorganised structure;
- Ontology on Semanticweb.org has a list of ontologies, ranked according to their usage;
- RDF Schema Registry has a big list of ontologies on various topics;
- Ontologies that are fully documented, used by independent datasets and supported by tools, on SemanticOverflow;
- Semantic Web Searching provides links to many search services, both semantic and non-semantic, with some outdated links;
- The Library Linked Data XG has written a report on available metadata element sets in the library domain.
This category requires minimal effort: if the publisher's data are in the domains referenced in the list, the corresponding ontology can readily used.
These lists pose the question "how to define what's in these lists?" Popularity is one aspects, quality may be another. What is a quality ontology? When does it become popular? Who decides?
Main article: Search engines
Semantic Web search engines are applications for finding ontologies that require reasonable effort: queries are usually written as natural language keywords and results are ranked. Some additional information is often provided. Examples:
- FalconS has both term search and ontology search features;
- Sindice generic Semantic Web document search;
- Swoogle is the grand-father of Semantic Web search engines;
- SWSE is an RDF entity search engine;
- vocab.cc RDF term search;
- Watson is an ontology search engine.
The problem here is that it is still hard to choose between two matching ontologies. What should guide publishers to the right choice? Should these ontologies be reused at all? See also BuildOrBuyTerms.
Main article: Ontology repositories
Ontology repositories are usually more specific that semantic web search engines and their navigation/search interfaces can vary greatly. They offer tools that may be specific to the type of applications the repository was designed for. Examples;
- Linked Open Vocabularies is a living collaborative data base of vocabularies, with rich metadata, interlinking, and version history. "All you need is LOV!"
- Prefix.cc is a namespace lookup service, which can be seen as a kind of vocabulary directory;
- vocab.cc is an RDF vocabulary search and lookup
- DERI Vocabularies is a repository and can be used as an online ontology editor;
- Knoodl was a repository and collaborative ontology management tool;
- Ontology Design Patterns repository for design patterns and ontology modules following the patterns
- OWL Seek was a repository of ontologies with additional metadata such as funding organisation, submitter, submission dates and possibly to get a list sorted by various criteria.
Mailing lists/online communities
If other tools are not sufficient to find an appropriate vocabulary, publishers can (and often do) rely on online communities by asking them directly. Examples:
- Ontology Dowsing Google Group;
- W3C Semantic Web mailing list;
- Linking Open Data ML;
- RDF Interest Group
- Semantic Web Yahoo! Group;
- SemanticOverflow is a Q&A service about semantic technologies;
- LOV community is the Linked Open Vocabulary mailing list
This is a rather effortless solution which can be really efficient in some case. However, repeated enquiries about vocabularies can easily polute the traffic and publishers should first try to find a solution on their own, e.g., by following the links and indications and this wiki page. See also MailingLists.
Main article: Ontology editors
If a data publishers cannot find a relevant vocabulary, or existing vocabularies are not good enough/suitable for the use case, they can make their own ontology. They can be helped by editors, such as:
- Protégé ontology editor (popular, pluggable).
- WebProtégé is the online version of Protégé.
- NeOn Toolkit is another ontology editor with many plugins available. It is especially suited for heavy-weight projects (e.g., multi-modular ontologies, multi-lingual, ontology integration, etc);
- SWOOP is a small and simple ontology editor;
- Neologism is an online vocabulary editor and publishing platform;
- TopBraid Composer is a multipurpose Semantic Web editor;
- Vitro is an Integrated Ontology Editor and Semantic Web Application;
- Knoodl is a community-oriented ontology and knowledge base editor.
- Ontofly is a web-based ontology editor.
- Altova OWL editor is another ontology editor.
- PoolParty is a thesaurus management system and a SKOS editor.
- IBM Integrated Ontology Development Toolkit is an ontology toolkit for storage, manipulation, query, and inference of ontologies and corresponding instances, based on Eclipse.
- Anzo for Excel will generate an initial ontology based on spreadsheet data and structure.
- Euler GUI is an editor for N3, RDF, OWL and other various other things.
- OWLGrEd is a graphical ontology editor for OWL.
- Fluent Editor is a tool for editing, manipulating and querying complex ontologies written in OWL, RDF or SWRL.
- Intelligent Topic Manager is a Web-based and ontology-driven application to collaboratively manage and maintain data models, multilingual vocabularies and rules. Helps centralize all terminology resources and expose them to business applications.
This requires considerable efforts and requires some guidelines.
Learning ontology design, best practices and evaluation
Guides to ontology design:
- Protégé guide
- An Executive Intro to Ontologies in AI³ blog;
- Creating an RDF vocabulary: Lessons learned is a blog post by Richard Cyganiak which provides advises for building a Web vocabulary.
Here are best practices:
- DontWorryBeCrappy: it's ok to do it wrong, it can improve later;
- Best Practice Recipes for Publishing RDF Vocabularies.
- Handbook about Metadata recommendations for linked open data vocabularies.
In addition to finding or making an ontology that contains the terms that are needed for the dataset, publishers may like to assess the quality of the ontologies, especially when they have the choice between several of them. Some possible factors:
- Fully documented;
- Used by independent data pubslihers;
- There exist tools that support the vocabulary specifically;
- The ontology is highly ranked by users in a voting system;
- all terms are dereferencable;
- The ontology just covers the right domain (not an upper level "ontology of everything");
- expressive enough: the ontology has axioms that make valuable inferences;
- not too expressive: the ontology does not define axioms that have limited utility and would make reasoning costly;
- OWL 2 Validator determines whether an ontology is in OWL 2 DL, OWL 2 EL, OWL 2 QL, OWL 2 RL or OWL 2 Full;
- OWL 1 Validator determines whether an ontology is in OWL 1 DL, OWL 1 Lite or OWL 1 Full;
- RDF Validator the official W3C validator for RDF/XML syntax validation;
- rdf:alerts is a tool for finding potential problems in linked data;
Interlinking ontologies can mean reusing existing ontologies, in modular way, or aligning ontologies. There are quite a lot of tools for ontology matching, most of which are prototypes from research projects. The following are well maintained tools:
- The Alignment API is used and supported by many ontology matching tools;
Some useful information are available on http://ontologymatching.org/, including references to existing tools, a very comprehensive list of scientific publications on the topic (500+ references).
Related Events, Projects, etc.
There is an important amount of research work going on to solve parts of the problem of guiding publishers to the right vocabulary:
- EKAW 2010 Workshop on Ontology Quality;
- ISWC 2010 Workshop on Semantic Repositories for Web, SERES 2010;
- SEALS is a European project on evaluating semantic applications, including ontologies;
- Semantic Web Journal is an academic journal which encourages the publication of ontology description (=> peer reviewed ontologies => good ontologies, in principle).