One of the major stumbling blocks in deploying RDF has been the difficulty data providers have in determining which vocabularies to use. For example, a publisher of scientific papers who wants to embed document metadata in the web pages about each paper has to make an extensive search to find the possible vocabularies and gather the data to decide which among them are appropriate for this use. Many vocabularies may already exist, but they are difficult to find; there may be more than one on the same subject area, but it is not clear which ones have a reasonable level of stability and community acceptance; or there may be none, i.e. one may have to be developed in which case it is unclear how to make the community know about the existence of such a vocabulary.
There have been several attempts to create vocabulary catalogs, indexes, etc. but none of them has gained a general acceptance and few have remained up for very long. The latest notable attempt is LOV, created and maintained by Bernard Vatant (Mondeca) and Pierre-Yves Vandenbussche (DERI) as part of the DataLift project. Other application areas have more specific, application-dependent catalogs; e.g., the HCLS community has established such application-specific "ontology portals" (vocabulary hosting and/or directory services) as NCBO and OBO. (Note that for the purposes of this document, the terms "ontology" and "vocabulary" are synonyms.) Unfortunately, many of the cataloging projects in the past relied on a specific project or some individuals and they became, more often than not, obsolete after a while.
Initially (1999-2003) W3C stayed out of this process, waiting to see if the community would sort out this issue by itself. We hoped to see the emergence of an open market for vocabularies, including development tools, reviews, catalogs, consultants, etc. When that did not emerge, we decided to begin offering ontology hosting (on www.w3.org) and we began the Ontaria project (with DARPA funding) to provide an ontology directory service. Implementation of these services was not completed, however, and project funding ended in 2005. After that, W3C took no active role until the emergence of schema.org and the eventual creation of the Web Schemas Task Force of the Semantic Web Interest Group. WSTF was created both to provide an open process for schema.org and as a general forum for people interested in developing vocabularies. At this point, we are contemplating taking a more active role supporting the vocabulary ecosystem.
The W3C Vocabulary management proposals set out here have emerged from our extensive discussions around the strategic direction that W3C should take in the Semantic Web and eGov Activities. It answers a clear community need and in that sense is simply something that W3C should do. Arguably, it's something W3C should have been doing for a long time but that's water under the bridge. Our work is required to underpin the development of the linked and open data visions. In that respect, undertaking this work is something our members — actual and potential — can reasonably expect of us.
The proposals below are simple and easy to do. Each has potential to generate tremendous value for the community. We suggest waiting until the value is clearly present before putting much effort into monetization. At a few points below, we note some possible revenue sources. In addition, once this work has demonstrated its value, it may be easier to obtain grant funding to improve it.
Goal: provide a forum for experts to talk to each other and newcomers to talk to the experts. This group can also help point out and coordinate areas of overlap among vocabularies, and help gather people into groups for new vocabularies
Proposal: redirect the Web Schemas Task Force to take on this role. Rename it, avoiding the word 'schema' to help clarify it's not particularly about schema.org, but keep the mailing list name (firstname.lastname@example.org) to avoid disruption. Add another chair. Perhaps call it "Vocabulary Advice Task Force" or "Vocabulary Coordination Group".
If possible, the group should host regular discussions on general vocabulary development issues, answer questions. Maybe have regular presentations where one group presents its vocabulary to the wider audience to get feedback (like w3c staff project reviews).
Goal: provide a forum for the people involved in each vocabulary to communicate and share material. Provide a trusted archive of public comments.
Proposal: in general, use Community Groups (CGs), with their normal tools (mailing lists, wikis, etc). Use Working Groups in situations where enough W3C members want a more formal process, possibly more restricted participation, and the stamp of "W3C Recommendation" on the vocabulary. (possible revenue source)
This is already done sometimes, as with the Open Annotation CG. We can encourage more people to do this by linking it with other services, such as vocabulary hosting (item 3, below).
Note that vocabularies have somewhat different stability and interoperability characteristics than most W3C Recommended technologies, so the full Recommendation Track is often not warranted. A well-constructed and properly published vocabulary seems to be taken at least as seriously by much of the market as a W3C-Recommended one.
Goal: Make it practical, even easy, for people to publish their vocabulary namespace document in accord with best practice, especially with regard to long-term stability.
Proposal: offer URLs starting with
http://www.w3.org/ns/ to any W3C group, including
Community Groups, as long as that group has an open/consensus
decision process. Provide a simple Web interface for people
to allocate a namespace and then update the contents of the
namespace document as needed. This would be subject to
reasonable terms of service, including the understanding that
individuals act as editors, on behalf of the group, and that
W3C has ultimate authority over the content.
Namespace Policy already allows W3C groups to claim URLs
http://www.w3.org/ns/, but that
policy was written before Community Groups existed and its
applicability to them is unclear. At the moment, requests to
allocate names and requests to update the contents of a
namespace document have to be handled by W3C staff.
In a second iteration, the W3C vocab hosting service could provide various tools which support or even enforce good practice in vocabulary development, such as not removing terms that people might be using. Existing Web-based tools such as WebProtege (from Stanford), Neologism (from DERI), and Knoodl (from Revelytix) should be considered.
A key advantage of W3C hosting, unlike most other options available, is that we can handle changes in personnel, business models, governments, corporate mergers, etc, through our normal group consensus processes.
Q: What if someone wants to host a vocabulary at W3C, but
does not want to turn over control to the group?
A: For now: tell them to come talk to us and we'll consider it on a case-by-case basis. (possible revenue source)
Q: What if a name is allocated and never used?
A: Abandoned vocabulary names may be reclaimed, depending on evidence of their use.
Q: What about name conflicts? What if someone wants to
claim "html" or "google" as a namespace name?
A: The terms of service will require that groups declare they have made reasonable effort to find other uses of the term, considered them, and concluded there is no significant likelihood of user confusion. If such confusion is reported, especially in the early days of the term being used, we may reallocate the name.
Q: Can we use
https://www.w3.org/ns/ (TLS secure)?
A: That's included automatically; all of www.w3.org is simultanously served with TLS.
Q: What about
A: We are considering this, as a way to better manage the load.
Q: What about
A: It's a possibility, if someone wants that. Does anyone?
Q: What about
A: Yes, subdirectories will be supported. (maybe this is just reserve-a-prefix)
Q: What about
A: Possibly in the future, to accommodate vocabularies that started outside W3C or that see a need for someday moving away from W3C. (possible revenue source)
Goal: Make sure that people selecting vocabularies have the data they need to make a good choice. Some of this data will be provided by the vocabulary providers (first-party metadata, self-reported) while some will be provided by others (third-party metadata).
Proposal: ask a group (maybe the experts group, item 1, above — or maybe a new CG) to come up with a vocabulary for this metadata, promote it, and also use it in the W3C vocabulary directory (item 5, below)
Some of this is currently in-scope for the Government Linked Data (GLD) Working Group, under Best Practices for Vocabulary Selection ("... issues of stability, security, and long-term maintenance commitment...").
Some possible items:
Goal: Provide vocabulary consumers (people publishing and in some cases consuming RDF data) with a convenient way to find the vocabularies that might work for them, along with metadata to help them choose among the options.
Proposal: in the first iteration, make a simple web page showing all the vocabularies we host, along with all the others that have been reported to us (via a web form which asks for basic metadata). For each vocabulary, provide some of the metadata from item 4, above. Depending on available resources and on feedback, grow this into a more complete "shopping site" where people can search, sort, and filter on various criteria, as well as enter their own ratings, reviews, and other metadata.
Q: Do we include all known vocabularies, or only ones that
seem pretty good?
A: We include any vocabulary for which someone is willing to fill out the form, or which already has embedded the basic required metadata.
Q: Do we include en masse the vocabularies known to
NCBO, LOV, prefix.cc, etc?
A: Probably not in the first iteration, because without good search and filter tools, those will dwarf the others. At the start, focus on vocabularies hosted at W3C or which people specifically request to be included, via the submission form.
At some point, it may be best to move to existing software, such as LOV or Calamachus; see an article on some others)
Ivan Herman email@example.com, Sandro Hawke firstname.lastname@example.org
$Id: vrc.html,v 1.24 2013-06-20 08:08:51 phila Exp $