Draft. This plan is still under development. Please send feedback to vocab-services@w3.org.

Latest version at: http://www.w3.org/2013/04/vocabs/ (version history).
This is $Revision: 1.29 $ $Date: 2013/06/06 17:15:30 $

W3C

W3C Vocabulary Services

In order to promote the widespread interoperability of data, the W3C is beginning to offer a set of services to help people select, create, and maintain IRI vocabularies useful for creating reusable data. These vocabulary management services will build on existing W3C activities to create a sustainable community of people creating, maintaining, and using vocabularies for data sharing.

Introduction

Effective data sharing requires people or systems to know how elements in a dataset are supposed to be understood. For example, a csv file can only be used by people who know what the column headings mean. When software is written to collect and analyze data, its developers need to understand these structural elements. When data is coming from many different sources, the data producers have to agree on structural elements (column names) or else the data consumers have to analyze and then write code for each different style.

One technique for addressing this problem is to use web addresses (URLs, URIs, or more recently IRIs) as identifiers for elements of the structure (eg column headings). This establishes a single authoritative source of information about the identifier's meaning — the web page — while still allowing everyone who can create a website to create as many new identifiers as desired. It also allows people to easily and unambiguously refer to the identifiers, for example when recommending them to a colleague or searching for datasets which provide particular data.

While this use of IRIs has been adopted in some technical communities, there are several barriers to wider adoption:

With the growing world-wide demand for data interoperability, these barriers are becoming increasingly problematic. Fortunately, W3C is well-positioned to address these problems. The vocabulary services outlined below build on existing W3C services and strengths. These services will make it much easier for people to obtain high quality vocabularies and help create new ones; this in turn will promote data sharing, reuse, and interoperability.

In general, these services are inexpensive to provide and can be offered for free to the public. W3C may, however, decide to charge for certain services and/or limit their use to W3C members.

Vocabulary Groups

Summary: We will promote the use of W3C Community Groups for gathering people interested in developing and maintaining individual vocabularies, and we will encourage new and existing cross-domain groups to help guide others in creating better vocabularies.

W3C has several different kinds of groups, including Community Groups (which can be created by anyone with just four other interested people), and Working Groups (which are created after a review process by the W3C Advisory Committee). For creating or maintaining a vocabulary, Community Groups are likely to be the preferred option, because of their lower financial and procedural overhead.

One drawback to Community Groups (in contrast to Working Groups) is that without W3C staff to help guide them, their progress depends on their leadership figuring out, on their own, how to make a new standard vocabulary. To reduce this burden and draw in people to use Community Groups, we are creating a guide to developing vocabularies at W3C.

In order to further help people, particularly on the technical aspects of the process, we will continue to support the WebSchemas (public-vocabs@w3.org) group as a community of practitioners willing to offer advice on vocabulary design. Although this group has so far largely focused on vocabularies hosted at schema.org, its mission covers all vocabularies. With help from the chair and schema.org, we intend to promote this group more broadly as a source for vocabulary technical reviews, coordination, and developing vocabulary design expertise.

Other cross-domain groups may also be formed, following the normal processes, such as to produce a vocabulary-design Best Practices document. As explained below, we may also form a group to develop vocabulary metadata suitable for helping people choose among available vocabularies.

Vocabulary Hosting

Summary: We will promote and clarify the existing policy of giving www.w3.org/ns space to any W3C group, including community groups, as well as to other organizations on a case-by-case basis. The goal is to allow vocabularies useful for open data interchange to be hosted by W3C and maintained by the people who care about them.

The longstanding W3C policy has been to give out namespaces (web space for vocabularies) upon request by any W3C group, including Community Groups and Business Groups. In practice, few groups have taken advantage of this policy and few people know it is an option. With this in mind, we intend to publicize this service and streamline the namespace document publication process.

Specifically: the chairs of W3C groups, including Community Groups, will have a simple way to reserve and update namespace documents, should they choose to have their vocabulary hosted by W3C and managed according to its policies. If they make use of this service, they will be required to confirm having several facts, including:

  1. Your group has agreed to publish this new version, as per your stated decision policy
  2. You and your group are acting on behalf of W3C as stewards for this vocabulary, helping maintain it on behalf of existing and potential users around the world.
  3. You believe this change will do no harm to W3C or to existing or potential users of this vocabulary.
  4. You have not removed documentation for terms that were previously included and people have started to use.
  5. If this change is non-trivial, you have asked for review from public-vocabs@w3.org and other relevant cross-domain lists and your group has publically responded to all comments.
  6. The group has made reasonable efforts to include labels and documentation in multiple languages and is prepared to add multilingual labels and documentation offered by users of the vocabulary.

When we publish the documents, we will add prominent notice of the status (not being endorsed by W3C) along with instructions for how to give feedback. At some point we may provide software tools which support group development of vocabularies.

This service provides both an easy-to-maintain vocabulary website and an institutional commitment to maintain that site as long as people are willing to participate in W3C groups to do the work. This second feature — a strong persistence policy — is essential to vocabulary users who want their data to remain usable for many years to come.

In order to support long-term persistance, we will continue to develop contingiency plans for allowing w3.org to remain in operation even after W3C no longer uses it or no longer exists. We may also support vocabulary hosting on domains other than w3.org, so that W3C-administered groups can manage existing non-w3.org namespaces, and new vocabularies can be developed while retaining the option of someday moving away from W3C.

Vocabulary Directory

Summary: We will collect, maintain, and distribute information about all available vocabularies, with the aim of helping people identify and decide among alternatives. This will be an open data application, freely interoperating with related information services.

Even though at present vocabularies are generally available free of charge, one may consider vocabulary adoption as a market, with "consumers" trying to identify "products" and choose among them. From this perspective, consumers in the current vocabulary market have little information about available products and their features. This is hardly surprising: there is little or no "advertising", there no simple business case for "retailers" to attract and guide consumers, and there is little available "product information" as might be printed on a package.

The current market has had some "retailers" who have since gone away (schema.net), some promising newcomers (LOV), and some successful efforts in subdomains with available funding (BioPortal). We plan to improve the flow of information in this market in two complementary ways:

  1. Vocabulary Directory Website. We will provide a "retail" website where people can maintain and search listings of vocabularies (wherever they are hosted), along with useful metadata ("product information"). Metadata may include simple endorsements ("like", "+1", star ratings) and more detailed information like reviews, the list of open/closed issues, and the list of public users/implementations.

  2. Vocabulary Market Database. The directory will be an open data application, making its internal data available for others and consuming data feeds from others. People who have existing metadata will be able to easily provide it to the directory, and people will be able to create new interfaces for exploring and exploiting the data.

This will be a development project involving both developing software and developing metadata vocabularies. W3C staff effort will be partially supported by external funding, and we will work with volunteers/partners. The software will be open source and the vocabularies will, of course, make use of the vocabulary management services described above, using a Community Group and review by public-vocabs@w3.org.

Using an open data architecture for these services is important not just as a validation and demonstration of the underlying technologies, but because it keeps down the barriers to entry for everyone, everywhere, trying to share data. Where a traditional closed directory acts as a bottleneck, stifling new approaches and dissuading people from participating because of uncertainties in how that directory might be run, an open directory welcomes everyone to participate. Everyone is free to list new vocabularies, add information about vocabularies, and creat apps to help people find and work with vocabularies. This openness and innovation is a hallmark of shared data and will be a key benefit of this service.


Sandro Hawke, sandro@w3.org, editor

$Date: 2013/06/06 17:15:30 $