Draft. This plan is still under development. Please send feedback to sandro@w3.org.

Latest version at: http://www.w3.org/2013/04/vocabs/ (version history).
This is Revision: 1.121.19 $ Date: 2013-04-25 17:57:4320:37:04 $

W3C

W3C Vocabulary Services

In order to promote the widespread interoperability of data, the W3C is beginning to offer a set of services to help people select, create, and maintain IRI vocabularies useful for creating reusable data. These vocabulary management services will build on existing W3C activities to create a sustainable community of people creating, maintaining, and using vocabularies for data sharing.

Introduction

Effective data sharing requires people or systems to know how elements in a dataset are supposed to be understood. For example, a csv file can only be used by people who know what the column headings mean. When software is written to collect and analyze data, its developers need to understand these structural elements. When data is coming from many different sources, the data producers have to agree on structural elements (column names) or else the data consumers willhave to learnanalyze and then write code for each different style.

One technique for addressing this problem is to use web addresses (URLs, URIs, or more recently IRIs) as identifiers for elements of the structure (eg column headings). This establishes a single authoritative source of information about the identifier's meaning — the web page — while still allowing everyone who can create a website to create as many new identifiers as desired. It also allows people to easily and unambiguously refer to the identifiers, for example when recommending them to a colleague or asking a question about the meaning.

While this use of IRIs has been adopted in some technical communities, there are several barriers remainto wider adoption:

With the growing world-wide demand for data interoperability, these barriers are becoming increasingly problematic. Fortunately, W3C is well-positioned to address these problems. The vocabulary services outlined below build on existing W3C services and strengths. These services will make it much easier for people to obtain high quality vocabularies and help create new ones; this in turn will promote data sharing, reuse, and interoperability.

In general, these services are inexpensive to provide and can be offered for free to the public. W3C may, however, decide to charge for certain services and/or limit their use to W3C members.

Vocabulary HostingGroups

Summary: We will promote and clarifythe existing policyuse of giving www.w3.org/ns space to anyW3C group, includingCommunity groups. The goal is to allow vocabularies usefulGroups for open data interchange to be hosted by W3C and maintained by thegathering people who care about them. The longstanding W3C policy has beeninterested in developing and maintaining individual vocabularies, and we will form a Vocabulary Review Group to give out namespaces upon request by anyhelp guide others in creating better vocabularies.

W3C group. When the policy was created,has several different kinds of groups, including Community Groups could only(which can be created by anyone with the approval of the W3C Advisory Committee (representing the W3C Membership),just four other interested people), and every group includedWorking Groups (which are created after a member ofreview process by the W3C staff. Since then, W3C has begun to supportAdvisory Committee). For creating or maintaining a vocabulary, Community Groups and Business Groups , which canare likely to be quickly created, without membership approval,the preferred option, because of their lower financial and do not have ongoingprocedural overhead.

One drawback to Community Groups (in contrast to Working Groups) is that without W3C staff participation. The namespace policy was interpretedto include thesehelp guide them, their progress depends on their leadership figuring out, on their own, how to make a new standard vocabulary. To reduce this burden and draw in people to use Community Groups, as long aswe will create a guide to develoloping vocabularies at W3C.

In order to further help people, particularly on the "shortname"technical aspects of the process, we will form a cross-domain group of experts prepared to examine vocabularies and give suggestions, a Vocabulary Review Group. We plan to create this group by retargetting the WebSchemas (public-vocabs@w3.org) group. That group was basedcreated as a public forum for discussing and reviewing new vocabularies and vocabulary extensions, but it was initially motivated by just the schema.org vocabulary and since then has largely focussed on that one (large) vocabulary.

With help from the chairs and schema.org, we intend to rename the group and clarify the mission. The new Vocabulary Review Group name.will be charged with helping people produce high-quality vocabularies by examining ones that are submitted and offering advice. Depending on interest within the VRG, these reviews might by performed by the entire VRG (eg during a teleconference), by a subgroup, or by individuals who volunteer. The advice may be public, and become part of the information people consider in deciding whether to adopt a vocabulary. The VRG will not, however, be a decision-making body, and such advice will be attributed to particular individuals (perhaps with other individuals concurring or dissenting), not the VRG as a whole.

Vocabulary Hosting

Summary: We will promote and clarify the existing policy of giving www.w3.org/ns space to any W3C group, including community groups. The goal is to allow vocabularies useful for open data interchange to be hosted by W3C and maintained by the people who care about them.

The longstanding W3C policy has been to give out namespaces (spaces for vocabularies) upon request by any W3C group, including Community Groups and Business Groups. In practice, few groups have taken advantage of this policy. It is not widely known, and the process for updating the namespace document (the vocabulary website) is not specified. @@@With this in mind...mind, we intend to publicize this service and automate the namespace document publication process.

Specifically: the chairs of W3C groups, including Community Groups, will have access to a web form which allows them to reserve and update namespace documents. The form will ask for some metadata, like what decision process was used by the group,metadata and request that the group seek review from public-vocabs@w3.org before making significant updates. It will require confirmation of certain terms-of-service, including a stipulation that the group is maintaining the vocabulary on behalf of the broader community of stakeholders and that W3C retains ownership. When it publishes the documents, the system will add prominent notice of the status (not being endorsed by W3C) along with instructions for how to give feedback.

At some point this interface may be expanded to provide software tools which support group development of vocabularies. It may also be extended to cover namespace documents on domains other than w3.org, in order to allow vocabularies to potentially become independent of W3C.

Vocabulary Groups Summary: We will re-purpose the public-vocabs@w3.org group into a general group of vocabulary-development experts, with a mission to help other groups produce high-quality vocabularies. We will promote the use of Community Groups for coordination among the people interested in developing and maintaining individual vocabularies. @@@ public-vocabs created....; who are the experts...? @@@ outreach? @@@ review sessions? VocabularyDirectory

Summary: We will collect, maintain, and distribute information about vocabularies, with the aim of helping people identify and decide among alternatives. This will be an open data application, freely interoperating with related information services.

Even though at present vocabularies are generally available free of charge, we may consider vocabulary adoption as a market, with "consumers" trying to identify "products" and choose among them. From this perspective, consumers in the current vocabulary market have very little information about available products and their features. This is hardly surprising: there is little or no "advertising", there no simple business case for "retailers" to attract and guide consumers, and there is little available "product information" as might be printed on a package.

The current market has had some "retailers" who have since gone away (schema.net), some promising newcomers (LOV), and some successful efforts in subdomains with available funding (BioPortal). We plan to improve the flow of information in this market in two complementary ways:

  1. Vocabulary Directory Website. We will provide a "retail" website where people can maintain and search listings of vocabularies, along with useful metadata ("product information"). Metadata may include simple endorsements ("like", "+1", star ratings) and more detailed information like reviews, the list of open/closed issues, and the list of public users/implementations.

  2. Vocabulary Market Database. The directory will be an open data application, making its internal data available for others and consuming data feeds from others. People who have existing metadata will be able to easily provide it to the directory, and people will be able to create new interfaces for exploring and exploiting the data. The system will be architected to give the directory website no special status; people will be able to create alternative vocabulary directories (other "retail" sites) backed by the same data. (This is a "dogfood" project, using open data technologies to support the open data ecosystem.)

By making the directory an open data application, and by branding it as a W3C service, we are likely to be able to make the service essentially complete, listinginclude all in-use vocabularies.vocabularies currently in use. By using an open data architecture, we avoid stiffling the market; we provide an extremely low barrier to entry for either new vocabulary producers or innovative retailers.


Sandro Hawke, sandro@w3.org, editor

Date: 2013-04-25 17:57:4320:37:04 $