Registries

From W3C Wiki
Jump to: navigation, search

Background discussion

This is a close companion to the Living Standards question. (Previously called Repositories, but Registries is a more common industry term).

There is a Process CG Issue open.

A registry is a place where people can request to have something stored for sharing among the community. This differentiates it from a specification, which is produced by a group of some sort.

The purposes of a registry include at least

  • non-collision. Avoiding the problem of two organizations using the same code-point value with different semantics.
  • non-duplication. Avoiding the problem of having two or more different code-points in use with the same semantics.
  • information. Providing a central reference/dispatch place where anyone can find out what a code-point means and what its formal definition is (and where it is).
  • ease of adding new terms
  • new stakeholders can add new terms
  • a clear consensus of the community on the terms

There are various terms used in the industry to describe those items that require registries. These include:

A registry typically contains a set of values for a specific 'slot' in some format. There are many examples of registries. IANA is perhaps one of the more famous sites hosting a number of registries. MP4RA is another, hosting code-points for MP4-family box-structured files.

In an Accessibility API Mapping, a W3C specifications "maps" W3C-specified elements and attributes to platform accessibility APIs such as NSAccessibility (macOS and iOS); ATK (GNU/Linux); IAccessible2 and UIAutomation (Windows).

Vocabularies may be used to disambiguate terms used in different datasets.

It is rare for Registries to have IPR policies; however, a lot depends on the requirements on implementations (see below).

Registries vs. specification tables

Not every table in a specification is a potential registry. If the intent or effect is that the table enumerates all the possibilities the authors of the specification expect or envisage, then the table by itself is enough. Similarly if the table is managed by the working group and only updated as part of specification update, then the complexities of 3rd-party registration applications and rules are not needed.

Requirements on implementations

A lot depends on what the expected implementation requirements of a registry are. Possibilities include:

  • every implementation of the specification that embeds the registry is expected to recognize and implement the effect/semantics of all registered values;
  • every implementation of the specification that embeds the registry is expected to recognize all registered values, but can choose which to implement;
  • the registry or specification has two or more 'classes' – required code-points that all must implement, and extension code-points that are optional
  • the registry and specification merely document existence, and it's for external constraints or other specifications, or for implementers, to decide which to implement.
  • a registry might be independent of a particular specification, but document registered values that could relate to multiple specifications.

An example of a code-point that all must implement would be one that identifies the compression or format of a datatransfer, such as transfer-encoding in HTTP and MIME. It's useless if a new value is used by a source, and the sink does not recognize it; the protocol then fails.

An example of a code-point that is optional is the MP4 registration of codecs. There is no expectation that every multimedia player will implement all possible codecs; what is actually required is set by vertical or application specifications.

A registry which is independent of a particular specification could be a vocabulary which is utilized by multiple specifications.

Clearly the licensing status needed varies greatly depending on what expectation is placed on implementations; if something must be implemented, we may want the IPR policy applying to the specification that embeds the registry to apply also to the registry - if indeed there are any IPR issues.

Admission criteria

Introduction

We must address the question of 'how do I get a new value into a registry?' When there is a relevant Working Group it would play a major role. When there is no Working Group (either because it spans Working Groups or because the WG is out of Charter) we need to determine who has the responsibility for this. It could be the W3C Team, a Community Group (although there is weak structure there), or we may create a lightweight registry Group.

Review Requirements

For tables embedded in specifications, the implied answer is "persuade the owners of the specification – the WG at W3C – to do an update". Since adding a code-point would generally be considered a 'substantive change', this is moderately slow and painful, involving, for an existing Recommendation, a trip around the rec-track process. A Living Standards process might be more suitable.

IANA requires (and marks every code-point with) an explicit declaration of what is required to get a new code-point in a table.

Possibilities include, from the W3C point of view:

  • Full Rec-track process; WG approval, wide review and community consensus, AC review, and Director's approval (i.e. equivalent to updating a table in the recommendation)
  • Some subset of the Rec-track, herein referred to as a registry Group: one likely popular option would be WG review
  • no review required

Other Requirements

For entry into registries, there may be other requirements (in addition to the obvious ones of non-collision and non-duplication):

  • that the specification of the semantics of the code-point, and where to find it, be identified
  • that the specification be publicly available
  • that the specification be freely available
  • that there is evidence of implementability, implementation, and interoperability
  • that the code-point be implementable with specific licensing requirements (e.g. RAND, Royalty-free, etc.)

Development hosting and other considerations

IANA takes some maintenance, as does the MP4RA. How registries outside their specification are hosted, backed-up, and maintained, needs consideration. We could, for example, explicitly ask IANA to host: but many people find IANA rather bureaucratic, and the IANA site rather hard to navigate. An alternative is to create a place in /TR where these are hosted.

It can be important to be able to trace the history of a registry: when code-points were entered, and so on. There may be aspects of registration requests (e.g. personal or corporate contact information) that are required for a request but that should not be made publicly visible.

It should be noted that GitHub natively presents CSV (comma-separated-value) files as tables; if a registry can be represented by one or more tables, this may be a simple hosting option, as history is automatically kept.

Proposed Process Text – Registries

(This section is proposed as an addition to the Process. It is deliberately both short (example-free) and as unconstraining as possible; it makes sense to read it along with the guidelines on implementation.)

Definition

A Registry is a data set that documents logically independent 'atoms'; conceptually a table with independent rows, and rules for the values in the columns (e.g. that values in a column be drawn from a defined set).

There are four conceptual elements in a Registry:

  1. Registries are always owned by, or incorporated directly into, a Referencing Document (though the registry may be referenced by other documents as well, fo course). (Roughly, the spec. that has some aspect (e.g. a field, an attribute value) for which registration of new values needs to be possible.)
  2. Registries are defined by a Registry Definition. Contains the definition of and requirements for the registry values.
  3. Published Registry values. The formal published location, which URLs referring to registry values (e.g. notably from the Referencing Document) point to.
  4. Registry values management. This is where the registry values are maintained, and so on.

Published Registry values MUST be in /TR (the place we publish all formal publications of the W3C), and are purely documentational and contain no requirements on implementers. They are updated following the (approved) process for that registry as defined in the Registry definition.

The W3C may publish a list of W3C Registries.

Combined Publication

We are currently discussing which of these elements can be combined. Examples include: 1+2) Referencing Document + Registry Definition, and have a pointer to the Published Registry. (This is like the IETF, where RFCs typically document a technology, and say which fields are open to registration, and under what conditions, but the registry values themselves are at IANA). (Some do not like this practice, others do).

2+3) Registry Definition + Published Registry. This would be the case where it makes sense to define a registry as a stand-alone Recommendation in its own right.

1+2+3) Referencing Document + Registry Definition + Published Registry. This would be suitable for ’small’ tables for which the current values and the definitions can easily be inlined into the spec. (“thing-type: a value drawn from the thing-type registry; current values and the procedures for registering a new thing-type can be found in Annex G”). You could insert the Published Registry as an iframe, for example.

Referencing Document requirements

The Referencing Document

  1. MUST contain all normative statements and information that affect implementations and the conformance of implementations. It must not be possible that a change to the Registry values affect the conformance of implementations to the Referencing document; if there are values that must be implemented, or any other such restrictions, they must be documented in the document without reference to the registry.
  2. may either contain the registry or have a reference to the registry, where to view it

Registry Definition requirements

The Registry definition must:

  1. contain a section defining the registry that must
    1. state that it is a Registry per the W3C Process,
    2. identify the Custodian. The Custodian can be the W3C group that owns the Referencing Document, the team if no such W3C group exists, or some other entity.
    3. define the fields of its table of items,
    4. define the policy and procedure for changes (additions, and deletions or modifications if they are permitted); how to request a registration, what information is required, and what criteria must be met
  2. contain all requirements and normative language related to the registry; the rules for values (logically, the column values) in each entry (e.g. uniqueness, matching to a value from some other specification or registry, etc.)
  3. contain documentation on the policy and mechanisms for changes to existing entries:
    1. can entries be retired (deleted) or deprecated?
    2. can entries be changed after being published?
    3. if entries can be deleted, can any code-points be re-used, or are they 'reserved' indefinitely?
  4. provide the location of the published Registry values, and the Registry management.
  5. provide any restrictions on links to other publications, e.g. that the specification be
    1. publicly available (as opposed to company-private, for example)
    2. freely available (no cost)
    3. published by a recognized standards body
    4. a W3C publication
  6. document any requirements of evidence of implementability, implementation, interoperability, etc.

Hence changes to the Registry other than managing entries are all controlled by the update process for the Registry definition. Registry definitions are approved through the same, or stronger, process as the Referencing Document for that registry (i.e. and specifically, if the Referencing Document is a W3C Recommendation, then the Registry Definition is also a Recommendation, while a W3C Referencing Document that is a Note may, of course, have a Registry Definition that is a Recommendation).

If the defined process is followed, changes to the values, and publication to the registry values, should be automatic and rapid.

Published Registry values requirements

The Registry values

  1. must only be updated when conforming to the rules in the associated Registry Definition.
  2. must not contain any normative statements (must, should, etc.).
  3. if it is not embedded in the referencing document or registry definition, must contain a link back to the registry definition (which contains the rules etc.) and should contain a link back to the referencing document (for which it is a registry), if that document is separate;
  4. may contain informative material, including descriptions of the various values ('columns') and their meaning
  5. must contain or reference the information on how to request updates (a new entry or any other permitted changes).
  6. may contain a machine-readable copy (e.g. CSV file) which MUST match the human-readable copy of the same version of the Published registry

Registry management requirements

(Note that this probably should apply to the development of any W3C publication. The general rules are currently unstated.)

The Registry management must be a location that is suitable for development of a W3C publication. The rules are currently unstated but probably should be stated for the general case, and not specifically for registries. Registry management must

  • have the ability to file bug reports (aka issues)
  • support history tracking (so we can see what was changed when, by whom, etc.)
  • have backups maintained by the W3C (so if there is a disc crash or the service goes down, we don't lose)
  • must be setup in such a way that review (cross-functional, wide, AC, review for example) can be managed using automatic notifications, e.g. sending notifications and periodic summaries to interested populations;
  • be such that the development tool/site are to be accessible (a) to those needing accessible access and (b) to our community internationally.

(A scrapable Wiki may be suitable here, while a Wiki is probably not suitable for document management generally.)

Proposed supporting text – Guidelines on Implementation of Registries

Registries exist to:

  • list possibilities
  • document meanings (so an implementer can ask "what does Pfoogle mean?")
  • avoid code-point collision (two 'rows' never share the same code-value)
  • avoid duplication (two 'rows' with the same meaning but different code-points)

Many registries assign a meaning to code-points (e.g. "the value 1 means the identity matrix").

The requirements are set such that there can be no essential IPR or other normative effect caused by any change to the registry; all requirements have to be evident from reading the Referencing Document without recourse to the Registry. The rules therefore are those of the IPR policy that applies to the referencing document.

Registration requests could be via a Github issue, email, or other suitable means of communicating with the Custodian.

Registration requirements might include that the value is documented in a publicly available specification, that it's implemented, that it doesn't contravene any laws, etc.

Examples of ways to host a W3C registry include:

  • a dedicated section within the Referencing Document ('embedded');
  • a separate HTML document;
  • a data file such as CSV, JSON, RDF, XML etc., supported by some suitable framework to make it viewable;
  • an API-accessible resource;
  • a database-backed system, etc.

All of these must be able to publish to /TR (ideally, automatically) for the formal published Registry values.

Examples of management and publication tools for file based registries are Github repos, or a scrapable Wiki, or a database-backed system.

Where Registry elements are combined, they must nonetheless be in distinct sections or otherwise clearly separated. (E.g. "This annex contains the definition of the XX Registry." "Table 1 contains the values of the XX Registry, and is dynamically updated as changes occur.")