Evolution of the Web:Registries

Editors Draft of 24 December 2011; see Overview for copyright, caveats, etc.

This section is intended to help resolve some of the issues around how MIME types, URI schemes, link relations, character encodings, and other registered values are used and extended in the Web, and to help point the way forward for some of the remaining open controversies. Recent W3C issues that this discussion is intented to explain and contribute to:

Where should the link relation registry be, and how should it be managed?

If there are multiple, but different, specifications adopted by different communities (forking), which one should the registry point to?

Reconcile with (update) URNs, Namespaces and Registries

In the case where a registry is used for maintaining a list of identifiers and their meaning or pointers to their specifications, there are some common practices that will enhance the reliability and interoperability of the Web while allowing rapid evolution.

The Internet Assigned Numbers Authority (IANA)[IANADEF] is the primary organization whose charter and purpose is to maintain registries of values needed for Internet protocols and languages as specified in the IETF.[ref BCP from which this was quoted]. IANA administers the registries for many protocol elements in the core of the Web, including URI scheme names, Internet media type identifiers ("MIME types"), HTTP protocol header values, HTTP result codes, names of character sets. (See Appendix A.)

Detailed analysis of registries

Lower cost of evolution, Preserve Interoperability, Matching reality, allow for private extensions, give implementers guidance about what is actually needed to be interoperable with other deployed systems, allow discovery of what is meaningful and important, insure the information is timely, doesn't go out of date, disappear, make sure that it is stable and evolvable at the same time. Manage transition from experimental to stable to standard, make sure the process for extending a standard needs to have similar characteristics as the standard itself, in terms of "fair" and "transparent", make sure the registry is as long-lived as the specifications that use it, avoid problems of trademark, spam, denial of service, ..., and Identifier length (Some protocols and languages are sensitive to the space usage and compressibility of long strings used as identifiers; in such cases, identifier length is a consideration)

Update: A registry has a specific update policy. Matching reality: Registries tend to go out of step with reality unless costs of registration or registry update are low and benefits are high to at least one of the parties authorized to make a registration or update. (See "Matching Reality" below). Discovery: Manual discovery is hindered by many alternative places to find a registry, and the possibility of alternative locations (Wikipedia for MIME types, for example.) Timeliness: In particular for registries, there is a tension between establishing a long-lived extensibility point meaning, between cementing the value too soon (before consensus is reached) and too late (after widespread deployment), especially when those overlap (extensions are widely deployed before consensus is reached). The policy needs to move toward "registration before deployment" independent of where in the standards cycle that holds. If the standard differs from early deployment, the registry should be updated to point to not only the standard but also the facts for what one might encounter "in the wild".

Transition: If registries encode status in registered names (as the MIME registry does), transition and grandfathering are issues. Lifetime: The documents pointed to by IANA registry are not as long-lived as the registry itself, and much of the information is obsolete. See "Registry Stability" below.

Fairness: IANA is capable of administering a "fair" process with a reasonable dispute reslution mechanism, if those are specified at the time the registry is established. Wikis and other methods for maintaining a registry have more (or at least different) potential for abuse.

The IETF best current practice specification [BCP 26][RFC 5226] gives guidelines to protocol designers for establishing the registry rules associated with an IANA registry. Note that IANA acts as the operator of each registry, but itself does not evalute registry requests, but merely adminmisters a process by which the organization or individuals authorized to review or approve registry entries are accepted. These guidelines apply to IANA namespaces established or requested by W3C working groups or task forces.

Matching Reality

In some cases, implementations have evolved and the registries have not followed: the registries have not tracked the use of identifiers.. In some cases, the registry process is percieved as a bottleneck. If there is a registry, it is only useful if values are registered. A registry which does not match actual use (as is currently the case with URI schemes, Media Types) is not very useful.

Over time, divergence of meaning of identifiers used in the same protocol element but in different languages or protocols is harmful to interoperability. We need some way of creating a positive force so that, over the long run, divergence is reduced.

(try to address willful violations) Technical specifications that wish to override an existing registry for some values and use it for another should (a) attempt to correct the extensiting registry; in cases where it cannot, (b) a new "override" registry should be established with new values, where the spec points to the new registry. (????)

References to Specifications within Registry

(see section References)

Often, a registry will not contain the complete definition needed to understand the meaning of an identifier, and contains a pointer to a specification or specification series. For example, the Internet Media Type registry defining file formats and languages often contains a pointer to a specification. In those cases, the registry entry for an identifier might be updated, or the registry value itself established in a way that notes the possible evolution of the specification indicated.

Forking is a situation where there are multiple specifications or specification series which are associated with the same identifier used in a single protocol element.

An identifier in a registry which identifies a protocol or language may contain a pointer to a specification, but the use of that identifier in an implementation needs to identify the language or protocol intended, even when those have evolved beyond what the specification referenced has described.

Standards Status of Registry Entries

Registry values and the specifications they point to typically go through a life-cycle, where a parameter is introduced experimentally, deployed in a limited or vendor-specific context, and then adopted more broadly.

Frequently, groups with registries or registered values attempt to convey status of a registered value in the name chosen within the registry, e.g., using an "x-" prefix for experimental names, "vnd." prefixes in internet media types, etc. In practice, these conventions are failures, counter-productive, because there is no simple deployment path when status changes, e.g., vendor proposed extension become public standards, experiments succeed, etc.

Extensibility requires review against criteria, but some identifiers haven't been reviewed and are unsafe. Registries at least give you the option of noting the review status of the proposed extension as a warning to implementors, if it is an extensibility point that requires careful review.

Reasons for using a registry over other methods:

  1. to avoid conflict (main purpose for all of the methods)
  2. to set a bar and set review - you want to have a quality of anything introduced
  3. to provide look-up
  4. limit the number because there is a cost of introducing each one

For example, some protocol designers thought a new URI scheme could cause a lot of extra work. For HTML tags, when you introduce a new section, everyone needs to understand that who implements browsers.

But if you add metadata, it's no skin off anyone's nose. so you have 2 situations - one on which you need whole community to get involved and one in which anyone besides a sub-community can ignore.

([12]http://lists.w3.org/Archives/Public/www-tag/2011Dec/0049.html) Roy Fielding as calling mustUnderstand-based approaches "socially reprehensible" we need a decision tree - questions to answer to understand what kind of extension you're doing and which of these techniques you should use

Compound extensibility points: when a new version of an exensibility point defines a new context in which old extensibility points are interpreted. (This is "willful violation" territory, if not also "sniffing" territory). see discussion following http://lists.w3.org/Archives/Public/www-archive/2011Nov/0009.html

Organizational support for Registries

(this section is left here for the cases where a W3C spec needs a registry and IANA and its processes don't fit the needs... it's intended to be specific to W3C,)

How to encourage W3C staff & working group participants to manage the registration information, update the chair document on establishing and managing registries and extensibility points. Other registrations have their own administrative procedure. A regular "have obligations related to registration been met" check into the W3C document publication/advancement procedure.

Some possible findings:

Registries should allow updates, and note warnings. In particular, documents rarely change without making a change which is incompatible in at least one direction (old content is invalid under the new definition, vs. new content is invalid or not processed interoperably in the old value.).
If specifications are "forked" in incompatible ways, then use separate names for the forks. If the same name is used for multiple forks (specifications which diverge technically) and where different implementations are widely deployed, the registry should contain pointers to all of the different specification branches. This means that in those cases, the registry entry cannot be in the same document as a description of only one of the forks.
Do NOT attempt to encode parameter status in the name; do not use "vnd.", or "x-".
There is a tradeoff between requiring that registry entries contain complete information and the goal of insuring the registry contains at least some information about identifiers likely to be encountered. In general, the cost of using unregistered values must be non-negligible to the organizations allowed or encouraged to register a value, if a distributed development community is to use the registry.
W3C specifications SHOULD use IANA registration methods for those extensibility points which are shared with other (IETF-managed) application protocols, rather than inventing their own registries.
Any extensibility points in a W3C specification MUST be explicit about the method and management of the registration of new values in a public, fair, and transparent way.