Evolution/Identifiers

From W3C Wiki
Jump to: navigation, search

One way a standard can facilitate evolution is to allow for extensibility in some of the protocol elements it uses, an extension is made by adding a new values for a protocol element, which element was not part of the protocol or language at the time the specification was written.

Some protocol elements used in a language/protocol/interface allow values which act as identifiers: determining the meaning those values requires information which is specified independently. In some cases, an identifier might come from a fixed set of values (e.g., identifiers for the days of the week). But in many cases,extensibility is accomplished by adding a new value with associated meaning.

The Web uses many protocol elements which are, or use, identifiers. Examples include character entities in HTML, content-types, uri schemes, color names, host names, HTTP headers.

There are many different methods used for managing identifiers, and the choice is often a controversial element of the design of a language or protocol and the speficiation for it. Some of the differences are well-justified, but others are artifacts of history.

What are the considerations when designing a protocol element which uses identifiers?

  • Make sure the process for extending a standard needs to have similar characteristics as the standard itself, in terms of "fair" and "transparent".
  • Make sure the registry is as long-lived as the specifications that use it, avoid problems of trademark, spam, denial of service, ...,
  • Identifier length (Some protocols and languages are sensitive to the space usage and compressibility of long strings used as identifiers; in such cases, identifier length is a consideration.)

Identifiers vs. Data Structures

In some cases, an identifier is a completely unstructured string of characters (or bytes). In other cases, identifiers have structure. In many cases, there is a trade-off between using an identifier and a data structure. In the case of a URI, the URI is a kind of data structure which appears as a string, but the data structure generally is a recipe for how one might (in the best of worlds) access more information about the thing identified.

Identifier spaces: categories

There are a variety of ways for allocating identifiers and the meaning associated with them:

in specification 
Many specifications limit the set of identifiers allowed in a protocol element to those explicitly listed in the specification: extending or changing the meaning of the identifiers allowed requires a new specification. This still allows implementations to evolve (through private extensions), with the standard following. (Example: element names in HTML)
use a registry 
A registry for a protocol element has list of identifiers and corresponding information about the identifier, including references to specifications. A registry is maintained by a registrar (an organization or individual), and has associated processes for adding to or updating the registry. (Example: Internet media types, HTTP response codes).
use a URI (IRI) as the identifier 
Some protocol elements use a URI to name an extensibility point, where the URI itself provides a mechanism for determining the "meaning" of the extension [httpRange-14]. (Example: RDF)
use a "vendor prefix" 
A "vendor prefix" is a short string which identifies an organization which controls one or more implementations. The organization maintaining an implementation uses prefixed identifiers for those their unique extensions. As extensions are made part of the standard, the unprefixed identifier is then substituted. (Example: CSS)
use URI-named namespace 
The protocol element uses an identifier in a way (with prefixes or scoped contexts or otherwise) where there is a URI-identified name space, and the meaning of individual identifiers are understood with respect to that namespace. This allows linking together multiple namespace values, and short identifiers. (Example: RDF with #)

Evolution with extensible identifier space

What are the processes for designing and deploying extensions?

assigning an identifier 
Inventing a new identifier requires obtaining one that is not already used, and making information about the identifier available to others who need to know it
discovering the meaning of an identifier 
finding out from the identifier information about it
using an identifier in a protocol or language 
Identifiers are added to other languages and protocols
extracting metadata from the identifier 
...
merging two identifiers 
...
making one identifier obsolete 
...
merging private identifier into public identifier space

Choosing identifier space category

(this section is intended to give a broad evaluation of the categories against the evaluation criteria; also still an outline of notes ...)

In Specification 
Low cost of implementation extension, higher cost of specification update, fairness depends on same standards process as anything else, long lifetime, transition from implementation to specification is painful but that's what standards are about.
Registry 
Cost of setting up registry, managing it, expert review, benefit of avoiding interactions, fairness issues, trademark and spam. Allows using numbers and meaningless values to avoid trademark spam, and difficulties of internationalization.
Using URIs 
Example: RDF. Meaning is discovered by httpRange14. Low cost (no registration process, might require maintaining URI. Very timely. Transition unnecessary. Lifetime up to lifetime of URI. Very fair. Hard to misuse because no registry. Preferred method, modulo longevity of URIs. Note that URN allows naming a registry as a URI.
Vendor Prefix 
(example from CSS. Transition path difficulties outlined in ...
URI-named namespace
XML namespaces, RDF (?).


Possible best practicies

Extensibility and evolution must be planned and provided for in specifications that become standards. Standards that use identifiers should also specify the expected behavior of compliant implementations when confronted with unrecognized identifiers; for example, to distinguish between "must understand" and "must ignore" for unrecognized identifiers. Without also constraining implementation behavior, the fact that the specification might be extensible will not translate into an effective way of allowing implementations to evolve.

Non-IANA registries have a problem with the long-term viability and control of the registry over, say, a 20 year period. Avoiding registries or using IANA seem preferable for the long term. In the short-term, using a Wiki seems like it's ok? Whether or not you use a registry with some gatekeeping and review may depend on the cost of extensibility... if it's low, then use a URI or vendor prefix. if it's high, use In Specification. In the middle, use a Registry with review.

Background

This document originally came from [Identifiers].