Re: ISSUE-27 Change Proposal: defer to the Microformats community for cataloging HTML rel values

On 09.12.2010 01:49, Edward O'Connor wrote:
> Hi all,
>
> Please consider this Change Proposal when deciding ISSUE-27:
>
> http://www.w3.org/html/wg/wiki/User:Eoconnor/ISSUE-27
> ...

Quoting from there:

> The Microformats community actively maintains the de facto listing of HTML rel values on its existing-rel-values wiki page. The HTML specification should bless this as the official location for listing HTML rel values.
> ...

It would be great if the Microformat community was also willing to be 
the registry for rel values, independently of where they occur in.

Or do you believe that consistency with other formats is not needed?


> Reflect reality
>
> Whatever solution we adopt should, above all else, reflect the deployed reality of HTML rel values. (Support Existing Content) In 2009, DeWitt Clinton conducted a survey of deployed HTML rel values (A Survey of Rel Values on the Web), and while he found "a staggering 1.8M unique rel value strings in use[…] the top 11 alone were responsible for 90% of all usage." Comparing the Microformats listing and the IANA registry of link relation types, we find that the Microformats list has been much more successful at capturing the top 25 rel values in DeWitt's survey:
> HTML rel value  Present in µf list  Registered with IANA
> nofollow  y  y
> stylesheet  y  y
> tag  y  n
> alternate  y  y
> icon  y  y
> chapter  y  y
> forum  n  n
> shortcut  n  n
> bookmark  y  y
> archives  y  y
> category  n  n
> external  y  n
> search  y  y
> edituri  y  n
> apple-touch-icon  n  n
> help  y  y
> prev  y  y
> next  y  y
> pingback  y  n
> wlwmanifest  n  n
> contents  y  y
> contact  y  n
> service.post  n  n
> top  y  n
> me  y  n
> Total  76%  48%
>
> As stated in the W3C-hosted registry Change Proposal, "the Microformat community has the best track record for running a registry of rel values."

As Toby already pointed out, the IANA registry is new. As of today, ~10 
additional registrations are pending, mainly because they have active 
tickets over here, in the HTML WG (see <http://paramsr.us/tracker>).

> Diversity and openness of maintainership
>
> HTML rel values are minted by a diverse set of markup authors: standards-aware web designers and developers, search engine engineers, browser vendors, Semantic Web tool developers and users, and many others. An effective registry of HTML rel values should engender the support and participation of all of these groups.
>
> Several active participants of the Microformats community have been championing the principled minting of new HTML rel values for many years, as documented in this email of Tantek's to public-html. More specifically, the community of contributors to the existing-rel-values page is as diverse and informed a group as we could possibly hope to attract to the task of registering and curating a list of HTML rel values. The page has been edited by, among others:
>
>     * DiSo project co-founder Steve Ivy;
>     * Apple accessibility engineer James Craig;
>     * Toby Inkster, a Semantic Web developer and advocate;
>     * Niall Kennedy, a prominent Web developer and Open Web evangelist;
>     * accessibility advocate and HTML WG contributor Leif Halvard Sillil.

That's great. We like the people over there, so we use their registry. 
That seems to be a weak argument. (BTW; I like their work, too, but that 
doesn't mean it's the best place to run a registry).

> Ease of registration of pre-existing values
>
> All things being equal, it should be as easy as possible for interested parties to register already-deployed HTML rel values. Such interested parties might not be the originators or publishers of such HTML rel values.

Yes.

> The Microformats listing is maintained on a wiki page, and as such is straightforward for anyone to update at any time. MediaWiki, the wiki software used by the Microformats community, is very widespread and so its wikitext syntax is more widely understood than competing wiki languages.

Yes. How is that relevant?

> Several widespread HTML rel values (tag, external, pingback, and various XFN values come to mind) are absent from the IANA registry, and due to its onerous registration requirements are likely to never be successfully registered. At least one such widespread HTML rel value's registration attempt failed (pingback). (ref) As Anne van Kesteren said: "Writing a specification as a barrier to enter the registry is too much work. Many link relations have seen widespread adoption before a formal specification was written." (src) "For instance, for 'nofollow' it was not really clear whether a specification would at some point arrive, but everyone was using it so it should really be on the list[…]" (src) One of our chairs put it this way:

"tag" and "external" have open bugs in the HTML WG, and thus their 
registration is pending until this WG has resolved these issues.

"pingback" hasn't got a spec in a stable place (it's on an individual's 
web page). It has been proposed that the author publishes the spec 
somewhere else (such as W3C Note, and IETF RFC, or on microformats.org), 
but he hasn't done that so far.

You say "onerous". The actual requirements can easily be looked up, see 
<http://tools.ietf.org/html/rfc5988#section-6.2.1>:

    Relation types are registered on the advice of a Designated Expert
    (appointed by the IESG or their delegate), with a Specification
    Required (using terminology from [RFC5226]).

RFC 5226 says:

       Specification Required - Values and their meanings must be
             documented in a permanent and readily available public
             specification, in sufficient detail so that interoperability
             between independent implementations is possible.  When used,
             Specification Required also implies use of a Designated
             Expert, who will review the public specification and
             evaluate whether it is sufficiently clear to allow
             interoperable implementations.  The intention behind
             "permanent and readily available" is that a document can
             reasonably be expected to be findable and retrievable long
             after IANA assignment of the requested value.  Publication
             of an RFC is an ideal means of achieving this requirement,
             but Specification Required is intended to also cover the
             case of a document published outside of the RFC path.  For
             RFC publication, the normal RFC review process is expected
             to provide the necessary review for interoperability, though
             the Designated Expert may be a particularly well-qualified
             person to perform such a review.

             Examples: Diffserv-aware TE Bandwidth Constraints Model
             Identifiers [RFC4124], TLS ClientCertificateType Identifiers
             [RFC4346], ROHC Profile Identifiers [RFC4995].

It has been pointed out that a specification on microformats.org indeed 
would qualify. So the only additional requirement is to fill out a 
template and send it to a mailing list.

>     I don't think it's a good idea to leave values in common use completely undocumented in the registry. It means that a good samaritan who finds someone else's unregistered header cannot ensure that it is documented without writing a full specification that will survive formal review.
>
> See also the Editor's experience with the IANA registry.

I recommend to read the whole thread, and also the report from Mike 
Smith who actually picked up the work and got most relations registered. 
Those which are not aren't because they have open bugs or issues, or 
actually aren't defined in HTML5 (pingpack).

> Must be able to register HTML-specific details
>
> Three HTML elements (<a>, <area>, and <link>) take rel attributes, but for various reasons different HTML rel values may be used in different circumstances. Because of this, any registry of HTML rel values must be able to hold such element-specific data on a per-value basis. When the Editor tried to use the IANA registry, he encountered pushback when attempting to register HTML-specific metadata about link relations:
>
>     One thing that came out during this discussion was the possibility that link relations would be rejected if they were HTML-specific. 17 For example, something specific to HTML cache manifests would likely not be able to be used as a <link rel> value since it wouldn't work in Atom. This is apparently a risk even if the value is already well-established de facto, meaning the registry could in fact by design fail to match deployed content.
>
>     The conversation shifted to suggesting that the HTML specification should not use the registry to maintain information about link relations as they apply to HTML. Since this would apparently mean that people who wanted to use link relations with HTML would have to register their relations twice — once with IANA, and once with whatever other registry mechanism HTML has, this seemed highly suboptimal. 18

The IANA registry has been designed to include application-specific data.

The first attempt to register these fields let to a discussion about the 
best way to do so, which is a separate issue 
(<http://www.w3.org/html/wg/tracker/issues/127>). Once that issue is 
resolved, the IANA registry is likely to be modified accordingly.

> ...
> There is no allowance for provisional registration in the IANA registry scheme.

No, there is not because we believed that registration is simple enough 
so that separate categories aren't needed.

> ...
> Unification of HTML and Atom rel values
>
> While atom:link superficially resembles HTML's link element, Atom rel values are (currently) distinct from HTML rel values. This WG has made no decision on whether or not unification of Atom and HTML rel values is a goal we should pursue.

I think that's a decision we're trying to make here.

>
> The IANA registry set up by RFC 5988 explicitly unifies HTML and Atom rel values. While this CP takes no stance on whether or not HTML and Atom rel values should be unified, this may be harmful to the future minting of HTML rel values by this WG or by other parties. As described by one of our chairs in an email to public-html:
>
>     At least some of the designated experts for the IANA link relation type registry seemed to indicate that all future entries in the registry should be appropriate for all contexts where they might be used (including the HTTP Link header, and Atom link relations), and so future (or current de facto standard) link relations that are HTML-specific by nature may well be rejected. Such relations might need to go in a separate HTML-specific registry.

It would be interesting to look at a concrete example. Do you have one?

> This is problematic for at least three reasons:
>
>     * By choosing the IANA registry, this WG and other HTML rel value minters may be constrained to minting only those rel values which make sense in an HTTP or Atom context, thus impeding their ability to register useful-yet-HTML-specific rel values.

FUD. As a matter of fact, I'd expect any relation that works in 
link/@rel to be usable as is in an HTTP Link header. See also 
<http://tools.ietf.org/html/rfc5988#appendix-A>.

>     * §4 of RFC 5988 restricts future link relations from having a different semantics when used in conjunction with another link relation. (HTML's stylesheet alternate is grandfathered in.)
>
>       If we adopt the IANA registry, we prevent future HTML rel value minters from minting "modifier" rel values which augment other values. While this CP takes no position on whether or not such "modifier" rel values are sound design, they do have precedent in the Web platform (stylesheet alternate, shortcut icon) so it's reasonable to suppose such things may be minted in the future.

Yes. That's a feature.

>     * The need to maintain a separate, parallel registry to contain real-world HTML rel values rejected for not being generic enoough defeats the purpose of choosing the registry in the first place.

Again. Example, please.

> ...
> HTML rel values are tokens
>
> All HTML rel values, whether defined in the HTML specification, registered in whatever scheme we end up choosing, or minted by web authors and unregisterd, are case-insensitive tokens.
>
> RFC 5988 requires unregistered rel values (what it calls Extension Relation Types) to be URIs. Formats which allow Extension Relation Types to be expressed as simple tokens are required to provide a mechanism for them to be converted to URIs for comparison.

You are confusing "uregistered" (as in "could be registed", but hasn't 
been yet), with "extension". See 
<http://tools.ietf.org/html/rfc5988#section-4.2>.

> There are a couple of issues with this:
>
>     * Because legacy HTML processors compare HTML rel values as simple tokens only, such processors would not consider simple tokens to be equivalent to their URI form.

A relation either has a short name *or* has a URI (because then it's an 
extension relation).

>     * It is unreasonable to expect web authors to use URIs as HTML rel values. (ref)

Aha.

> These issues came up during the 2008 F2F:
>
>     anne: I don't want to be able to write everything as a URI. e.g. rel="stylesheet" shouldn't have an equivalent using a full URI

It doesn't.

>     […]
>
>     gsnedders: The only problem I can see with absolute URIs is using current relations like "stylesheet" is that you can't express them as absolute URIs without breaking backwards compatibility

Yes. Did anybody propose that?

> ...
> This change proposal, if adopted, would result in HTML link relations remaining separate from Atom and HTTP link relations. This is negative to those who have a goal of converging Atom, HTML, and HTTP link relations into a unified system.
> Conformance Classes Changes
> ...

Which, btw, we already agreed upon once last spring (when this issue was 
resolved for the first time).

> Conformance checkers which currently source valid HTML rel values from the WHATWG RelExtensions wiki page would have to be modified to instead source such values from the Microformats wiki.
> Risks
>
>     * Rohit Khare could be hit by a bus.
 > ...

The *actual* risk is to have a registry that isn't long-lived, because 
it depends on a single person. I don't believe this is the case for 
microformats.org.

Best regards, Julian

Received on Saturday, 11 December 2010 17:05:35 UTC