This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 17899 - Update the Web registries model (<meta>, rel="", schemes, etc) (starting with <meta>)
Summary: Update the Web registries model (<meta>, rel="", schemes, etc) (starting with...
Status: RESOLVED WONTFIX
Alias: None
Product: WHATWG
Classification: Unclassified
Component: HTML (show other bugs)
Version: unspecified
Hardware: Other All
: P3 normal
Target Milestone: Unsorted
Assignee: Ian 'Hixie' Hickson
QA Contact: contributor
URL: http://www.whatwg.org/specs/web-apps/...
Whiteboard: registry
Keywords:
Depends on:
Blocks:
 
Reported: 2012-07-18 07:14 UTC by contributor
Modified: 2019-03-29 19:22 UTC (History)
12 users (show)

See Also:


Attachments

Description contributor 2012-07-18 07:14:41 UTC
This was was cloned from bug 14363 as part of operation convergence.
Originally filed: 2011-10-03 11:32:00 +0000

================================================================================
 #0   contributor@whatwg.org                          2011-10-03 11:32:48 +0000 
--------------------------------------------------------------------------------
Specification: http://www.w3.org/TR/2011/WD-html5-20110525/
Multipage: http://www.whatwg.org/C#top
Complete: http://www.whatwg.org/c#top

Comment:
Section 4.2.5.2 appears to be saying that conformance checkers must obtain the
list of valid meta names by screen-scraping a public wiki? *Seriously*? That's
a joke, right?

Posted from: 86.179.45.246
User agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.835.187 Safari/535.1
================================================================================
 #1   John Foliot                                     2011-10-03 15:43:49 +0000 
--------------------------------------------------------------------------------
W3C Reference URL: http://www.w3.org/TR/2011/WD-html5-20110525/semantics.html#other-metadata-names

Outside of the crude method of data retrieval, I also have concerns over the following section:

Status
   Ratified
      The name has received wide peer review and approval. 

Please define "wide peer review". At issue is the accuracy and validity of the assertion of Ratified.

If I show it to a few of my friends via an IRC chat at 2:00 AM, and they all agree that it looks good, does that constitute a wide peer review? Can I then claim my newly minted metadata name Ratified? 


Proposal to resolve this bug:
Remove section "4.2.5.2 Other metadata names" from the W3C specification until such time as a more robust method of adding metadata names to the collection is established. 6 friends with the key to a public wiki hardly seems accountable and would likely be ignored by conformance checkers due to the high overhead imposed upon them to remain up-to-date.
================================================================================
 #2   Ian 'Hixie' Hickson                             2011-10-03 18:45:02 +0000 
--------------------------------------------------------------------------------
Not a joke, no. It's in fact the same mechanism as the HTML working group agreed to use for rel="" values. Welcome to the new Web.

The exact mechanism needs work, but it's not a big problem.

(In reply to comment #1)
> 
> If I show it to a few of my friends via an IRC chat at 2:00 AM, and they all
> agree that it looks good, does that constitute a wide peer review? Can I then
> claim my newly minted metadata name Ratified?

Certainly within that community you should be able to use it, sure. That's always been the way Web standards work. If you have a community who want to do something, you just write a spec and agree to it and then within that community, that's how the technology works. The HTML spec actually calls that out explicitly; see the last few paragraphs of the "Extensibility" section.
================================================================================
 #3   John Foliot                                     2011-10-04 02:12:17 +0000 
--------------------------------------------------------------------------------
Please define "wide peer review". 
At issue is the accuracy and validity of the assertion of Ratified.
This should be measurable and verifiable by any concerned 3rd party, and the specification should specify how this is done.
================================================================================
 #4   Jon Ribbens                                     2011-10-04 17:20:55 +0000 
--------------------------------------------------------------------------------
There are a couple of problems with what it says currently:

(a) There is no way defined to parse the list of acceptable names from the Wiki.
(b) Anyone anywhere could, at any time, blank the wiki page and hey presto a very large percentage of all HTML 5 documents in the world are suddenly invalid.
(c) It makes the W3C HTML standard dependent on an anonymous third-party website.

(a) in particular is surely a show-stopper.

I suggest that the list should be hosted on the w3c.org site, and should be in a computer-readable format. This list can then, behind the scenes, be automatically scraped from the Wiki if that's what you want to happen (and it could do things like alert someone at the W3C if the list suddenly changes significantly). You could also say that conformance checkers should pay attention to the HTTP Expires header when fetching the list, as an indication as to how long to cache the list for.
================================================================================
 #5   John Foliot                                     2011-10-04 18:26:28 +0000 
--------------------------------------------------------------------------------
...and I would add that b) is a highly plausible and very scary scenario as well. There is zero security in the proposed model if anyone can, at any time, make modifications unchecked.

Like it or not, certain things require a trusted gate-keeper.
================================================================================
 #6   Ian 'Hixie' Hickson                             2011-10-06 23:12:42 +0000 
--------------------------------------------------------------------------------
(In reply to comment #4)
> There are a couple of problems with what it says currently:
> 
> (a) There is no way defined to parse the list of acceptable names from the
> Wiki.

I wouldn't expect any software to literally crawl the wiki. You'd do it manually, or have a custom script to do it.


> (b) Anyone anywhere could, at any time, blank the wiki page and hey presto a
> very large percentage of all HTML 5 documents in the world are suddenly
> invalid.

*shrug*. Vandalism happens. It is trivially reverted. This is not an issue.

Someone could crack into the HTML spec's Web server and changed the required DOCTYPE to <!DOCTYPE LADYGAGA> but that wouldn't make all the pages invalid. What matters is what people think matters.

Heck, I could change the spec tomorrow to say all documents are invalid. That wouldn't mean that all documents were invalid, it would just mean the spec was wrong.


> (c) It makes the W3C HTML standard dependent on an anonymous third-party
> website.

Anonymous?


> I suggest that the list should be hosted on the w3c.org site, and should be in
> a computer-readable format.

The W3C hasn't fared well with having computer-readable data in the past. (DTDs have caused the W3C to essentially DDOS itself by having lots of badly authored software read it continuously.)


Anyway, the whole registration mechanism really needs updating in general. Just need to work out what the right solution is first.
================================================================================
 #7   Jon Ribbens                                     2011-10-07 00:44:18 +0000 
--------------------------------------------------------------------------------
(In reply to comment #6)
> I wouldn't expect any software to literally crawl the wiki. You'd do it
> manually, or have a custom script to do it.

That's my whole point. Every "conformance checker" would do the scraping slightly differently, because there's no defined "correct way" of doing it. It would be incredibly fragile. Computers trying to parse non-computer-readable formats is never a good idea - it being a mandated part of a fundamental standard is inconceivable. Nobody's going to do it manually, that's ridiculous.

> > (b) Anyone anywhere could, at any time, blank the wiki page and hey presto a
> > very large percentage of all HTML 5 documents in the world are suddenly
> > invalid.
> 
> *shrug*. Vandalism happens. It is trivially reverted. This is not an issue.

I realise that you do indeed have a lot of authority here, but nevertheless "argument from authority" is still a logical fallacy. It is not "not an issue" simply because you say so.

> Someone could crack into the HTML spec's Web server and changed the required
> DOCTYPE to <!DOCTYPE LADYGAGA> but that wouldn't make all the pages invalid.

People hacking into secure servers is one thing. People trivially changing deliberately insecure public wikis is another.

> > (c) It makes the W3C HTML standard dependent on an anonymous third-party
> > website.
> 
> Anonymous?

Have you checked the 'whois' for whatwg.org recently? Or, for that matter, the whatwg.org web site?

The final HTML specification should not be fundamentally dependent on any site other than w3.org, ietf.org, or similar.

> The W3C hasn't fared well with having computer-readable data in the past.
> (DTDs have caused the W3C to essentially DDOS itself by having lots of badly
> authored software read it continuously.)

And this problem is somehow avoided by having the list hosted on a less-well-funded web site instead?

> Anyway, the whole registration mechanism really needs updating in general.
> Just need to work out what the right solution is first.

Excellent, well, hopefully things will improve.
================================================================================
 #8   Ian 'Hixie' Hickson                             2011-10-21 22:37:00 +0000 
--------------------------------------------------------------------------------
> That's my whole point. Every "conformance checker" would do the scraping
> slightly differently, because there's no defined "correct way" of doing it.

Well we should definitely have a defined way to determine what the registered types are, sure. I don't see why this is a problem.


> > *shrug*. Vandalism happens. It is trivially reverted. This is not an issue.
> 
> I realise that you do indeed have a lot of authority here, but nevertheless
> "argument from authority" is still a logical fallacy. It is not "not an issue"
> simply because you say so.

Why would vandalism be an issue? It's not an issue because you say it is, either. :-)


> Have you checked the 'whois' for whatwg.org recently? Or, for that matter, the
> whatwg.org web site?

Currently, I pay for it.


> The final HTML specification should not be fundamentally dependent on any site
> other than w3.org, ietf.org, or similar.

I don't see why. Even if it was dependent on a site that went dark two months from now, it would just be updated to point to another site then.


> > The W3C hasn't fared well with having computer-readable data in the past.
> > (DTDs have caused the W3C to essentially DDOS itself by having lots of badly
> > authored software read it continuously.)
> 
> And this problem is somehow avoided by having the list hosted on a
> less-well-funded web site instead?

The problem is apparently not made worse, at least.
================================================================================
 #9   Jon Ribbens                                     2011-10-21 23:37:25 +0000 
--------------------------------------------------------------------------------
(In reply to comment #8)
> > That's my whole point. Every "conformance checker" would do the scraping
> > slightly differently, because there's no defined "correct way" of doing it.
> 
> Well we should definitely have a defined way to determine what the registered
> types are, sure. I don't see why this is a problem.

The problem is that currently you *don't* have a defined way to determine what the registered types are. If that's a known defect with the specification that will be fixed before it's finalised then that's fine.

> Why would vandalism be an issue? It's not an issue because you say it is,
> either. :-)

Because it's trivially easy and could potentially cause significant problems for people doing conformance checking (in that their tools will suddenly indicate that most websites are invalid).

> > Have you checked the 'whois' for whatwg.org recently? Or, for that matter,
> > the whatwg.org web site?
> 
> Currently, I pay for it.
> 
> > The final HTML specification should not be fundamentally dependent on any
> > site other than w3.org, ietf.org, or similar.
> 
> I don't see why. Even if it was dependent on a site that went dark two months
> from now, it would just be updated to point to another site then.

Both of these replies tend to indicate that we have a different idea of what a "standard" is. Generally speaking, one would expect a standard to be released on a certain date and not to change after that, or at least, not to change on a daily basis. They're something that are supposed to have some stability and people are supposed to be able to have some trust in. Contracting out part of a standard to a wiki would be, um, novel.

> > > The W3C hasn't fared well with having computer-readable data in the past.
> > > (DTDs have caused the W3C to essentially DDOS itself by having lots of
> > > badly authored software read it continuously.)
> > 
> > And this problem is somehow avoided by having the list hosted on a
> > less-well-funded web site instead?
> 
> The problem is apparently not made worse, at least.

Or rather, it's made a lot worse. Instead of the potential victim of the accidental DDoS being a sizeable organisation with the funds and experience to cope with the situation, it's just you. No criticism of you personally intended, but most people have more limited means in terms of time and money than most organisations, and it leaves HTML with a "bus factor" of 1, which is somewhat unfortunate.
================================================================================
 #10  Ian 'Hixie' Hickson                             2011-12-02 18:12:23 +0000 
--------------------------------------------------------------------------------
(In reply to comment #9)
> > 
> > Well we should definitely have a defined way to determine what the registered
> > types are, sure. I don't see why this is a problem.
> 
> The problem is that currently you *don't* have a defined way to determine what
> the registered types are. If that's a known defect with the specification that
> will be fixed before it's finalised then that's fine.

Yes, this needs to be cleared up.


> > Why would vandalism be an issue? It's not an issue because you say it is,
> > either. :-)
> 
> Because it's trivially easy and could potentially cause significant problems
> for people doing conformance checking (in that their tools will suddenly
> indicate that most websites are invalid).

Tools shouldn't be just scraping the sites automatically (and don't, in practice).


> > I don't see why. Even if it was dependent on a site that went dark two months
> > from now, it would just be updated to point to another site then.
> 
> Both of these replies tend to indicate that we have a different idea of what a
> "standard" is. Generally speaking, one would expect a standard to be released
> on a certain date and not to change after that, or at least, not to change on a
> daily basis.

HTML will continue to evolve until it is dead. It's a living standard.


> Or rather, it's made a lot worse. Instead of the potential victim of the
> accidental DDoS being a sizeable organisation with the funds and experience to
> cope with the situation, it's just you. No criticism of you personally
> intended, but most people have more limited means in terms of time and money
> than most organisations, and it leaves HTML with a "bus factor" of 1, which is
> somewhat unfortunate.

This is false. If I were to die suddenly, people would just lift up the spec and wiki and put it elsewhere, assuming they couldn't get the site reassigned to them, which would be the more likely situation.
================================================================================
 #11  Ian 'Hixie' Hickson                             2011-12-02 18:12:28 +0000 
--------------------------------------------------------------------------------
*** Bug 12854 has been marked as a duplicate of this bug. ***
================================================================================
 #12  Jon Ribbens                                     2011-12-02 19:18:24 +0000 
--------------------------------------------------------------------------------
(In reply to comment #10)
> Tools shouldn't be just scraping the sites automatically (and don't, in
> practice).

It's what they *must* do, according to the current HTML5 specification.

> HTML will continue to evolve until it is dead. It's a living standard.

> This is false. If I were to die suddenly, people would just lift up the spec
> and wiki and put it elsewhere, assuming they couldn't get the site reassigned
> to them, which would be the more likely situation.

I'm sure this argument has been done to death elsewhere, but suffice to say that your usage of the word "standard" appears to me to be somewhat... non-standard. This state of affairs is rather alarming for such an important specification.
================================================================================
Comment 1 Ian 'Hixie' Hickson 2012-10-02 19:37:14 UTC
Marking LATER as I'm not planning on doing the registry revamp in the near
future. Will get back to this.
Comment 2 Antoine Amarilli 2013-03-09 02:02:53 UTC
Notice that some relevant changes have been made to the specification since the last updates to this discussion. Compare <http://web.archive.org/web/20111215112636/http://www.w3.org/TR/html5/links.html>:

  "Conformance checkers must use the information given on the Microformats wiki existing-rel-values page to establish if a value is allowed or not"

with <http://www.w3.org/TR/2012/CR-html5-20121217/document-metadata.html#other-metadata-names>:

  "Conformance checkers may use the information given on the microformats wiki existing-rel-values page to establish if a value is allowed or not"

(The "must" was changed to "may".) This is probably a more reasonable way to state things, except that the rest of the relevant paragraph still uses "must" indiscriminately to things that require wiki-scraping and things that should be done in any case.

Note that some of this loosening isn't very consistent. In the current specification, compare <http://www.w3.org/TR/html5/document-metadata.html#other-metadata-names>:

  "When an author uses a new metadata name not defined by either this specification or the Wiki page, conformance checkers may offer to add the value to the Wiki, with the details described above, with the "proposed" status."

with <http://www.w3.org/TR/html5/links.html#other-link-types>:

  "When an author uses a new type not defined by either this specification or the Wiki page, conformance checkers should offer to add the value to the Wiki, with the details described above, with the "proposed" status."

(One says "may" and other says "should", but this is roughly the same mechanism, just for different attributes.)
Comment 3 Ian 'Hixie' Hickson 2014-02-25 18:37:09 UTC
A proposal for <meta>:
   http://lists.w3.org/Archives/Public/public-whatwg-archive/2014Jan/0097.html

hsivonen, input from you in particular (either here or on that thread) would be especially welcome.
Comment 4 Domenic Denicola 2019-03-29 19:22:37 UTC
Nobody has an appetite for a new registry system, as flawed as the current one may be, so we'll close this constellation of bugs for now. If folks want to continue discussing the issues with the IANA and meta registries, https://github.com/whatwg/html/issues is the place.