Re: a/@ping discussion (ISSUE-1 and ISSUE-2), was: An HTML language specification vs. a browser specification from Ian Hickson on 2008-11-24 (public-html@w3.org from November 2008)

From: Ian Hickson <ian@hixie.ch>
Date: Mon, 24 Nov 2008 05:07:56 +0000 (UTC)
To: Julian Reschke <julian.reschke@gmx.de>
Cc: "Roy T. Fielding" <fielding@gbiv.com>, HTML WG <public-html@w3.org>
Message-ID: <Pine.LNX.4.62.0811240348190.17401@hixie.dreamhostps.com>
On Sun, 23 Nov 2008, Julian Reschke wrote:
> Ian Hickson wrote:
> > ...
> >>> HTML5? Would it help if you consider HTML5 spec as it stands today 
> >>> to _be_ the separate proposal?
> >>
> >> Separate to what?
> >
> > Separate to the specifications it is intended to replace, namely, 
> > HTML4, XHTML 1.x, and DOM2 HTML.
> 
> No, that wouldn't help, as it's supposed to *replace* those, not extend 
> them.

Then I don't understand what you want.


> >> I would expect that for implementations to be consistent, there would need
> >> to be a proposal *somewhere* how to implement this particular aspect.
> >
> > I have discussed such interface proposals with several browser 
> > vendors. For example, to address the above requirement, at least three 
> > proposals have been made: appending the hostnames of the ping targets 
> > to the status bar, as in:
> >
> >   +--------------------------------------------------------------+
> >   | http://example.com/ (Notifying example.org)                //|
> >   +--------------------------------------------------------------+
> >
> > ...or using a special cursor with a tooltip, or appending an inline 
> > icon next to the link. ...
> 
> (I note that there are browsers that do not display the link target 
> itself in the status bar)

Indeed, different browsers have different user interfaces, and so 
different solutions make sense for different browsers. This is why the 
specification doesn't say anything about UI.


> So, what has been the feedback on these proposals? Where did the 
> discussion occur?

Feedback has been generally positive. These discussions mostly occured in 
person and in private e-mail with individual browser vendor engineers.


> So let's assume for a second that national regulations (in a country 
> with significant population) would force a vendor to ship with this 
> feature disabled. Would it still be used for link tracking in practice?

Assuming that Web pages intend to follow local laws, then sure.


> >> img/@src causes GET requests, while a/@ping causes POST requests.
> >
> > Ok, then use <form>. ping="" is as easy to trigger as a form 
> > submission.
> 
> Not with scripting disabled, right? (yes, I use the FF noscript 
> extension).

With forms today it is as easy to trigger a POST as it will be to trigger 
a ping="", with the additional problem that with the POST you can include 
any arbitrary payload. Both can be done without scripting. For example, 
looks at demo 2 here:

   http://damowmow.com/playground/demos/http/

I've already shown this to you:

   http://lists.w3.org/Archives/Public/public-html/2007Nov/0086.html


> >> This introduces an additional party (example.org) to the operation. 
> >> Why is this needed?
> >
> > It's a common use case. The ad publisher, the ad provider (and click 
> > tracker) and the ad target are commonly three different companies.
> 
> Why can't the site hosting the document do the link notification?

I don't understand the question.

If bloomberg.com buys ad inventory from google.com, and the ad Google 
provides is a link to the economist.com, then when the ad is clicked, 
Google needs to be notified (so it can charge the economist.com and pay 
bloomberg.com), and the user needs to be redirected to the economist.com. 
Currently, this is done with redirects, akin to:

   <a href="http://example.google.com/ads/12345678901234567890">...</a>

The user clicks the link, Google records the click, and redirects the user 
to the economist.com. However, in this scenario, the user has no way to 
opt-out of the tracking, or even to know that tracking will occur, and no 
way to see where the link will really lead.

To solve this, with ping="", we could have:

   <a href="http://economist.com/ad1"
      ping="http://example.google.com/ads/12345678901234567890">...</a>

I don't see how bloomberg.com could do the link notification. It certainly 
doesn't seem like something that a publisher would be interested in doing.


> >> But volume of comments can be an indicator of whether something has 
> >> consensus or not.
> >
> > Sure. Nobody claims that we have consensus on this feature (or, for 
> > that matter, any feature). What's the relevance of this?
> 
> Nice strategy :-) By saying nothing has consensus, and consensus isn't 
> relevant, it's of course simple to argue that controversial stuff should 
> stay in.

Whether something is controversial or not certainly shouldn't be a concern 
as to whether it stays in or not. (It's not really clear to me how you 
would objectively decide that something is controversial, either.)


> So, avoiding the term "consensus"... This feature is much more 
> controversial than many other new features.

Oh my, no, not at all. We've had far, far more complaints about, say, 
headers="", or <img alt="">, or the video codecs issue. (The latter in 
particular has triggered orders of magnitude more feedback than ping="" 
ever has. We probably got more new subscribers out of the codecs debacle 
than we've ever received e-mails total on ping="".)


> > Well, what method would you propose? GET is unacceptable for a number 
> > of reasons, such as bad caching behavior with proxies. Not having the 
> > feature rather fails to address the use cases.
> 
> HEAD/GET would work when used with the proper Cache headers (and yes, 
> this was discussed before).

And as discussed before, HEAD/GET don't satisfy the requirements that we 
have (which include, basically, "don't use GET"). We need a method that 
proxies aren't going to replay, that has no caching problems (setting 
cache headers often doesn't work, sadly), and that isn't the default, so 
that people visiting the URL in a browser won't trigger the logging 
behaviour that pinging would.


> > Would the HTTP working group be willing to add a more appropriate 
> > method?
> 
> This also was discussed before; you *could* use a new method, and you do 
> not need the HTTPbis working group for that. You should try to get IETF 
> approval though.

Since you are the one who believes that using POST is a problem, and since 
you are apparently better versed in these matters than me, could you 
please do the honours? If we had a new method, I would be happy to use it. 

(I personally don't think it's necessary, though, and nor is the bulk of 
the feedback on the topic, so I'm not really interested in doing the work 
to get it approved.)


> > We already have a way to create POST requests by simply navigating a 
> > Web site. This isn't adding anything new as far as that goes.
> 
> That is incorrect, unless you count "pressing buttons" as web site 
> navigation.

It is trivial to cause users to click things. Please don't forget that you 
can cause form submission, even without script, from clicks on random text 
and images, as well as from getting the user to hit enter or the spacebar.


> >> To summarize:
> >>
> >> - it's not clear that it will be used
> >
> > A number of groups, including Google, have said they will use it.
> 
> In which case I'd propose:
> 
> - see that it gets implemented in at least one browser (Chrome comes to 
> mind)
> 
> - *demonstrate* that it's going to be used with that browser

That's what we're doing with _everything_ in HTML5. The ping="" attribute 
is certainly no exception.


> >> - the way it's implemented over HTTP is problematic
> >
> > Suggestions on improving it are welcome. I'm trying to do the best I 
> > can given HTTP's limitations.
> 
> Lots of suggestions have been made already, such as
> 
> - not doing it at all,

Hardly an improvement, since it doesn't address any of the use cases.


> - not doing it with HTTP (TimBl proposed UDP, if I recall correctly),

UDP wouldn't go through proxies. It would be nice if we could use UDP; I 
would be able to put back the UDP-based networking APIs if we could.


> - use GET/HEAD,

GET/HEAD fail at least three requirements as listed above and as discussed 
in depth in previous discussions.


> - when using POST, at least make the message self-descriptive by using a 
> body + well-defined MIME type, or

I've changed the spec to include an entity body with a new MIME type.


> - use a different method.

If you can get us approval for use of a particular method, I'd be happy to 
make the spec use it. Personally I don't understand why POST isn't enough, 
as it appears to address all our requirements without introducing any new 
problems. But I understand that you disagree.


> >> - there's no proposal for a UI that would comply with the 
> >> requirements in the spec
> >
> > There are several such proposals.
> 
> I would recommend to put these proposals as examples of potential 
> implementations into the spec, so people can review them and comment on 
> them.

As noted above, the specification is not an appropriate place for user 
interface discussions.


> >> So this is a very good example for a part of HTML5 that clearly is 
> >> not stable
> >
> > It's as stable as most parts of the document. More stable than many.
> 
> I don't see it as "stable", it's definition has changed several times in 
> the last two months (at least twice), and I personally expect it to be 
> taken out before we're done.

I didn't say it was stable, I said it was as stable as most of the spec, 
which is a statement I stand by.

With the exception of the change I just made for you, the "auditing 
hyperlinks" section hasn't changed at all since August, hasn't changed in 
a relevant way since July, and hasn't changed in a way that affects 
implementation conformance since February.


> >> and also could *easily* be specified separately.
> >
> > Not really; it's part of the vocabulary and directly affects link 
> > navigation, a pretty core part of HTML.
> 
> Now that's an interesting statement. Are you saying that the vocabulary 
> *needs* to be defined in single place, even a truly optional attribute? 

I think "needs" is a bit strong, but it certainly is preferable.


> Where does that leave us with respect to future extensibility?

I would assume that all extensions to HTML would be done in future 
revisions of the HTML spec (HTML6, HTML7, etc) just as it has been since 
the start. Experimentation with doing things in a more modular fashion, in 
particular Ruby, RFFa, and ARIA, have proved somewhat problematic. (Which 
isn't to say that those features aren't good ideas, just that having them 
be defined in separate documents has resulted in far more trouble than we 
would have had if they had been defined as part of the core language from 
the start.)


> Furthermore, while we're at it:
> 
> "For URLs that are HTTP URLs, the requests must be performed by fetching 
> the specified URLs using the POST method, with an empty entity body in 
> the request...."
> 
> This text is very misleading. You don't "fetch" URLs.

The word "fetch" is a term defined by HTML5 (hence why it is hyperlinked 
in that sentence).


> Also, POST isn't a retrieval method, it operates on the specified URI.
> The result *can* be a response (which doesn't need to represent the 
> resource in any way), plus, optionally, a pointer to a separate resource 
> from which a representation than can be fetched (Location header + 3xx).

How does this affect the text quoted above?

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'
Received on Monday, 24 November 2008 05:08:33 UTC