Re: change proposal for issue-86, was: ISSUE-86 - atom-id-stability - Chairs Solicit Proposals from Maciej Stachowiak on 2010-04-15 (public-html@w3.org from April 2010)

From: Maciej Stachowiak <mjs@apple.com>
Date: Thu, 15 Apr 2010 02:05:30 -0700
To: Ian Hickson <ian@hixie.ch>
Cc: Julian Reschke <julian.reschke@gmx.de>, Sam Ruby <rubys@intertwingly.net>, "public-html@w3.org WG" <public-html@w3.org>
Message-id: <22204E59-FE43-421E-9801-7AEDB51B64D3@apple.com>

On Apr 15, 2010, at 1:49 AM, Ian Hickson wrote:

> On Thu, 15 Apr 2010, Julian Reschke wrote:
>> On 15.04.2010 01:08, Ian Hickson wrote:
>>> ...
>>>>> Basically making this a MUST would lead to implementations having
>>>>> to violate the spec to do anything useful. When we require that
>>>>> implementations violate the spec, we lead to them ignoring the
>>>>> spec even when it's not necessary.
>>>>
>>>> Based on my experience with feeds (predating Atom), this part of  
>>>> the
>>>> spec will not be ignored.  Users will write bug reports against the
>>>> software that implements the algorithm.
>>>
>>> If a feed producer has to invent an ID from nothing, and doesn't  
>>> know
>>> what ID it used in the past, yet the spec uses "MUST" here, how
>>> exactly can it do anything _but_ ignore the spec?
>>
>> Either you store the ID with the item, or you derive the ID from
>> something sufficiently unique in the set of items, or ... you don't
>> produce an Atom feed.
>
> Storing an ID doesn't work when you're on a read-only medium.
>
> A hashing mechanism by which we can generate an ID from a DOM tree  
> would
> end up making identical posts in unrelated Atom feeds have the same  
> ID,
> and would mean that minor edits (e.g. typo fixes) would generate new  
> IDs,
> both of which would cause all kinds of problems, as Sam pointed out.
>
> Not producing a feed doesn't solve the problem of producing a feed.

If you take a strong view of the Atom requirements, it seems like the  
only possible way to convert HTML to Atom that would conform to the  
Atom spec is to include the Atom IDs in the HTML in the first place.  
That's assuming the HTML document is the sole input and there isn't  
some out-of-band mapping to IDs.

Consider: if the same IDs must be produced even if individual posts  
are edited, the same IDs must be produced even if the feed is  
regenerated on another machine with a completely different tool, and  
yet at the same time identical text in unrelated feeds must not  
produce the same ID, then there is no way to solve the problem with  
unadorned HTML as the sole input.

That seems like a flaw in the Atom spec if that's really what it  
means, but on the other hand it seems wrong to specify a feature for  
producing Atom that can't possibly be implemented in a way that  
conforms to Atom. In other words, I think even the current SHOULD is  
unimplementable. No conversion tool can prevent sending the input it  
was given to another tool, or know that this wasn't done already.

> If you prefer a process-based argument: we can't progress past CR if  
> we
> can't find two interoperable implementations of every feature. If  
> one of
> the features is "you must make consistent IDs for HTML <article>s when
> converting them to Atom, even if they are slightly modified from  
> time to
> time, and even if you have no persistent storage, and you must not  
> create
> IDs that conflict with other entries", then we won't be able to exit  
> CR,
> because that requirement is not implementable.

Do you think we'll have two interoperable implementations of the  
"convert HTML to Atom" feature at all? It certainly doesn't seem like  
something browsers would implement. Do we have any implementors of any  
kind of HTML processor who have expressed interest in implementing  
this feature? I am increasingly leaning towards having this feature  
out, because it really seems like the only options are to have an  
unimplementable MUST, or to violate the Atom spec.

Regards,
Maciej

Received on Thursday, 15 April 2010 09:06:06 UTC