Re: looking for the use case for HTML->Atom conversion

Maciej wrote:
> Would using hAtom be a viable option for you, as the second tool apparently
> does already?

hAtom is great--I'm a big fan, and I use it already. In fact, the
widespread use of hAtom in blog templates is one of the sources of
inspiration for <article> & <time pubdate> in the first place. That
said, the converting-hAtom-to-Atom story is actually worse than the
converting-HTML-to-Atom story.

The hAtom spec[1] doesn't actually define what to use for <atom:id>.[2]
It *does* define something called the Entry Permalink like so:

* an Entry Permalink element is identified by rel-bookmark
* an Entry should have an Entry Permalink
* an Entry Permalink element represents the concept of an Atom link in
  an entry
* if the Entry Permalink is missing, use the URI of the page; if the
  Entry has an "id" attribute, add that as a fragment to the page URI to
  distinguish individual entries

So the Entry Permalink is the equvalent of <atom:link>, not <atom:id>.
Also, note the use of RFC2119 SHOULD and the fallback, for every entry,
to the document URL.

The non-normative hAtom parsing document[3] says to use the Entry
Permalink for <atom:id> as well as for an <atom:link>, and this is
almost what hAtom2Atom implements. If you ran a (valid hAtom) page with
several entries, all of which fail to provide a permalink (and lack
id="") through hAtom2Atom, the resultant <atom:entry>s wouldn't all have
the page's URL for their <atom:id>s, as hAtom specifies. That would be
bad enough, but hAtom2Atom doesn't implement the fallback to the
document URL--it generates empty <atom:id/>s instead, and so produces
invalid Atom. Here's a test case:

http://edward.oconnor.cx/tests/html5/ISSUE-86/hAtom-no-id.html

Run through hAtom2Atom:

http://lukearno.com/projects/hatom2atom/?url=http://edward.oconnor.cx/tests/html5/ISSUE-86/hAtom-no-id.html&ctype=application/atom%2Bxml&tidy=yes

So converting hAtom to Atom with hAtom2Atom suffers from worse Atom
conformance issues than the HTML5 spec's HTML to Atom algorithm. Empty
<atom:id/>s are worse than unstable <atom:id>s in my book. Software that
implemented the Entry Permalink fallback correctly would suffer from a
worse <atom:id> story too, because in the above scenario all of the
distinct <atom:entry>s in the feed would share the same <atom:id>.


Ted

1. http://microformats.org/wiki/hatom
2. There are several open hAtom issues related to feed and entry IDs:
   http://microformats.org/wiki/hatom-issues#Entry_id_.28atom:id.29
   http://microformats.org/wiki/hatom-issues#Feed_id_.28atom:id.29
   http://microformats.org/wiki/hatom-issues#Relationship_of_rel-bookmark_to_url.2Buid
   http://microformats.org/wiki/hatom-issues#add_url_property_to_hentry
3. http://microformats.org/wiki/hatom-parsing

Received on Thursday, 15 April 2010 18:01:16 UTC