Re: change proposal for issue-86, was: ISSUE-86 - atom-id-stability - Chairs Solicit Proposals

Maciej wrote:
> Do you think we'll have two interoperable implementations of the
> "convert HTML to Atom" feature at all?

We'll have at least one. I've been working on one, off and on (though
mostly off) for a bit. Which is why I jumped in on this thread in the
first place. :)

Sam wrote:
> Perhaps some validators can be produced which have an option to check
> for such conditions.

For my html2atom service, I plan on the default behavior being that the
generated Atom feed is run through the Feed Validator, with some kind of
"here's how to improve your HTML for Atom conversion" document to click
on if there are validation errors.

I suggested this change to the HTML->Atom algorithm:
> Suppose there's an HTML document with several<article>s, only one of
> which triggers the "otherwise" clause of step 15, substep 9. Instead
> of throwing an exception and aborting--not producing any feed at
> all--why not just leave out that one problematic <atom:entry> from the
> resulting feed? So instead of "or ... you don't produce an Atom feed,"
> we don't produce an Atom *entry* for that specific <article>.

Ian replied:
> Wouldn't this just mean that the algorithm would fail to generate
> anything at all in most cases?

It's already the case that the algorithm won't generate anything in most
cases (as most HTML documents lack <article>s entirely).

> (I'm assuming most <article>s don't have an ID or a rel=bookmark.)

I'd guess differently, but since <article> isn't widely deployed, we're
both guessing. I expect (at least the early-)adopters of <article> will
be standards-savvy bloggers updating their hand-written templates. Such
templates are often fairly high-fidelity HTML, so I wouldn't be at all
surprised if they already a) have permalinks and b) use rel=bookmark.

> That seems like a problem, if the goal is to get each article into a
> feed.

I don't think my goal is to turn each and every <article>, even
<article></article>, into a valid <atom:entry>. That would be awesome,
but I don't see how to get there. My more modest goal is to get
<atom:entry>s out of "sufficiently high-fidelity" <article>s; I'm less
interested in lower-fidelity markup (which is unlikely to use <article>
at all anyway). That said, I believe that the algorithm should make the
best <atom:entry>s possible (which might be no <atom:entry> at all)
given any <article> as input.

> To put it another way: the goal here is that if someone wants to get
> their HTML file turned into a feed, they have a set of steps they can
> follow that reliably give a predictable result, so that they can use
> off-the- shelf software to do it and can later change to different
> software and get the same result.

Agreed, with the caveat that that someone probably only wants a feed
that won't annoy the crap out of its subscribers. I assume this is what
Sam was getting at when he mentioned NetNewsWire:

> Forgive me[...] I don't expect that the use case is "I want to produce
> something that may or may not be valid and may or may not be
> useful[...] If anything, I think the use case is "I want to produce
> something that people can use in NetNewsWire[...]

Ian wrote:
> If we make that algorithm not work[...] then people are going to
> extend the algorithm in proprietary ways (all of which still violate
> Atom), and then moving from one piece of software to another is going
> to be expensive.

I agree that this is to be avoided if possible.

Julian wrote:
> That could also be achieved by clearly stating the requirements on the HTML
> source, and by requiring the conversion not to generate broken output (but
> to abort on either the feed or entry level instead).

I'm not crazy about aborting the algorithm because of crappy input--to
me, one of the most appealing aspects of HTML5 is that we generally
define something sensible for all inputs.

I'm not even crazy about my own suggestion to drop problematic
<atom:entry>s, but doing so looks to me to move the algorithm closer to
being acceptable to you, Julian, while actually keeping the algorithm in
the spec. That's a pretty good compromise in my book.

Sam wrote:
> I'll be honest here: if we were talking about feed IDs, I would be hard
> pressed to demonstrate the practical effects of what happens if they were
> invalid or outright omitted.

Indeed. But feed IDs are far easier to "generate"--just use the URL of
the HTML document.


Ted

Received on Thursday, 15 April 2010 21:46:10 UTC