Re: Change proposal for ISSUE-56 from Sam Ruby on 2010-06-23 (public-html@w3.org from June 2010)

From: Sam Ruby <rubys@intertwingly.net>
Date: Wed, 23 Jun 2010 06:19:13 -0400
To: Adam Barth <w3c@adambarth.com>
CC: HTML WG <public-html@w3.org>
Message-ID: <4C21DFA1.7070607@intertwingly.net>
This change proposal contains three clearly worded objections, and 
further historical data to back up those objections.  These objections 
will prove helpful at the later stages in the resolution of this issue. 
  Meanwhile, this change proposal needs to be updated in order to 
specify the text to be restored, and then to provide rationale for this 
text.

- Sam Ruby

On 06/16/2010 01:40 AM, Adam Barth wrote:
> (Apologies if I've missed the deadline for submitting a Change
> Proposal for this issue.  Roy only recently explained this issue to
> me.)
>
> == Summary ==
>
> There is no need to align "URL" processing in HTML documents with the
> IRI specifications because HTML documents do not contain IRIs (or URIs
> for that matter).  We should restore the removed text that explained
> how to translate input strings contained in text/html documents into
> URIs.
>
> == Rationale ==
>
> ISSUE-56 was raised in error by Michael(tm) Smith based on a
> misunderstanding of Roy's messages to the working group.  Roy said
> that "pretending to define a new URL standard as part of HTML5 is not
> acceptable ... HTML will never define the identifiers for the Web.
> That would be a fundamental violation of the Web architecture."  Based
> on my current understanding of the web architecture and of how a
> sequence of characters in a text/html document becomes a URI, he is
> correct.  However, that does not imply that we ought to remove the
> "URL" processing requirements from the HTML5 specification.
>
> In a recent message to the IRI working group [1], Roy writes:
>
> [[
>
> RFC 3986 defines how to parse URIs (for recipients) and provides
> many rules for scheme-specific specs to define how to generate URIs
> of a given scheme (for producers) within the overall constraint of
> matching the URI syntax (the formal ABNF).
>
> [...]
>
> Please understand that browsers almost never parse URI or IRI or
> anything in between.  Browsers have input strings that contain one
> or more references, usually in the document encoding, and so there
> is a sequence of context-specific and charset-specific and
> media-type-specific processing that occurs before you even get to
> the individual URI-reference or IRI-reference that are defined by
> 3986/3987.
>
> Some people have proposed that most of that pre-processing be added
> to the IRIbis spec, but I have seen no evidence to suggest that
> such pre-processing is even remotely standardizable (it seems to
> be different for every input context).  If you can demonstrate or
> get agreement on a single way to preprocess an input string, or at
> least a few named processes (like single-ref and multi-ref), then
> that would be useful.
>
> ]]
>
>> From this more detailed message, it appears that it is fully
> appropriate for HTML5 to define an algorithm for translating input
> strings containing one or more references into one or more URIs (or an
> IRIs, as appropriate).  In particular, Roy expects such translations
> to be context-specific, charset-specific, and (importantly)
> media-type-specific.  To wit: HTML5 ought define the pre-processing
> rules that are specific to the text/html media type.
>
> To lend even more credence to this rationale, I quote from the very
> same email message [2] written by Roy that Michael(tm) Smith cited in
> the description of ISSUE-56.  This quote was omitted from the
> description of ISSUE-56 for reasons unknown to me:
>
> [[
>
> I suggest that the section be removed or replaced with the
> limited and specific needs for parsing href and src attribute
> values such that the attribute's value string is mapped to a
> URI-reference with a defined base-URI.  HTML owns that process
> of extracting a valid URI-reference from an attribute's value
> string.  A simple string parsing description, with associated
> context-specific error-handling, is more than sufficient to
> satisfy the needs of HTML5 without appearing to override an
> existing standard that has recently been agreed to by all
> vendors, including the few browser vendors that care about HTML5.
>
> ]]
>
> In effect, this change proposal urges the working group to adopt Roy's
> proposal: HTML5 should define how to extract a URI-reference from
> strings contained in text/html documents, complete with
> context-specific error handling.
>
> For those that prefer rationales expressed in terms of objects, this
> change proposal makes the following objections:
>
> 1) I object to HTML5 deferring to RFC 3987 for parsing input strings
> containing one or more references because RFC 3987 does not define an
> algorithm for parsing input strings containing one or more references
> that takes into account the context-specific, charset-specific, and
> media-type-specific rules required by user agents to interoperably
> parse such input strings in text/html documents.
>
> 2) I object to HTML5 being blocked in the IRIbis working group for
> defining an algorithm for extracting URI-references from strings
> contained in text/html documents for two reasons:
>    a) Defining such an algorithm is out of scope for that working
> group's charter [3] because these strings are not IRIs and therefore
> are not subject to the requirements contained in RFC 3987.
>    b) The IRIbis working group has made essentially no technical
> progress since its inception.  To wit: the working group has published
> only a -00 version of a single Internet-Draft.  In contrast to Larry's
> claim in his change proposal, the mailing list is essentially dead:
>      i) There have been only two message in June.
>      ii) The messages in May consisted (essentially) of a discussion of
> how to render BIDI URIs on billboards.
>      iii) The messages in April consisted of coordinating with this
> working group.
>
> 3) I (strongly) object to HTML5 not defining how to interoperably
> process a hyperlink because a hyperlink is the essential feature of a
> *hypertext* markup language.
>
> == Proposal Details ==
>
> The proposal details herein takes the form of a set of edit
> instructions, specific enough that they can be applied without
> ambiguity.
>
> 1) Restore the removed text regarding translating input strings
> containing one or more reference into one or more URIs.
> 2) Update the surrounding text to distinguish between these input
> strings and the URIs to which they are translated.
>
> == Impact ==
>
> 1) Positive effects: User agents will be able to implement
> interoperable error handling for translating strings in HTML documents
> into URIs.
> 2) Negative effects: Readers of the HTML5 specification will need to
> learn the difference between these input strings and the URIs they
> represent.
>
> Q: What conformance classes will have to change?
> A: User agents.
>
> Q: What are the risks?
> A: We might actually be able to process hyperlinks interoperably,
> leading to joy and happiness.  With so much joy in the work, purveyors
> of whisky might go out of business.
>
> [1] http://lists.w3.org/Archives/Public/public-iri/2010May/0008.html
> [2] http://lists.w3.org/Archives/Public/public-html/2008Jun/0435.html
> [3] http://tools.ietf.org/wg/iri/charters
>
Received on Wednesday, 23 June 2010 10:20:09 UTC