Re: "Outline" algorithm (document length and complexity) from jgraham@opera.com on 2009-05-26 (public-html@w3.org from May 2009)

From: <jgraham@opera.com>
Date: Tue, 26 May 2009 20:01:09 +0000
To: Larry Masinter <masinter@adobe.com>
Cc: HTML WG <public-html@w3.org>
Message-ID: <20090526200109.l5puwwaeaso4wsks@staff.opera.com>
Larry Masinter wrote:
> The document is currently over 930 pages when printed
> "letter" size. The first complaint I get from implementors
> wanting to review the specification is that it is
> unreviewable: too long, to complex, too difficult to
> review individual sections, too difficult to find
> the definition of terms, or where things are used.

Who are these implementors? What are they (planning on) implementing? Is
it possible to encourage them to speak for themselves? If there are
really significant issues with the document they should, of course, be
fixed, but not at the expense of making the document incomplete.

> My calling out this section was part of the review of the
> use of pseudo-code algorithmic specifications. It is
> well known  that it is difficult to verify whether an
> algorithm produces expected results,

Can you provide pointers to back this up, please?

   and even more
> difficult to determine whether two algorithms produce
> equivalent results, which someone wishing to test
> conformance would have to do.

My understanding is that typically conformance is ascertained by running
testsuites rather than by attempting formal proofs of the equivalence
between two algorithms. So the difficulty associated with doing the
latter seems inconsequential to the current discussion. Admittedly there
is some burden on a developer who wants to achieve reasonable certainty
that their implementation has the same behavior as the spec text.
However this difficulty exists in any scheme where the description in
the spec does not translate directly into production code. I am far from
convinced that an initially algorithmic style makes this more of a problem.

   Expressing normative
> requirements in terms of sets of constraints which the
> results of the implementation must satisfy is far
> preferable from the point of view of validation,
> testing, and document review.

Do you have some evidence to back that up? My experience is that an
algorithmic style of specification makes it rather easy to determine
what the expected behaviour is. It has worked well for me when
implementing various parts of the HTML 5 specification (parsing, table
structure + headers, outline, microdata) although I was generally not
interested in making the most optimal implementation. It has also worked
well for me when QAing implementations of other specifications that use
a similar algorithmic style e.g. ECMAscript (as an aside, I will note
that the presentation of the algorithms can, of course, make a big
difference; the switch from a goto-based style in ECMAScript 3 to a
loop-based style in ECMAScript 5 significantly improved the readability
of the spec). Indeed for many things I have no idea how one would convey
the equivalent semantics in a non-algorithmic style (it is worth noting
that informative text specifying the intended output of the algorithm is
often helpful; maybe more of this would address your primary concern?).

> If we are concerned about whether the document can be
> reviewed, then an algorithmic normative section that
> is also lengthy is even more egregious. Being
> "precise" in this way may be counter-productive,
> if no one is really capable of evaluating the
> precision.

The idea that no one can evaluate these sections is demonstrably false;
there already exist implementations of most of the algorithms in HTML 5.
For example there is an implementation of the outline algorithm at [1].
The source code [2] contains extensive comments taken from the spec
text. In writing the program the author was able to review the spec
(e.g. [3]). Since there is a requirement that the WG produce testcases,
we will produce testcases that verify the implementation matches the
spec. Such an implementation is helpful for other people looking to
understand the spec.

Of course there are other ways to review the spec than by
implementation but for some parts, testcases and implementations are the
only reasonable approach. The parsing section is an example of this; it
needs to match hat implementations are prepared to ship. The only way to
determine if it actually does match what implementations are prepared to
ship is to implement it in mass-market implementations and see where it
fails on the existing corpus of web documents.

> While there may be other applications which want a common definition  
> of "outline", those other applications
> have their own requirements, ways of determining
> conformance, and constraints which this working group
> is not in a position to review. There is no way of
> determining conformance, for example. There are no
> requirements for "outline" against which this particular
> outline algorithm can be reviewed.

I really don't understand this position. The outline is a property of
the document; its logical structure. Explicitly marking such structure
in documents is commonplace; the HTML 5 spec itself is a good example.
Saying that HTML shouldn't define how the elements of the document form
a logical structure is rather like saying that it shouldn't define the
meaning of <em> because different consumers might want to interpret it
differently.

The utility of outline tools is in their ability to present that logical
structure in a way that is useful to the end user e.g. as a table of
contents, as a sidebar, or as a set of position-dependent navigation
commands in a voice browser. Regardless of the specific presentational
form chosen by a given tool, the underlying document stucure is
something that all tools should agree on. That is what this section defines.

> In any case, if the "outline" specification has
> no application within HTML itself, then it can
> be put in a separate document and processed
> independently.

I find the concept of defining the core semantics of HTML in a document
outside HTML itself to be utterly bizzare.


[1] http://gsnedders.html5.org/outliner/
[2]
http://hg.gsnedders.com/anolis/file/b6d93515d41e/anolislib/processes/outliner.py
[3]
http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2008-June/015083.html
Received on Tuesday, 26 May 2009 20:01:51 UTC