6530 – <div class="p"> is an abomination

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 6530 - <div class="p"> is an abomination

Summary: <div class="p"> is an abomination

Status:	RESOLVED WORKSFORME

Alias:	None

Product:	XML Schema
Classification:	Unclassified
Component:	Structures: XSD Part 1 (show other bugs)
Version:	1.1 only
Hardware:	All All

Importance:	P2 normal
Target Milestone:	---
Assignee:	David Ezell
QA Contact:	XML Schema comments list

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2009-02-04 13:46 UTC by Elliotte Harold
Modified:	2009-02-09 17:36 UTC (History)
CC List:	2 users (show)

See Also:

Attachments

Description Elliotte Harold 2009-02-04 13:46:53 UTC

Looking at the source code of the structures draft I find many instances of <div class="p">. I'm not sure what editor produced this, but it is ugly, unnecessary, and bad form. The underlying XHTML of a W3C spec should be more semantic and correct. If the W3C can't get this right, who can?

These should all be replaced with standard <p> tags. 

The same issue seems to appear in the datatypes draft.

Comment 1 Michael Kay 2009-02-04 14:11:45 UTC

Personal response:

I'm no expert on the technology used to generate the HTML of this specification, but I know it is extremely complex, and I suspect that the problem is that <div> elements can be nested and <p> elements can't. I guess one could do some post-processing to turn the innermost <div> elements into <p> elements, provided they satisfy all the right rules, but is it really worth it? The HTML here is for presentation only - if anyone wants access to the spec at a semantic level, they should surely be using the original XML directly.

Comment 2 Michael Kay 2009-02-04 14:18:23 UTC

Actually another point comes to mind: who says <div class="p"> is an abomination? It reminds me rather of a COBOL old-timer who looked at by COBOL code and told me that "COMPUTE X = Y + 1" was an abomination - every self-respecting programmer would write "ADD 1 TO Y GIVING X", and it was criminal to use a general-purpose construct where a special-purpose statement was available. To which the only legitimate response is: "Why?".

Comment 3 C. M. Sperberg-McQueen 2009-02-04 19:23:30 UTC

    [Executive summary: 
    The answer to the question "If W3C can't get this right, who can?" 
    is essentially "But this IS right -- or as nearly right as HTML's
    faulty document grammar allows one to get".]

I should probably fess up; the editor who produced the div elements
with class="p" is me.

The root of the problem is that two schools of thought analyse modern
technical documentation in two different ways.  One school of thought
distinguishes rigorously between character-level styles and
paragraph-level styles, and holds that objects with paragraph-level
styling do not nest.  Many word processors take essentially this view;
perhaps it simplifies the layout calculations.  The other school of
thought observes that after a short block-style example

  <eg>like this one</eg>

it is not unusual for the same paragraph -- or even the same sentence
-- to continue.

I have known intelligent, thoughtful people on both sides of this
question, and I don't want to re-argue it here.  The two schools of
thought exist, and their analyses have observable consequences for the
document grammars they write.

HTML's rules for p, ul, ol, etc. align it with the first school.  The
rules for p, list, etc. in the XMLspec / specprod vocabulary align it
with the second; this reflects its heritage from TEI (and probably
also Docbook).

Replacing all the occurrences of div with class="p" by 'p' elements
would result in an invalid document, and thus in a document
unpublishable on the W3C /TR page.

Translating from the second style to the first style is always
possible in theory, sometimes easy, and often feasible if the
stylesheet author has a high enough pain threshold, but in my
experience it can be remarkably error prone.  When I began maintaining
the editorial system, our stylesheets routinely produced invalid XHTML
for this reason among others.  We were able to change the stylesheets
to make them produce better XHTML, but chunk-level objects of many
different kinds can appear insde of specprod paragraphs, there are
very complicated interactions with the diff markup. and from time to
time the first version of a change I installed would turn out to break
something else.  As would the second through fifth versions of the
change.  Writing each fix six different ways can really eat into a
time budget.

For a while we tried tidy, but I was unable to find ways to prevent
tidy from introducing unwanted white space in semantically sensitive
locations, so we no longer use it.

Eventually, I did in the stylesheets for the XSD spec what I had long
ago done in my stylesheets for TEI markup.  Since the HTML 'p' element
does not model paragraphs as I understand paragraphs, but the HTML
'div' element does, I began translating 'p' elements in TEI (and now
in specprod) into 'div', not 'p'.  (In my TEI stylesheets, the class
attribute gets the value 'real-P', which captures my sentiments but
seemed unnecessarily truculent for the XSD spec.)

Ultimately, I guess my defense of the current markup is that 'div
class="p"' is a better semantic match for the 'p' of the source
document than the HTML 'p' element.  And unlike the HTML 'p' element
it does not require jumping through hoops to make its usage valid.
That is to say, the answer to the question "If W3C can't get this
right, who can?" is essentially "but this IS right; translating a
specprod 'p' into HTML 'p' is tag abuse".

This suggests, of course, that we ought to eliminate ALL of the 'p'
elements in the output, in favor of 'div class="p"', for consistency.
But some are produced by templates I don't own.

It might be entertaining to write a stylesheet which does nothing at
all but try to turn 'div class="p"' elements into appropriate
sequences of HTML 'p' and other elements; if nothing else, it's a good
advertisement for the grouping features of XSLT 2.0.  So I'll put this
on my someday pile.

I'm sympathetic to the cause of semantic cleanliness.  
But in the immediate future, any change would require a rather large
investment of effort, which would seem to produce either a small
benefit, or a small decrement in quality.  Given that our editorial
resources have just been cut back severely, I'm not confident this
issue will make the cut.

Comment 4 David Ezell 2009-02-09 17:36:43 UTC

On 2009-02-06 the WG decided to mark this bug as WORKS FOR ME and to take no further action.  We wish to extend thanks to Mr. Harold for reading and commenting.  The WG is indeed concerned (as is W3C) with the ability of documents to interoperate with various software implementations, and to be accessible.

The comments from Michael Kay (comment 1 and comment 2) and Michael Sperberg-McQueen (comment 3) give some justification for this decision on the part of the WG, essentially that the editors on the WG are very uncertain that the semantics in question can be replicated using the suggested changes in markup.

We hope Mr. Harold is satisfied with the WG's decision for the reasons given.