Bug 7829 - [SER] Serialization of minimized attributes.
Summary: [SER] Serialization of minimized attributes.
Status: CLOSED FIXED
Alias: None
Product: XPath / XQuery / XSLT
Classification: Unclassified
Component: Serialization 1.0 (show other bugs)
Version: 2nd Edition Recommendation
Hardware: PC Windows NT
: P2 normal
Target Milestone: ---
Assignee: Henry Zongaro
QA Contact: Mailing list for public feedback on specs from XSL and XML Query WGs
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-10-07 16:28 UTC by Oliver Hallam
Modified: 2010-06-29 13:53 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Oliver Hallam 2009-10-07 16:28:33 UTC
Quoting section 7.2 of the serialization spec:

The HTML output method MUST output boolean attributes (that is attributes with only a single allowed value that is equal to the name of the attribute) in minimized form.

You could argue that no attributes in the HTML DTDs have "a single allowed value that is equal to the name of the attribute" -- The HTML DTDs are case insensitive, and so every boolean attributes has several allowed values, that differ only by case.

Take the following example:

<input readonly="READONLY" DISABLED="no" />

The most logical serialization of this is as follows:

<INPUT readonly DISABLED="no"></INPUT>

but as far as I can tell, the spec dictates that it should be:

<INPUT readonly DISABLED></INPUT>

My reasoning is as follows:

The disabled attribute is an attribute that only has a single allowed value (ignoring case differences) of "DISABLED", and so it is minimized, even though the value in the instance document is not this value.

I would suggest the following changes:

<new>
The HTML output method MUST output boolean attributes in minimized form.  A boolean attribute is an attribute whose name is equal to the value of the attribute regardless of case, and which are defined in the HTML 4.01 DTDs with a single allowed value (disregarding case differences) that is equal to the name of the attribute.
</new>

It is also worth a note somewhere to clarify that the HTML 4.01 DTDs are case insensitive for element names and attribute names, and for some attribute values.

Other places in the spec could also be more clear as to case insensitivity.  The only place that namespace insensitivity is mentioned is in section 7.1:

The HTML output method MUST recognize the names of HTML elements regardless of case. For example, elements named br, BR or Br MUST all be recognized as the HTML br element and output without an end-tag.

This could be interpreted that the element names are output in lower case which I believe is not the intention.


Take errata SE.E9:

Whitespace MUST NOT be added or removed inside a formatted element, the formatted elements being pre, script, style, and textarea.

It is not immediately clear that these element names are referring to the entries in the HTML DTDs and are therefore case insensitive.
Comment 1 Oliver Hallam 2009-10-07 17:31:44 UTC
It could also be that

<INPUT readonly DISABLED></INPUT>

is the desired result.

The strongest argument for this is that an implementation is free to test for boolean attributes at any point during serialization; whereas if the value is important then it must be handled before character replacement.

I suspect it is better to leave this difference implementation-dependant.

In this case I would still argue for the definition to be changed, but to something more like the following:

<new>
The HTML output MUST output boolean attributes in minimized form.  A
boolean attribute is an attribute which is defined in the HTML 4.01 DTDs with
a single allowed value (disregarding case differences) that is equal to the
name of the attribute.

If the value of a boolean attribute is not equal to its name disregarding case differences then an implementation MAY choose to treat it as a non-boolean attribute.
</new>

This allows for most flexibility in matching what implementations already do.
Comment 2 Henry Zongaro 2009-11-26 21:19:35 UTC
Thank you for reporting this problem and for your thoughtful analysis.

In the past the working groups have ruled that instances of the Data Model that do not adhere to the HTML DTD should be serialized as faithfully as possible.  For instance, according to erratum SE.E7,[1] an element whose content model is empty must be output using only a start tag (for HTML) or an empty element tag (for XHTML) if the element node in the Data Model instance actually has no children.

So, I propose the following change:

. In section 7.2, replace the second paragraph with, "A boolean attribute is an attribute with only a single allowed value in any of the HTML DTDs, where the allowed value is equal without regard to case to the name of the attribute.  The HTML output method MUST output any boolean attribute in minimized form if and only if the value of the attribute node actually is equal to the name of the attribute without regard to case."

[1] http://www.w3.org/XML/2007/qt-errata/xslt-xquery-serialization-errata.html#E7
Comment 3 Henry Zongaro 2009-12-02 11:31:58 UTC
At the joint teleconference of the XQuery and XSL Working Groups of 2009-12-01,[2]
the proposal in comment #2 was accepted.  As only a few members of the XSL WG were present on the call, I will bring the proposal back to that working group for final ratification.

[2] http://lists.w3.org/Archives/Member/w3c-xsl-query/2009Dec/0005.html (Member-only link)
Comment 4 Henry Zongaro 2009-12-03 19:58:26 UTC
At its teleconference of 2009-12-03,[3] the XSL Working Group ratified the decision reported in comment #2.  Oliver, as you were present at the joint call of 2009-12-01, I will assume this resolution is acceptable to you.

This will be Serialization erratum SE.E14.

[3] http://lists.w3.org/Archives/Member/w3c-xsl-wg/2009Dec/0008.html