ChangeProposals/UShouldBeConforming

From HTML WG Wiki
< ChangeProposals
Revision as of 20:07, 3 April 2011 by Kennyluck (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

<u> should be conforming

Started by Kang-Hao (Kenny) Lu, amended by Henri Sivonen and Aryeh Gregor (you are free to make any change as long as the conclusion is to make <u> conforming, but please add your name here if you edit the wiki page)

Summary

There are too many drawbacks if <u> is made non-conforming. The use cases of this element are mainly content generated by authoring tools. Minor use cases such as proper nouns marks in Chinese documents, misspelled words also exist.

Introduction

Making HTML a semantic language rather than a presentational language has been a design goal since it starts. HTML5 deprecates a fair amount of presentational features but the popularity of certain elements (<b>, <i>, <hr>, <s>, <small>) makes dropping them unrealistic, and these elements are redefined in HTML5 to be media-independent. The contents the <b> and <i> elements represent on the Web cannot be easily described and the editor choose the following strategy:

  • enumerate a subset of common use cases (also known as semantic tig leaf)
  • build an unbounded set by defining these elements as "a span of text offset from the normal prose/presentational mode, whose typical typographic presentation is bolded/italicized."

However, the editor refuses to define the <u> element in similar way and claimed that <u> is far more presentational than <b> and <i> without giving details.

Rationale

  • The element has been interoperably implemented and deployed for a long time, and it is the sixth frequently used phrase element, meaning that it is significantly needed. The following software outputs the <u> element for underlining:
    • Internet Explorer (tested in 9 RC)
    • Firefox (tested in 4b11)
    • Chrome (tested in 10.0.642.2 dev)
    • Safari (tested in 5.0.3)
    • Opera (tested in 11.00)
    • Thunderbird 3.17
    • Word 2002
    • GMail
    • OpenOffice.org 3.2.0
    • (see also softwares that don't output <u>)
  • (Browsers were tested using the following markup: <!doctype html><button onclick="execCommand('underline', false, null)"><u>U</u></button><div contenteditable=true><p>Hello there!</div>. Firefox requires an extra execCommand('useCSS', false, true); before it will use <u> – it defaults to using CSS.)
  • Requiring HTML5 conformance checker to report errors when authors use <u> will mask other conformance messages that are far more important. It is also better not to add unnecessary complications to authoring tool developers. Even if authoring tool developers are willing to make their tools conforming with the current draft of HTML5, in which <u> is missing, the tools are likely to output source code like <span class="s1">, which is unnecessarily long and not semantic either. In a WYSIWYG editor, asking the user for a reason to use certain typographical feature is a non-starter for usability.
  • WYSIWYG editors traditionally provide three buttons: bold, italic, underlined. Normal users find it weird that <b>, <i> and <u> have different status. They should either be
    • all conforming (defined as either semantic element, presentational element or mixed element)
    • all deprecated but conforming
    • all non-conforming
  • <u> should not be invalid just because some other obsolete presentational features like <font color=red> or <p align=right> are, because:
    1. The length savings are much greater. <u></u> is seven characters, while <span style=text-decoration:underline></span> is 45 characters, over six times longer. This greatly harms both typability and readability. In contrast, markup like <span style=color:red> is less than twice as long as its obsolete counterpart, while retaining the benefits of using pure CSS (i.e., authors don't need to learn two separate formatting languages).
    2. <u> is more comparable to other tags that were declared valid, such as <b>, <i>, <s>, and so on. It's more inconsistent to leave it invalid than to make it valid.
    3. It's possible that some other presentational markup should in fact be made valid in the future. This shouldn't stop us from considering each individual tag or attribute one by one. We should aim for the final specification to be consistent, but it's not essential that every Working Draft is fully consistent. Consistency is best evaluated at a later stage of specification maturity, when we have a better idea of what the final feature set will be.
    4. The chairs have already ruled that consistency does not preempt other legitimate concerns. Even if there were some increased inconsistency from adopting this change proposal, that's not enough reason not to adopt it unless all the concerns raised herein are shown to be so negligible that even an academic concern like consistency takes precedence. Insofar as this rationale highlights real, concrete costs to authors of making <u> invalid (such as having to change WYSIWYG authoring tools), it can only fairly be rebutted by real, concrete costs of adopting it. Consistency is largely a question of theoretical purity, which ranks last in the priority of constituencies.
    5. Although the editor has currently chosen to make some types of presentational markup invalid, that was his decision, not the Working Group's. If the Working Group decides to make <u> valid, and the editor believes this harms consistency, he is free to make other markup conforming to restore consistency. He has done this before: when asked to remove microdata from HTML5, he responded by splitting off a number of other pieces as well, with the aim of being more consistent. If the editor wishes to argue that this change proposal would make the specification less consistent, he should also have to argue that either a) he could not restore consistency without violating other Working Group decisions, or b) restoring consistency by making more markup conforming would be undesirable for some reason.
  • Automated (e.g. OCR for paper or just plain file format converters) conversion of non-Web-native documents needs to deal with inline styling somehow to avoid data loss and cannot guess the semantics of input. For italics, bold and strike-through, there are <i>, <b> and <s>. It is weird to pretend that <u> isn't likewise available. Using CSS would not solve the data loss issue to the extent CSS is considered optional (i.e. not necessarily honored for presentation). Sometimes a human is supervising conversion, but the subject matter is so delicate that the human wants to refrain from applying semantic judgement. (For example, at http://hsivonen.iki.fi/mustaa-valkoisella/ an official print document is reproduced with underlining intact without making judgement about what the document tries to signify.)
  • There are use cases that cannot be covered by other elements, such as proper noun marks in Chinese and misspelled words.

It is worth mentioning that, compared to <s>(for wrong info) and <small>(for side comments), <b> and <i> do not even have a coherent use case. What do technical terms have to do with alternative voices? What do key words have to do with product names?

Details

(Note: This section outlines a specific proposed change, because change proposals must provide change details that can be applied unambiguously. However, any change that makes <u> conforming is acceptable, since the arguments above don't depend on exactly how it's defined or other details. If the editor would prefer to, e.g., define <u> and/or other elements as presentational, he should feel free to do so even if this Change Proposal is adopted, provided <u> remains fully conforming.)

Redefine <u> as a (vaguely) semantic element. Proposed wording:

The u element represents a span of text to be stylistically offset from the normal prose, such as misspelled words, proper nouns in Chinese, or other spans of text whose typical typographic presentation is underlined. (adapted from <b>)

with a sentence saying that this element is used as a last resort, similar to <b> and <i>. Then this element arguably has slight semantics, the "offset from the normal prose" semantics, as compared to <span>, which has no meaning.

Notice that "in Chinese" is important because, as the editor mentioned, the <u> element will not be used correctly without mentioning "in Chinese". This proposal also suggests "ship name" in 4.6.16 to be corrected as "ship name in Western typography" for English readers who do not know this convention.

Impact

Positive Effects

  • An interoperable way that makes more sites conforming. Sites developers can focus on other issues that are more important.
  • Consistent with existing content.
  • Reduce internet traffic.

Negative Effects

  • Authors will have an excuse not to use appropriate markup for applying underlines. (e.g. insertion, emphasis, etc.)

Conformance Classes Changes

HTML conformance checkers. Authoring tools would not need to change.

Risks

None.

References