Warning:
This wiki has been archived and is now read-only.

ChangeProposals/BIUHaveNoSemantics

From HTML WG Wiki
Jump to: navigation, search

<b>, <i> and <u> should have no semantics

Started by Kang-Hao (Kenny) Lu, amended by Henri Sivonen (you are free to make any change as long as the conclusion is to make <u> conforming, but please add your name here if you edit the wiki page)

Summary

There is, although diverse in terms of the purposes, significant amount of use cases of <u> that are not covered by other elements. These use cases include proper nouns marks in Chinese documents, misspelled words and imaginably other typography convention in the world. The <u> element should be reintroduced as having no semantics. <b>, <i> should have no semantics to be in consistent with <u>.

Introduction

Making HTML a semantic language rather than a presentational language has been a design goal since it starts. HTML5 deprecates a fair amount of presentational features but the popularity of certain elements (<b>, <i>, <hr>, <s>, <small>) makes dropping them unrealistic, and these elements are redefined in HTML5 to be media-independent. The contents the <b> and <i> elements represent on the Web cannot be easily described and the editor choose the following strategy:

  • enumerate a subset of common use cases (also known as semantic tig leaf)
  • build an unbounded set by defining these elements as "a span of text offset from the normal prose/presentational mode, whose typical typographic presentation is bolded/italicized."

However, the editor refuses to define the <u> element in similar way and claimed that <u> is far more presentational than <b> and <i> without giving details.

Rationale

Why is <u> needed?

  • The element has been interoperably implemented and deployed for a long time, and it is the sixth frequently used phrase element, meaning that it is significantly needed. The following software outputs the <u> element for underlining:
  • Requiring HTML5 conformance checker to report errors when authors use <u> will mask other conformance messages that are far more important. It is also better not to add unnecessary complications to authoring tool developers. Even if authoring tool developers are willing to make their tools conforming with the current draft of HTML5, in which <u> is missing, the tools are likely to output source code like <span class="s1">, which is unnecessarily long and not semantic either. In a WYSIWYG editor, asking the user for a reason to use certain typographical feature is a non-starter for usability.
  • WYSIWYG editors traditionally provide three buttons: bold, italic, underlined. Normal users find it weird that <b>, <i> and <u> have different status. They should either be
    • all conforming (defined as either semantic element, presentational element or mixed element)
    • all deprecated but conforming
    • all non-conforming
  • Automated (e.g. OCR for paper or just plain file format converters) conversion of non-Web-native documents needs to deal with inline styling somehow to avoid data loss and cannot guess the semantics of input. For italics, bold and strike-through, there are <i>, <b> and <s>. It is weird to pretend that <u> isn't likewise available. Using CSS would not solve the data loss issue to the extent CSS is considered optional (i.e. not necessarily honored for presentation). Sometimes a human is supervising conversion, but the subject matter is so delicate that the human wants to refrain from applying semantic judgement. (For example, at http://hsivonen.iki.fi/mustaa-valkoisella/ an official print document is reproduced with underlining intact without making judgement about what the document tries to signify.)
  • There are significant use cases that cannot be covered by other elements, such as proper noun marks in Chinese and misspelled words.

Why should <b> and <i> have no semantics?

  • Compared to <s>(for wrong info) and <small>(for side comments), <b> and <i> do not even have a coherent use case. What do technical terms have to do with alternative voices? What do key words have to do with product names?
  • It better reflects the reality.
  • The use of <b> and <i> without the class attribute is known to be harmful for localization. For example, the current draft uses <i> for both technical terms and alternative voices ("This section is non-normative"). Defining <b> and <i> as semantic elements encourages authors to omit the class attribute, which is a bad thing.
  • Every use of <i> or <b> claimed to be semantic is essentially ambiguous and questionable due to the element's incoherent nature. "This section is non-normative" is certainly not a technical term ("non-normative" is), but then why do screen readers have to read this sentence in alternative voices? Why isn't <small> (for side comments) used in this case?
  • For <i>, the wording "or some other prose whose typical typographic presentation is italicized" makes this element totally inconsistent because italic type does not appear in traditional Asian documents for either technical terms or ship names or whatsoever.
  • It is hard to imagine that authors would ever converge to use these elements as semantic elements, given the ambiguity.
  • It was mentioned that the "alternative voices" meaning of <i> could be useful for accessibility tools, but this is questionable as long as <i> could also be used for terms.

Details

Recognize that there's no way for authors and authoring tools developers to understand and correctly use these elements as semantic elements due to status quo, diverse use cases, usability issues of WYSIWYG editors, and that "stylistically offset for normal prose", the wording for <b> and <i>, is too general and confusing. We could:

  1. Reintroduce <u>. Tag all <b>, <i> and <u> as having no semantics(=? presentational), i.e. equivalent to <span>. They represent their children. If necessary, separate them, along with <span> from 4.6 into a new section called Text-Level Grouping Elements
  2. Retain the use cases of <b> and <i>. Add cases for <u> (misspelled words, proper noun marks). These are valid use cases of these elements since they are not covered by other elements, besides themselves + <span>.

Impact

Positive Effects

  • An interoperable way that makes more sites conforming. Sites developers can focus on other issues that are more important.
  • Consistent with existing content.
  • Reduce internet traffic. <i> (or <b> or <u>) can be used in place of <span>

Negative Effects

  • Authors will have an excuse not to use appropriate markup for applying underlines. (e.g. insertion, emphasis, etc.)
  • Have duplicate functionality(<b>, <i>, <u> and <span> are the same) semantically.

Conformance Classes Changes

HTML conformance checkers. Authoring tools would not need to change.

Risks

None.

References