IncludeRB

From HTML WG Wiki
Revision as of 03:12, 23 January 2012 by Lsilli (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Change Proposal for ISSUE-172: Include the rb element

Leif Halvard Silli 03:12, 23 January 2012 (UTC)

Summary

This proposal requests:

  1. that the rb element, which is currently considered obsolete, is included — for the following reasons:
    • legacy documents: rb inclusion makes legacy mark-up (in the wild and per XHTML 1.1 where rb is is mandatory) HTML5-compatible.
    • legacy tools: rb inclusion allows tools to operate with largely identical ruby modes for XHTML 1.1 simple ruby and (simple) HTML5 ruby (instead of operating two modes that differ greatly) and, as well, allows authors to use XHTML 1.1 tools to create HTML5-compatible ruby content.
    • native wrapper: that rb only belongs in the ruby format, allows tools to auto-insert it together with the parent ruby element, something which is often practical compared with manually adding a wrapper as an afterthought. For contrast, the current HTML5 alternative will never offer the same feasibility, since:
      1. the alternative wrapper (e.g. span) is not part of the format, and thus one would have to manually add it (for a contrast, when the editor oXygen inserts the ruby element in an XHTML 1.1. document, then the rb element gets automatically added so that the author can start to type inside it, much like many editors, when inserting a dl or a table, also inserts the required children elements);
      2. when deleting (with a browser based WYSIWYG tool) the content of a non-native wrapper, the wrapper itself will often be deleted too (this in order to prevent littering the code with stray, empty elements — XStandard does this, see the heading “Emtpy Tags”), whereas an empty element that is part of the format itself, could just remain in the code, ready to be refilled with content.)
    • CSS2 selector: the rb element makes a cross browser CSS2 selector (rb{}) that, as well — and unlike e.g. span{} only selects ruby base text. In order to offer a similar feature without inclusion of rb, HTML5's editor has proposed extending the old and largely unimplemented CSS3 Generated and Replaced Content Module with, quote “a pseudo-element that can style certain spans of descendants); the flip side of ::outside. His proposed “flip side of ::outside” does, however (according to CanIuse.com) have zero implementation. And that the editor as well questions the need to select the ruby base text at all, doesn't make this option any more credible. Note: For simple ruby, it is not uncommon (examples: one, two (two a)) to use ruby{display:inline-table} in combinatino with rb{display:table-row-group; /* or similar, from CSS tables */}. And it might be that removing the rb makes the styling less robust, but other than that, it seems to work also without rb
    • CSS2 backup styling: HTML5 prescribes the style rules ruby{display:ruby;}rt{display:ruby-text;}, which stem from the CSS3 ruby module, but which none of the common browsers fully support yet (no, not even Webkit or IE, despite that they have some ruby support). Thus, if one wants it to look ruby in Opera and Firefox, then one must hack up som backup CSS that works, and a wrapper element for the ruby base text may then come in handy — even if it is not impossible to make it work without it: demo of styling that works in Opera + Firefox.
    • metadata readiness: A wrapper around the ruby base words makes the ruby base text ready to be independtly language tagged (<rb lang="*">), to become ARIA “styled” (e.g. <rb aria-hidden="true">) and to receive a HTML class-name (<rb class="important">) without affecting whether <ruby>, <rt> or <rp>.
  2. that the content model of ruby is changed to make it non-conforming to let <rt> occur adjacent to <rb> more than once per ruby element — for the following reasons
    • source order is important:
      • The HTML5 content model breaks with the content model of XHTML1.1. by requiring that one do
        • <ruby><rb>W</rb><rt>World</rt><rb>W</rb><rt>Wide</rt><rb>W</rb><rt>Web</rt></ruby>
      • By contrast, XHTML 1.1 requires that one do the following, which as one can see, allows words and letters to be written in source order:
        • <ruby><rbc><rb>W</rb><rb>W</rb><rb>W</rb></rbc><rtc><rt>World</rt><rt>Wide</rt><rt>Web</rt></rtc></ruby>
      • HTML5's break from the source order found in XHTML 1.1, creates problems for every parser/reader that needs to detect words more that visually.
        Problem examples: A user trying a find-in-page search in the browser for 'WWW' when the above code is used, will not locate the 'WWW' that the user can see. For the same reason, it creates problems in screen readers, in online translations services like Google Translate, in copy-and-paste and selections - and so on and so forth. It is even hard to author, since the user cannot type the letters/words that belong together in one chunck — the author has instead type one ruby base letter and then a ruby text letter/word etc, which is cumbersome and prone to error — it is comparable to a table model where the author has to work with cells on two rows simultaneously.
      • The content model of this change proposal, requires that the above example be written like this:
        • <ruby><rb>W</rb><rb>W</rb><rb>W</rb><rt>World</rt><rt>Wide</rt><rt>Web</rt></ruby>
          or this (NOTE: rtc is not made conforming, as of yet, due to legacy parser problems):
        • <ruby><rbc><rb>W</rb><rb>W</rb><rb>W</rb></rbc><rt>World</rt><rt>Wide</rt><rt>Web</rt></ruby>.
  3. that the rbc element is included as a ruby base wrapper, for the following reaasons:
    • The rbc is very helpful when styling a ruby elemnet — it becomes almost impossible to style ruby cross browser unless it is included. And yet, it is made optional, to allow authors to “type less”
  4. that the HTML5 parser is updated to handle rtc, for the following reasons:
    • currently, Gecko and Trident will auto-close any element as soon as the parser sees rp or rt. This prevents the introduction of rtc.
    • without rtc, double sided ruby is more or less impossible to achieve.



Rationale

  • The inclusion of rb addresses the problem that the alternative — exclusion of rb — encourages ad-hoc solutions with regard to marking up or styling the ruby base text.
  • The inclusion or rb allows thought-free, direct transition to/from e.g. XHTML 1.1. and HTML5. (Thought free = automatic: The author — or the authoring tool — does not have to ponder about which element to use.)
  • A dedicated wrapper element offers thought-free simplicity with regard to adding CSS, ARIA, a language tag or semantic meta data to the ruby base word — and without effects on <ruby>, <rt> or <rp>.
  • The change of the content model, addresses the need words/compounds to appear as words/compounds also in the source, in order to be compatible non-visual parsing (screen readers, find-in-page, translation services etc), which over all needs to work with text that is just as logical in the source as in the display.
  • The inclusion of rbc allows advanced ruby to be styled more simply.
  • Preparing the HTML5 parser to handle rtc, allows the rtc elemetn to be introduced in HTML6, and thereby allowing double sided ruby.

Details

  1. A set of edit instructions:
  • Inside “Text-level semantics”: Add a new section about the rb element.
  • Inside “Text-level semantics”: Add a new section about the rbc element.
  • Inside “Rendering”: Add a new note about the default CSS styling of rb (rb{display:ruby-base})
  • Inside “Rendering”: Add a new note about the default CSS styling of rbc (probably rb{display:ruby-base-container})
  • Inside “Obsolete features, delete the rb element.
  • About the HTML5 parser:
    1. With the exepction of the ruby element itself, and the rtc element, let the parser auto-close the current element (be it rb, rt, span or whatever), when the parser sees a rp or a rt element. This is almost what Geck and Webkit currently do, with the exception that they also auto-closes rtc.
  • As authoring requirements:
    1. Say that rb SHOULD be manually closed by the author, in order to accommodate legacy UAs (that do not auto-close it).
    2. Say that the rb element is optional but RECOMMENDED (for styling, ARIA, language-tagging, semantic-web purposes (RDFa and Microdata)).

Impact

Positive Effects

  • Offers simple transition from existing (simple) ruby mark-up to HTML5. E.g. simple to define cross-language 'microformats' that includes ruby if HTML5 and the the other language includes the same elements. And simple to make tools - and build on existing tools - that work in HTML5 as well as XHTML1.1.
  • Offers simple, thought-free mark-up of ruby base text: Simple to add Language tagging, ARIA attributes, RDFa/Microdata attributes etc of the ruby base text.
  • Simple, direct styling of ruby base text
  • Future readiness (for advanced ruby): The rb is necessary in advanced ruby.
  • Most authors and authoring tools will continue to use rb - no need to learn something new.
  • Instead of forbidding rb, with difficult to explain reasons as justification and yet with effects on authors and authoring tools, allowing rb instead benefit those authors, those parsers and those authoring tools that already use/implement it, thereby avoiding to needlessly bother them.
  • Authors don't have to wait for new CSS features to be invented (and deployed) before they get a CSS selector that is dedicated to styling ruby base text — and only ruby base text!
  • Change of content model assures that text is meaningful both in source and in display, and thus simplifying treatment of ruby in AT, translation services, find-in-page features, spell-checkers etc
  • Simpler styling and more efficient meta data tagging or ruby base, due to inclusion of rbc (tag one element instead of all the rb elements)
  • Preparedness for the future, due to the change in the HTML5 parser so that it doesn't auto-close the rtc element.

Negative Effects

  • rb is not well supported in all legacy HTML parsers (the Trident parser), hence authors have to be aware of the need to use helping scripts (Modernizr, HTML5shiv etc) and helping CSS in order to get good styling.
    • Counter argument: For when this causes problems, authors have the option of either dropping the rb (if it helps) and/or adding an additional wrapper, such as span
    • Counter argument: If — as in legacy IE versions — the rb renders as an empty element, then no harm happens. Effectively, it means that the ruby base text is rendered without a wrapper - equivalent to dropping the rb.
    • Counter argument: Why would the lacking support in legacy HTML parsers be any more important to consider when it comes to rb compared to other, new HTML elements?
    • Counter argument: Since there actually is some support of rb in legacy UAs (at least, it is treated as span), it could be argued that it is simpler to include rb compared to the inclusion of many completely new HTML elements in HTML5.
    • IE since IE9 supports rb natively.
  • Due to little visual feedback when rb is supported in contrast to when it is not supported, authors may think that rb works, while it in reality doesn't.
    • Counter argument: This can be said about legacy parsers with regard to many new element. And since it hasn't been held against the any other, new element in HTML5, it seems unjustified to hold it against rb.
  • Some pages that are authored accorind to the current content model, will become invalid due to this CP's requrement that no more than a single adjacent pair of rb and rt occur in the same ruby element.
    • The benefits of fixing the page is more important thant this slight annoyance.
  • Change of content model affects how one browser (Webkit) with partial support for ruby display the ruby
    • This is true. However, the source order is so important that it is worth it.
  • Use of the rbc element affects how one browser (Webkit) with partial support for ruby display the ruby
    • Counter argument: This is not true. It is the change of the content model that can cause this. The inclusion rbc instead allows authors to style the ruby so that it looks fine also in webkit.
    • Counter argument: Actually, in Trident, the <rbc> does no harm.

Conformance Classes Changes

  • It becomes conforming to to use the rb element inside ruby
  • The HTML5 parser must auto-close the rb element — and any ather element except ruby and rtc — when it sees rt or rp
  • Until UAs offers broad support auto-closing, it becomes RECOMMENDED to manually close the rb element.
  • The rbc element becomes valid
  • Only a single adjacent pair of rb and rt is conforming

Risks

  • If the author fails to manually close the rb element, then legacy UAs may place the rest of the ruby content inside the rb element, causing the mark-up to malfunction in legacy parsers.
    • Counter argument: In existing usage, the rb seems to almost always be closed.
    • Counter argument: This is a temporal problem - UAs willl update. (Currently released versions of Gecko and Webkit plus IE10 do it.)
  • Authors who do close the rb element, might think that closing the rb will guarantee that it works in legacy UAs.
    • Counter argument: Why? It is already well known that legacy versions of e.g. Firefox and IE do not handle unknown elements well, and that one must just various tricks in order to make new HTML5 elements work in legacy UAs.
  • If authors are required to use span instead, then they know that span does not get auto-closed, whereas for rb, they have to learn that while it is intended to auto-close, it does so far not get auto-closed.
    • Counter argument: Actually, this is not true since, in Gecko, Webkit and IE10, then any element ges auto-clsoed when it see rp or rt
    • Counter argument: To the degree that it is true (see above), it is a temporal problem - UAs willl update. Also: Because pre-HTML5 ruby is part of XHTML 1.1, the big bulk of legacy code do close the rb element, so it does not seem much of problem. This CP does however suggest that authors SHOULD close the rb until UAs catch up.
  • To not make the rb element obligatory (like in XHTML 1.1), creates the risk that authors omits the rb element, just because they can, and because they think there is a benefit in doing so.
    • Comment: Agreed. I lean towards making the rb element obligatory, as I don't see it as particulary healthy that one can ommit the rb element. From my perspective, allowing the rb to be omitted, is just a compromise position (and I don't rule out that my argumentation in favour of includsion rb would have been more successful if I asked it to be obligatory). There is primarily only one benefit, namely, that it fits slightly better with the fact that IE6-8 does not by default recognize this element. But this does not seem to be much of an argument since, in a HTML5 parser, any unknown element would be handled in a defined way. Thus, while <rb>word<rb> might fail to work in an un-prepped copy of IE6/IE7/IE8 (and may be in Firefox 2), it would still nevertheless work in an HTML5 parser (as well as in e.g. IE6/7/8 browser prepped with an HTML5 helper script.
    • Comment: Indeed. There would indeed be no more benefit in omitting the rb element, than it would be for the same author in dropping the html, head or body element. E.g. if the author drops the html element, then he/she also drops adding a language tag, semantic meta data and so on for the entire document. Actually, when dropping html, then the element is still auto-generated by the HTML5 parser — which means that it is readily available for scripting and styling. In contrast, when dropping the rb element, then there is no automatic generation of the element. Which means that when an author omits of the rb element, then it he also takes away the direct opportunity to apply e.g. CSS to the element. (Fortunately, however, by including the rb in HTML5, the author (or the authoring tool) has a very simple recipe for fixing such situations, though.)

References

Relevant tests

rb versus span

  • Richard Ishida's test of rb versus the alternatives, demonstrates that:
    • for (more or less) HTML5-capable parsers (Firefox 9 and Opera 11.5, then use of span does not work any better than using rb: In either case, the author, if he/she wants to use ruby, must use alternative CSS styling, due to the lack of support for the Ruby CSS module.
    • for UAs that have some kind of built-in ruby support (IE, Safari/Chrome, ) then span and rb works equally well.
    • if one adds the HTML5 shiv — <script>document.createElement("rb")</script> to Richard's tests (see demo), then rb works as good/bad as span in legacy IE version.

auto-closing of the rb element

  • Test: auto-closing test.
  • Results:
    1. Parser does not recognize <rb> as an element (even if it recognizes <ruby> and <rt>): IE5-8.
      In IE5-8, this has two effects: a) rb styling does not work; b) it might make it seem as if auto-closing does work;.
    2. Parser auto-closes any element (except ruby itself) when it sees <rt> or <rp>: Firefox, Webkit
    3. No auto-closing happens: Opera, IE9 and IE5-8 with HTML5 shiv (<script>document.createElement("rb");</script>)

Use cases for rb

  • Uses cases for rb amount to documenting that there are use cases for the addition of language tags, CSS, metadata (via Microformats, Microdata or RDFa), ARIA to ruby base text. In our view, it does not make sense to accept that it has to be documented — via use cases — that ruby base text needs langauge tagging, styling, metadata or ARIA, since these are features which are generally accepted as needed anywhere on any element. Nevertheless, we will mention some such examples:
    • Language tagging:
      • The very idea behind ruby markup is to express a translation (in the widest sense of the word) of a ruby base text in the form of a ruby (annotation) text. Often the difference between base and text is only a difference in script - thus the language is the same while the script differs. Other times, the language differs. In either case, the difference in script or language, can be expressed via language tags on either rb or rt — or on both. We can conclude that ruby mark-up (to the degree that language tagging has any relevance at all) is more frequently needed for ruby mark-up than for any other HTML construct.
      • The language is inherited from a parent element — e.g. from p or html. And since the ruby text (rt) is supposed to explain the ruby base (rb), one must conclude that it is the ruby base that most frequently will need to be language tagged. (The rt will just inherit the language tagging value from the parent element.) Without the rb element, one would have to first add a language tag on the ruby element, and then add a language tag on the rt element, in order to cancel the (inherited) effect ot setting the language on the ruby element — quite ad hoc and impractical, in our view.
    • ARIA, CSS
      • In the WebAIM forum, it was recently asked how to convey to an AT user that a Roman numeral (such as IV) stood for 4 and was not to be read as the letters I plus V. I provided an answer where one of the options was to use ruby markp. With the help of <rb aria-hidden="true">, this was the only method that worked even in the Mac OS X screen reader (VoiceOver). In the demo code, I also added styling of the rb element in the form of rb{text-decoration:underline}, to hint to sighted users as well that hovering above this text, would provide information. Thus I was able to, by default, hide the ruby text for everyone except screen readers uers — see demo.
    • CSS selectors for ruby base
      • Koji Ishii suggests treating rb just like tbody — thus, the selector should work, even if the author skips actually typing it. A good idea. But this proposal does however not make that proposal, as it would mean that one woudl have to change the HTML5 parser so that it autogenerates the element. No such change are on the horizon. Alternatively, one could add a pseudo selector in CSS - e.g. ruby:base{}. But no such selector is on the horizon either. It stands that it is necessary to be able to select the ruby base, and that simples way is to use rb.
    • metadata (Microformats, RDFa, Microdata)
      • No examples.