20115 – Double-sided ruby

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 20115 - Double-sided ruby

Summary: Double-sided ruby

Status:	RESOLVED WORKSFORME

Alias:	None

Product:	WHATWG
Classification:	Unclassified
Component:	HTML (show other bugs)
Version:	unspecified
Hardware:	Other other

Importance:	P3 normal
Target Milestone:	Unsorted
Assignee:	Ian 'Hixie' Hickson
QA Contact:	contributor

URL:	http://www.whatwg.org/specs/web-apps/...
Whiteboard:
Keywords:

Depends on:
Blocks:	21040
	Show dependency tree / graph

Reported:	2012-11-28 04:40 UTC by Ian 'Hixie' Hickson
Modified:	2013-02-18 14:57 UTC (History)
CC List:	5 users (show)

See Also:

Attachments

Description Ian 'Hixie' Hickson 2012-11-28 04:40:50 UTC

From: http://lists.w3.org/Archives/Public/public-whatwg-archive/2012Nov/0370.html


When you want to create double-sided ruby, with the ruby text on both
sides of base text, the current HTML model posits two separate and
fairly different markup models.

In the first, when the group boundaries for both ruby text runs are
the same, it allows you to have two <rt>s following an <rb>, with the
obvious meaning.

In the second, when the group boundaries do *not* line up (in
particular, for the common case where one line of ruby is
per-character and the other is for the whole group, such as with a
pinyin and English translation), it requires you to nest two <ruby>
elements, with the inner one supplying the per-character annotations
and the outer supplying the whole-group ones.

Having to learn and use two different markup patterns for two nearly
identical use-cases is sub-optimal for authors.  It would be best if
they could just learn one model that works for both.

On the implementation side, this also requires two different layout
models for essentially the exact same thing.  This is unnecessarily
complicated; again, one simple way to get both would be preferred.

This is easy to address.  Add an <rtc> element (name taken from the
XHTML Ruby module), which is used for the second line of text.  You
can fill an <rtc> with <rt> elements, in which case they match up
index-wise with the preceding run of <rb> elements.  The last <rt>
(or, if no <rt>s were given at all, the naked text that was implicitly
wrapped in an <rt>) automatically spans the remaining bases in the
preceding run.

This makes both cases trivial.  If both runs of ruby are
per-character, you can just write:

<ruby><rb>FOO<rb>BAR<rt>baz1<rt>baz2<rtc><rt>qux1<rt>qux2</ruby>

Or, in the pure column-based model:

<ruby>FOO<rt>baz1<rtc>qux1<rb>BAR<rt>baz2<rtc>qux2</ruby>

Alternately, if the second line of ruby text spans the entire group,
that's also trivial, and very simlar:

<ruby><rb>FOO<rb>BAR<rt>baz1<rt>baz2<rtc>qux1 qux2</ruby>

As you can see, the only difference is that the <rtc> contains a
single (implicit) <rt>, rather than two <rt>s.  It seems plainly
obviously that this is simpler for authors; it's also simpler for
implementors, because we don't have to infer that we should be
formatting something as double-ruby from the presence of nested <ruby>
elements.

Comment 1 Ian 'Hixie' Hickson 2012-12-07 22:46:08 UTC

There's really only one model here -- the nested <ruby> case -- it's just that in the case of two annotations covering the same region, there's a kind of shorthand.


> This is easy to address.

I disagree that the proposal is simpler than what we have now. In fact it seems several orders of magnitude more confusing.

Comment 2 Tab Atkins Jr. 2012-12-07 23:10:26 UTC

(In reply to comment #1)
> There's really only one model here -- the nested <ruby> case -- it's just
> that in the case of two annotations covering the same region, there's a kind
> of shorthand.

As the shorter version does not desugar into the longer version, it's not a shorthand in any meaningful technical sense.

In particular, it means that CSS has to handle two different box structures while producing the same display, which requires some properties to work in confusing ways.  Forcing authors to understand two different box structures for solving basically identical problems is bad if we can possibly avoid it.  It also prevents simple things like styling every <ruby> element in a particular way without 

>> This is easy to address.
> 
> I disagree that the proposal is simpler than what we have now. In fact it
> seems several orders of magnitude more confusing.

While I respect your right to be confused, I don't see how you can possibly justify it as "several order of magnitude" more confusing.

For simple runs (corresponding to your "shorthand"), the markup is literally the *exact* same, except the second ruby text uses an <rtc> tagname rather than <rt>.  It's not possible to honestly assert that this is *any* more complicated, let alone "orders of magnitude".

For longer runs (corresponding to your stacked ruby case), the markup is directly analogous to tables, and is demonstrably of similar length and complexity.  Here's a reproduction of the code in my example:

<ruby><rb>FOO<rb>BAR<rt>baz1<rt>baz2<rtc>qux1 qux2</ruby>

And here's the same thing in the spec's current markup style:

<ruby><ruby><rb>FOO<rt>baz1<rb>BAR<rt>baz2</ruby><rt>qux1 qux2</ruby>

If we indent each to make them easier to read, we get:

<ruby>
  <rb>FOO<rb>BAR
  <rt>baz1<rt>baz2
  <rtc>qux1 qux2
</ruby>

<ruby>
  <ruby>
    <rb>FOO<rt>baz1
    <rb>BAR<rt>baz2
  </ruby>
    <rt>qux1 qux2
</ruby>

In both cases, the two seem of similar complexity, and I believe it's reasonable to argue that the second one (the spec's current model) is actually marginally *more* complex.  Even if one believes the opposite, I don't think it's possible to honestly assert that it's "orders of magnitude" more complex.

Comment 3 Ian 'Hixie' Hickson 2012-12-08 00:05:07 UTC

Re complexity, I think the more elements in a solution space, the more complex it is. The idea of nesting <ruby>s is straightforward — it's just a logical application of the existing model, but with an extra iteration. Anyone who understands how simple <ruby> works can understand how the more complex two-sided nested <ruby> case works, there's nothing new. A new element, on the other hand, is a new element, and therefore requires that the author learn something. That's where the order of magnitude comes in.

I don't think the CSS model being more complicated is a big deal here, since really this doesn't need to be supported in CSS, we could just hardcode it the way e.g. <fieldset> is done. Obviously supporting it in CSS would be _better_, but if that requires a slightly more complicated model, that seems fine to me.

But that's even assuming that it requires a more complicated model, which isn't clear to me at all. The model could just be a reflection of the HTML model:

ruby { display: ruby }
rt { display: ruby-text }
rp { display: none }

...and have the resulting boxes be constructed the same way that the HTML semantics are:

- inline content inside a 'ruby' box forms the base
- a 'ruby' box inside a 'ruby' box is transparent
- a 'ruby-text' inside a 'ruby' is placed on top of the corresponding part of
the base, unless there are more annotations on that side than the other side
covering this part of the base, in which case put it on the bottom instead.

Then there's some additional properties you'd need to do the alignment stuff, but that's mostly just text-level stuff, not box-level.

I don't really see the problem here.

Comment 4 Tab Atkins Jr. 2012-12-08 00:21:59 UTC

(In reply to comment #3)
> Re complexity, I think the more elements in a solution space, the more
> complex it is. The idea of nesting <ruby>s is straightforward — it's just a
> logical application of the existing model, but with an extra iteration.
> Anyone who understands how simple <ruby> works can understand how the more
> complex two-sided nested <ruby> case works, there's nothing new. A new
> element, on the other hand, is a new element, and therefore requires that
> the author learn something. That's where the order of magnitude comes in.

You're asserting that an extra element (<rtc> in this case) makes the feature more complicated, but discounting the fact that the spec's current situation has an extra *markup pattern*, which is at least as complex.

I mean, if people have double-sided ruby that is amenable to per-base annotations (no spanning stuff), there's no way they'll use nested <ruby>s.  They'll *definitely* use the shorthand with sibling <rt>s, because that's easier to read and write.  But then, if they ever want spanning ruby on one side, they've got to remember that it's done with a different markup model.

In my suggestion, the simple case is equally simple (rather than <rb><rt><rt>, you do <rb><rt><rtc>), and the spanning case uses the same model and the same markup structure.


> I don't think the CSS model being more complicated is a big deal here, since
> really this doesn't need to be supported in CSS, we could just hardcode it
> the way e.g. <fieldset> is done. Obviously supporting it in CSS would be
> _better_, but if that requires a slightly more complicated model, that seems
> fine to me.

It does need to be supported in CSS.  The box model needs to integrate into inline layout, which is much more significant than <fieldset>'s magical behavior, and there's legitimate use-cases for varying the presentation of the ruby based on context (inline or not, jukugo vs simple)

> But that's even assuming that it requires a more complicated model, which
> isn't clear to me at all. The model could just be a reflection of the HTML
> model:
> 
>   ruby { display: ruby }
>   rt { display: ruby-text }
>   rp { display: none }
> 
> ...and have the resulting boxes be constructed the same way that the HTML
> semantics are:
> 
> - inline content inside a 'ruby' box forms the base
> - a 'ruby' box inside a 'ruby' box is transparent
> - a 'ruby-text' inside a 'ruby' is placed on top of the corresponding part
> of 
>   the base, unless there are more annotations on that side than the other
> side
>   covering this part of the base, in which case put it on the bottom instead.
> 
> Then there's some additional properties you'd need to do the alignment
> stuff, but that's mostly just text-level stuff, not box-level.
> 
> I don't really see the problem here.

The model I'm suggesting is:

  ruby { display: ruby }
  rt { display: ruby-text; }
  rtc { display: ruby-text; ruby-side: below; }
  rp { display: none }

and then the box construction be:

- inline content inside a 'ruby' box forms the base
- a 'ruby-text' inside a 'ruby' is placed on the side of the base specified by 'ruby-side'.

Notice how it's identical to yours, except with one less box-construction rule and one more style rule, and the style rule just works on a property that is exposed *anyway* for stylistic control even if we only supported single-sided ruby.


(My descriptions here actually assume a slightly different model than what I posted in the OP; I'm assuming that <rtc> is just an alternate <rt>, not a container for further <rt>s.  Either way works, and they're approximately equivalent in complexity.  The OP matches up with the older XHTML Ruby Module.)

Comment 5 Ian 'Hixie' Hickson 2012-12-08 21:41:24 UTC

I think we understand each other completely, and have the same facts at hand. I think we just disagree on what is simple and what is not.

Comment 6 Ian 'Hixie' Hickson 2013-02-12 23:31:28 UTC

(In reply to comment #4)
> 
> You're asserting that an extra element (<rtc> in this case) makes the
> feature more complicated, but discounting the fact that the spec's current
> situation has an extra *markup pattern*, which is at least as complex.

I don't think it's a new pattern. Nesting is a pattern HTML uses throughout.

If it was a new pattern, then a new element would similarly be a new pattern. So by definition a new element would be at least as complex.

However, a new element alone is new complexity beyond the complexity of patterns. Patterns given a set of elements and their rules can be explored and discovered. There's no way to explore your way to a new element. You just have to know it exists. This, pretty much by definition IMHO, is why new elements are an extra level of complexity than new elements.


> I mean, if people have double-sided ruby that is amenable to per-base
> annotations (no spanning stuff), there's no way they'll use nested <ruby>s. 
> They'll *definitely* use the shorthand with sibling <rt>s, because that's
> easier to read and write.  But then, if they ever want spanning ruby on one
> side, they've got to remember that it's done with a different markup model.

Are you arguing that the simplest case should require more complex markup just so that the rarer two-sided spanning case is a logical extension of the same pattern? If so, I disagree even more. The simplest cases should use the simplest markup, even if that means that the more complex cases have more complex markup than strictly necessary. In this particular instance, we can have both the simple case with simple markup _and_ the more complex case with still relatively simple markup.


> In my suggestion, the simple case is equally simple (rather than
> <rb><rt><rt>, you do <rb><rt><rtc>)

IMHO that's an order of magnitude more complicated, not equally simple. It requires knowing about a whole new element.


> Notice how it's identical to yours, except with one less box-construction
> rule and one more style rule, and the style rule just works on a property
> that is exposed *anyway* for stylistic control even if we only supported
> single-sided ruby.

I don't see any value in optimising for simplifying the CSS here. Ruby CSS is only rarely going to be actually authored. Most of the time it'll just be in the UA style sheet and the UA rendering code. Far better to put complexity there, and pay the cost up-front in a few implementations, than force all authors of ruby to deal with it.


In conclusion, I disagree with the premise of this bug. The use case in question are already simple to author, and the proposals are not simpler.