Bug 13398 - i18n-ISSUE-80: Default rules for the quotes property
Summary: i18n-ISSUE-80: Default rules for the quotes property
Status: NEW
Alias: None
Product: WHATWG
Classification: Unclassified
Component: HTML (show other bugs)
Version: unspecified
Hardware: PC All
: P2 normal
Target Milestone: Unsorted
Assignee: Ian 'Hixie' Hickson
QA Contact: contributor
URL:
Whiteboard: blocked awaiting response to comment ...
Keywords:
Depends on:
Blocks:
 
Reported: 2011-07-27 18:47 UTC by I18n Core WG
Modified: 2014-05-07 23:03 UTC (History)
13 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description I18n Core WG 2011-07-27 18:47:05 UTC
10.2.6 Punctuation and decorations
http://www.w3.org/TR/html5/text-level-semantics.html#the-q-element

On behalf of the i18n WG. 


"Rules setting the 'quotes' property appropriately for the locales and languages understood by the user are expected to be present."

How are the 'locales and languages understood by the user' determined? Is this related to the Accept-Language settings of the browser?

Assuming that the intent here is to provide a single default only per browser installation, rather than to provide defaults for text in various languages, we believe that this is incorrect. A quotation in a page or passage in German should by default use German quotation marks, ie. the default should depend on the language of the text.

By the way, using 'rules' like

q:lang(en) quotes { '"' '"' "'" "'"; }
q:lang(no) { quotes: "«" "»" '"' '"' }
etc.

in the default stylesheet would provide defaults for the language of the text, and not according to the user's locale and language preferences.

Obviously, a browser would not be able to provide accurate defaults for all possible languages, but we feel that they should provide for a good fist of the more commonly used ones.  After that, perhaps they would then fall back to some default, which could be decided by the browser but is likely to be that shown above for the english case. 

Also, it should be made clear that rendering of the quotation marks should depend on the language of the text surrounding the q element, not the language of the text inside the q element.
Comment 1 Michael[tm] Smith 2011-08-04 05:03:48 UTC
mass-moved component to LC1
Comment 2 Ian 'Hixie' Hickson 2011-08-11 02:36:35 UTC
Based on bug 13718, my plan is to include explicit defaults.

If you have a comprehensive list of defaults to suggest (in the form of CSS rules), then I'll use that. Otherwise I'll probably just use this:

   q { quotes: '"' '"' "'" "'"; }

...as the default.
Comment 3 Anne 2011-08-16 09:57:36 UTC
See https://bugzilla.mozilla.org/show_bug.cgi?id=16206 for the last time you tried this Ian. Personally I am with the people that suggest we give up on <q>.
Comment 4 I18n Core WG 2011-08-16 12:36:20 UTC
How about this CLDR data?
http://unicode.org/repos/cldr-tmp/trunk/diff/by_type/misc.delimiters.html

Would it make sense to suggest that browsers provide defaults based on the language of the surrounding text, and fall back to  { quotes: '"' '"' "'" "'"; } if they don't have a language-specific fallback defined?
Comment 5 Ian 'Hixie' Hickson 2011-08-31 23:59:30 UTC
Anne: I agree with what dbaron wrote in that bug — if we have a spec for it, we can use it.

(In reply to comment #4)
> How about this CLDR data?
> http://unicode.org/repos/cldr-tmp/trunk/diff/by_type/misc.delimiters.html

If you would like to use this list, please provide it in the form of CSS selectors and I'll be happy to add it to the spec.
Comment 6 Martin Dürst 2011-09-01 01:08:50 UTC
(In reply to comment #5)
> Anne: I agree with what dbaron wrote in that bug

It would be good to know which of David Baron's comments in that bug is meant;
I'm assuming https://bugzilla.mozilla.org/show_bug.cgi?id=16206#c84.

> — if we have a spec for it, we can use it.
> 
> (In reply to comment #4)
> > How about this CLDR data?
> > http://unicode.org/repos/cldr-tmp/trunk/diff/by_type/misc.delimiters.html
> 
> If you would like to use this list, please provide it in the form of CSS
> selectors and I'll be happy to add it to the spec.

I think the CLDR data should be used by reference, not by copying it.
Otherwise, we will be limited to exactly the languages that CLDR covers now.

At https://bugzilla.mozilla.org/show_bug.cgi?id=16206#c84, David Baron says:
"We certainly shouldn't "refine" the values from release to release, since that would cause even more confusion for authors."

I somewhat agree with respect to languages for which quotes are already defined. But I don't agree with respect to languages that aren't yet covered.

Even for languages for which the data is already defined, if we assume that the selection of quotes is part of styling, then the actual quotes may indeed differ from browser to browser. As an example, assume that there is a language where it's customary to use '"' for quotations with sans-serif fonts and "«" "»" with serif fonts, or some other style dependency. Then the browser default style sheet will depend on the fonts selected in the default stylesheet.

If the above example of style-dependent quotes sounds too unrealistic (as far as I know, it's just a hypothetical example), then having the default stylesheet inserting quotes around a <q> element may be unnecessary.

Overall, my position would be to have the browsers implement either of two:
a) no default quotes for the 'q' element (authors who want automatic addition can do so via their own style sheets)
b) full language-dependent automatic quote addition not limited to the CLDR data we currently have, but updated at a reasonable pace if new data gets available.

Of course the choice between a) and b) must be the same for all browsers!
Comment 7 Ian 'Hixie' Hickson 2011-09-21 20:46:44 UTC
We're going to have quotes. What quotes are used can change occasionally as the spec is updated.

I will shortly update the spec to require that there is no language-dependent styling for 'quotes', with the only quotes being ".

I am happy to make the default quotes be language-specific; if you would like this, please provide a CSS block for me to include in the specification.
Comment 8 Addison Phillips 2011-09-21 21:12:47 UTC
(In reply to comment #7)
> We're going to have quotes. What quotes are used can change occasionally as the
> spec is updated.
> 
> I will shortly update the spec to require that there is no language-dependent
> styling for 'quotes', with the only quotes being ".
> 
> I am happy to make the default quotes be language-specific; if you would like
> this, please provide a CSS block for me to include in the specification.

Hi Ian,

I am having a hard time figuring out your comment (above). Should I read this as saying "I intend to make an edit to clarify that quotes are shown and this edit will specify the ASCII quotes and no language-sensitivity, but I will change this to be language specific if you give me a CSS block showing exactly what to include instead"?

Or did you mean something else?

Would it be workable to refer to the quotes in CLDR rather than providing an exhaustive list inside HTML5? 

thanks,

Addison
Comment 9 Ian 'Hixie' Hickson 2011-09-28 23:22:47 UTC
I mean basically what you said, yes.

Implementors have indicated that the only way they're going to do this is if we provide them with a CSS block they can just copy and paste. So referencing something that isn't a CSS block is unlikely to be sufficient.
Comment 10 Edward O'Connor 2011-09-28 23:37:27 UTC
If people are interested in working on such a list, feel free to fork https://github.com/hober/mothereffingquotestyles :)
Comment 11 Martin Dürst 2011-09-29 11:16:08 UTC
(In reply to comment #9)
> Implementors have indicated that the only way they're going to do this is if we
> provide them with a CSS block they can just copy and paste. So referencing
> something that isn't a CSS block is unlikely to be sufficient.

If we can find a way, any way, that this can be updated for more and more languages, that may be okay. If it has to be a one-time shot, it's a really bad idea.


(In reply to comment #10)
> If people are interested in working on such a list, feel free to fork
> https://github.com/hober/mothereffingquotestyles :)

I had a look at it. I see at least two problems:

1) It would be better to use actual characters, with character numbers in comments or some such. Github and other tools these days shouldn't have problems with UTF-8. This would make things much easier to verify.

2) There's an essential error I think in that the quotes should be determined by the language outside the quote, not the language of the quote itself. So for example, instead of:
   q:lang(en-gb) { quotes: "\2018" "\2019" "\201C" "\201D" }
   q q:lang(en-gb) { quotes: "\201C" "\201D" }
it should be something like:
   *:lang(en-gb) q { quotes: "\2018" "\2019" "\201C" "\201D" }
   q:lang(en-gb) q { quotes: "\201C" "\201D" }
Comment 12 Addison Phillips 2011-09-29 15:12:27 UTC
(In reply to comment #9)
> I mean basically what you said, yes.
> 
> Implementors have indicated that the only way they're going to do this is if we
> provide them with a CSS block they can just copy and paste. So referencing
> something that isn't a CSS block is unlikely to be sufficient.

Thanks for clarifying. That sounds good.

I'll echo Martin's comment: allowing for future extensibility, even if done by implementers, would be most ideal. Perhaps replacing the existing paragraph in 10.3.4 with something akin to:

--
Rules for the 'quotes' property provide quote styles appropriate for many languages; these were defined from data in the current version of [CLDR]. Implementors may extend or update this list to encompass additional languages, preferably using data gleaned from updates of the CLDR.
--
Comment 13 Ian 'Hixie' Hickson 2011-09-30 17:51:39 UTC
The spec will be continuously updated on this matter. Feel free to e-mail me or file a new bug whenever you want the list updated.

Incidentally, for practical purposes we should use CSS escapes rather than literal characters, so that implementors don't have to worry about making sure their codebase is UTF-8 clean (not to mention the risk of copy and pasting content from the spec through software that tries to "help", e.g. introducing smart quotes and the like).

Anyway. Is there a style sheet for me to use here? (No rush, we can always fix this later, but without something to paste in I can't fix it!)
Comment 14 fantasai 2011-09-30 20:00:24 UTC
You want
  :lang(en) > q { quotes: '“' '”' '‘' '’'; }

I believe the second rule (q q) is unnecessary... but I could be wrong. What was it indended to accomplish? Also, the current behavior of CSS quotes is to repeat the last pair. Is that correct behavior, or should it be cycling?

Wrt tracking CLDR updates, hixie, I agree with the i18n folks that you should put that normatively in the spec. The CSS should effectively be a transform of the CLDR data, and should be defined as such so that implementations understand they should (ideally) track that data and that the CSS syntax is provided as a convenience. If the HTML spec gets out of sync, the CLDR should be the one to follow.
Comment 15 Ian 'Hixie' Hickson 2011-09-30 20:12:19 UTC
If it's a transform of the data, then provide the script that can create the transform, and I'll just plug that into the build system, the same way we do with the named character references (which are a transformation of the MathML group's work). That would be ideal. Failing that, I'm happy to regularly update a CSS sheet based on input.

A normative reference is a non-starter since implementors have said they will not take the burden of doing the transformation (understandably so).

By the way, comment 14 demonstrates why the data should be ASCII-safe explicit escapes and not raw characters. It got all corrupted here because Bugzilla doesn't send an explicit charset definition. If we can't trust ourselves to get this right, let's not rely on all the browser vendors getting it right. :-)
Comment 16 L. David Baron (Mozilla) 2011-09-30 20:54:34 UTC
I think a vague normative reference is a non-starter.  A normative reference that says exactly where to get the data and what CSS rules to construct from it would be OK with me, assuming the performance effects of those CSS rules are acceptable.
Comment 17 Ian 'Hixie' Hickson 2011-10-03 18:48:24 UTC
Makes sense.

I had a look at the Unicode data and couldn't work out the right way to generate the language codes for the selectors.

If someone can provide either a CSS sheet, or a script that generates the CSS sheet, or a clear description of what such a script would have to do (at the level of how to process each XML file and what selectors and rules to output), I can update the spec accordingly.
Comment 18 Ian 'Hixie' Hickson 2011-10-24 18:49:31 UTC
EDITOR'S RESPONSE: This is an Editor's Response to your comment. If you are satisfied with this response, please change the state of this bug to CLOSED. If you have additional information and would like the editor to reconsider, please reopen this bug. If you would like to escalate the issue to the full HTML Working Group, please add the TrackerRequest keyword to this bug, and suggest title and text for the tracker issue; or you may create a tracker issue yourself, if you are able to do so. For more details, see this document:
   http://dev.w3.org/html5/decision-policy/decision-policy.html

Status: Did Not Understand Request
Change Description: no spec change
Rationale:

Please reopen this bug once there is either a CSS sheet, or a script that generates the CSS sheet, or a clear description of what such a script would have to do (at the level of how to process each XML file and what selectors and rules to output), so that I have something actionable to add to the spec.
Comment 19 Ian 'Hixie' Hickson 2011-11-01 19:34:17 UTC
Apparently the way to generate the language codes is just to take the filename and s/_/-/g. I'll try that.
Comment 20 Ian 'Hixie' Hickson 2011-11-02 16:59:59 UTC
EDITOR'S RESPONSE: This is an Editor's Response to your comment. If you are satisfied with this response, please change the state of this bug to CLOSED. If you have additional information and would like the editor to reconsider, please reopen this bug. If you would like to escalate the issue to the full HTML Working Group, please add the TrackerRequest keyword to this bug, and suggest title and text for the tracker issue; or you may create a tracker issue yourself, if you are able to do so. For more details, see this document:
   http://dev.w3.org/html5/decision-policy/decision-policy.html

Status: Accepted
Change Description: see diff given below
Rationale: Concurred with reporter's comments.

You can see it in the spec here:
http://www.whatwg.org/specs/web-apps/current-work/#quotes
Comment 21 contributor 2011-11-02 17:01:27 UTC
Checked in as WHATWG revision r6812.
Check-in comment: Language-specific 'quote' CSS rules generated from the CLDR.
http://html5.org/tools/web-apps-tracker?from=6811&to=6812
Comment 22 i18n IG 2014-02-21 16:40:45 UTC
I believe the syntax of the styling at http://www.whatwg.org/specs/web-apps/current-work/#quotes is incorrect, because my understanding is that when you have an embedded quotation in a different language the quotation marks should be determined by the surrounding language, not that of the quotation.

I therefore think we need to change:

:root:lang(x), :not(:lang(x)) > :lang(x) { quotes: '\00ab' '\00bb' '\2039' '\203a' } /* « » ‹ › */

to:

:lang(x)  { quotes: '\00ab' '\00bb' '\2039' '\203a' } /* « » ‹ › */
:lang(x) > :not(:lang(x))  { quotes: '\00ab' '\00bb' } /* « »  */


See a test file that shows the various outcomes at
http://www.w3.org/International/tests/test-incubator/the-q-element/html5-q-styling.html
Comment 23 Ian 'Hixie' Hickson 2014-02-21 17:36:32 UTC
The problem with that is that it would set 'quotes' on every element, and we only want to set 'quotes' when the quotes change, letting inheritance do the rest.

But maybe we could do something like this?:

   :root:lang(x),
   :not(:lang(x)) > :lang(x),
   :lang(x) > q:not(:lang(x))::before,
   :lang(x) > q:not(:lang(x))::after { quotes: ... }
Comment 24 i18n IG 2014-02-25 17:14:26 UTC
There are still two problems with that:

(1) :root means that q elements don't get properly quoted if there's no lang on the html, or if the q appears in part of the doc that happens to be in another language

(2) the quoting level is not quite right in mixed language embeddings, ie. you get

un «two 'drei ‚vier‘ fünf' six» sept

instead of 

un «two "drei „vier“ fünf" six» sept

I tried various alternatives, but my brain hurts and I still haven't found a way to get around that.

Any other suggestions?
Comment 25 Ian 'Hixie' Hickson 2014-02-25 17:49:30 UTC
(In reply to i18n IG from comment #24)
> There are still two problems with that:
> 
> (1) :root means that q elements don't get properly quoted if there's no lang
> on the html, or if the q appears in part of the doc that happens to be in
> another language

I think you misunderstand the selector. Commas are "or" in Selectors, so the first line of my proposed selector is just about settings 'quotes' on the first element in the tree, and it doesn't affect the rest of the selectors. In particular, it doesn't require lang="" (it could come from out-of-band data), and doesn't prevent the other rules from applying when the document has multiple languages (indeed the second line is exclusively for handling that specific case, regardless of nesting level).


> (2) the quoting level is not quite right in mixed language embeddings, ie.
> you get
> 
> un «two 'drei ‚vier‘ fünf' six» sept
> 
> instead of 
> 
> un «two "drei „vier“ fünf" six» sept
> 
> I tried various alternatives, but my brain hurts and I still haven't found a
> way to get around that.

That seems separate from the issue of seeing the 'quotes' property. That should be resolved at the CSS level.
Comment 26 Ian 'Hixie' Hickson 2014-02-27 20:56:32 UTC
(Let me know if I'm wrong in comment 25; my plan otherwise is to apply comment 23 to the spec in the near future. As far as the issue of nested languages needing to reset the quote count goes, I think CSS needs a property to trigger the depth resetting. Trying to fake it using 'quotes' is not scalable.)
Comment 27 Richard Ishida 2014-02-27 21:13:42 UTC
Point 1 in comment 24 was the result of a sudden bout of stupidity while I was typing the comment in bugzilla. Please ignore.

Wrt your suggestion in comment 25, it's certainly better in that it uses the right quotes 'linguistically', but I think there's still an issue about how to get the primary quotes to appear at the start of the quote in a new language. 

I'd like to get wider input on two questions, and I'm thinking to write to the typography experts in the digital publishing IG and to the CSS list:

a. am I right to suppose that the quote counter should be reset when the language changes? (ie. un «two "drei... instead of un «two 'drei...)  

b. if so, is it actually possible to do this? ie. can the CSS quotes mechanism be made to do it? Addison pointed out today that this could be messed up by using more language tags than needed, or by small changes to subtags, even though the language hasn't changed. 

I suspect it might be possible to express what's needed if quotes has only one pair of values and we use more selectors to identify the nesting, but I also suspect that that will create rather unwieldy code.

Should we wait at least until I get an answer to question a?
Comment 28 Ian 'Hixie' Hickson 2014-05-07 23:03:53 UTC
Tab, dbaron, any other CSS people: any input on comment 27?