TPAC 2022: Internationalization Working Group

Meeting minutes

Meeting Info: https://www.w3.org/events/meetings/121afb09-553e-4b68-854d-2ba64111c34b#agenda

FIND US IN 'FINBACK', 3rd Floor (same as registration but in the far corner)

Use this link: https://us02web.zoom.us/j/85856632124?pwd=TzdnYzZTbUZNTkNGLzBkMG1rbDdEdz09

Introductions

atsushi: JL-TF is preparing two documents, one is simple ruby, one is ruby-t2s-req

r12a: Richard Ishida

xfq: Fuqiao Xue

Bert: Bert Bos

florian: Florian Rivoal

<fremy> François Remy, Invited Expert in the CSS WG

<r12a> Paul Libbrecht, MathML WG

<florian> Florian Rivoal, Invited Expert, CSS WG, i18n WG, Advisory Board

Greg Whitworth

atsushi: team contact for i18n, timed text, immersive web

<r12a> Elika Etemad, CSS, i18n, ex-AB

CSS backlog

https://github.com/w3c/csswg-drafts/issues/5421

florian: we need someone from Apple to discuss this effectively

r12a: maybe there's one or even two other issues that need to be read in conjunction with this one

https://github.com/w3c/csswg-drafts/issues?q=is%3Aissue+is%3Aopen+label%3A%22Agenda%2B+TPAC%22+label%3Ai18n-tracker

https://github.com/w3c/csswg-drafts/issues/6848

addison: Backslash & Yen sign behavior

fantasai: last time I looked at the issue there's no good solution for it

addison: this is the famous issue that existed forever
… probably we need to have someone from WebKit to discuss it

https://github.com/w3c/csswg-drafts/issues/4606

addison: Kaiti & cursive

florian: it's probably good to discuss this in the joint session
… we have some CSS people here, but we're missing some

https://github.com/w3c/csswg-drafts/issues/6730

florian: we have specified in css text L4
… a pair of properties
… if you have wbr in the markup
… @@ which is typically used in titles
… this is also useful because allowing line breaks
… especially in children's books or for people with dyslexia
… for line breaking you can use the usual
… here it's trying to tackle the same problem differently
… whether we reuse the existing machinery for line breaking or whether we make a new one
… do we just have a giant pile of AI and it figures out where to put the breaks on its own
… or do we need to be more strict on this and say it's Chinese
… from an author's point of view
… it's going to be important to know what it's going to do
… you need to be able to ask not just about line breaking (with @supports), but line breaking in Japanese
… the property we have in CSS, specify which language you want to line break
… let the browser figure it out
… that's the general two different approaches to try and tackle the same problem
… CSS line breaking properties is already extraordinary complicated with so many properties interact with each other
… I'm not excited about adding one magic switch that says ignore everything and do new line breaking

addison: I hear a lot from Japanese designers
… a lot of unfortunate line breaks
… do you think it's related to that?

addison: I'm trying to understand what people think the problem is

florian: I made a talk about many line breaking things, including this one

<florian> https://florian.rivoal.net/talks/line-breaking/#ja-titles

florian: I don't think I can take over and project, but I just dropped a link on IRC ^
… if you press space and shift+space repeatedly to move forward

[florian shows the slides]

florian: we have existing properties that lets you switch two different modes

addison: you can mark all the boundaries

florian: if you mark all the boundaries, it's just like English
… as Richard mentions, there is more than one way to do this
… there's varying opinions
… the new approach is don't add wbr, ignore the line break properties in CSS
… I'm concerned about the inability to specify which language it is

Francois: why not 'word-break: avoid'?

fremy: @@
… if you can, fallback to 'word-break: normal'
… if it doesn't work, it's the normal behaviour

florian: there's word-boundary-detection and word-boundary-expansion
… word-spacing is for where there is already a space and makes it bigger
… this is for inserting a space

florian: we could add a second keyword

<myles> hello

florian: for languages like Thai, word-boundary-detection has three values

fremy: interesting to see if anybody cares about this

r12a: they do
… we're talking about the language Thai
… there are many languages using the Thai script
… like Northern Thai

florian: you could switch out auto

addison: if you don't know Northern Thai, don't use Thai
… with proper language tagging

florian: not language for the context, but language for the algorithm
… maybe browsers don't know how to do line breaking for Cantonese, but they know how to do line breaking for Chinese

<r12a> (languages using the Thai scipt: http://r12a.github.io/scripts/thai/index.html#languages)

florian: there's something dealing with the normalization of languages

r12a: we have another issue about that topic as well

<Bert> (Example of an unfortunate line break in English: ‘Hi, My daughter had an accident and now we need body parts to fix her / car’, from https://freediculous.blogspot.com/2006/02/unfortunate-line-break.html )

r12a: it is a little difficult to hear you with your masks on

florian: combination of word-boundary-detection and word-break: keep-all
… go to look at the content
… when you language tag things, if you language tag your content properly, the browser has exact algorithm for this, it will do the right thing, otherwise it will fallbak to normal

fantasai: I would not classify that as normal vs strict
… kinsoku rules are independent

florian: at some point, if you're extremely picky about how to line break, you can go and add wbr

fantasai: if you do word-based breaking and suddenly switch to phrase-based line breaking

Bert: there's other thing you want, like length of a line

florian: I don't want a complete new thing, a new magic mode that completely ignores the line breaking properties
… I think I got useful ideas

<fantasai> [current discussion is adding 'words' and 'phrases' keywords or something to 'word-break' property]

<fantasai> https://github.com/w3c/csswg-drafts/issues?q=is%3Aopen+is%3Aissue+label%3A%22Agenda%2B+TPAC%22+label%3Ai18n-tracker

<fantasai> https://github.com/w3c/csswg-drafts/issues/5995

r12a: you started by saying you wanted to talk about the needs-resolution issues
… but these are tracker issues, addison

<r12a> https://w3c.github.io/i18n-activity/reviews/#

r12a: we don't have a single CSS label

<r12a> filter on needs-resolution

r12a: if you go here ^ and filter on needs-resolution

<fantasai> https://github.com/w3c/csswg-drafts/issues?q=is%3Aopen+is%3Aissue+label%3Ai18n-needs-resolution+

https://github.com/w3c/csswg-drafts/issues?q=is%3Aopen+is%3Aissue++label%3Ai18n-needs-resolution

<fantasai> https://github.com/w3c/csswg-drafts/issues/771#issuecomment-1182339573

fantasai: proposal is to get the WG to resolve it

<fantasai> if i18n agrees with the proposal, I'll get CSSWG to resolve on it

addison: atsushi and r12a seem to agree with fantasai's comment

fantasai: should we do this to handle justification for ruby annotations?

r12a: what we discovered it's more likely the Latin text is centered

<fantasai> https://github.com/w3c/csswg-drafts/issues/5995

addison: doesn't sound controversial

fantasai: Should auto-hide match use NFKC or other normalization?

addison: NFKC is usually not a good idea
… there's a lot of things, it's kind of an uncontrolled @@

r12a: I think it's NFK mapping

addison: I suppose we need to make some research

fantasai: if it's too aggressive we can do some custom normalization

fantasai: I think it's legitimate using different representations

r12a: would not be good idea to normalize stuff that people have typed
… automatically annotate your Japanese or Chinese
… e.g., you could have katanaka, and the kuten marks decomposed
… if it's a different kanji character, you probably don't want to unify them in any way
… if it's real normalization, maybe it's useful

fantasai: we should probably do NFC for auto-hiding

<fantasai> https://www.w3.org/TR/css-ruby-1/#hiding

r12a: if it matches, it removes the annotation

addison: the comments are all pointing to things like whitespace normalization
… possibly normalize inernal whitespace
… two space become one

r12a: if you inline, you're not removing anything from view
… we're talking about real edge cases here

florian: Xian and Xi'an example in Chinese

<David-Clarke> Should half-width kana match full-width in this case?

fantasai: I feel that is a different issue

<fantasai> * whitespace normalization

<fantasai> * NFC normalization

<fantasai> * East Asian Width folding

florian: you're using half-width katakana because it's tiny
… [explains the half-width katakana example]

fantasai: whitespacing is sometimes accidentally introduced
… you have trailing/leading whitespace
… I think what we should do is whitespace and NFC normalization for auto-hiding

addison: sounds like a reasonable starting point
… need to think about the edge cases

florian: I suspect if we start with NFC, it's safe
… if it's rare enough edge case, we probably shouldn't do anything by default

addison: choose your code point carefully

<fantasai> Proposal for normalization of base and annotation text before auto-hiding:

<fantasai> - Use NFC normalization (not NFKC)

<fantasai> - Trim white space

<fantasai> Anything else, authors should adjust manually using `visibility: collapse`.

addison: it's not asking people to store or something like that

fantasai: my suggestion going forward is to ask the WG to see how they feel with NFC matching

r12a: you're not actually displaying diffreent things, you're just matching
… if you got 2 or 3 spaces in between two words
… it's not really relevant here

addison: we've just done discussing wbr

<fantasai> Commentary on why we have the current spec text https://github.com/w3c/csswg-drafts/commit/0c972dc6d3a3bd34ee9ce63bfd5babc55f0afb14

Ruby markup status

florian: extremely long discussion to try and make it possible to write the ruby markup
… a pull request against the HTML spec
… as soon as we find time to actually do it we should be very quick to make a FPWD
… we actually have two impls
… firefox and amazon kindle

~* Break *~

MathML

polx: MathWG is working on v4 of MathML
… MathML is XML format for writing math notations
… v4 the biggest novelty is trying to get speakout to work
… so that a11y tools can read math out loud
… MathML is known to sort-of work on that, but we want this to become proper
… current development means adding an attr, intent, that describes how parts of tree will be spoken out
… should be combined with default knowledge of how to speak things, which is currently the fuzzy part of the spec

addison: intent is structured data?

fremy: is it fixed options or freetext?

polx: freetext, but has some placeholder that allow you to delegate the rest of the speaking to inside

fremy: so templates?

polx: template language being developed, part of more unclear part

florian: It's freetext in a human language?

polx: Yes, that's why i18n aspect is interesting
… I believe MathML lives in lang-tagged trees
… and I think the voice that is used to speak this, depends on user
… not sure if i18n has special concerns?

addison: You're touching on some hot buttons
… one is that putting natural language text into an attribute makes it hard to localize / translate
… can't be lang-tagged

polx: it's mono-language, so whole subtree of language
… there are alternative representations but

addison: common thing to want to do, if you want to localize something you have the intent ... can have multiple ones with different lang tags
… can localize sub-parts

polx: would translate the whole subtree

addison: also case that there are things like structure of natural language is not, you can just add words together to make senstence
… so if you structure things that way, then wehn you put into another language, then you don't have attributes in correct order
… need to rearrange to make it sensible

polx: There is a dynamic to how intents are combined
… pull things out and make one single sentence
… makes it non-navigatable, a11y tools like to navigate sub-elements
… but [missed]

addison: I'm not an a11y expert, but usally what you want is you want a stream of words
… that you're feeding to the TTS engine
… might feed language tags to get it to pronounce correctly

polx: The whole world around it would be Russian, expectedly

addison: what I'm saying is, there's a stream of text that would go to processor that will read it out
… your problem is that to generate the stream of text
… you're providing a way for ppl to mark up their math with the content such that it generates that string of text
… and if you only ever had a document in one language at a time, that would be maybe possible to do
… but different languages have different requirements
… you'd have to reformulate your content to be in a different language
… e.g. Japanese has very different word order than English
… so you would need to set it up so that stream of text would be in the taret language

florian: If I've understood correct, idea is not that each subtree has a piece of text that gets added together
… but idea of having a language tmeplate thing
… can have subtrees invoke the grammar in the right way
… so concatenation order wouldn't be a problem

polx: Sometimes can't, so in "...", you'd have different needs to put things on the parent alone
… because you cannot use the templates in a reasonable manner

addison: Harder than it loos, bcause you have agreement issues, e.g. don't have just zero/one/plural
… and math of course has lots of numbers

<r12a> it would be very useful to see an example !

addison: interesting project at Unicode, next generation XXX format, to describe localizable structures
… for inserting runtime formatted strings
… called the Message Format WG of CLDR

<florian> fantasai: have you heard of l20n project at Mozilla?

<florian> fantasai: it was a templating system that Mozilla was working on maybe a decade ago

<florian> fantasai: that was to deal with agreement or inflexions and other grammatical things, in order to deal with these in the Mozilla UI

addison: I think that evolved into Fluent, which is a format that does that
… other groups doing similar things
… and all those groups working on this message formatl also
… to build a system

polx: Problem with math is that there's extremely big variation on abstraction
… predicting contents, can depend on resolution you might not come t
… speaking in an abstract way

addison: I have illustrations of why doesn't work but

polx: Send it around, it would be useful

addison: My first reaction is, I think I understand what you want to do, super common to want to build a templating language
… trick is it's hard to do well from i18n pov, build it for one language, and have to rewrite for other languages
… there are better mechanisms to support doing these things
… I think it would be a good idea for us to look at your proposals and help make connections early on

<Bert> Example of ‘intent’ in MathML4

addison: from a very high level your description sounds like it would be problematic

florian: another thing to mention, since you put text in attributes, is limiting because you can't have markup
… since trying to display text, often need some extra markup
… if speaking rather than displaying, could be different
… but also have other things in CSS, that supposed to let you style how things are spoken

fremy: feedback I got is ppl don't want that

florian: if you stick things into an attr, you can't extend it
… if you can use regular elements, that opens up more possibilities
… maybe more than you want, but won't run into problem of less than you need

polx: There's an element in MathML called <semantic> for alternate representations, e.g. LaTeX representation alongside MathML
… these things are all there, but known to be too complex to be of use
… hard to make it simpler, honestly
… because i nthe end what you want is parallel trees
… and need to hook them up with IDs, and it works, but it's art

fremy: I think what you're trying to do, it seems you're trying to create a text representation for elements in the a11y tree
… very close to concept of aria-label and aria-description

polx: it is

fremy: why not use the existing system that they use?
… they already have this concept
… if you use aria-labeled-by, can have a list of things

polx: There are guys who are in the aria groups, and aria-label is considered part of this scenario, but different impl possibilities
… offer in ways that are independent
… not sufficient for our formatting

fremy: I would like to understand why it's insufficient, because it would help us understand what needs to be worked on

r12a: This discussion would be a lot easier if we had an example

r12a: The other possibility occures to me, you have a templating language which creates something in English and you translate it to other languages
… rather than trying to have a templating language that serves every language

polx: What do you mean my translating?

r12a: As I understand it, you're coming out with a sentence that sayse "The third root of 64"
… and that represents the formula that will appear on the screen

polx: right

r12a: you've got all those bits in that formula which you can assign words to, and then you have to understand relationships among them and how to create syntax for that
… then need to figure out agremeents e.g. pluralization
… and do that for every language
… but another possibility is that you generate a string in English
… then only have to build all that complex stuff in English, and then you use translation mechanisms

polx: This is a support aspect. You could do that, and you could do that at the authoring level or in the browsers
… but at some point you want control over that, and this is the space we're creating with the intent
… we want author to control how is spoken

fremy: also translation isn't cheap
… can't run it on client side every time you have a math equation, it doesn't scale

addison: if you have a true machine-translation engine, then it's not cheap to create but maybe can do that

fremy: I work in machine learning, and machine translation is multiple gigabytes in memory
… very few programs translate things correctly on a computer
… that's why they use servers
… small machine translation is very low quality, help if you are stuck without internet
… but not something that can be relied on

polx: Sometimes can do wonders with automatic translation, and can help author
… but whether author wants to render everything into a string, and then get that translated, and then get it checked by a math expert is one thing
… [missed]
… as soon as formula becomes really big, becomes essential

<fremy> fantasai: I have concerns about having natural language in attributes

<addison> fantasai: some concern about natural language in an attribute, because we often markup

<fremy> fantasai: not all accessibility engines use a speech engine

<fremy> fantasai: sometimes braille output for example

polx: Braille has a special math pattern
… One Hungarian guy I forget has a system for this
… I don't remember
… but I know that many ppl are feeling that this standard for Braille math is limited, but it is what everyone uses

addison: Also don't get wrapped up in saying it's just a11y, lots of documents are read alound these days
… so general purpose TTS is more prevalent than it has been

<Bert> Math in Braille often uses Nemeth Braille.

addison: so is it good enough to serve a11y audiences? Maybe, and that's where tech has been driven from historically
… but it is expanding quite a bit

polx: Yes, all wondering about listening to math in the car while driving

addison: "Alexa, read this paper"
… that's when you have some piece of MathML embedded, and needs to become...

fremy: aspect to keep in mind, MathML has 2 standards, and the one that is used is a presentational format
… it says how it looks on the screen, but same notation can be used for multiple things
… that's why you need intent
… need this on a letter can mean multiple things, that's part of the intent aspect right?

polx: you can use MathML Content to do better, but it's too expensive
… math professionals are more comfortable thinking about how to write things rather than how you mean them

fremy: Content is intended to be a middle ground, still describing presentation but with more info
… I think it does make sense to me, looking at the example
… one thing that maybe I am wondering is removing the idea of the string representation
… I would argue that this is not a good idea to put in intent
… I would limit it to things that can be understood
… If you want to express something outside of intent, should use aria-label
… for example in the spec you have x power to the ?
… suppose you want to read as position
… then this should be done at the aria-label level
… so you have the x arg, you have x aria-label = positoin
… and then you can compose the stentence
… but I would refrain from using intent to scope things outside the scope of intent
… I think it misses the bar
… because it becomes very confusing if you can rename things
… if you go depeer in the hierarchy, these renames won't be consistent
… So want to see if can remove the freetext option from itnent
… and if you need freetext, use aria-label on the functions to give the freetext you need
… and that is a tech that already exits
… and get those translated
… it flows into natural localization pipeline for HTML
… and enforces the idea that 'intent' is something the computer can understand
… freetext is something the computer cannot understand

polx: What do you mean computer can understand
… being able to understnad more of the intent of the expression
… my experience is this an extremely American point of view
… as soon as you go farther
… the bigger problem when you do this understanding, you want to understand in a semantic world that is well-defined
… and mathematicians have been creating math or centuries, and many things are not encoded

fremy: Not saying computer understands the equation, but understand each piece
… intent should be structured, but if you need a name, should use a name from the markup
… stilll rely on existing tech, but compose [missed]
… this seems more reasonable I think

polx: This is interesting, we'll be meeting on Wednesday
… is interesting thoughts

<fantasai> +1

polx: Wondering if we should consider, if single-language is safe enough
… or should be safe enough

florian: One of the beauties of math notation is that it is not natural language
… in translation description can be different, but the equation will be the same
… to a large extent

florian: It's shared
… and if we could enable those formulas that are not strongly tied to a natural language to be re-used as-is in a bunch of different language documents as-is
… would be nice, but certainly more complicated

polx: There's a will in the intent definition that trying to make it as simple as possible is a most important quality
… and might be reason why all these templating languages feel inappropriate

addison: integrates well with other tech stack pieces then make ssense
… the more different special thngs ppl invent, there's less likely to have widespread support
… e.g. re-using aria-label insofar as possible, already widespread

polx: One thing unsure about is how to encode defaults
… so that a11y tools don't need intent as much as possible
… probably this is doable for basic math, for English language
… if you go to any European language, there is no complete tool with these defaults
… if you go further away, then this will be almost impossible
… to use, for every different language, can stick the ENglish name and translate, seems doable but not sure
… and then things like i is used as root of -1 , but understood to be something else in different fields
… e.g. H2 is hydrogen or 2nd homology group
… currently we seem to avoid being able to speak a proper domain name
… this is crystallograph or organic chemistry
… we don't know whether there's a way to model this kind of subdomain things
… because at the end you end up very scattered

addison: I understand what you're saying

r12a: We

r12a: We're talking about describing an expression, why don't we have something like alt attr

polx: 2 reasons

polx: This is one string for whole subtree, which is what aria-label/descri can do
… but this is not enough for navigating through the subtree
… as you move in the subtree
… take out some parts and re-use other things

r12a: Thanks
… point I wanted to make, before I joined W3C I worked at Xerox as global design consultant and helped develop the i18n aspects of the corporate engineering process
… if you're developing a product, principle of develop it in at least 2 languages
… My recommendation to you, because sounds extremely complicated, is that you develop it in English and another language e.g. Arabic or Japanese ,which are substantially different in syntax
… and try to concucrrently develop the tech in all those languages at the same time
… you'll have a better idea of how to develop

addison: Danger is that WEstern-european types can assemble something that works, but breaks down as you move to other language sets

polx: Exactly the problem we have right now in non-standardized software

addison: Get proof of existeance, and then encounater problem of like Japanese having very different word order
… or different agreements with numbered
… and that's where you discovered have features, but can't go there
… if you can make it work for an array of languages then you can sneak up on some aspects of the problems
… as you see from earlier discussion in CSS, still corner cases that are hard
… don't have "well it worked in English" and then get stuck

r12a: I chose Arabic and Japanese on purpose
… Japanese has a SOV word order
… but also has very little agreement and very vague language
… Arabic on the other has lots of agreement, and VSO order
… and also has single tense, dual tense, and multiple tense in terms of plurality
… so those two languages cover a lot of range in the problems you're likely to run into

polx: Unfortunately both those languages are colonized in terms of math notation. They write in French notation
… and I believe that the Japanese have been taking math notation since 1920s from Americans with almost no difference

florian: From notation, yes, but from the way they speak it

polx: you're right

r12a: You know about our note about MathML?

polx: ?? is the author, but is unfortunately not involved anymore
… we had one guy which has just left recently, might come back, is BUlgarian and have a bit more exotic math formulae formatting
… so we have French, German, and English in the group
… and Dutch with Bert :)

addison: Point though is not the math notation that's different, it's the natural language aspect

polx: You're right that Bulgarian might not differ as much from grammar

fremy: Right now the spec doesn't include list of templates

polx: working in Google sheets

fremy: Exercise that seems worthwhile is to sample 100 equations from Wikipedia, and ask people to write how they would read these formulas in their own language

<Bert> https://www.w3.org/TR/2006/NOTE-arabic-math-20060131/

fremy: it's difficult to imagine without this sampling
… it will tell you which patterns are most often recurring, which will tell you the focus of intents
… and will also tell you the different ways these are descried in different language,
… will show whether your strategy will work
… and if so do you need more, e.g. you realize you need singular/plural. or male/female
… for some of the letters
… maybe then you need to say this is an attribute we may want to consider
… you will not be able to solve all the challenges in the first version
… but it would get you idea of what are the major issues
… Consider how can you cover with simplest possible approach these cases
… It's a survey also that's not too hard to run
… this will help a lot in shapin gyour desing

polx: Also within a language, ppl will speak things differently

<fremy> fantasai: I think we could probably run the survey at a Math conference

<fremy> fantasai: and some would think of this exercice as "fun"

<fremy> fantasai: compare how they would voice a formula vs friends

<fremy> fantasai: and there would be people from all over the world in these conferences

florian: When I was in engineering school, Vietnamese students and us understood each other better in math than anything else
… they had learned to speak the notation in Vietnamese, and also learned in French

r12a: You also have to be careful, I spent 6-7 years teaching globalization
… and I would be teaching developers who spoke those languages how to develop i18n
… and they'd never apply the idea of "oh, we do this differently to how it's being implemented here"
… so you can ask them, but they might not have ever thought about it

fantasai: The advantage of fremy's question is it's very simple, don't have to think deeply about it just write down how you would read it

<fremy> fantasai: reading in your own language is easier because participants don't need to think about it

addison: There are common patterns to this, this is similar to other things that ppl have done
… so maybe we can connect you with some resources
… and have some guiding discussion to show you the kinds of things that you can

polx: One thing done in MathML 3 introducing long division
… you have an amount of ppl, asking "how do you write long division in your country"
… and found 17 different ways
… and it differs

addison: There are styles even within langguages
… many ways to do the same thing, all of which are valid, just stylistic or preferential
… so have to account for those differences

r12a: Have to account for, whatever you come up with should be understood by everyone

addison: Myles will join in 5 minutes, any other things on Math?

polx: If you can send me links to experiments, would be very helpful
… indeed the design seems like it is something ppl have been doing

addison: wherea re you in the cycle?

polx: This is the FPWD
… so enough time to inform the design
… really a big trade-off between simplicity and explicitness

[discussion of possible survey]

fremy: If you have this presentation, how do you read it?
… might not be the preferred presentation but how do you read it

<addison> https://github.com/w3c/csswg-drafts/issues?q=is%3Aissue+is%3Aopen+label%3Ai18n-tracker

<addison> https://github.com/w3c/csswg-drafts/issues?q=is%3Aissue+is%3Aopen+label%3Ai18n-tracker+label%3A%22Agenda%2B+TPAC%22

<r12a> https://github.com/w3c/csswg-drafts/issues?q=is%3Aopen+is%3Aissue++label%3Ai18n-needs-resolution

<addison> https://github.com/w3c/csswg-drafts/issues/6848

CSS issues

myles: on windows certain fonts display backslash as yen sign, so people use backslash where they mean yen
… so on macos we have to do something to make these fonts display to the user intention
… only certain fonts or certain encodings

fantasai: do we have an idea of the best way forware

r12a: kida-san provided some recommendations

<r12a> https://github.com/w3c/csswg-drafts/issues/6848#issuecomment-1226798241

florian: can take a shortcut to talk about yen, but korean has won sign
… they appear in file paths for windows
… in asia, very familiar
… makes me wonder if kida-san's recommendation is correct, since any webpage will use Unicode 5C but expect to show yen or won or yuan
… normally characters should be different for a reason

addison: this is holdover from DOS days
… see ppl use \ as currency symbol
… I don't think modern APIs generate that often

florian: keyboards do

addison: I agree with Myles we need to solve this in a consistent manner
… because will be tricky
… because intention is lost

myles: Is there a key on Japanese keyboards for Yen or Won sign

florian: I think answer is no, you press just the one key
… backslash/yen key
… how software converts that to Unicode is maybe they do 5C

￥

florian: but there's just one key

atsushi: Some keyboards have both
… my keyboard has both

myles: Do you know if those keyboards are common?

addison: just switching to IME doesn't get you yen sign until you swith out of directed mode
… but in command shell you'll see paths displayed consistently in those localse with those symbols

r12a: What about escape codes? Do they all start with yen sign?

florian: I guess so
… but not sure, not on Windows for too long
… and this really is a Windows-ism
… it's not a Linux thing and not a Mac thing

addison: You could wish to start to repair the world

<Bert> Some photos of Japanese keyboards

addison: certainly backslashes as backslashes outside a path context

florian: If you're thinking about a Mac author writing an article about Windows, it would be fine if you don't get it automatically

<atsushi> keyboard map examples

florian: and have to work to find char for Windows users
… but if you have a machine where the font renders \ as Yen
… then won't notice the oddness
… Kida-sans advice, does it work if we can't fix the font?
… Removing tricks from fonts is nice, but fonts are already out there. Too late to fix

myles: interesting observation is that if you use ICU to convert the byle 5C from Shift-JIS encoding
… e.g. say this sequence of bytes is a SHift-JIS encoded string, and that byte is 5C
… if you then take this string as 1 byte and ask convert to UTF-8, the result that ICU produces is also 5C
… so ICU at least seems to be thinking that the encoded byte 5C in Shift-JIS is backslash rather than meaning yen sign

addison: it absolutely has to, because underneath the hood the OS expects a backslash in the path
… just a thing in East Asian OSes that the DOS fonts and later presentational fonts show paths as having the symbol in them
… I don't think it was shift-JIS, I think it was the single-byte national code sets that had yen sign in them
… so I think that's the right behavior for a converter
… but what's happened is that everyone got used to path separators looking like currency sybol
… even though underneath the hood they're really 5C
… which is horrrifying

fantasai: So what do we want to do here? DO we want other borwsers to adopt WebKit behavior or something else?

myles: not a mode, just any time you have a particular encoding OR certain fonts, we will automatically swap out the two characters

addison: My question is, is this something one could style on or off

myles: not with a CSS property. That's one potential option, could control with a CSS property

florian: Should we have in @font-face some descriptors to tell what the font is doing?
… currently triggering WebKit behavior on several famous fonts, but could be non-famous fonts

myles: sound sreasonable

myles: also this list of names is heuristic
… if you make @font-face rule with same name, but source is a different font, that will still trigger

fantasai: I think at that point you're asking for trouble

florian: Maybe intial value of descriptor can be auto
… [missed] and trigger the right behavior

myles: This code is older than WebKit-Blink fork, and Blink doesn't have it so must have intentionally removed it

fantasai: They also aren't as focused on Mac, so maybe not as focused on that?

florian: These fonts are not on Android either

addison: These fonts are named in the stylesheet and subbed in OS
… but taking the behavior

florian: Chrome on Android should be having the same problems as WebKit on MacOS
… but Chrome removed it, possibly on purpose

<florian> fantasai: the two options we have are

<florian> fantasai: 1: remove this special behavior from webkit, and just let the font do what it does

<florian> fantasai: this will result in pages result very different on windows vs other OS

<florian> fantasai: option 2: encode this behavior in all browsers, and possibly add some css to control it

<florian> myles: we could change our heuristic

<florian> fantasai: but something more or less like it

<florian> fantasai: we should probably take that to the CSSWG

Bert: This might also occur to other languages

florian: It happens for sure in Japan and Korea

addison: Also affects simplified Chinese, maybe also traditional

fantasai: If we standardize this, should expand to other affected languages

addison: I think limited to East Asian at least

Bert: WebKit only does Yen sign, right?

florian: Do you have equivalent heuristic for Korean, or don't do it for Korean?

myles: I've exhaustively listed our cirteria

polx: Is there special behavior for French francs?

florian: There were symbols, but never intermingled with backslash in encodings

ACTION: fantasai to summarize into issue, for discussion in CSSWG

<trackbot> Created ACTION-1194 - Summarize into issue, for discussion in csswg [on Elika Etemad - due 2022-09-19].

myles: If other browsers refuse to implement, this makes our decision for us

fantasai: Thats why need to discuss on Friday

<r12a> https://github.com/w3c/csswg-drafts/issues/7183

<r12a> [css-text-4] Make autospace a property, rather than a value of text-spacing #7183

https://github.com/w3c/csswg-drafts/issues/7183

r12a: I think there are advantages of splitting these two apart
… and may even be able to do some additional stuff, such as replacing normal spaces with autospacing

myles: When you say autospacing, can you describe?

r12a: in Japanese, there's typically a little bit of extra space between Japanese chars and numbers
… or between Japanese chars and Latin
… and that's something that if you put in an actual space before/after
… those spaces are too big
… and don't really belong there
… so the autospace property applies that extra spacing without having to add that spacing
… which everyone wants that
… whereas text-spacing is stretching gaps

myles: I'm confused, what's the difference?

r12a: text-spacing applies equal amount of space
… autospacing is particular to context
… and another question of applying lots of these spaces across range fo text
… about surrounding text with a bit of space on either space
… often fixed-size space

<r12a> myles, see this (read the whole section) https://r12a.github.io/scripts/jpan/#letterspace

[fantasai explains what text-spacing does]

myles: transform spaces in source?

fantasai: either transform or to insert where not already there

r12a: also includes reduction of space around punctuation
… everything to do with space, rather that different types

addison: so could split different classes of mechanical spacing
… for CJK autospacing would give you for runs of non-native text
… and not affect any other spacing

r12a: Splitting it out allows you to be more specific
… apply to certain cases and not others

r12a: I wanted to throws this out there because I think there's been no movement on it

fantasai: haven't been working on Text 4 lately

myles: I don't wat to comment on property split
… but our native text engine CoreText has a similar feature for Chinese and Japanese text
… where it inserts spacing
… in various places between different kana, punctuation, for Chinese and Japanese
… and it has specific rules about where that happens
… text-spacing property in Text L4 has a bunch of values which are fairly prescriptive about where space goes
… so for us, the reason that we like the auto value here is it's a way for CSS text to match the native text engine
… to get equal fidelity with native apps and webapps

r12a: i'm not arguing against an auto value

myles: If we have auto value in its own property, what would be the meaning if you specify "do autospacing" which for us would mean match platform *and* you supply different value to text-spacing in conjunction

r12a: have a read of this stuff and the description I pasted I pasted into IRC
… what I'm saying is that these are different things that involve gaps
… for different reasons and in different ways

myles: Question is what does it mean "do autospacing" and also say "text-spacing: trim-start"

r12a: I'm not sure that there's a clash there
… you're just offering content author ability to handle independently
… I don't think they overlap

fantasai: different ways of splitting the control
… text-spacing could shorthand two properties
… one for punct. vs. script boundaries
… or could have an indepdendent property for controlling the space replacement vs. how much
… set for whole doc "how much it is" vs. turning on and off
… think about what is more ergonomic for authors
… want to control how much spacing
… could go in another level, was originally in L3, could consider in L4
… for example, underline position is separate from whether it is on or off

myles: When reviewing r12a's document, I see text about letter-spacing, initial-line punctuation, text-indent,
… want to make sure I'm not missing autospacing

r12a: autospacing I'm talking about is the spacing around alphabetic or numeric phrases
… seprately is spacing around punctuation
… felt it easier to split up that way for readers

~* Lunch *~

New Dial-in: https://us02web.zoom.us/j/85205096646?pwd=Z0tIVk1PdHlPZ20vQlBVRmVqSG1RZz09

Intros

PeterR: Peter Rushforth

PeterR: interested because of indigenous languages and making them happen in browsers
… maps is my focus, but fact finding

David: connect with Andrew Cunningham perhaps?

CSS Stuff

fantasai: color contrast discussions https://github.com/w3c/csswg-drafts/issues/6319
… to be aware of

fantasai: question about whether color contrast values are affected by writing system, and how to have algos account for this

fantasai: Another unsolved issue is top metrics for non-Western scripts https://github.com/w3c/csswg-drafts/issues/5244

fantasai: Related to Kaiti issue is fangsong issue https://github.com/w3c/csswg-drafts/issues/4425

[discussion of what styles fall back to what]

florian: Define grasscript only over the CJK range
… because don't want it to fall back to children's handwriting font

dsinger: Maybe look at semantics of what the styles convey
… e.g. if about emphasis, translate it

florian: but what's the Khmer equivalent to writing German in Fraktur?

dsinger: That would be archaic, that's the semantic, roughly
… but this is how you emphasize in Chinese, so go to bold or italic in English

florian: we sort of used to try to do, either serif or sans-serif or cursive, but moving away from that because mapping is too hard

addison: semantically differnet and not 1-1 mapping
… Japanese emphasis might have bg color difference, or emphasis marks, not bold or italc
… can style em or strong to be these things
… but really different things

addison: drop-cap thing, if you try to smash everything into Western typographic, that doesn't match how fonts are structured or how the script works

<florian> fantasai: what we need to consider is that we're not going to be able to map every style of font, even in western typography

<florian> fantasai: it should not be our goal to be exhaustive

<florian> fantasai: the reason to create a new generic style is if you were using that to convey semantic differences or contrast

<florian> fantasai: we need to have css be able to fall back when the font is missing to something else that would express the same semantic contrast

<florian> fantasai: in English text, you wouldn't switch between Times and Palatino to to express anything, but you might switch between italics or monospace or something. That needs preservation

<florian> fantasai: same logic should apply to chinese: if the text switches from something to grass styles to express a distinction, then we'd need it, but I suspect you won't actually find text…

<florian> fantasai: …where that is the only difference. Using a different style for a heading isn't strong enough, as there's other things that distinguish the heading.

addison: are generics about "give me a font with this type of styling generally" or ...

fantasai: The were added originally for that, but that's not what we need
… fantasy or cursive are useless because their purpose is to convey a feeling and they cna't do that because such a wide variety of fonts in each category
…

fantasai: you can use lang tags to tweak generic choices

dsinger: ...

dsinger: We don't have a place to put information about shaping of certain language/writing systems

addison: If you look at Urdu vs Arabic, they have different stylistic variations
… not serif vs sans-serif
… you can look at them and say their not really serif or sans-serif
… can smash into those buckets, or do we recognize that without changing language there are different font styles
… is it semantic thing
… I can argue both sides, it's really hard to add generics
… shaping engines work in specific ways with info we haave
… some token to pass, this is what I intend
… without being able to know what fonts are installed on a machine

dsinger: I'm hearing it's inappropriate to talk about generics in Latin terminology
… so we should have names for things they do in those scripts
… but then we have a problem of translation, what does it mean in other script

florian: don't necessarily have a problem, can apply :lang() selectors to choose fonts differently
… but if we say that Kaiti is not cursive, but new, then how many such new things should we have?
… do we want to go as far as fantasai said?
… or go further, e.g. I want Humanist typeface?
… not just about adding keywords, but also browser needs to have access to the fonts *and* know which fonts map to each keyword

fantasai: I think we have two critera
… one is what Florian mentioned, which is can we reasonably implement this generic
… other is do we need the generic in order to ensure the semantic preservation
… e.g. if these two fall back to the same font, will the text be less understandable
… nevermind whether it feels appropriate

florian: typical example would be italics in English
… if you lose the italics, you lose the fact that there was emphasis
… if you have a document which uses italics for emphasis and you fall back to normal text instead of italics, you lose information
… what are the cases in other languages?

<Bert> CSS font classification isn't based on Vox, but showing that others are struggling with classifications, too: ATypI abandoned the Vox classification and is working on new one.

<florian> fantasai: the reason for the change should be in the markup, and then you style it however you want. But it often happens that then way you style thing is that the only difference is between the font face, then we need generics to be able to preserve that distinction

addison: I think I agree with you, but your test may be incorrect
… if you suppose that someone used that as the only distinction
… e.g. I've seen serif vs sans-serif, e.g. as a form of emphasis
… but you could imagine that a document that would pass your test and still say, well, the fact that the browser smashed these two styles together is because of a limitation of our ability to express in generics

<florian> fantasai: we should not introduce generics to deal with that problem just become some one-off document made a distinction, but if it is one commonly made in the language, then it calls for generics

fantasai: ...

florian: I would like to see generics for more things, but if we are going to get a more limited set, the criteria you mention are the minimum we should aim for
… I think it would still be nice to pick from general categories for preferences
… e.g. naastaliq
… But regardless, we can't just make nice keywords in specs. The browsers need to be able to map them
… if we create 500 generics in all known languages, it's not going to have good coverage and not going to be helpful

addison: it's like counter styles
… I know I want certain things, but to force everyone to implement
… if you're styling documents can use these keywords in this way, and it will do a good job of getting fonts that matches

florian: maybe can provide premade style sheets for this
… but even though I would like all to be covered by browser, if we have smaller set
… fantasai is hinting at the minimum neessary for international text to work
… nice to go beyond, but should at least start there

addison: it's about where font management taking plae

<florian> fantasai: where it gets implemented is a bit more of an open question

addison: not necessary to spec
… up to implementations

<florian> fantasai: it often happens that introducing it in CSS puts the pressure on the reste of the ecosystem to make it happen

<florian> fantasai: as the i18n WG, what we need to do is to identify the critical things that needs to exist so that the earlier criteria can be handled

<florian> fantasai: just like western designers may wish to get the distinction between serif and slab serif, Arabic designers might wish for many distinctions, but that's not a priority

dsinger: if writing a document [gives example of switching font styles]

florian: if it's a one-off, that's one thing. If it's a regular pattern, need to build into CSS

<florian> fantasai: but if the common type of document wouldn't make sense on a phone because the phone doesn't show the right distinction, then that's a problem.

addison: [...]

<florian> fantasai: css should be designed in a way that as you fall back through fonts, you may loose some styling, but you shouldn't lose meaning. Whichever generics are needed to make that happen should exist

florian: Imagine we were not all familiar with Latin, and only had distinction relevant to our own language
… discussing about adding generic keywords
… as i18n, and they explain italics
… if you miss that, you'll have difficulty understanding
… if you can't preserve that you will miss information
… they have many different font styles, which is nice, but need italics vs non italics
… CSSWG knows how to introduce generics
… but doesn't know what's needed to add
… if i18n can say, in language X you will use font face changes to distinguish these different uses
… functions similar to switching to monospace or switching to italics in Latin
… if i18nwg comes back and says these 7 keywords would solve these problems, CSSWG can add them
… but i18nwg needs to find these cases

addison: Can identify here's a group of languages, and here's what they do
… forgetting about the outside world, this is how they classify fonts

fantasai: but we don't care how they classify fonts if they're not using those classifications to make distinctions within the same document

addison: there are mental classificatoins
… for emphasis, we've introduced different ways to style emphasis
… because obliquing things is not the way to do it
… We can describe what those all are
… but can show what the cases are and have a discussion of where the bar should be
… before we take the plunge and introduce a new generic
… or should we do interstitial work that's separate

<florian> fantasai: nastaliq vs kufi is not going to be a distinction used within a document to contrast things. Would be nice to have, but not critical for understanding

[discussion of kaiti vs non-kaiti being used simlar to italic vs non-italic]

fantasai: I think the problem with classifing kaiti as cursive would be that if you ask for kaiti, you might get grasscript which would be totally inappropriate

florian: Would be like asking for monospace to express code and fell back to Zapfino
… the contrast would be there, but what it means is lost
… falling back to monospace would be better

dsinger: in this document, use the font as distinction, and in other as stylistic
… what do we do in that case
… want both documents to be readable at least

florian: problem of mapping fonts to categories, browser can do it if we introduce 3 new keywords; but not if we introduce 50
… a handful (worldwide), they can do it and it will be usable
… if instead of 3 (ignoring the 2 uselss ones) we had 8 or 9, would be manageable
… if we are asking for 50, will not be impemented
… so what are the few extra ones that are critical for understandability?

florian: can we action i18n to find the cases where font face category switches are needed for understanding common documents?

addison: other challenge is we'll not find a global generic
… we'll find a set of traditions over here with Kaiti, over there with another one, etc.
… will find islands of variations

florian: that's fine

David-Clarke: Would things like old-fashioned/modern/etc be types of categories to look

fantasai: no, because that's just a stylistic preference

florian: The distinction here is critical to have for understanding documents, vs stylistic preferences

ACTION: addison: follow up with r12a and others about gap analysis for font generics

<trackbot> Created ACTION-1195 - Follow up with r12a and others about gap analysis for font generics [on Addison Phillips - due 2022-09-19].

<florian> Florian: the distinction between old style or modern isn't a wrong one, but it isn't a critical one in the sense that both aren't commonly used in the same document to contrast two pieces of text

Triage

https://github.com/w3c/csswg-drafts/issues?page=2&q=is%3Aopen+is%3Aissue+label%3Ai18n-tracker

https://github.com/w3c/i18n-request/projects/1

https://github.com/w3c/csswg-drafts/issues/1790

fantasai: some kind of overview might make sense to me
… lot of details handled in there, not just baseline alignment not just in one writing system, but when mixed
… and this section tries to account for all of that

overview of baselines in CSS at https://www.w3.org/TR/css-inline-3/#css-metrics

fantasai: discussion of text-spacing and adding rules to handle non-fullwidth punctuation https://github.com/w3c/csswg-drafts/issues/6091

ACTION: atsushi: follow up with jlreq on csswg#6091 to see if non-CJK enclosing punctuation should be included in space-trimming

<trackbot> Created ACTION-1196 - Follow up with jlreq on csswg#6091 to see if non-cjk enclosing punctuation should be included in space-trimming [on Atsushi Shimono - due 2022-09-19].

https://github.com/w3c/csswg-drafts/issues/1282

https://github.com/w3c/csswg-drafts/issues/1282#issuecomment-952428897

fantasai: review miriam's comment linked above and convince csswg about direction

https://github.com/w3c/csswg-drafts/issues/6915

[addison explains how lang tags for undetermined language work]

<florian> conclusion 1: :lang("") matches lang=""

<florian> conclusion 2: :lang("*") matches everything but lang=""

<florian> conclusion 3: maybe add a note about lang="und" and lang="" being treated distinctly, despite having similar semantics

ACTION: florian to reread issue, and if conclusions still make sense in the end, post as the proposal

<trackbot> Created ACTION-1197 - Reread issue, and if conclusions still make sense in the end, post as the proposal [on Florian Rivoal - due 2022-09-19].

AOB?

– DRAFT –
TPAC 2022: Internationalization Working Group

12 September 2022

Attendees