Meeting minutes
Meeting Info: https://
FIND US IN 'FINBACK', 3rd Floor (same as registration but in the far corner)
Use this link: https://
Introductions
atsushi: JL-TF is preparing two documents, one is simple ruby, one is ruby-t2s-req
r12a: Richard Ishida
xfq: Fuqiao Xue
Bert: Bert Bos
florian: Florian Rivoal
<fremy> François Remy, Invited Expert in the CSS WG
<r12a> Paul Libbrecht, MathML WG
<florian> Florian Rivoal, Invited Expert, CSS WG, i18n WG, Advisory Board
Greg Whitworth
atsushi: team contact for i18n, timed text, immersive web
<r12a> Elika Etemad, CSS, i18n, ex-AB
CSS backlog
https://
florian: we need someone from Apple to discuss this effectively
r12a: maybe there's one or even two other issues that need to be read in conjunction with this one
<r12a> see also https://
https://
addison: Backslash & Yen sign behavior
fantasai: last time I looked at the issue there's no good solution for it
addison: this is the famous issue that existed forever
… probably we need to have someone from WebKit to discuss it
https://
addison: Kaiti & cursive
florian: it's probably good to discuss this in the joint session
… we have some CSS people here, but we're missing some
https://
florian: we have specified in css text L4
… a pair of properties
… if you have wbr in the markup
… @@ which is typically used in titles
… this is also useful because allowing line breaks
… especially in children's books or for people with dyslexia
… for line breaking you can use the usual
… here it's trying to tackle the same problem differently
… whether we reuse the existing machinery for line breaking or whether we make a new one
… do we just have a giant pile of AI and it figures out where to put the breaks on its own
… or do we need to be more strict on this and say it's Chinese
… from an author's point of view
… it's going to be important to know what it's going to do
… you need to be able to ask not just about line breaking (with @supports), but line breaking in Japanese
… the property we have in CSS, specify which language you want to line break
… let the browser figure it out
… that's the general two different approaches to try and tackle the same problem
… CSS line breaking properties is already extraordinary complicated with so many properties interact with each other
… I'm not excited about adding one magic switch that says ignore everything and do new line breaking
addison: I hear a lot from Japanese designers
… a lot of unfortunate line breaks
… do you think it's related to that?
addison: I'm trying to understand what people think the problem is
florian: I made a talk about many line breaking things, including this one
<florian> https://
florian: I don't think I can take over and project, but I just dropped a link on IRC ^
… if you press space and shift+space repeatedly to move forward
[florian shows the slides]
florian: we have existing properties that lets you switch two different modes
addison: you can mark all the boundaries
florian: if you mark all the boundaries, it's just like English
… as Richard mentions, there is more than one way to do this
… there's varying opinions
… the new approach is don't add wbr, ignore the line break properties in CSS
… I'm concerned about the inability to specify which language it is
Francois: why not 'word-break: avoid'?
fremy: @@
… if you can, fallback to 'word-break: normal'
… if it doesn't work, it's the normal behaviour
florian: there's word-boundary-detection and word-boundary-expansion
… word-spacing is for where there is already a space and makes it bigger
… this is for inserting a space
florian: we could add a second keyword
<myles> hello
florian: for languages like Thai, word-boundary-detection has three values
fremy: interesting to see if anybody cares about this
r12a: they do
… we're talking about the language Thai
… there are many languages using the Thai script
… like Northern Thai
florian: you could switch out auto
addison: if you don't know Northern Thai, don't use Thai
… with proper language tagging
florian: not language for the context, but language for the algorithm
… maybe browsers don't know how to do line breaking for Cantonese, but they know how to do line breaking for Chinese
<r12a> (languages using the Thai scipt: http://
florian: there's something dealing with the normalization of languages
r12a: we have another issue about that topic as well
<Bert> (Example of an unfortunate line break in English: ‘Hi, My daughter had an accident and now we need body parts to fix her / car’, from https://
r12a: it is a little difficult to hear you with your masks on
florian: combination of word-boundary-detection and word-break: keep-all
… go to look at the content
… when you language tag things, if you language tag your content properly, the browser has exact algorithm for this, it will do the right thing, otherwise it will fallbak to normal
fantasai: I would not classify that as normal vs strict
… kinsoku rules are independent
florian: at some point, if you're extremely picky about how to line break, you can go and add wbr
fantasai: if you do word-based breaking and suddenly switch to phrase-based line breaking
Bert: there's other thing you want, like length of a line
florian: I don't want a complete new thing, a new magic mode that completely ignores the line breaking properties
… I think I got useful ideas
<fantasai> [current discussion is adding 'words' and 'phrases' keywords or something to 'word-break' property]
<fantasai> https://
r12a: you started by saying you wanted to talk about the needs-resolution issues
… but these are tracker issues, addison
<r12a> https://
r12a: we don't have a single CSS label
<r12a> filter on needs-resolution
r12a: if you go here ^ and filter on needs-resolution
<fantasai> https://
https://
<fantasai> https://
fantasai: proposal is to get the WG to resolve it
<fantasai> if i18n agrees with the proposal, I'll get CSSWG to resolve on it
addison: atsushi and r12a seem to agree with fantasai's comment
fantasai: should we do this to handle justification for ruby annotations?
r12a: what we discovered it's more likely the Latin text is centered
<fantasai> https://
addison: doesn't sound controversial
fantasai: Should auto-hide match use NFKC or other normalization?
addison: NFKC is usually not a good idea
… there's a lot of things, it's kind of an uncontrolled @@
r12a: I think it's NFK mapping
addison: I suppose we need to make some research
fantasai: if it's too aggressive we can do some custom normalization
fantasai: I think it's legitimate using different representations
r12a: would not be good idea to normalize stuff that people have typed
… automatically annotate your Japanese or Chinese
… e.g., you could have katanaka, and the kuten marks decomposed
… if it's a different kanji character, you probably don't want to unify them in any way
… if it's real normalization, maybe it's useful
fantasai: we should probably do NFC for auto-hiding
<fantasai> https://
r12a: if it matches, it removes the annotation
addison: the comments are all pointing to things like whitespace normalization
… possibly normalize inernal whitespace
… two space become one
r12a: if you inline, you're not removing anything from view
… we're talking about real edge cases here
florian: Xian and Xi'an example in Chinese
<David-Clarke> Should half-width kana match full-width in this case?
fantasai: I feel that is a different issue
<fantasai> * whitespace normalization
<fantasai> * NFC normalization
<fantasai> * East Asian Width folding
florian: you're using half-width katakana because it's tiny
… [explains the half-width katakana example]
fantasai: whitespacing is sometimes accidentally introduced
… you have trailing/leading whitespace
… I think what we should do is whitespace and NFC normalization for auto-hiding
addison: sounds like a reasonable starting point
… need to think about the edge cases
florian: I suspect if we start with NFC, it's safe
… if it's rare enough edge case, we probably shouldn't do anything by default
addison: choose your code point carefully
<fantasai> Proposal for normalization of base and annotation text before auto-hiding:
<fantasai> - Use NFC normalization (not NFKC)
<fantasai> - Trim white space
<fantasai> Anything else, authors should adjust manually using `visibility: collapse`.
addison: it's not asking people to store or something like that
fantasai: my suggestion going forward is to ask the WG to see how they feel with NFC matching
r12a: you're not actually displaying diffreent things, you're just matching
… if you got 2 or 3 spaces in between two words
… it's not really relevant here
addison: we've just done discussing wbr
<fantasai> Commentary on why we have the current spec text https://
Ruby markup status
florian: extremely long discussion to try and make it possible to write the ruby markup
… a pull request against the HTML spec
… as soon as we find time to actually do it we should be very quick to make a FPWD
… we actually have two impls
… firefox and amazon kindle
~* Break *~
MathML
polx: MathWG is working on v4 of MathML
… MathML is XML format for writing math notations
… v4 the biggest novelty is trying to get speakout to work
… so that a11y tools can read math out loud
… MathML is known to sort-of work on that, but we want this to become proper
… current development means adding an attr, intent, that describes how parts of tree will be spoken out
… should be combined with default knowledge of how to speak things, which is currently the fuzzy part of the spec
addison: intent is structured data?
fremy: is it fixed options or freetext?
polx: freetext, but has some placeholder that allow you to delegate the rest of the speaking to inside
fremy: so templates?
polx: template language being developed, part of more unclear part
florian: It's freetext in a human language?
polx: Yes, that's why i18n aspect is interesting
… I believe MathML lives in lang-tagged trees
… and I think the voice that is used to speak this, depends on user
… not sure if i18n has special concerns?
addison: You're touching on some hot buttons
… one is that putting natural language text into an attribute makes it hard to localize / translate
… can't be lang-tagged
polx: it's mono-language, so whole subtree of language
… there are alternative representations but
addison: common thing to want to do, if you want to localize something you have the intent ... can have multiple ones with different lang tags
… can localize sub-parts
polx: would translate the whole subtree
addison: also case that there are things like structure of natural language is not, you can just add words together to make senstence
… so if you structure things that way, then wehn you put into another language, then you don't have attributes in correct order
… need to rearrange to make it sensible
polx: There is a dynamic to how intents are combined
… pull things out and make one single sentence
… makes it non-navigatable, a11y tools like to navigate sub-elements
… but [missed]
addison: I'm not an a11y expert, but usally what you want is you want a stream of words
… that you're feeding to the TTS engine
… might feed language tags to get it to pronounce correctly
polx: The whole world around it would be Russian, expectedly
addison: what I'm saying is, there's a stream of text that would go to processor that will read it out
… your problem is that to generate the stream of text
… you're providing a way for ppl to mark up their math with the content such that it generates that string of text
… and if you only ever had a document in one language at a time, that would be maybe possible to do
… but different languages have different requirements
… you'd have to reformulate your content to be in a different language
… e.g. Japanese has very different word order than English
… so you would need to set it up so that stream of text would be in the taret language
florian: If I've understood correct, idea is not that each subtree has a piece of text that gets added together
… but idea of having a language tmeplate thing
… can have subtrees invoke the grammar in the right way
… so concatenation order wouldn't be a problem
polx: Sometimes can't, so in "...", you'd have different needs to put things on the parent alone
… because you cannot use the templates in a reasonable manner
addison: Harder than it loos, bcause you have agreement issues, e.g. don't have just zero/one/plural
… and math of course has lots of numbers
<r12a> it would be very useful to see an example !
addison: interesting project at Unicode, next generation XXX format, to describe localizable structures
… for inserting runtime formatted strings
… called the Message Format WG of CLDR
<florian> fantasai: have you heard of l20n project at Mozilla?
<florian> fantasai: it was a templating system that Mozilla was working on maybe a decade ago
<florian> fantasai: that was to deal with agreement or inflexions and other grammatical things, in order to deal with these in the Mozilla UI
addison: I think that evolved into Fluent, which is a format that does that
… other groups doing similar things
… and all those groups working on this message formatl also
… to build a system
polx: Problem with math is that there's extremely big variation on abstraction
… predicting contents, can depend on resolution you might not come t
… speaking in an abstract way
addison: I have illustrations of why doesn't work but
polx: Send it around, it would be useful
addison: My first reaction is, I think I understand what you want to do, super common to want to build a templating language
… trick is it's hard to do well from i18n pov, build it for one language, and have to rewrite for other languages
… there are better mechanisms to support doing these things
… I think it would be a good idea for us to look at your proposals and help make connections early on
<Bert> Example of ‘intent’ in MathML4
addison: from a very high level your description sounds like it would be problematic
florian: another thing to mention, since you put text in attributes, is limiting because you can't have markup
… since trying to display text, often need some extra markup
… if speaking rather than displaying, could be different
… but also have other things in CSS, that supposed to let you style how things are spoken
fremy: feedback I got is ppl don't want that
florian: if you stick things into an attr, you can't extend it
… if you can use regular elements, that opens up more possibilities
… maybe more than you want, but won't run into problem of less than you need
polx: There's an element in MathML called <semantic> for alternate representations, e.g. LaTeX representation alongside MathML
… these things are all there, but known to be too complex to be of use
… hard to make it simpler, honestly
… because i nthe end what you want is parallel trees
… and need to hook them up with IDs, and it works, but it's art
fremy: I think what you're trying to do, it seems you're trying to create a text representation for elements in the a11y tree
… very close to concept of aria-label and aria-description
polx: it is
fremy: why not use the existing system that they use?
… they already have this concept
… if you use aria-labeled-by, can have a list of things
polx: There are guys who are in the aria groups, and aria-label is considered part of this scenario, but different impl possibilities
… offer in ways that are independent
… not sufficient for our formatting
fremy: I would like to understand why it's insufficient, because it would help us understand what needs to be worked on
r12a: This discussion would be a lot easier if we had an example
r12a: The other possibility occures to me, you have a templating language which creates something in English and you translate it to other languages
… rather than trying to have a templating language that serves every language
polx: What do you mean my translating?
r12a: As I understand it, you're coming out with a sentence that sayse "The third root of 64"
… and that represents the formula that will appear on the screen
polx: right
r12a: you've got all those bits in that formula which you can assign words to, and then you have to understand relationships among them and how to create syntax for that
… then need to figure out agremeents e.g. pluralization
… and do that for every language
… but another possibility is that you generate a string in English
… then only have to build all that complex stuff in English, and then you use translation mechanisms
polx: This is a support aspect. You could do that, and you could do that at the authoring level or in the browsers
… but at some point you want control over that, and this is the space we're creating with the intent
… we want author to control how is spoken
fremy: also translation isn't cheap
… can't run it on client side every time you have a math equation, it doesn't scale
addison: if you have a true machine-translation engine, then it's not cheap to create but maybe can do that
fremy: I work in machine learning, and machine translation is multiple gigabytes in memory
… very few programs translate things correctly on a computer
… that's why they use servers
… small machine translation is very low quality, help if you are stuck without internet
… but not something that can be relied on
polx: Sometimes can do wonders with automatic translation, and can help author
… but whether author wants to render everything into a string, and then get that translated, and then get it checked by a math expert is one thing
… [missed]
… as soon as formula becomes really big, becomes essential
<fremy> fantasai: I have concerns about having natural language in attributes
<addison> fantasai: some concern about natural language in an attribute, because we often markup
<fremy> fantasai: not all accessibility engines use a speech engine
<fremy> fantasai: sometimes braille output for example
polx: Braille has a special math pattern
… One Hungarian guy I forget has a system for this
… I don't remember
… but I know that many ppl are feeling that this standard for Braille math is limited, but it is what everyone uses
addison: Also don't get wrapped up in saying it's just a11y, lots of documents are read alound these days
… so general purpose TTS is more prevalent than it has been
<Bert> Math in Braille often uses Nemeth Braille.
addison: so is it good enough to serve a11y audiences? Maybe, and that's where tech has been driven from historically
… but it is expanding quite a bit
polx: Yes, all wondering about listening to math in the car while driving
addison: "Alexa, read this paper"
… that's when you have some piece of MathML embedded, and needs to become...
fremy: aspect to keep in mind, MathML has 2 standards, and the one that is used is a presentational format
… it says how it looks on the screen, but same notation can be used for multiple things
… that's why you need intent
… need this on a letter can mean multiple things, that's part of the intent aspect right?
polx: you can use MathML Content to do better, but it's too expensive
… math professionals are more comfortable thinking about how to write things rather than how you mean them
fremy: Content is intended to be a middle ground, still describing presentation but with more info
… I think it does make sense to me, looking at the example
… one thing that maybe I am wondering is removing the idea of the string representation
… I would argue that this is not a good idea to put in intent
… I would limit it to things that can be understood
… If you want to express something outside of intent, should use aria-label
… for example in the spec you have x power to the ?
… suppose you want to read as position
… then this should be done at the aria-label level
… so you have the x arg, you have x aria-label = positoin
… and then you can compose the stentence
… but I would refrain from using intent to scope things outside the scope of intent
… I think it misses the bar
… because it becomes very confusing if you can rename things
… if you go depeer in the hierarchy, these renames won't be consistent
… So want to see if can remove the freetext option from itnent
… and if you need freetext, use aria-label on the functions to give the freetext you need
… and that is a tech that already exits
… and get those translated
… it flows into natural localization pipeline for HTML
… and enforces the idea that 'intent' is something the computer can understand
… freetext is something the computer cannot understand
polx: What do you mean computer can understand
… being able to understnad more of the intent of the expression
… my experience is this an extremely American point of view
… as soon as you go farther
… the bigger problem when you do this understanding, you want to understand in a semantic world that is well-defined
… and mathematicians have been creating math or centuries, and many things are not encoded
fremy: Not saying computer understands the equation, but understand each piece
… intent should be structured, but if you need a name, should use a name from the markup
… stilll rely on existing tech, but compose [missed]
… this seems more reasonable I think
polx: This is interesting, we'll be meeting on Wednesday
… is interesting thoughts
<fantasai> +1
polx: Wondering if we should consider, if single-language is safe enough
… or should be safe enough
florian: One of the beauties of math notation is that it is not natural language
… in translation description can be different, but the equation will be the same
… to a large extent
florian: It's shared
… and if we could enable those formulas that are not strongly tied to a natural language to be re-used as-is in a bunch of different language documents as-is
… would be nice, but certainly more complicated
polx: There's a will in the intent definition that trying to make it as simple as possible is a most important quality
… and might be reason why all these templating languages feel inappropriate
addison: integrates well with other tech stack pieces then make ssense
… the more different special thngs ppl invent, there's less likely to have widespread support
… e.g. re-using aria-label insofar as possible, already widespread
polx: One thing unsure about is how to encode defaults
… so that a11y tools don't need intent as much as possible
… probably this is doable for basic math, for English language
… if you go to any European language, there is no complete tool with these defaults
… if you go further away, then this will be almost impossible
… to use, for every different language, can stick the ENglish name and translate, seems doable but not sure
… and then things like i is used as root of -1 , but understood to be something else in different fields
… e.g. H2 is hydrogen or 2nd homology group
… currently we seem to avoid being able to speak a proper domain name
… this is crystallograph or organic chemistry
… we don't know whether there's a way to model this kind of subdomain things
… because at the end you end up very scattered
addison: I understand what you're saying
r12a: We
r12a: We're talking about describing an expression, why don't we have something like alt attr
polx: 2 reasons
polx: This is one string for whole subtree, which is what aria-label/descri can do
… but this is not enough for navigating through the subtree
… as you move in the subtree
… take out some parts and re-use other things
r12a: Thanks
… point I wanted to make, before I joined W3C I worked at Xerox as global design consultant and helped develop the i18n aspects of the corporate engineering process
… if you're developing a product, principle of develop it in at least 2 languages
… My recommendation to you, because sounds extremely complicated, is that you develop it in English and another language e.g. Arabic or Japanese ,which are substantially different in syntax
… and try to concucrrently develop the tech in all those languages at the same time
… you'll have a better idea of how to develop
addison: Danger is that WEstern-european types can assemble something that works, but breaks down as you move to other language sets
polx: Exactly the problem we have right now in non-standardized software
addison: Get proof of existeance, and then encounater problem of like Japanese having very different word order
… or different agreements with numbered
… and that's where you discovered have features, but can't go there
… if you can make it work for an array of languages then you can sneak up on some aspects of the problems
… as you see from earlier discussion in CSS, still corner cases that are hard
… don't have "well it worked in English" and then get stuck
r12a: I chose Arabic and Japanese on purpose
… Japanese has a SOV word order
… but also has very little agreement and very vague language
… Arabic on the other has lots of agreement, and VSO order
… and also has single tense, dual tense, and multiple tense in terms of plurality
… so those two languages cover a lot of range in the problems you're likely to run into
polx: Unfortunately both those languages are colonized in terms of math notation. They write in French notation
… and I believe that the Japanese have been taking math notation since 1920s from Americans with almost no difference
florian: From notation, yes, but from the way they speak it
polx: you're right
r12a: You know about our note about MathML?
polx: ?? is the author, but is unfortunately not involved anymore
… we had one guy which has just left recently, might come back, is BUlgarian and have a bit more exotic math formulae formatting
… so we have French, German, and English in the group
… and Dutch with Bert :)
addison: Point though is not the math notation that's different, it's the natural language aspect
polx: You're right that Bulgarian might not differ as much from grammar
fremy: Right now the spec doesn't include list of templates
polx: working in Google sheets
fremy: Exercise that seems worthwhile is to sample 100 equations from Wikipedia, and ask people to write how they would read these formulas in their own language
<Bert> https://www.w3.org/TR/2006/NOTE-arabic-math-20060131/
fremy: it's difficult to imagine without this sampling
… it will tell you which patterns are most often recurring, which will tell you the focus of intents
… and will also tell you the different ways these are descried in different language,
… will show whether your strategy will work
… and if so do you need more, e.g. you realize you need singular/plural. or male/female
… for some of the letters
… maybe then you need to say this is an attribute we may want to consider
… you will not be able to solve all the challenges in the first version
… but it would get you idea of what are the major issues
… Consider how can you cover with simplest possible approach these cases
… It's a survey also that's not too hard to run
… this will help a lot in shapin gyour desing
polx: Also within a language, ppl will speak things differently
<fremy> fantasai: I think we could probably run the survey at a Math conference
<fremy> fantasai: and some would think of this exercice as "fun"
<fremy> fantasai: compare how they would voice a formula vs friends
<fremy> fantasai: and there would be people from all over the world in these conferences
florian: When I was in engineering school, Vietnamese students and us understood each other better in math than anything else
… they had learned to speak the notation in Vietnamese, and also learned in French
r12a: You also have to be careful, I spent 6-7 years teaching globalization
… and I would be teaching developers who spoke those languages how to develop i18n
… and they'd never apply the idea of "oh, we do this differently to how it's being implemented here"
… so you can ask them, but they might not have ever thought about it
fantasai: The advantage of fremy's question is it's very simple, don't have to think deeply about it just write down how you would read it
<fremy> fantasai: reading in your own language is easier because participants don't need to think about it
addison: There are common patterns to this, this is similar to other things that ppl have done
… so maybe we can connect you with some resources
… and have some guiding discussion to show you the kinds of things that you can
polx: One thing done in MathML 3 introducing long division
… you have an amount of ppl, asking "how do you write long division in your country"
… and found 17 different ways
… and it differs
addison: There are styles even within langguages
… many ways to do the same thing, all of which are valid, just stylistic or preferential
… so have to account for those differences
r12a: Have to account for, whatever you come up with should be understood by everyone
addison: Myles will join in 5 minutes, any other things on Math?
polx: If you can send me links to experiments, would be very helpful
… indeed the design seems like it is something ppl have been doing
addison: wherea re you in the cycle?
polx: This is the FPWD
… so enough time to inform the design
… really a big trade-off between simplicity and explicitness
[discussion of possible survey]
fremy: If you have this presentation, how do you read it?
… might not be the preferred presentation but how do you read it
<addison> https://
<r12a> https://
CSS issues
myles: on windows certain fonts display backslash as yen sign, so people use backslash where they mean yen
… so on macos we have to do something to make these fonts display to the user intention
… only certain fonts or certain encodings
fantasai: do we have an idea of the best way forware
r12a: kida-san provided some recommendations
<r12a> https://
florian: can take a shortcut to talk about yen, but korean has won sign
… they appear in file paths for windows
… in asia, very familiar
… makes me wonder if kida-san's recommendation is correct, since any webpage will use Unicode 5C but expect to show yen or won or yuan
… normally characters should be different for a reason
addison: this is holdover from DOS days
… see ppl use \ as currency symbol
… I don't think modern APIs generate that often
florian: keyboards do
addison: I agree with Myles we need to solve this in a consistent manner
… because will be tricky
… because intention is lost
myles: Is there a key on Japanese keyboards for Yen or Won sign
florian: I think answer is no, you press just the one key
… backslash/yen key
… how software converts that to Unicode is maybe they do 5C
¥
florian: but there's just one key
atsushi: Some keyboards have both
… my keyboard has both
myles: Do you know if those keyboards are common?
addison: just switching to IME doesn't get you yen sign until you swith out of directed mode
… but in command shell you'll see paths displayed consistently in those localse with those symbols
r12a: What about escape codes? Do they all start with yen sign?
florian: I guess so
… but not sure, not on Windows for too long
… and this really is a Windows-ism
… it's not a Linux thing and not a Mac thing
addison: You could wish to start to repair the world
<Bert> Some photos of Japanese keyboards
addison: certainly backslashes as backslashes outside a path context
florian: If you're thinking about a Mac author writing an article about Windows, it would be fine if you don't get it automatically
<atsushi> keyboard map examples
florian: and have to work to find char for Windows users
… but if you have a machine where the font renders \ as Yen
… then won't notice the oddness
… Kida-sans advice, does it work if we can't fix the font?
… Removing tricks from fonts is nice, but fonts are already out there. Too late to fix
myles: interesting observation is that if you use ICU to convert the byle 5C from Shift-JIS encoding
… e.g. say this sequence of bytes is a SHift-JIS encoded string, and that byte is 5C
… if you then take this string as 1 byte and ask convert to UTF-8, the result that ICU produces is also 5C
… so ICU at least seems to be thinking that the encoded byte 5C in Shift-JIS is backslash rather than meaning yen sign
addison: it absolutely has to, because underneath the hood the OS expects a backslash in the path
… just a thing in East Asian OSes that the DOS fonts and later presentational fonts show paths as having the symbol in them
… I don't think it was shift-JIS, I think it was the single-byte national code sets that had yen sign in them
… so I think that's the right behavior for a converter
… but what's happened is that everyone got used to path separators looking like currency sybol
… even though underneath the hood they're really 5C
… which is horrrifying
fantasai: So what do we want to do here? DO we want other borwsers to adopt WebKit behavior or something else?
myles: not a mode, just any time you have a particular encoding OR certain fonts, we will automatically swap out the two characters
addison: My question is, is this something one could style on or off
myles: not with a CSS property. That's one potential option, could control with a CSS property
florian: Should we have in @font-face some descriptors to tell what the font is doing?
… currently triggering WebKit behavior on several famous fonts, but could be non-famous fonts
myles: sound sreasonable
myles: also this list of names is heuristic
… if you make @font-face rule with same name, but source is a different font, that will still trigger
fantasai: I think at that point you're asking for trouble
florian: Maybe intial value of descriptor can be auto
… [missed] and trigger the right behavior
myles: This code is older than WebKit-Blink fork, and Blink doesn't have it so must have intentionally removed it
fantasai: They also aren't as focused on Mac, so maybe not as focused on that?
florian: These fonts are not on Android either
addison: These fonts are named in the stylesheet and subbed in OS
… but taking the behavior
florian: Chrome on Android should be having the same problems as WebKit on MacOS
… but Chrome removed it, possibly on purpose
<florian> fantasai: the two options we have are
<florian> fantasai: 1: remove this special behavior from webkit, and just let the font do what it does
<florian> fantasai: this will result in pages result very different on windows vs other OS
<florian> fantasai: option 2: encode this behavior in all browsers, and possibly add some css to control it
<florian> myles: we could change our heuristic
<florian> fantasai: but something more or less like it
<florian> fantasai: we should probably take that to the CSSWG
Bert: This might also occur to other languages
florian: It happens for sure in Japan and Korea
addison: Also affects simplified Chinese, maybe also traditional
fantasai: If we standardize this, should expand to other affected languages
addison: I think limited to East Asian at least
Bert: WebKit only does Yen sign, right?
florian: Do you have equivalent heuristic for Korean, or don't do it for Korean?
myles: I've exhaustively listed our cirteria
polx: Is there special behavior for French francs?
florian: There were symbols, but never intermingled with backslash in encodings
ACTION: fantasai to summarize into issue, for discussion in CSSWG
<trackbot> Created ACTION-1194 - Summarize into issue, for discussion in csswg [on Elika Etemad - due 2022-09-19].
myles: If other browsers refuse to implement, this makes our decision for us
fantasai: Thats why need to discuss on Friday
<r12a> https://
<r12a> [css-text-4] Make autospace a property, rather than a value of text-spacing #7183
https://github.com/w3c/csswg-drafts/issues/7183
r12a: I think there are advantages of splitting these two apart
… and may even be able to do some additional stuff, such as replacing normal spaces with autospacing
myles: When you say autospacing, can you describe?
r12a: in Japanese, there's typically a little bit of extra space between Japanese chars and numbers
… or between Japanese chars and Latin
… and that's something that if you put in an actual space before/after
… those spaces are too big
… and don't really belong there
… so the autospace property applies that extra spacing without having to add that spacing
… which everyone wants that
… whereas text-spacing is stretching gaps
myles: I'm confused, what's the difference?
r12a: text-spacing applies equal amount of space
… autospacing is particular to context
… and another question of applying lots of these spaces across range fo text
… about surrounding text with a bit of space on either space
… often fixed-size space
<r12a> myles, see this (read the whole section) https://
[fantasai explains what text-spacing does]
myles: transform spaces in source?
fantasai: either transform or to insert where not already there
r12a: also includes reduction of space around punctuation
… everything to do with space, rather that different types
addison: so could split different classes of mechanical spacing
… for CJK autospacing would give you for runs of non-native text
… and not affect any other spacing
r12a: Splitting it out allows you to be more specific
… apply to certain cases and not others
r12a: I wanted to throws this out there because I think there's been no movement on it
fantasai: haven't been working on Text 4 lately
myles: I don't wat to comment on property split
… but our native text engine CoreText has a similar feature for Chinese and Japanese text
… where it inserts spacing
… in various places between different kana, punctuation, for Chinese and Japanese
… and it has specific rules about where that happens
… text-spacing property in Text L4 has a bunch of values which are fairly prescriptive about where space goes
… so for us, the reason that we like the auto value here is it's a way for CSS text to match the native text engine
… to get equal fidelity with native apps and webapps
r12a: i'm not arguing against an auto value
myles: If we have auto value in its own property, what would be the meaning if you specify "do autospacing" which for us would mean match platform *and* you supply different value to text-spacing in conjunction
r12a: have a read of this stuff and the description I pasted I pasted into IRC
… what I'm saying is that these are different things that involve gaps
… for different reasons and in different ways
myles: Question is what does it mean "do autospacing" and also say "text-spacing: trim-start"
r12a: I'm not sure that there's a clash there
… you're just offering content author ability to handle independently
… I don't think they overlap
fantasai: different ways of splitting the control
… text-spacing could shorthand two properties
… one for punct. vs. script boundaries
… or could have an indepdendent property for controlling the space replacement vs. how much
… set for whole doc "how much it is" vs. turning on and off
… think about what is more ergonomic for authors
… want to control how much spacing
… could go in another level, was originally in L3, could consider in L4
… for example, underline position is separate from whether it is on or off
myles: When reviewing r12a's document, I see text about letter-spacing, initial-line punctuation, text-indent,
… want to make sure I'm not missing autospacing
r12a: autospacing I'm talking about is the spacing around alphabetic or numeric phrases
… seprately is spacing around punctuation
… felt it easier to split up that way for readers
<br type=lunch duration=50min>
~* Lunch *~
New Dial-in: https://
Intros
PeterR: Peter Rushforth
PeterR: interested because of indigenous languages and making them happen in browsers
… maps is my focus, but fact finding
David: connect with Andrew Cunningham perhaps?
CSS Stuff
fantasai: color contrast discussions https://
… to be aware of
fantasai: question about whether color contrast values are affected by writing system, and how to have algos account for this
fantasai: Another unsolved issue is top metrics for non-Western scripts https://
fantasai: Related to Kaiti issue is fangsong issue https://
[discussion of what styles fall back to what]
florian: Define grasscript only over the CJK range
… because don't want it to fall back to children's handwriting font
dsinger: Maybe look at semantics of what the styles convey
… e.g. if about emphasis, translate it
florian: but what's the Khmer equivalent to writing German in Fraktur?
dsinger: That would be archaic, that's the semantic, roughly
… but this is how you emphasize in Chinese, so go to bold or italic in English
florian: we sort of used to try to do, either serif or sans-serif or cursive, but moving away from that because mapping is too hard
addison: semantically differnet and not 1-1 mapping
… Japanese emphasis might have bg color difference, or emphasis marks, not bold or italc
… can style em or strong to be these things
… but really different things
addison: drop-cap thing, if you try to smash everything into Western typographic, that doesn't match how fonts are structured or how the script works
<florian> fantasai: what we need to consider is that we're not going to be able to map every style of font, even in western typography
<florian> fantasai: it should not be our goal to be exhaustive
<florian> fantasai: the reason to create a new generic style is if you were using that to convey semantic differences or contrast
<florian> fantasai: we need to have css be able to fall back when the font is missing to something else that would express the same semantic contrast
<florian> fantasai: in English text, you wouldn't switch between Times and Palatino to to express anything, but you might switch between italics or monospace or something. That needs preservation
<florian> fantasai: same logic should apply to chinese: if the text switches from something to grass styles to express a distinction, then we'd need it, but I suspect you won't actually find text…
<florian> fantasai: …where that is the only difference. Using a different style for a heading isn't strong enough, as there's other things that distinguish the heading.
addison: are generics about "give me a font with this type of styling generally" or ...
fantasai: The were added originally for that, but that's not what we need
… fantasy or cursive are useless because their purpose is to convey a feeling and they cna't do that because such a wide variety of fonts in each category
…
fantasai: you can use lang tags to tweak generic choices
dsinger: ...
dsinger: We don't have a place to put information about shaping of certain language/writing systems
addison: If you look at Urdu vs Arabic, they have different stylistic variations
… not serif vs sans-serif
… you can look at them and say their not really serif or sans-serif
… can smash into those buckets, or do we recognize that without changing language there are different font styles
… is it semantic thing
… I can argue both sides, it's really hard to add generics
… shaping engines work in specific ways with info we haave
… some token to pass, this is what I intend
… without being able to know what fonts are installed on a machine
dsinger: I'm hearing it's inappropriate to talk about generics in Latin terminology
… so we should have names for things they do in those scripts
… but then we have a problem of translation, what does it mean in other script
florian: don't necessarily have a problem, can apply :lang() selectors to choose fonts differently
… but if we say that Kaiti is not cursive, but new, then how many such new things should we have?
… do we want to go as far as fantasai said?
… or go further, e.g. I want Humanist typeface?
… not just about adding keywords, but also browser needs to have access to the fonts *and* know which fonts map to each keyword
fantasai: I think we have two critera
… one is what Florian mentioned, which is can we reasonably implement this generic
… other is do we need the generic in order to ensure the semantic preservation
… e.g. if these two fall back to the same font, will the text be less understandable
… nevermind whether it feels appropriate
florian: typical example would be italics in English
… if you lose the italics, you lose the fact that there was emphasis
… if you have a document which uses italics for emphasis and you fall back to normal text instead of italics, you lose information
… what are the cases in other languages?
<Bert> CSS font classification isn't based on Vox, but showing that others are struggling with classifications, too: ATypI abandoned the Vox classification and is working on new one.
<florian> fantasai: the reason for the change should be in the markup, and then you style it however you want. But it often happens that then way you style thing is that the only difference is between the font face, then we need generics to be able to preserve that distinction
addison: I think I agree with you, but your test may be incorrect
… if you suppose that someone used that as the only distinction
… e.g. I've seen serif vs sans-serif, e.g. as a form of emphasis
… but you could imagine that a document that would pass your test and still say, well, the fact that the browser smashed these two styles together is because of a limitation of our ability to express in generics
<florian> fantasai: we should not introduce generics to deal with that problem just become some one-off document made a distinction, but if it is one commonly made in the language, then it calls for generics
fantasai: ...
florian: I would like to see generics for more things, but if we are going to get a more limited set, the criteria you mention are the minimum we should aim for
… I think it would still be nice to pick from general categories for preferences
… e.g. naastaliq
… But regardless, we can't just make nice keywords in specs. The browsers need to be able to map them
… if we create 500 generics in all known languages, it's not going to have good coverage and not going to be helpful
addison: it's like counter styles
… I know I want certain things, but to force everyone to implement
… if you're styling documents can use these keywords in this way, and it will do a good job of getting fonts that matches
florian: maybe can provide premade style sheets for this
… but even though I would like all to be covered by browser, if we have smaller set
… fantasai is hinting at the minimum neessary for international text to work
… nice to go beyond, but should at least start there
addison: it's about where font management taking plae
<florian> fantasai: where it gets implemented is a bit more of an open question
addison: not necessary to spec
… up to implementations
<florian> fantasai: it often happens that introducing it in CSS puts the pressure on the reste of the ecosystem to make it happen
<florian> fantasai: as the i18n WG, what we need to do is to identify the critical things that needs to exist so that the earlier criteria can be handled
<florian> fantasai: just like western designers may wish to get the distinction between serif and slab serif, Arabic designers might wish for many distinctions, but that's not a priority
dsinger: if writing a document [gives example of switching font styles]
florian: if it's a one-off, that's one thing. If it's a regular pattern, need to build into CSS
<florian> fantasai: but if the common type of document wouldn't make sense on a phone because the phone doesn't show the right distinction, then that's a problem.
addison: [...]
<florian> fantasai: css should be designed in a way that as you fall back through fonts, you may loose some styling, but you shouldn't lose meaning. Whichever generics are needed to make that happen should exist
florian: Imagine we were not all familiar with Latin, and only had distinction relevant to our own language
… discussing about adding generic keywords
… as i18n, and they explain italics
… if you miss that, you'll have difficulty understanding
… if you can't preserve that you will miss information
… they have many different font styles, which is nice, but need italics vs non italics
… CSSWG knows how to introduce generics
… but doesn't know what's needed to add
… if i18n can say, in language X you will use font face changes to distinguish these different uses
… functions similar to switching to monospace or switching to italics in Latin
… if i18nwg comes back and says these 7 keywords would solve these problems, CSSWG can add them
… but i18nwg needs to find these cases
addison: Can identify here's a group of languages, and here's what they do
… forgetting about the outside world, this is how they classify fonts
fantasai: but we don't care how they classify fonts if they're not using those classifications to make distinctions within the same document
addison: there are mental classificatoins
… for emphasis, we've introduced different ways to style emphasis
… because obliquing things is not the way to do it
… We can describe what those all are
… but can show what the cases are and have a discussion of where the bar should be
… before we take the plunge and introduce a new generic
… or should we do interstitial work that's separate
<florian> fantasai: nastaliq vs kufi is not going to be a distinction used within a document to contrast things. Would be nice to have, but not critical for understanding
[discussion of kaiti vs non-kaiti being used simlar to italic vs non-italic]
fantasai: I think the problem with classifing kaiti as cursive would be that if you ask for kaiti, you might get grasscript which would be totally inappropriate
florian: Would be like asking for monospace to express code and fell back to Zapfino
… the contrast would be there, but what it means is lost
… falling back to monospace would be better
dsinger: in this document, use the font as distinction, and in other as stylistic
… what do we do in that case
… want both documents to be readable at least
florian: problem of mapping fonts to categories, browser can do it if we introduce 3 new keywords; but not if we introduce 50
… a handful (worldwide), they can do it and it will be usable
… if instead of 3 (ignoring the 2 uselss ones) we had 8 or 9, would be manageable
… if we are asking for 50, will not be impemented
… so what are the few extra ones that are critical for understandability?
florian: can we action i18n to find the cases where font face category switches are needed for understanding common documents?
addison: other challenge is we'll not find a global generic
… we'll find a set of traditions over here with Kaiti, over there with another one, etc.
… will find islands of variations
florian: that's fine
David-Clarke: Would things like old-fashioned/modern/etc be types of categories to look
fantasai: no, because that's just a stylistic preference
florian: The distinction here is critical to have for understanding documents, vs stylistic preferences
ACTION: addison: follow up with r12a and others about gap analysis for font generics
<trackbot> Created ACTION-1195 - Follow up with r12a and others about gap analysis for font generics [on Addison Phillips - due 2022-09-19].
<florian> Florian: the distinction between old style or modern isn't a wrong one, but it isn't a critical one in the sense that both aren't commonly used in the same document to contrast two pieces of text
Triage
https://
https://
https://
fantasai: some kind of overview might make sense to me
… lot of details handled in there, not just baseline alignment not just in one writing system, but when mixed
… and this section tries to account for all of that
overview of baselines in CSS at https://
fantasai: discussion of text-spacing and adding rules to handle non-fullwidth punctuation https://
ACTION: atsushi: follow up with jlreq on csswg#6091 to see if non-CJK enclosing punctuation should be included in space-trimming
<trackbot> Created ACTION-1196 - Follow up with jlreq on csswg#6091 to see if non-cjk enclosing punctuation should be included in space-trimming [on Atsushi Shimono - due 2022-09-19].
https://
https://
fantasai: review miriam's comment linked above and convince csswg about direction
https://
[addison explains how lang tags for undetermined language work]
<florian> conclusion 1: :lang("") matches lang=""
<florian> conclusion 2: :lang("*") matches everything but lang=""
<florian> conclusion 3: maybe add a note about lang="und" and lang="" being treated distinctly, despite having similar semantics
ACTION: florian to reread issue, and if conclusions still make sense in the end, post as the proposal
<trackbot> Created ACTION-1197 - Reread issue, and if conclusions still make sense in the end, post as the proposal [on Florian Rivoal - due 2022-09-19].
AOB?
<Meeting adjourned for the day at 15:40>