Internationalization Working Group - TPAC 2025

Meeting minutes

Introductions

<addison> Richard Ishida

Martin Dürst

Eemeli Aro

Florian Rivoal

Bert Bos

Fuqiao Xue

Daisuke Shiohara

Ryusei Saijiki

Bobby Tung

Addison Phillips

https://www.w3.org/events/meetings/86ea031d-776b-426e-aa2a-bdf6ba6d50af/

Agenda Parking Lot

<r12a> oops

addison: the parking lot is just requests that we have

DOM localization

eemeli: a couple of different items on dom localization
… should this be a thing that should be done?

and if we do would this be going into the html spec eventually?
… separately from that where do we work on and incubate and standardize the representation of a message resource as a file format?
… does not belong in the html spec
… is this group a right place to incubate some or all parts of this?
… or should that technically be happening somwhere else?
… it is sounding to me like it would be much more benefical to talk about dom l10n more in depth after tomorrow's breakout session

florian: i would appreciate a 5-minute intro

addison: i would suggest maybe if this is necessary do it a little later
… there is no way to localize a web app built-in
… everybody rolling their own little l10n thing
… eemeli and i have been working on MF2 for a while
… would like to see MF2 be a native participant in the web

r12a: tomorrow morning there's also the wcag and non-latin language breakout
… i can go to that
… i would be useful there
… i don't know whether anybody wlse needs to be at the wcag one

addison: i suspect it's going to take more than one conversation

IRI vulnerability, IRI status in general (cf. RFC3987, WHATWG URL)

addison: the errata piece is a short conversation
… the larger conversation is we haven't finished this work
… WHATWG URL etc.

r12a: the ICANN UA Expert Group is looking at what standards need to be address
… you came up obviously with the IRI stuff
… just FYI

eemeli: @@1

eemeli: is there any representation of what is missing from the URL spec for it to be a full successor to the IRI spec?

martin: i think it depends on probably for web browsers there's not much that is missing
… on the other hand there are things like how exactly should bidi work?
… that's a very difficult problem

addison: I know mark davis is working on linkification
… if that were to turn to a standard of some sort
… you would want harmony on that

martin: it's very clearly a problem
… if there's an easy solution somebody would easily do it
… but the problem is that there's no solution
… and in browsers @@ non-ASCII copy that out it turns into percent escaping

addison: the address bar is a special place
… the challenge is that the address bar is not the only place where urls need to go

martin: it's more like a UI issue

eemeli: feels like the easiest here is to consider to consider all of the slashes to be directionally ltr
… and to break up parts according to those

martin: the IRI spec currently say something but browsers do it a little bit different

addison: the url standard has some about presentation maybe not full

eemeli: can we identify the pieces that ought to be added to the url standard so that we could possibly even deprecate the iri spec?
… martin would you be interested in putting something on a list of what is missing from the URL standard?

martin: i can do that

ACTION: martin: create a list of gaps in URL standard

<gb> Cannot create action. Validation failed. Maybe martin is not a valid user for w3c/i18n-actions?

ACTION: duerst: create a list of gaps in URL standard

<gb> Cannot create action. Validation failed. Maybe duerst is not a valid user for w3c/i18n-actions?

ACTION: addison: remind @duerst to create a list of gaps in URL standard

<gb> Created action #196

<gb> Action 196 remind @duerst to create a list of gaps in URL standard (on aphillips, duerst) due 2025-11-17

CSS

addison: we should think about how best to engage

https://github.com/w3c/i18n-activity/issues?q=is%3Aissue%20state%3Aopen%20label%3Awg%3Acss ==> 156 open

addison: i think that meeting at one time was very helpful

https://github.com/w3c/i18n-activity/issues?q=is%3Aissue%20state%3Aopen%20label%3AAgenda%2BI18N%2BCSS

addison: we have shared interest

florian: i think the problem is not shared interest
… my lack of attendance of our sync up meetings
… i don't believe i can realistically be that champion
… tho i wish i could

addison: as long as css uses some mechanism like our agenda+ tag
… or something to say this one is currently active and so interaction would be useful

r12a: i18n is part of the architecture , not an add-on

florian: i completely agree with that
… i don't think there is a general neglect of the i1n apects
… but i18n questions can be of cvarious levels of complexity

addison: we do tend to notice when there is action on something
… at least i'm looking for new pending issues
… we'll need to make sure joel gets that message
… there are some higher level things like physical versus logical
… on our radar for multiple years
… needs to get done

r12a: it needs to be done because customers for css are people from all around the world
… one thing we could look at is there's a champion in the csswg for i18n
… the champion doesn't have to know anything in great detail about i18n
… but they need to be aware when those discussions need to take place
… and they need to engage discussion

florian: when we design something new do we take i18n considerations into account properly? i would say the answer is yes we do
… logical vs physical thing is the language as a whole has a gap
… maybe it's insuffcient lack of attnetion

r12a: for me the problem is not so much the detailed work
… but having somebody monitoring what's happening and trying to facilitate discussions
… we don't have any meetings scheduled anymore

addison: we didn't accomplish anything

r12a: it's not "let's do extra i18n things"
… it's a case of making sure that CSS meets the needs of the world
… it's a part of the process of building CSS

Requirements for the layout of rosters

w3c/clreq#268

<gb> Issue 268 Requirements for the layout of rosters (by xfq) [未來工作/future] [i:justification]

w3c/clreq#268

xfq: common in traditional media, alignment by column
… aligned in three characters, and for two character cell there is space in middle
… how to do this with CSS

xfq: dot in second line connects two, in half size
… for thrid column it should align at the first

florian: for current CSS, text-justify
… justify by name but not with character, name need to be marked up by span or some

florian: flexbox might also work potentially

xfq: ideally alignment should not be done by ideographic character, but by system spacing

r12a: names should have minimum length, and to be aligned by system

florian: I think you can get close to this with either flex or grid
… either will have different shortcomings

martin: maybe clreq can see how far they get with grid/flex
… and CSSWG can help

r12a: what would help is if you could put the actual text for that black box in the github issue

xfq: I can do that

CSS

https://github.com/w3c/i18n-activity/issues?q=is%3Aissue%20state%3Aopen%20label%3Awg%3Acss

Ruby

r12a: i want to show people what we have in terms of finding issues related to a particular language such as japanese and ruby and so on

<r12a-again> https://www.w3.org/International/

<r12a-again> https://www.w3.org/TR/typography/

r12a: this is the i18n homepage
… if you look under language enablement there's a link called language enablement index
… if you scroll down that page

<r12a-again> https://www.w3.org/TR/arab-lreq/#vertical_text

[r12a shows the page]

<r12a-again> https://www.w3.org/TR/jpan-lreq/#inline_notes

<r12a-again> https://www.w3.org/TR/hani-lreq/#inline_notes

<r12a-again> https://www.w3.org/TR/kore-lreq/#inline_notes

<r12a-again> https://www.w3.org/TR/mong-lreq/#inline_notes

florian: i know that there is a wealth of interconnected information in these sets of pages
… it's never quite been clear to me on when i'm supposed to go there

r12a: if you want to check reqs go the reqs page
… if you want to check tests go to the tests page

Fonts

bobby: limitations about local fonts in CSS
… cannnot use system fonts like Kai and Fangsong
… important in Chinese typography
… use for emphasize etc.
… some system fonts only have one weight
… we do not use italics
… we cannot use synthetic oblique fonts
… we change typeface for emphasis
… if there's no way to use a local font
… there is not way to indicate emphasis
… we talked about this in clreq calls
… finally we have a new generic() function in CSS fonts L4
… but I don't know when it will be implemented in browsers
… that's the problem
… it's related to the CSS-i18n champion issue we just talked about

bobby: we have documented this in clreq

xfq: they are documented in the gap analysis

eemeli: i can fowrard this to the right people

r12a: who should we talk to?

eemeli: Henri
… does quite a bit of work with characters

<atsushi> xfq: we had discussed within i18n, and have more than 10 trackers on this

<atsushi> r12a: let's take this up again when florian is back

<atsushi> xfq: for ruby, Murata-san is wokring on, and will join tomorrow?

Glossary and the normative approach

https://github.com/w3c/i18n-glossary/pulls

<atsushi> xfq: have discussed this glossary for a while, ready to merge?

w3c/i18n-glossary#95

<gb> Pull Request 95 Update the definition of 'Mojibake' (by xfq)

<atsushi> xfq: #95 for Mojibake

https://github.com/w3c/i18n-glossary/pull/95/files

<gb> CLOSED Action 95 write endorsement of html ruby markup extensions (on aphillips) due 2024-05-02

[xfq introduces the PR]

martin: it's not an issue of encoding, but an issue of decoding

boby: we can still find some old web pages on the web that use shift-jis and when you open it with modern browsers
… they decode it with utf-8
… and you cannot read anything

r12a: tofu is a different thing
… lack of glyph

martin: or lack of fonts

r12a: how to say mojibake in chinese?

bobby: 乱码

Luànmǎ

乱 means disorder, confused

码 means encoding

<bobby> https://zh-yue.wikipedia.org/wiki/亂碼

xfq: specdev uses Mojibake
… i don't think any other spec uses it

r12a: so it's a informative term and we can have Luànmǎ too
… maybe it's not so important for the specs

w3c/i18n-glossary#89

<gb> https://github.com/w3c/i18n-glossary/pull/89

[xfq introduces the PR]

ok to merge

w3c/i18n-glossary#88

<gb> Pull Request 88 Update the definition of 'Bidirectional isolate' and 'Bidi isolation' (by xfq)

<atsushi> [xfq shows on screen for discussion on text/change in PR]

ok to merge

https://github.com/w3c/i18n-glossary/pull/91/files

<gb> Pull Request 91 Update the definition of 'First-strong detection' (by xfq)

martin: maybe say something like first-strong is used when auto is set

xfq: i can add a link

r12a: "then uses that to guess at the appropriate base direction for the string as a whole" is missing from the new def

martin: "guess" is the core here
… it should be used when the directionality is not known yet

Open issues and PRs

<atsushi> xfq: jumping into pending issues

w3c/i18n-drafts#701

<gb> Pull Request 701 Update qa-i18n (by xfq)

<atsushi> xfq: raised PR while ago, adding line to list of i18n targets

martin: "Keyboard usage" to "Keyboard layout and usage"

r12a: "Accessibility requirements" is too vague
… maybe things like "readability requirements" and "legal requirements"
… "script-specific readability requirements"

w3c/i18n-drafts#702

<gb> Pull Request 702 Add a brief mention of security issues (by xfq)

https://deploy-preview-702--i18n-drafts.netlify.app/questions/qa-escapes.en.html#security

eemeli: "inserting it into HTML"
… from a reader's point of view there's a little ambiguity of what "inserting it into HTML" means
… the way you're using it is correct
… but it is easy to misunderstand

r12a: and it's only the syntax characters
… if i say hello in another language
… you don't need to escape it

https://github.com/w3c/i18n-drafts/pull/705/files

<gb> Pull Request 705 Use "text content" instead of "content" (by xfq)

r12a: we can remove "There are many character encodings to choose from."

Mention layout mirroring for bidi

w3c/bp-i18n-specdev#163

<gb> Pull Request 163 Mention layout mirroring for bidi (by xfq)

https://deploy-preview-163--bp-i18n-specdev.netlify.app/#typ_bidi_styling

<Bert> xfq: There were some comments and I made a pull request ^^

eemeli: "It should preferably automatic" -> "It should be preferably automatic"

<Bert> eemeli: Typo: missing "be"

<Bert> martin: what languages use sloping in both directions (in section 9.4)?

<Bert> r12a_: In Hebrew it is a choice

Add some best practices to string-search

w3c/string-search#28

<gb> Pull Request 28 Add some best practices (by xfq)

<r12a_> https://r12a.github.io/scripts/hebr/he.html#fontstyle

<r12a_> https://r12a.github.io/scripts/arab/arb.html#letterforms

<Bert> xfq: This is about string searching. I added that UAs should by default offer case-insensitive searching, using Unicode case folding.

<Bert> eemeli: Section 5.18 of Unicode 17

<Bert> xfq: Current string search document doesn't refer to Unicode diretcly, but does point to charmod-norm, which does.

<Bert> ... That might be enough.

<Bert> eemeli: Maybe better to link directly and reduce need for clicks.

<Bert> ... Another typo: s/forms forms/character forms/

<Bert> r12a_: Or maybe just characters, instead of character forms.

<Bert> xfq: Another patch is to add ‘User agents MAY normalize numeric values to their ASCII forms (0-9) in string searching operations.’

<Bert> eemeli: Is that about characters that represent numbers?

<Bert> xfq: Yes

<Bert> hsivonen: Is this document normative?

<Bert> ... Performance difference on long documents for collator-based search.

<hsivonen> https://docs.google.com/document/d/1nUCQxSCCIdfBas5l-jGu58O38FaCLuvlsBFAjvXrgNM/edit?tab=t.0

<Bert> ... There is a request to me to write about this. Haven't written it yet. Let me paste something ^^

<Bert> ... Firefox doesn't do some things from this list.

<Bert> xfq: I'll read though that document.

<Bert> hsivonen: Firefox probably doesn't want to add a checkbox to fold numbering systems, but it could be treated as an accent difference.

<Bert> xfq: That's why I wrote ‘may’.

<Bert> ... as in ‘MAY provide an option for diacritics-sensitive search’

<Bert> r12a_: About ASCII digits: also search for the value if the number is not decimal?

<Bert> eemeli: It is a ‘may’

<Bert> ... Allow implementers to think about what can be done.

<Bert> r12a_: I've been searching a lot and keep finding things that I don't want. Such as finding é when I really want e.

<Bert> eemeli: In many languages, letters with accent are really different and you don't want to mix them.

<Bert> xfq: Maybe the ‘should‘ in the diacritics rule should be a ‘may’ then.

<Bert> r12a_: For me that applies to digits, too: I often do want to search for the character, not for anything with that value.

<Bert> hsivonen: Accent-sensitive search per language, e.g., for Finnish.

<Bert> ... I have only once seen a complaint about that.

<Bert> ... That is what Firefox do.

<Bert> ... Chrome and Safari search accent-insensitive.

<Bert> ... But if your UI language is Finnish or Swedish, then accents that are analyzed to form a separate base letter are not ignored.

<Bert> ... Need documenting what the cases and languages are. I've been asked to write it up, but haven't done so yet.

<Bert> eemeli: So we should change the ‘should’ (in ‘SHOULD ignore diacritics’) to a ‘may’.

<Bert> xfq: UAs may provide different UIs.

<Bert> Looking at example in the spec of Dürst vs Duerst for German.

<Bert> Introduction of some observers: Nicolò, Andreu and Itoe

DOM localization

<Bert> eemeli: Breakout about this tomorrow.

<Bert> ... Pretty easy to get 90 or 95% of the way.

<Bert> ... But the last few percent can be hard, depending on the model you start with.

<Bert> ... We have quite a bit of experience with localization of UI/UX.

<Bert> ... Localization is more than translation.

<Bert> ... Automated translation is pretty decent these days, but it different for words in a UI. What does ‘accept» on a button mean?

<Bert> ... Still need human translators.

<Bert> ... Goal is to have the web platform support localization, i.e., HTML.

<Bert> ... Compare how CSS attached to element.

<Bert> ... A lot of work has been done in Unicode on MessageFormat.

<Bert> ... What does a single message look like? How do you format it?

<Bert> ... We need an imperative way to do localization, as well as a declarative way.

<Bert> ... This needs work in HTML, but also work on a file format for holding the information. JSON probably not good enough.

<Bert> ... There are various formats in use, including JSON or XML-based.

<Bert> ... Question for this group is how much of the incubation for this should happen here?

<Bert> ... Firefox has a lot experience with this and we have a system for building the frontend this way.

<Bert> florian: It says ‘DOM’ localization. You mean a system where the localization happens in one document, with one URL?

<Bert> eemeli: There is no single correct solution. Can be via a URL or some other state.

<Bert> florian: You mentioned the Firefox UI, which doesn't have a URL.

<Bert> eemeli: Not a visible URL, but internally it has a similar identifier.

<Bert> florian: So also for different versions of a local document, as for an app?

<Bert> eemeli: Yes.

<Bert> ... Breakout session is tomorrow morning.

CSS

Ruby

r12a: there's markup and CSS for ruby
… ruby is used in Chinese and Japanese
… Korean and Mongolian a little bit
… a couple of years ago, you had a bit of money to develop an HTML add-on spec
… it'd be great to know what the progress is

[florian talks about the funding issue]

florian: my plan is this month to chase up in what has happened during in the horizontal review
… which is a little bit from i18n and not from anyone else
… my understanding is that firefox implements all that we have in the spec
… amazon kindle implements some of what we have in this spec
… the part that would be pushed to a level 2
… it's basically the rtc part of the markup
… and the multi-layered ruby
… within this month or so enough work to actually call for CR
… on the CSS side of things there remains plenty of work to do
… there's diminishing returns to working on the spec
… before impls catch up
… the CSS spec pretty much picks the html set extension we've been talking about
… i don't know how productive it is you're too far ahead of the impls

r12a: i don't think we can expect much movement on the impls before we have the markups
… we have it in draft form, it is published as a WD

https://github.com/w3c/i18n-activity/issues?q=is%3Aissue%20state%3Aopen%20label%3As%3Ahtml-ruby-extensions

r12a: would it be possible to create L1 and L2 at the same time?

florian: yes

[Discuss how and when to draft L2]

florian: for example, for rtc, my plan will be to leave it in L1, marked at risk until we're forced to trying to go to REC

[Discuss if we ever want to go to REC]

florian: while we have 2 impls
… one of them is not a browser
… that means this text will not be accepted by whatwg
… 2. we have some maintenance work to do on it
… tts representation of ruby
… will likely need some additional attributes with some new values
… we would offer pull requests against the HTML spec to keep that subset in sync

r12a: it just seems to me that the going to REC bit adds extra time and effort
… and knocks things out of the spec that haven't been implemented
… we can do CR and push implementers to implement it
… then it just makes life simpler

r12a: rtc is not that much more to do
… it's in the parser already

[Discuss how to test it in Amazon Kindle]

CSS

bobby: four Chinese typeface styles
… Chinese do not have italics
… we change typefaces for emphasis
… like switching between Hei (sans-serif) and Kai
… we can list all Kai system fonts
… that's stupid, but works
… CSS fonts L4 introduced a generic() function

xfq: @@2

florian: the font fingerprinting problem is more than tricky
… it's hard

r12a: I seem to remember we were getting closer

florian: I think we were getting pretty close in terms of allowing people to do various things
… some of which might be the right one

w3c/csswg-drafts#11775

<gb> Issue 11775 [meta][css-fonts-4] Index of local font issues: fingerprinting, I18n, privacy (by svgeesus) [css-fonts-4] [i18n-tracker] [meta] [privacy-tracker]

<bobby> https://en.wikipedia.org/wiki/CJK_Unified_Ideographs_Extension_G

bobby: another case
… Unihan Extension G
… very recent new block in Unicode
… the Jigmo font has glyphs from extention G
… but if it's a local font
… Safari can't load it
… and it's a large font

<xfq> I made a demo a while ago: https://xfq.github.io/large-webfont/

florian: if we want to talk about multiple things with CSSWG, probably do not start with this issue
… it will consume all the time

w3c/csswg-drafts#11257

<gb> Issue 11257 [css-text-decor] Control the line height / proximity of text containing emphasis marks (by xfq) [css-text-decor-3] [css-text-decor-4] [i18n-needs-resolution] [i18n-jlreq] [i18n-clreq] [i18n-klreq] [i18n-mlreq]

w3c/csswg-drafts#10844

<gb> CLOSED Issue 10844 [css-overflow] Line-clamp and approaches to ellipsis insertion (by frivoal) [css-overflow-4] [Closed Accepted by CSSWG Resolution] [i18n-tracker] [Needs Testcase (WPT)] [i18n-jlreq] [i18n-alreq] [topic: line-clamp]

Andreu: CSSWG #10844

<gb> Issue 10844 not found

Andreu: this is a closed issue
… but I do not agree with Addison's comments in the issue

florian: I'll introduce line-clamp
… there already exists something in CSS which people often confuse this with
… we're not talking about the thing that lets you add a dot dot dot at the end of a line
… that exceeds its box
… when a line is too long and it overflows in the inline direction

florian: we have solid agreement with i18n and CSS WGs is that the chopping should happen logically not physically
… when we're doing this in multiple lines
… but
… the removal of extra content to make room for the ellipsis is logical
… but physically where does the ellipsis go?

Andreu: the ellipsis indicates that the text is truncated
… does it indicate that the embedding level is truncated or does it indicate that the paragraph level is truncated?
… I showed several examples to Arabic and Hebrew speakers
… including multiple nested levels of Hebrew and English
… they did seem to agree that it would be better to place the ellipsis at the visual end of the line
… the way the CSSWG has resolved on this is in agreement with what Andreu wants to do
… I don't think is in conflict with what i18n WG has said as a formal resolution
… however
… the last comment that Addison left
… seems to suggest another way

eemeli: I think if you've got user research, even if it's informal
… that is strongly indicative that speakers think paragraph level makes more sense
… that sounds very believable to me
… this feels like a thing that what the humans expect does not necessarily match what logic might dictate
… or you can argue the logic either way

Andreu: I was trying to implement Addison's suggestion in Chrome
… this is completely alien to the way that Chrome or other browsers do things
… because it's just at the wrong level
… at the wrong place in the layout stage

Bert: it depends on what kind of symbol you use
… ellipsis vs arrow
… if you end with a hyphenated word

r12a: be careful when you're saying hyphenation
… do you mean words with hyphens in between
… or do you mean end of line?

florian: currently we use the same logic as what we use for line breaking
… we're trying to reuse the existing mechanism of CSS
… avoid reinventing them poorly

Andreu: in my impl
… I had just been assuming that you can compute the answers ahead of time

Bert: hanging ellipsis?

florian: separate question

eemeli: if you're a human dealing with this
… @@2

r12a: Arabic language does not use hyphenation
… but Arabic script used for Uyghur
… you'll find lots of hyphenation

r12a: Persian doesn't

w3c/hlreq#8

<gb> Issue 8 Hebrew Hyphen (by r12a) [i:segmentation] [s:hebr]

breakout sessions

r12a: wcag is trying to create readability guidelines
… like leaving a certain amout of spacing between lines
… it works for english
… but not necessary for other scripts
… they put together a task force that is looking at hwo they can extend wcag guidelines so that it meets the needs of people who use different scripts
… they're struggling a bit in terms of how they're gonna capture that info
… they've tried to choose 5 scripts
… latin, cyrillic

– DRAFT –
Internationalization Working Group - TPAC 2025

09 November 2025

Attendees

Meeting minutes

Introductions

Agenda Parking Lot

DOM localization

IRI vulnerability, IRI status in general (cf. RFC3987, WHATWG URL)

CSS

Requirements for the layout of rosters

CSS

Ruby

Fonts

Glossary and the normative approach

Open issues and PRs

Mention layout mirroring for bidi

Add some best practices to string-search

DOM localization

CSS

Ruby

CSS

breakout sessions

Summary of action items

Diagnostics